Digital Image Processing Book
Digital Image Processing Book
Week 3
11 Image Formation - I 327
12 Image Formation - II 340
13 Image Geometry - I 369
14 Image Geometry - II 402
15 Stereo Imaging Model - II 437
Week 4
16 Interpolation and Resampling 467
17 Interpolation Techniques 478
18 Interpolation with examples - I 494
19 Interpolation with Examples - II 504
20 Image Transformation - I Edit Lesson 519
Week 5
1
21 Image Transformation - 531
22 Separable Transformation 540
23 Basis Images 549
24 Fourier Transformation 560
25 Properties of FT 576
Week 6
26 FT Result Display 584
27 Rotation Invariance Property 594
28 DCT and Walsh Transform 604
29 Hadamard Transformation 618
30 Histogram Equalization and Specifications - I 630
Week 7
31 KL-transform- 641
32 Image Enhancement Point Processing Techniques 652
33 Contrast Stretching Operation 665
34 Histogram Equalization and Specification - I 678
35 Histogram Equalization and Specification - II 688
Week 8
36 Histogram Implementation - I 700
37 Histogram Implementation - II 710
38 Image Enhancement Mask Processing Techniques - I 726
39 Image Enhancement Mask Processing Techniques - II 735
40 Image Enhancement Mask Processing Techniques - III 749
Week 9
41 Frequency Domain Processing Techniques 764
42 Image Restoration Techniques - I 787
43 Image Restoration Techniques - II 799
44 Estimation of Degradation Model and Restoration Techniques - I 808
45 Estimation of Degradation Model and Restoration Techniques - II 822
2
Week 10
46 Other Restoration Techniques - I 833
47 Other Restoration Techniques - II 852
48 Image Registration - I 866
49 Image Registration - II 880
50 Colour Image Processing Colour Fundamentals 895
Week 11
51 Colour Model 912
52 Conversion of one color model to another - I 921
53 Conversion of one color model to another - II 933
54 Pseudo color image processing 947
55 Full color image processing 960
Week 12
56 Different Approaches for Image Segmentation 977
57 Image Segmentation Global Processing (Hough Transform) 1000
58 Region based Segmentation Operation. Thresholding Techniques 1019
59 Region Splitting and Merging Technique Edit Lesson 1041
3
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module Number 01 Lecture Number 01
Introduction of Digital Image Processing.
(Refer Slide Time 00:17)
Now in today's lecture we will have an introduction to the various image processing
techniques and their applications and in subsequent lectures we will go to the details of
different image processing algorithms. To start with, let us see that what does digital image
processing mean.
So if you just look at this name, digital image processing, you will find that there are 3 terms.
4
(Refer Slide Time 00:52)
First one is processing, then image then digital; so a digital image processing means
processing of images which are digital in nature by a digital computer. Before we come to the
other details, let us see that why do we need to process the images?
So you will find that digital image processing techniques is motivated by two major
applications. The first application is improvement of pictorial information for human
perception. So this means that whatever image you get, we want to enhance the quality of
image, so that the image will have a better look and it will be much better when you look at
the image. The second important application of the digital image processing techniques is for
5
autonomous machine applications. This has various applications in industries particularly for
quality control, in assembly automation and many such applications. We will look at them
one after another.
And of course, there is a third application which is for efficient storage and transmission, say
for example, if we want to store an image on a computer, then this image will need certain
amount of disk space. Now we will look at whether it is possible to process the image using
certain image properties so that the disk space required for storing the image will be less; not
only that, we can also have applications where we want, where we want to transmit the image
or the video signal over a transmission medium. And in that case if the bandwidth of the
transmission medium is very low, we will see that how to process the image or the video so
that the image or the video can be transmitted over low bandwidth communication channels.
So let us first look at the first major application which is meant for human perception.
6
(Refer Slide Time 03:23)
Now these methods mainly employ the different image processing techniques to enhance the
pictorial information for human interpretation and analysis. Typical applications of these
kinds of techniques are noise filtering.
In some cases, the images that you get may be very, very noisy. So we have to filter the
images, filter those images so that the noise present in the image can be removed and the
image appears much better. In some other kind of applications, we may have to enhance
certain characteristics of the image. So the different kind of applications under this category
7
(Refer Slide Time 04:10)
one is the contrast enhancement. So sometimes the image may be very, very poor contrast
(Refer Slide Time 04:14)
and we have to enhance the contrast of the image so that it is better visually. In some other
cases, the image may be blurred. This blurring may occur because of various reasons. May be
the camera setting is not proper or the lens is not focused properly. So that leads to one kind
of blurring. The other kind of blurring can be if we take a picture from a moving platform,
say for example from a moving car or from a moving train, in that case also you might have
observed that the image that you get is not a clear image but it is a blurred image. So we will
8
look at whether the image processing techniques can help to rectify those images. The other
kind of application is remote sensing.
are the aerial images and in most of the cases the aerial images are taken from a satellite.
Now let us look at different examples under these different categories.
9
(Refer Slide Time 05:18)
Here you find that you have a noisy image, the first image that is shown in this slide is a
noisy image.
and this kind of image is quite common on a TV screen. Now you will find that the digital
image processing techniques can be used to filter these images and the filtered image is
shown on the right hand side and you will find that the filtered image looks much better than
the noisy image that we have shown on the left side.
10
(Refer Slide Time 05:54)
11
(Refer Slide Time 05:58)
you will find that again on the left hand side, we have an image and on the right hand side we
have the corresponding image which is processed to enhance its contrast. If you compare
these two images, you will find that the low contrast image, in this case there are many
details which are not clearly visible. Say for example the waterline of the river.
Simultaneously if you look at the right image which is the processed and enhanced version of
the low contrast image you will find that the waterlines of the river are clearly visible. So
here after processing we have got an image which is visually much better than the original
low contrast image.
12
This is another example of image enhancement. Here on the left hand side, we have a low
contrast image but in this case it's a color image. And I am sure that none of you will like to
have a color image of this form. On the other hand, on the right hand side we have the same
image which is enhanced by using the digital image processing techniques and you will find
that in this case again, the enhanced image is much better than the low contrast image.
I talked about the other content enhancement where we have said that in some of the
applications, some cases the image may be blurred. So here on the top row, the left side you
find an image which is blurred and in this case, the blurring has occurred because of the
defocused lens. When the image was taken the lens was not focused properly. And we have
talked about another kind of blurring which occurs if you take a picture from a moving
platform, may be from a moving train or from a moving car. Since such cases, the type of
image that we usually get is the kind of image that we have shown on the right hand side of
the top row. And here you find that this kind of blurring is mostly the motion blurring. And
the third image on the bottom row, it shows the processed image, where by processing these
different blurred images we have been able to improve the quality of the image.
13
(Refer Slide Time 08:28)
Now other major application of digital image processing techniques is in the area of
medicine. I am sure that many of you must have seen the CT Scan images, where the images
of the human brain are formed by using the CT scan machines. Here it shows one slice of a
CT scan image and the image is used to determine the location of a tumor. So you find the
left hand image
is the original CT scan image and the middle and, the image on the right hand side, they are
the processed images.
14
(Refer Slide Time 09:12)
So here, in the processed image, the region of yellow and red, they tell you the presence of
the tumor in the brain. Now these kinds of images and the image processing techniques are
very, very important in medical applications. Because, by these processing techniques, the
doctors can find out the exact location of the tumor, the size of the tumor and many other
things which can help the doctor to plan the operation process. And obviously this is very,
very important because in many cases it saves our lives.
15
This is another application of the image processing techniques in the medical field where you
have shown some images, some mammogram images which shows the presence of cancerous
tissues. So this image processing technique in the medical field is very, very helpful to detect
the formation of cancers.
This shows another image which is very, very popular and I believe most of you have heard
the name of Ultra-sonography. So here we have shown two images, two ultrasonic images
which are used to study the growth of a baby while the baby is in the mother's womb. And
this also helps the doctor to monitor the health of the baby before the baby is actually born.
(Refer Slide Time 10:43)
16
The image processing techniques are also very, very important for remote sensing
applications. So here, this is an image, a satellite image which is taken over the region of
Kolkata and you find that many of the information which are present in the image. The blue
line, the blue thick line, it shows the river Ganges and there are different color coding used
for indicating different regions. And when we have a remote sensing image, an aerial image
of this form, which is taken from a satellite, we can study various things. For example, we
can study that whether the river has changed its path, we can study what is the growth of
vegetable over a certain region. We can study if there is any pollution in some region in that
area.
So these are various applications of these remote sensing images. Not only that, such kind of
remote sensing images or aerial images can also be used for planning a city. Suppose we have
to form a city, we have to build a city over certain region, then through these aerial images
what we can study is what is the nature of the region over which the city has to be built and
through this, one can determine that where the residential area has to be grown, where an
industrial area has to be grown, through which regions the paths have to be formed, the roads
have to be constructed, where you can construct a car parking region and all those things can
be planned if you have an aerial image like this.
17
(Refer Slide Time 12:32)
Here is another application of remote sensing images. So here you find that the remote
sensing images are used for terrain mapping. So this shows the terrain map of a hilly region
which is, which are not accessible very easily. So what you can do is we can get the images
through the satellite of that region which is not accessible, then process those images to find
out
the 3D terrain map and here, this particular image shows such a terrain map of a hilly region.
18
(Refer Slide Time 13:09)
This is another application of the remote sensing images. Here you find that this particular
satellite image shows a fire which took place in Borneo. When you see that these kind of
images are useful to find out what is the extent of fire or in which direction the fire is moving.
And once you identify that, you can determine what is the loss that has been made by the
wake of this fire and not only that, if we can find out the direction in which the fire is
moving, we can warn the people beforehand much early so that the precautionary action can
be taken and many lives as well as the property can be saved.
19
The image processing technique is also very, very important for weather forecasting. I am
sure that whenever you look at the TV news on a television channel, when the weather
forecasting is given, in that case, on a map some images are overlapped which tells you what
is the cloud formation in certain regions. That gives you an idea that whether there is going to
be some rain, there is going to be some storm and things like that. This is an image which
shows the formation of hurricane over Dennis which happened in 1990 and through this
image we can find out
what is the extent of the hurricane, what is the strength of this hurricane and what are the
precautionary measures that can be taken to save lives as well as property beforehand.
Image processing techniques are also very useful for atmospheric study.
20
(Refer Slide Time 14:55)
So if you look at this image, you will find that in the center part of the image what has been
shown is the formation of an ozone hole. Many of you know that this ozone layer is very,
very important for us because it gives us a protective layer over our atmosphere and because
of this ozone protective layer, many of the unwanted rays from the sun cannot enter our
earth’s atmosphere. And by that
our health is saved. Whenever there is formation of such an ozone hole, so this indicates that
all those unwanted rays can enter the earth's surface through that ozone hole. So the region
over which such an ozone hole is formed, people of that region has to take some
21
precautionary measure to protect them against such unwanted radiation. So, this is also very,
very important. Such image processing techniques are very, very important for atmospheric
study.
Image processing techniques are also important
for astronomical studies Say for example, in this particular case, it shows the image of a star
formation process.
22
Again the next image, it shows the image of the galaxy. So you find that the application of
the image processing techniques is becoming unlimited. So these are applied in various fields
for various purposes.
Next we come to the other domain of application of image processing techniques which is the
machine vision applications. You find that all the earlier applications which we have shown;
there the purpose was the visualization, the improvement of the visual quality of the image so
that it becomes better for human perception.
(Refer Slide Time 16:54)
Thank you.
23
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module Number 01 Lecture Number 02
Application of Digital Image Processing
To extract some description or some features which can be used for further processing by a
digital computer. And such a kind of processing can be applied in industrial machine vision,
24
for product, assembly and inspection. It can be used for automated target detection and
tracking. This can be used for fingerprint recognition. This can also be used for processing of
aerial and satellite images, for weather prediction, crop assessment and many other
applications.
So let us look at these different applications one after another. So this shows an application of
An automation of a bottling plant; here what the plant does is, it fills up some liquid, some
chemical into a bottle and after it is filled up, the bottles are carried away by the conveyor
25
belts and after that these are packed and finally sent to the customers. So here checking the
quality of the product is very, very important
and in this particular application, the quality of the product indicates that whether the bottles
are filled properly or some bottles are coming out empty or partially filled. So naturally the
application will be that if we can find out that some bottles are partially filled or some bottles
are empty then naturally we don't want those bottles to be delivered to the customer; because
if the customer gets such bottles then the goodwill of that company would be lost. So
detection of the empty bottles or partially filled bottles is very, very important and here image
processing techniques can be used to automate this particular process. So here you find that
we have shown an image, the snapshot of this bottling process where you find that
26
(Refer Slide Time 02:28)
there are some bottles which are completely filled up and one bottle in the middle which is
partially filled. So naturally we want to detect this particular bottle and remove it from the
production line so that finally when the bottles go to the customer, no empty bottle or no
partially filled bottles
27
(Refer Slide Time 02:53)
in machine vision, for machine vision purpose. Now before I go to that application, I have
shown an image to highlight the importance of boundary information in image processing. So
here you find we have shown the boundary image of an animal. There is no other information
available in this image except the boundary contours. And you find that if I ask you can you
identify this particular animal and I am sure that all of you will identify this to be a giraffe. So
you find that even though we don't have any other information except the boundary
or the border of the giraffe, but still we have been able to identify this particular animal. So in
many cases or in most of the cases, the boundaries contain most of the information of the
28
objects present in the scene. And using these boundary informations we can develop various
applications of image processing techniques. Here is an application.
So this is again an automated inspection process and here the objects that we are interested to
inspect are some refractory bricks. So here we find that we have shown 4 different images.
The first one is the original image, which is of the refractory brick which is captured by
camera. The second one is what we call, is a thresholded image or a segmented image, we
will come to details of this later, where we have been able to identify that what are the
regions which actually belong to this object and what are the regions which belong to the
boundary. Naturally when we are interested in inspecting this particular object, we will not be
interested in the background region. What we will be interested is in the region that belongs
to that particular object. So this background and object separation process is very, very
important in all these kind of applications. The third image, that is the left one on the bottom
is a field image. You find the second image; it is not very smooth. There are a number of
black patches over the white region. So the second one has filled up all those black patches
and it shows that what is the profile of the object, the 2D projection of the object that we can
get. And the fourth image, you will find that it shows that what is the boundary of this object.
And using this boundary information we can inspect various properties of this particular
object. For example, in this particular application, there can be two different types of defects.
29
(Refer Slide Time 05:50)
One kind of defect is the structural defect. When we say the structural defect, by structure
what I mean is, what is the dimension of every side of the object, what is the angle at every
corner of the object. These are the structural informations of that particular object. And the
other kind of inspection that we are interested to do is what is the surface characteristics of
this particular object, whether the surface is uniform or the surface is non-uniform. So let us
see that how these inspections can be made.
(Refer Slide Time 06:28)
So here you find that the first image, what we have done is we have processed the boundary
image in such a way that since there are 4 different boundary regions, we have fitted 4
30
different straight lines and these 4 different straight lines; that tells you that what should be
the ideal boundary of the object. And once we get these 4 different straight lines, using these
4 different straight lines we can find out what are the points of intersections of these 4
different straight lines. And using these point of intersections we know that in the ideal
situation, those point of intersections are actually the locations of the corners of the object. So
you find that in the first image, there are 4 white dots which indicates the corners of the
object. And once we get these informations, the corners of the object, the boundary line of the
object, we can find out what is the dimension of or the length of each and every side of the
object. We can also find out, what is the corner subtended, what is the angle subtended at
every corner of the object and from this, if I compare these informations that we have
obtained through image processing with the information which is already stored in the
database for this particular object, we can find out whether the dimensions that we have got is
within the tolerable limit or not. So if it is within the tolerable limit
(Refer Slide Time 08:04)
then we can accept the object. If it is not within the tolerable limit then we will not accept the
object.
31
(Refer Slide Time 08:13)
you will find that there are two different corners, the right, the corner on the right hand side
and the corner on the left hand side. These corners are broken. Not only that, on the left hand
side, if you look at the middle, you can identify that there is certain crack. So these are also
the defects of this particular object and through this image processing technique we are
interested to identify these defects. Now let us see that how these defects are being identified.
So here again in the first image once we have got the ideal boundary and the ideal corners of
the object, we can fill up the region bounded by these 4 different edges to get an ideal
projection of the object. So the second image in this particular slide shows you that what is
the ideal projection; the third image that shows you that what is the actual projection that has
been obtained after image processing techniques. Now if this ideal projection, if we take the
difference of this ideal projection and the actual projection then we can identify these defects.
So you find that in the fourth image the two different corner bricks have been represented by
white patches, also on the left hand side in the middle, you have, you can see that the crack is
also identified. So these image processing techniques can be used for inspection of the
industrial objects like this. And as we mentioned
32
(Refer Slide Time 09:51)
the other application or the other kind of inspection that we are interested in is the surface
characteristics, whether the surface is uniform, or the surface is non-uniform.
(Refer Slide Time 10:01)
So when we want to find out or study the surface characteristics, the type of processing
techniques which will be used is called texture processing. And this one shows that
33
(Refer Slide Time 10:14)
for the surface of the object it is not really uniform, rather it contains 2 different textures and
in the right image those 2 textures are indicated by 2 different gray shades.
(Refer Slide Time 10:31)
It shows the application of image processing techniques for automated inspection in other
applications.
For example, the inspection of Integrated Circuits during the manufacturing phase. Here you
find that in the first image, there is a broken bond where as in the second image some bond is
missing which should have been there. So naturally these are the defects which are to be
identified because otherwise, if this IC is made, then the IC will not function properly.
34
(Refer Slide Time 11:10)
So those are the applications which are used for machine-vision applications, for automating
some operations and in most of the cases, it is used for automating the inspection process or
automating the assembly process.
Now we have another kind of applications by processing the sequence of images which are
known as video sequence. The video sequence is nothing but the different image frames
which are displayed one after another. So naturally when the image frames are displayed one
after another then if there is any movement in the image that movement is clearly detected.
So the major emphasis in image processing, image sequence processing is to detect the
moving parts. This has various applications. For example, detection and tracking of moving
targets and major application is in security surveillance. The other application can be, to find
the trajectory of a moving target. Also monitoring the movement of organ boundaries in
medical applications is very, very important
35
(Refer Slide Time 12:24)
Here you find that in the first image some person is moving against a green background. So
let us see this.
36
(Refer Slide Time 12:43)
37
(Refer Slide Time 12:55)
So through image processing techniques we can identify this movement. So, in the second
processed sequence you find
38
(Refer Slide Time 13:09)
against a black background. That means we have been able to separate the background from
the moving object.
(Refer Slide Time 13:18)
Now this particular application which has been shown, here the image was taken or the video
sequence was taken on broad daylight. But in many applications, for example, particularly for
security applications, the images are to be taken
39
(Refer Slide Time 13:35)
during the night also, when there is no sunlight. Then what kind of image processing
techniques can be used for such
surveillance applications? So here you find that we have shown an image which is or a
sequence
40
(Refer Slide Time 13:51)
which is taken during the night and the kind of imaging that you have to take is not the
ordinary optical imaging but here we have to go for infrared imaging or thermal imaging. So
this particular sequence is a thermal sequence. So again here you find that a person is moving
against a still background. So if you just concentrate in this region you find that the person is
moving. And again through image processing techniques, we have identified
just this moving person against the still background So here you find that the person is
moving and the background is completely black. So these kinds of image processing
41
techniques can also be used for video sequence processing. Now let us take, look at another
application of this image processing technique.
Here we have a moving target, say like this. And our interest is
42
(Refer Slide Time 15:11)
to track this particular target That is, we want to find out what is the trajectory that this
particular moving object is following. So what we will do is we will just highlight the
particular point that we want to track. So I make a window like this
with the window covers the region that I want to track. And after making that window, I want
to make a template of the object region within this window. So after making the template, I
go for tracking this particular object. So you will find, again in this particular application, the
object is being tracked in this video sequence. Just look at this.
43
(Refer Slide Time 16:00)
So you find that over the sequences the object is changing its shape.
But even after changed shape, we have been able to track this particular object. But when the
object cannot be matched any further, the shape has changed so much
44
(Refer Slide Time 16:23)
that the object cannot be matched any further, it indicates poor detection. So what is the
application of this kind of image processing techniques? Here the application is, if I track this
moving object using a single camera, then with the help of a single camera, I can find out
what is the azimuth and elevation of that particular object with respect to certain reference
coordinate system. If I track the object with 2 different cameras and find out the azimuth and
elevation with the help of 2 different cameras then I can identify that what is the x, y, z
coordinate of that object with respect to that 3D coordinate system. And by locating those
locations in different frames, I can find out that over the time which path the object is
following and I can determine that what is the trajectory that the moving object follows. So
these are the different applications of video sequence processing.
45
(Refer Slide Time 17:40)
and in compression what we want is, we want to process the image to reduce the space
required to store that image or if want to transmit the image we should be able to transmit the
image over a low bandwidth channel. Now let us look at this image and let us look at the
region, the blue circular region. You will find that in this particular region, the intensity of the
image is more or less uniform. That is, if I know that the intensity of the image at a particular
point, I can predict what is the intensity of its neighboring point. So if that prediction is
possible then it can be argued that why do have to store all those image points? Rather I store
one point and the way it can be predicted, its neighborhood can be predicted, we just mention
that prediction mechanism. Then the same information can be stored in a much lower space.
You will find this second region. Here again in most of the regions you find that intensity is
more or less uniform except certain regions like eye, like the hat boundary, like the hair and
things like that. So these are the kind of things which are known as redundancy.
46
(Refer Slide Time 19:13)
So whenever we talk about an image, the image usually shows 3 kinds of redundancies. The
first kind of redundancy is called a pixel redundancy which is just shown here.
The second kind of redundancy is called a coding redundancy and the third kind of
redundancy is called a psychovisual redundancy. So these are
47
(Refer Slide Time 19:33)
the 3 kind of redundancies which are present in an image So whenever we talk about an
image
the image contains two types of entities, the first one is the information content of the image
and the second one is the redundancy and these are the three different kinds of redundancies.
So what is done for image compression purposes, you process the image and try to remove
the redundancy present in the image, retain only the information present in the image. So if
48
we retain only the information, then obviously the same information can be stored using a
much lower space. The applications of this are reduced storage as I have already mentioned
if I want to store this image on a hard disk, or if I want to store a video sequence on a hard
disk,
then the same image or the same digital video can be stored in a much lower space.
49
(Refer Slide Time 20:39)
That is if I want to transmit this image over a communication channel or if I want to transmit
the video
over a communication channel then the same image or the same video will take much lower
bandwidth of the communication channel. Now given all these applications, this again shows
that
50
(Refer Slide Time 21:02)
what do we get after completion? So here we find that we have the first image which is the
original image, the second one shows the same image but here it is compressed 55 times. So
you find, if I compare the first image and second image, I find the visual quality of the two
images are almost same, at least visually we cannot make much of difference. Whereas if you
look at the third image which is compressed 156 times, now if you compare third image with
the original image you will find that in the third image there are a number of blocked regions
or blocking, these are called blocking artifacts which are clearly visible when you compare it
with the original image. The reason is, as we said that the image contains information as well
as redundancy. So if I remove the redundancy, maintain only the information then the
reconstructed image does not look much different from the original image. But there is
another kind of compression techniques which are called lossy compression.
51
(Refer Slide Time 22:12)
In case of lossy compression, what we remove is not only the redundancy but we also remove
some of the informations, so that after removing those informations the quality of the
reconstructed image is still acceptable. Now in such cases, because you are removing some of
the information which is present in the image so naturally the quality of the reconstructed
image will not be as the original image. So naturally there will be some loss or some
distortion and this is taken care by what is called rate distortion theorem.
Now if I just compare the space requirement of these 3 images, if the original image is of
size, say 256 by 256 bytes Ok that is 64 kilo bytes, the second image which is compressed 55
52
times, the second image will take slightly above something around say 10 kilo bytes, Ok. So
you find that the difference. The original image takes 64 kilo bytes but the second one takes
something around 10 kilo bytes where as the third one will take something around 500 bytes
or even less than 500 bytes. So you will find that how much reduction in the space
requirement we can achieve by using these image compression techniques.
now let us look at some history of image processing. Though the application of image
processing has become
53
very, very popular for last 1 or 2 decades but the concept of image processing is not that
young. In fact, as early in 1920s image processing techniques were being used. And during
those days, the image processing techniques or the digital images were used to transmit the
newspapers pictures between London and New York. And these digital pictures are carried
by submarine cables; the system which was known as Bartlane Systems. Now when you
transmit these digital images via a submarine cable then obviously on the transmitting side I
have to have a facility for digitization of the image, similarly on the receiving side, I have to
have a facility for reproduction of the image. So in those days, on the receiving side
the pictures are being reproduced by the telegraphic printers and here, find a particular
picture which was reproduced by a telegraphic printer.
54
(Refer Slide Time 25:21)
Now next in 1921, there was improvement in the printing procedure. In the earlier days,
images were reproduced by telegraphic printers. In 1921 what was introduced was the
photographic process of picture reproduction. And in this case, on the receiver, instead of
using the telegraphic printer, the digital images or the codes of the digital images were
perforated on a tape and photographic printing was carried on using those tapes. So here you
find that there are 2 images, the second image is obviously the image that we have shown in
the earlier slide. The first image is the image which has been produced using this
photographic printing process. So here you find that the improvement both in terms of tonal
quality as well as in terms of resolution is quite evident. So if you have compared the first
image and the second image, the first image appears much better than the second image.
55
(Refer Slide Time 26:33)
Now the Bartlane System that I said which was being used during 1920s that was capable of
coding 5 distinct brightness levels. This was increased to 15 levels by 1929, so here you find
an image with 15 different intensity levels and here the quality of this image is better than the
quality of the image which was produced by the Bartlane system. Now since 1929
(Refer Slide Time 27:08)
for next 35 years, the researchers have paid their attention to improve the image quality or to
improve the reproduction quality.
56
And in 1964, these image processing techniques were being used by, in, at Jet propulsion
laboratory to improve the pictures of moon which have been transmitted by Ranger 7. And
we can say this is the time from where
the image processing techniques or the digital image processing techniques has got a boost.
And this is considered to be the basis of modern image processing techniques. Now given the
applications as well as
(Refer Slide Time 27:58)
57
and the history of image processing techniques, now let us see that how an image is to be
represented in a digital computer. This representation is very, very important because unless
we are able to represent the image in a digital computer, obviously we cannot process the
image. So here you find that we have shown an image and at a particular point x, y in the
image conventionally the x coordinate is taken vertically downwards and the y axis is taken
horizontally towards right. And if I look at this image this image is nothing but a two
dimensional intensity function Ok which is represented by f(x, y). Now if at any particular
point x, y we find out the intensity value which is represented by f(x, y), this f(x, y) is nothing
but a product of two terms. So here you find that this f(x, y) is represented by the product of
two terms one term is r(x, y) and other term is i(x, y). This r(x, y) is the reflectivity of the
surface of the corresponding image point Ok. After all, how do we get an image or how do
we, how can we see an object? You find that there must be some light source. If I take an
image in the daylight this light source is usually the sun, so the light from the light source
falls on the object surface. It gets reflected, reaches our eye and then only we can see that
particular object. So here you will find that this r(x, y) that represents that reflectivity of the
point on the object surface which is used from where the light gets reflected and falls on the
image intent. And this i(x, y), it represents the intensity of the incident light. So if I take the
product of the reflectivity and the intensity, these two terms are responsible for giving an
intensity at a particular point in the image. Now if we look at this, you find if this is an analog
image then how many points we can have on this image? Obviously there are, there is
information at every possible point both in the x direction and the y direction. That means
there are infinite number of points in this particular image. And at every point the intensity
value is also continuous between some minimum and some maximum and theoretically the
minimum value can be 0 and the maximum value can be infinite. So can we represent or can
we store such an image in a digital computer where I have infinite number of points and I
have infinite possible intensity values? Obviously not! So what we have to do is, we have to
go for some processing of this image
58
(Refer Slide Time 31:13)
and what we do is, instead of storing all the intensity values of all possible points in the
image we try to take samples of the image on a regular grid. So here the grid is superimposed
on this particular image and what we do is we take samples, image samples at various grid
points, Ok. So the first level that we need for representation of an image in a digital computer
is spatial discretization by grids. And once we get these sample values, at every point the
value of that particular sample is again continuous. So it can assume any of the infinite
possible values, which again cannot be represented in a digital computer. So after sampling
the second operation that we have to do is discretization of the intensity values of different
samples, the process which is called quantization.
(Refer Slide Time 32:14)
59
So effectively what we need is an image is to be represented by a matrix like this, Ok. So this
is a matrix of finite dimension. It has m number of rows and n number of columns. Typically
for image processing applications, the image size which is used is either 256 x 256 elements,
512 x 512 elements, 1 K x 1 K elements and so on. Each of these elements in this matrix
representation is called a pixel or a pel. Now, quantization of these matrix elements, now you
find that each of these locations represents a particular grid location where I have to, I have
stored a particular sample value. Each of these sample values are quantized and typically for
image processing applications, the quantization is done using 8 bits for black and white
image and using 24 bits for color image. Because in case of color, there are 3 color planes
red, green and blue. For each of the planes, if I use 8 bits for quantization, then it gives us 24
bits which is used for representation of digital color image.
So here you find that we just show an example that given this image if I take a small
rectangular area somewhere here, then intensity values of that rectangular area is given by a
matrix like this.
60
(Refer Slide Time 33:54)
Now let us see what are the steps in digital image processing techniques. Obviously the first
step is image acquisition. The next step after acquisition is we have to do some kind of
processing which are known as preprocessing which takes care of removing the noise or
enhancement of the contrast and so on. The third operation is segmentation that is
partitioning an input image into constituent parts of objects. This segmentation is also
responsible for extracting the object points from the boundary points.
(Refer Slide Time 34:32)
After segmentation, the next step is to extract some description of image objects which are
suitable for further computer processing. So these steps
61
(Refer Slide Time 34:45)
are mainly used for machine vision applications Then we have to go for recognition. So once
we get description of the objects, from those descriptions we have to interpret or recognize
what the object is. And the last step is
(Refer Slide Time 35:01)
the knowledge base, what the knowledge base helps for efficient processing as well as inter-
module cooperation of all the previous processing steps.
62
(Refer Slide Time 35:12)
So here we have shown all those different steps with the help of a diagram where the first
step is image acquisition, the second step is preprocessing, then go for segmentation, then go
for representation and description and finally go for recognition and interpretation and you
get the image understanding result. And at the core of the system, we have shown a
knowledge base and here you find that the knowledge base has a link with all these different
modules, so the different modules can take help of the knowledge base for efficient
processing as well as for communicating or exchanging the information from one module to
another module. So these are the different steps
63
which are involved in digital image processing techniques and in our subsequent lectures we
will elaborate on these different processing steps one after another. Thank you.
64
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 01 Lecture Number 03
Image Digitalization, Sampling Quantization and Display
65
We will also talk about what is meant by signal bandwidth. We will talk about how to select
the sampling frequency of a given signal, and we will also see the image reconstruction
process from the sampled values.
(Refer Slide Time 00:42)
So in today's lecture, we will try to find out the answers to 3 basic questions. The first
question is why do we need digitization? Then we will try to find the answer to what is meant
by digitization and thirdly, we will go to how to digitize an image.
So let us talk about this one after another. Firstly, let us see that why image digitization is
necessary.
66
(Refer Slide Time 01:17)
You will find that in this slide that we have shown an image, this is the image of a girl and as
we have just indicated in our introductory lecture that an image can be viewed as a two
dimensional function given in the form of f(x, y). Now this image has certain length and
certain height. The image that has been shown here has a length of L; this L will be in units
of distance or units of length. Similarly, the image has a height of H which is also in units of
distance or units of length. Any point in this two dimensional space will be identified by the
image coordinates x and y. Now you will find that conventionally we have said that x axis is
taken vertically downwards and y axis is taken as horizontal. So every coordinate in this two-
dimensional space will have a limit like this. That value of x will vary from 0 to H and value
of y will vary from 0 to L. Now if I consider any point (x, y) in this image, the point (x, y) or
the intensity or the color value at point(x, y) , which can be represented as function of x and y
where(x, y) identifies a point in the inner space, that will be actually a multiplication of two
terms. One is r(x, y) and other one is i(x, y) . We have said during our introductory lecture
that this r(x, y) represents the reflectance of the surface point of which this particular image
point corresponds to. And i(x, y) represents the intensity of the light that is falling on the
object surface. Theoretically this r(x, y) can vary from 0 to 1 and i(x, y) can vary from 0 to
infinity. So a point f(x, y) in the image can have a value anything between 0 to infinity. But
practically intensity at a particular point or the color at a particular point given by (x, y) that
varies from certain minimum that is given by Imin and certain maximum Imax. So the
intensity at this point (x, y) that is represented by (x, y) will vary from minimum intensity
value to certain maximum intensity value.
67
Now you will find the second figure in this particular slide. It shows that if I take a horizontal
line on this image space and if I plot intensity values along that line, the intensity profile will
be something like this. It again shows that this is the minimum intensity value along that line
and this is the maximum intensity value along the line. So the intensity at any point in the
image or intensity along a line whether it is horizontal or vertical, can assume any value
between the maximum and the minimum.
Now here lies the problem. When we consider a continuous image which can assume any
value, an intensity can assume any value between certain minimum and certain maximum and
the coordinate points x and y they can also some value between, x can vary from 0 to H, y
can vary from 0 to L.
Now from the Theory of Real Numbers, you know that given any two points, that is, between
any two points there are infinite numbers of points. So again when I come to
68
(Refer Slide Time 05:21)
this image, as x varies from 0 to H, there can be infinite possible values of x between 0 and
H. Similarly, there can be infinite values of y between 0 and L. So effectively
69
(Refer Slide Time 05:42)
this image in a computer then this image has to be represent by infinite number of points.
And secondly when I consider the intensity value at a particular point, we have seen
70
(Refer Slide Time 05:56)
that the intensity value f(x, y) it varies between certain minimum “Imin” and certain
maximum “Imax”. Again if I take these two “Imin” and “Imax” to be minimum and
maximum intensity values possible
but here again the problem is the intensity values, the number of intensity values that can be
between minimum and maximum is again infinite in number; so which again means that if I
want to represent an intensity value in a digital computer then I have to have infinite number
of bits to represent an intensity value. And obviously such a representation is not possible in
any digital computer.
71
So naturally we have to find out a way out,
(Refer Slide Time 06:50)
that is our requirement is we have to represent this image in a digital computer in a digital
form. So what is the way out? In our introductory lecture, if you remember that we have said
that instead of considering every possible point in the image space, we will take some
discrete set of points and those discrete set of points are decided by grid. So if we have a
uniform rectangular grid, then at each of the grid locations we can take a particular point and
we will consider the intensity at that particular point. So this is a process which is known as
sampling.
72
So what is desired is we want that an image should be represented in a form of finite two-
dimensional matrix like this. So this is a matrix representation of an image and this matrix
has got finite number of elements. So if you look at this matrix, you find that this matrix has
got m number of rows, varying from 0 to m minus 1 and the matrix has got n number of
columns, varying from 0 to n minus 1. Typically for image processing applications we have
mentioned that the dimension is usually taken either as 256 x 256 or 512 x 512 or 1 K x 1 K
and so on. But still whatever be the size, the matrix is still finite. We have finite number of
rows and we have finite number of columns. So after sampling what we get is an image in the
form of a matrix like this.
Now the second requirement is, if I don't do any other processing on these matrix elements,
now what this matrix element represents? Every matrix element represents an intensity value
in the corresponding image location. And we have said that these intensity values or the
number of intensity values can again be infinite between certain minimum and maximum
which is again not possible to be represented in a digital computer. So here what we want is
each of the matrix elements should also assume one of finite discrete values. So when I do
both of these,
that is first operation is sampling to represent the image in the form of a finite two-
dimensional matrix and each of the matrix elements again has to be digitized so that the
intensity value at a particular element or a particular element in a matrix can assume only
73
values from a finite set of discrete values. These two together completes the image
digitization process.
You find that we have shown an image on the left hand side and if I take a small rectangle in
this image and try to find out what are the values in that small rectangle, you find that these
values are in the form of a finite matrix and every element in this rectangular, in this small
rectangle or in this small matrix assumes an integer value. So an image when it is digitized
will be represented in the form of a matrix like this.
74
So typically what we have said till now, it indicates that by digitization what we mean is an
image representation by a 2D, two-dimensional finite matrix, the process known as sampling
and the second operation is each matrix element must be represented by one of the finite set
of discrete values and this is an operation which is called quantization. In today's lecture
(Refer Slide Time 10:56)
we will mainly concentrate on the sampling; and quantization we will talk about later.
Now let us see that how,
what should be the different blocks in an image processing system. Firstly, we have seen that
computer processing of images requires that images be available in digital form and so we
have to digitize the image. And the digitization process is a two-step process. The first step is
75
sampling and the second step is quantization. Then finally when we digitize an image
processed by computer, then obviously our final aim will be that we want to see
(Refer Slide Time 11:51)
So we have to display the image on a display device. Now when then image has been
processed, the image in the digital form but when we want to have the display, we must have
the display in the form of analog. So whatever process we have done during digitization,
during visualization or during display, we must do the reverse process. So for displaying the
images, it is, it has to be first converted into the analog signal which is then displayed on a
normal display. So if we just look in the form of a block diagram
76
it appears something like this.
That while digitization, first we have to sample the image by a unit which is known as
sampler, then every sampled values we have to digitize, the process known as quantization
and after quantization
And when we want to see the processed image that is how does the image look like after the
processing is complete then for that operation it is the digital computer which gives the
digital output.
77
This digital output goes to D to A converter and finally the digital to analog converter output
is fed to the display and on the display we can see that how the processed image looks like.
(Refer Slide Time 13:25)
Now
let us come to the first step of the digitization process that is sampling. To understand
sampling, before going to the two-dimensional image let us take an example from one
dimension. That is, let us assume that we have a one-dimensional signal x(t) which is a
function of t. Here we assume this t to be time and you know that whenever some signal is
represented as a function of time, whatever is the frequency content of the signal that is
represented in the form of Hz and this Hz means it is cycles per unit time. So here again,
78
when you look at this particular signal x(t), you find that this is an analog signal that is t can
assume any value, t is not discretized. Similarly, the functional value x(t) can also assume any
value between certain maximum and minimum. So obviously this is an analog signal and we
have seen that an analog signal cannot be represented in a computer. So what is the first step
that we have to do? As we said, for the digitization process, the first operation that you have
to do is the sampling operation.
instead of taking, considering the signal values at every possible value of t, what we do is, we
consider signal values at certain discrete values of t. So here in this figure, it is shown that we
assume the value of the signal x(t), at t = 0. We also consider the value of the signal x(t), at t
= Δts Assume the value of signal x(t), at t = 2Δts, t = 3Δts and so on. So instead of considering
signal values at every possible instant, we are considering the signal values at some discrete
instants of time. So this is a process known as sampling. And here when we are considering
the signal values at an interval of Δts, so we can find out what is the sampling frequency. So
Δts is the sampling interval and corresponding sampling frequency, if I represent it by f s it
becomes 1 upon Δts.
Now when you sample the signal like this, you will find that there are many informations
which are being missed. So for example, here we have a local minimum, here we have a local
maximum, here again we have a local minimum, local maximum, here again we have a local
79
maximum and when we sample at an interval of Δts, these are the informations that cannot be
captured by these samples. So what is the alternative?
(Refer Slide Time 16:39)
The alternative is let us increase the sampling frequency or let us decreasing the sampling
interval. So if I do that you will find that these bold lines, bold golden lines
80
(Refer Slide Time 17:00)
whereas these dotted green lines, they represent the new samples that we want to take. And
when we take the new samples, what we do is we reduce the sampling interval by half. That
is our earlier sampling interval was Δts, now I make the new sampling interval which I
represent as Δts’ = Δts/2; and obviously in this case, the sampling frequency which is fs’ =
1/Δts’, now it becomes twice of fs. That is, earlier we had the sampling frequency of fs, now
we have the sampling frequency of 2fs, twice fs. And when you increase the sampling
frequency, you find that with the earlier samples represented by this solid lines you find that
this particular information that is tip in between these two solid lines were missed. Now when
I introduce a sample in between, then some information of this minimum or of this local
maximum can be retained. Similarly, here, some information of this local minimum can also
be retained. So obviously it says that when I increase the sampling frequency or I reducing
the sampling interval
81
(Refer Slide Time 18:24)
then the information that I can maintain in the sampled signal will be more than when the
sampling frequency is less. Now the question comes, whether there is a theoretical
background by which we can decide that what is the sampling frequency, proper sampling
frequency for certain signals, that we can decide. We will come to that a bit later.
Now let us see that what does this sampling actually mean?
We have seen that we have a continuous signal x(t) and for digitization instead of considering
the signal values at every possible value of t, we have considered the signal values at some
discrete instants of time t, Ok. Now this particular sampling process can be represented
82
(Refer Slide Time 19:16)
mathematically in the form that if I have, if I consider that I have a sampling function and this
sampling function is one-dimensional array of Dirac delta functions which are situated at a
regular spacing of Δt. So this sequence of Dirac delta functions can be represented in this
form. So you find that each of these are sequence of Dirac delta functions and the spacing
between two functions is Δt. In short, these kind of function is represented by comb function,
a comb function t at an interval of Δt and mathematically this comb function can be
m
represented as comb(t;Δt) (t-mΔt).
m
Now this is the Dirac delta function. The Dirac delta function says that if I have a Dirac delta
function Δt then the functional value will be 1, whenever t = 0 and the functional value will
be 0 for all other values of t. In this case when I have Δt- mΔt, then this functional value will
be 1 only when this quantity, that is, t – mΔt, = 0. That means this functional value will
assume a value 1 whenever t = mΔ t for different values of m varying from -∞ to ∞.
So effectively this mathematical expression gives rise to a series of Dirac delta functions in
this form where at an interval of Δt, I get a value of 1. For all other values of t, I get values of
0. Now this sampling
83
(Refer Slide Time 21:14)
as you find that we have represented the same figure here, we had this continuous signal x(t),
original signal. After sampling we get a number of samples like this. Now here, these samples
can now be represented by multiplication of x(t) with the series of Dirac delta functions that
you have seen, that is, comb(t; Δt). So if I multiply this, whenever this comb function gives
me a value 1, only the corresponding value of t will be retained in the product and whenever
this comb function gives you a value 0, the corresponding points, the corresponding values of
x(t) will be set to 0. So effectively, this particular sampling, when from this analog signal,
this continuous signal, we have gone to this discrete signal, this discretization process can be
represented mathematically as Xs(t)=X(t).comb(t,Δt) and if I expand this comb function and
consider only the values of t, where this comb function has a value 1, then this mathematical
m=¥
expression is translated Xs(t)= X(mΔt) (t-mΔt) , right.
m=-¥
84
(Refer Slide Time 22:43)
So after sampling what you have got is, from a continuous signal we have got the sampled
signal represented by Xs(t) where the sample values exist at discrete instants of time.
Sampling, what we get is a sequence of samples as shown in this figure where Xs(t) has got
the signal values at discrete time instants and during the other time intervals, the value of the
signal is set to 0. Now this sampling will be proper if we are able to reconstruct the original,
continuous signal x(t) from these sample values. And you will find out that while sampling
we have to maintain certain conditions so that the reconstruction of the analog signal x(t) is
possible.
85
Now let us look at some mathematical background which will help us to find out the
conditions which we have to impose for this kind of reconstruction. So here you find that if
we have a continuous signal in time which is represented by x(t)
then we know that the frequency components of this signal x(t) can be obtained by taking the
Fourier transform of this x(t). So if I take the Fourier transform of x(t) which is represented
by F{x(t)}, which is also represented in the form of X(ω), where ω is the frequency
component and mathematically this will be represented as F{x(t)}= x(t)e-jωt dt . So this
86
Fourier transform of the signal x(t). Now this is possible if the signal x(t) is aperiodic. But
when the signal x(t) is periodic, in that case the instead of taking the Fourier transform, we
have to go for Fourier series expansion. And the Fourier series expansion of a periodic signal
say v(t)
87
(Refer Slide Time 26:25)
n = -∞ to ∞.
(Refer Slide Time 26:31)
Now in this case the c(n) is known as Fourier coefficient. So nth Fourier coefficient and the
v(t)e
-jnωot
value of c(n) = 1
To dt .
To
88
(Refer Slide Time 26:46)
Now in our case when we have this v(t) in the form of series of Dirac delta functions, in that
case we know
(Refer Slide Time 27:24)
we know that value of v(t) will be = 1; when t = 0 and value of v(t) is = 0 for any other value
of t within a single period.
So in our case
89
(Refer Slide Time 27:40)
To, that is the period of this periodic signal = Δts because every delta function appears at an
interval of Δts. And we have v(t) = 1, for t = 0 and v(t) = 0, otherwise. Ok.
90
(Refer Slide Time 28:23)
that the value of this integral will exist only at t = 0 and it will be 0 for
(Refer Slide Time 28:28)
91
(Refer Slide Time 28:35)
Becomes, c(n) = 1/Δts and this is nothing, but the sampling frequency we put at, say ωs. So
this is the frequency of the sampling signal.
(Refer Slide Time 28:52)
92
(Refer Slide Time 28:57)
v(t) can be represented as v(t ) 1
Ts e
n
jn ot
. So what does it mean? This means that if I take
the Fourier series expansion of our periodic signal which is in our case Dirac delta function,
this will have frequency components
(Refer Slide Time 29:33)
various frequency components for the fundamental component of frequency is ωo and it will
have other frequency components of 2ωo, 3ωo, 4ωo and so on.
93
(Refer Slide Time 29:50)
So if I plot those frequencies or frequency spectrum we find that will have the fundamental
frequency ωo naught or in this case, this ωo is nothing but same as the sampling frequency ωs.
We will also have a frequency component of 2ωs, we will also have a frequency component
of 3ωs and this continues like this. So you will find that the comb function as the sampling
function
(Refer Slide Time 30:22)
that we have taken, the Fourier series expansion of that is again a comb function. Now this is
about the continuous domain.
94
(Refer Slide Time 30:38)
for a discrete time signal say x(n) where n is the nth sample of the signal x(n), the Fourier
N-1
Transform of this is given by X(k)= x(n)e
-j( 2π )nk
N
, where value of n varies from 0 to N-1,
n=0
where this N indicates number of samples that we have for which we are taking the Fourier
transform. And given this Fourier Transform, we can find out the original sampled signal by
N-1
the inverse Fourier transformation which is obtained as x(n)= X(k)e
j( 2π )nk
N
. So you find that
k=0
we get a Fourier Transform pair, in one case from the discrete time signal, we get the
frequency component, discrete frequency components by the forward Fourier transform and
in the second case, from the frequency components, we get the discrete time signal by the
inverse Fourier transform. And these two equations taken together form a Fourier Transform
pair.
95
(Refer Slide Time 32:32)
Thank you.
96
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module Number 01 Lecture Number 04
Signal Reconstruction Form Samples: Convolution Concept
(Refer Slide Time 00:17)
you find that we have represented our sampled signal as Xs(t) = x(t).comb(t,Δt) , Ok.
97
(Refer Slide Time 00:54)
So what we are doing is, we are taking 2 signals in time domain and we are multiplying these
2 signals. Now, what will happen if we take Fourier Transform of these 2 signals? Or let us
put it like this. I have 2 signals, x(t) and I have another signal say h(t). Both these signals are
in the time domain. We define an operation called convolution which is defined as h(t)* x (t).
This convolution operation is represented as h(t)*x(t)= h(τ)x(t-τ)dτ .
-
98
(Refer Slide Time 01:59)
Now what does it mean? This means that whenever we want to take the convolution of two
signals h(t) and x(t) then, firstly what we are doing is, we are time-inverting the signal x(t).
So instead of taking x(τ) we are taking x(τ). So if I have 2 signals of this form, say h(t) is
represented like this and we have a signal say x(t) which is represented like this
Then, what we have to do is, as our expression says that the convolution of h(t) and x(t) is
nothing but h(t)*x(t)= h(τ)x(t-τ)dτ .
99
(Refer Slide Time 03:02)
100
(Refer Slide Time 03:17)
Then what we have to do is, for convolution purpose we are taking h of tau and x of minus
tau. So if I take x of minus t, this function will be like this. So this is x of minus t.
And for this integration, we have to take h(τ), for a value of τ and x(-τ), that has to be
translated by this value t and then the corresponding values of h and x have to be multiplied
and then you have to the integration from -∞ to ∞.
So if I take an instance like this, Ok. So at this point I want to find out what is the
convolution value. Then I have to multiply the corresponding values of h with these values of
101
x, each and every time instance I have to do the multiplication, then I have to integrate from -
∞ to ∞.
I will come to application of this a bit later. Now let us see that, if we have a convoluted
signal. Say we have h(t), which is convoluted with x(t); and if I want to take Fourier
Transform of this signal, then what we will get?
102
(Refer Slide Time 05:41)
So this is the Fourier Transform of the convolution of those 2 signals, h(t) and x(t). Now if
you do this integration, you will find that the same integration can be written in this
-jω
x(t-τ)e dt e dτ
-jω(t- )
form, F{h(t)*x(t)} = h(τ)
Now you will find that what does this inner integral mean? From the definition of Fourier
Transform, this inner integral is nothing but the Fourier Transform of x(t).
So,
103
(Refer Slide Time 06:53)
h(τ)X( )e
-jω
So this expression F{h(t)*x(t)} = dτ
Now what I can do is, because this X(ω) is independent of τ, so I can take out this X(ω) from
this integral. So my expression will now be, F{h(t)*x(t)} = X( ) h(τ)e-jω dτ .
104
(Refer Slide Time 07:51)
Again you will find that from the definition of Fourier Transformation, this is nothing but the
Fourier Transformation of the time signal h(t). So effectively this expression comes out to be
X(ω).H(ω), where X(ω) is the Fourier Transform of the signal x(t) and H(ω) is the Fourier
Transform of the signal h(t).
So effectively this means that if I take the convolution of 2 signals x(t) and h(t) in time
domain, this is equivalent to multiplication of the two signals in the frequency domain.
105
(Refer Slide Time 08:42)
So convolution of two signals x(t) and h(t) in the time domain is equivalent to multiplication
of the same signals in the frequency domain. The reverse is also true. That is, if we take the
convolution of X(ω) and H(ω)in the frequency domain, this will be equivalent to
multiplication of x(t) and h(t) in the time domain.
So both these relations are true and we will apply these relations to find out
106
(Refer Slide Time 09:23)
So now let us come back to our original signal. So here we have seen
that we have been given these sample values and from the sample values, our aim is to
reconstruct this continuous signal x(t). And we have seen
107
(Refer Slide Time 09:51)
that this sampling is actually equivalent to multiplication of two signals in the time domain,
one signal is x(t) and the other signal is comb function, comb(t, Δt). So these relations as we
have said that these are true that if I multiply 2 signals x(t) and y(t) in time domain that is
equivalent to convolution of the two signals X(ω) and Y(ω)in the frequency domain.
Similarly, if I take the convolution of two signals in time domain, that is equivalent to
multiplication of the same signals in frequency domain.
So for sampling when you have said that you have got Xs(t),
108
that is the sampled values of the signal x(t) which is nothing but multiplication of x(t) with
the series of Dirac delta functions represented by comb(t, Δt). So that will be equivalent to, in
frequency domain I can find out Xs(ω),
which is equivalent to the frequency domain representation X(ω) of the signal x(t) convoluted
with the frequency domain representation of the comb function, comb(t, Δt) and we have
seen that this comb function, the Fourier Transform or the Fourier series expansion
109
(Refer Slide Time 11:25)
we have another comb function in the frequency domain and we have to take the convolution
of these two.
110
(Refer Slide Time 11:35)
What does this convolution actually mean? Here we have taken 2 signals h(n) and x(n), both
of them for this purpose are in the sample domain. So h(n) is represented by this and x(n) is
represented by this. You will find that this h(n) is actually nothing but a comb function where
the Δts, in this case, we have value of h(n) = 1, at n = 0, we have value of h(n) = 1 at n = -9,
we have value of h(n) = 1, at n = 9. And this thing repeats. So this is nothing but
representation of a comb function. And if I assume that my x(n) is of this form that is at n =
0, value of x(n) = 7, x (-1) that, n = -1 it is 5, for n-1, it is 2. Similarly, on this slide, for n = 1,
x(1) = 9 and x(2) = 3; and the convolution expression that we have said in the continuous
domain.
In discrete data domain the convolution expression is translated to this form, that is
y(n)= h(m)x(n-m) . So let us see,
m
111
(Refer Slide Time 13:11)
that how this convolution actually takes place. So if I really understand this particular
expression that y(n)= h(m)x(n-m) , we said that this actually means that we have to take
m
the time inversion of the signal x(n). So if I take the time inversion, the signal will be
something like this, 3, 9, 7, 5 and 2 and when I take the convolution, that is, I want to find out
the various values of y n that particular expression can be computed in this form. So if I want
to take the value of y(-11), so what I have to do is, I have to give a translation of -11 to this
particular signal x(-m), so it comes here. Then I have to take the summation of this product
from m = -∞ to ∞. So here what does it do? You will find that I do point by point
multiplication of these signals. So here 0 multiplied with 3 + it will be 0 multiplied with 9 + 0
multiplied with 7 + 0 multiplied with 5 + 1 multiplied with 2, so the value I get is 2. And this
2 comes at this location y(-11).
Now for getting the value of y(-10), again I do the same computation and here you find that
this 1 gets multiplied with 5 and all other values get multiplied with 0. And when you take
the summation of all of them I get 5 here. Then I get value at -10, I get 7 here, following the
same operation, sorry this is at -9. I get at - 8, I get at - 7. I get at - 6. At - 6, you find that the
value is 0. If I continue like this here, again at n = - 2, I get value = 2. At n = - 1, I get value =
5. At n = 0, I get value of 7. At n = plus 1, I get value of 9, at n = plus 2, I get value of 3, at n
= 3, again I get the value of 0.
112
So if I continue like this, you will find that after completion of this convolution process, this
h(n) convoluted with x(n) gives me this kind of pattern. And here you notice one thing, that
when I have convoluted this x(n) with this h(n), the convolution output y n, this is, you just
noticed this that it is the repetition of the pattern of x(n) and it is repeated at those locations
where the value of h(n) was = 1. So by this convolution, what I get is, I get repetition of the
pattern x(n) at the locations of delta functions in the function h(n).
So by applying this, when I convolute 2 signals, x(t) and the Fourier Transform of this comb
function that is COMB(ω) in the frequency domain, what I get is,
113
something like this.
When x(t) is band limited, that means the maximum frequency component in the signal x(t) is
say ωo, then the frequency spectrum of the signal x(t) which is represented by X(ω) will be
like this. Now when I convolve this with this comb function, COMB(ω), then as we have
done in the previous example what I get is at those locations where the comb function had a
value 1 .I will get just a replica of the frequency spectrum X(ω). So this X(ω) gets replicated
at all these locations.
So what we find here? You find that the same frequency spectrum X(ω), when it gets
translated like this, when x(t) is actually sampled. That means the frequency spectrum of Xs
or Xs(ω) is like this. Now this helps us in reconstruction of the original signal x(t). So here
what I do is, around ω = 0, I get a copy of the original frequency spectrum. So what I can do
is, if I have a low pass filter whose cutoff frequency is just beyond ωo, and this frequency
signal, this spectrum, the signal with this spectrum I pass through that low pass filter, in that
case the low pass filter will just take out this particular frequency band and it will cut out all
other frequency bands. So since I am getting the original frequency spectrum of x(t), so
signal reconstruction is possible. Now here you notice one thing. As we said we will just try
to find out that what is the condition that original signal can be reconstructed. Here you find
that we have a frequency gap between this frequency band and this translated frequency
band. Now the difference of, between center of this frequency band and the center of this
frequency band is nothing but 1 /Δts or ωs, that is the sampling frequency.
Now as long as this condition, that is 1 /Δts – ωo is greater than ωo, that is the lowest
frequency of this translated frequency band is greater than the highest frequency of the
original frequency band, then only these 2 frequency bands are disjoint. And when these 2
frequency bands are disjoint, then only by use of a low-pass filter, I can take out this original
frequency band. And from this relation, you get the condition that 1 /Δts or the sampling
frequency ωs, in this case it is represented as fs must be > 2 ωo, where ωo is the highest
frequency component in the original signal x(t). And this is what is known as Nyquist rate.
That is, we can reconstruct, perfectly reconstruct
114
(Refer Slide Time 19:56)
the continuous signal only when the sampling frequency is greater than,
more than twice the maximum frequency component of the original continuous signal.
115
(Refer Slide Time 20:08)
Thank you.
116
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 01 Lecture Number 05
Signal Reconstruction from Image
(Refer Slide Time 00:17)
We will also talk about the Optimum Mean Square Error or Lloyd-Max Quantizer. Then we
will also talk about that how to design an optimum quantizer which, with the given signal
probability density function.
117
(Refer Slide Time 00:43)
Now let us briefly recapitulate that what we have done in the last class.
118
(Refer Slide Time 00:56)
which is a function of
a single variable, say t. And, then what we have done is, we have sampled this one
dimensional signal with a sampling function which is represented in the form of combo
function, say comb(t, Δt) and we get the sample values as represented by Xs(t). And we have
also said that this Xs(t) can be represented in the form of multiplication of x(t) by comb(t, Δt).
Now the same function can also be represented in the form, Xs(t)= x(mΔt) (t-mΔt) , this
m=-
119
gives you, what is the comb function. So this Xs(t), that is the sampled value, that is the
sample version of the signal x(t) can be represented in the, Xs(t)= x(mΔt) (t-mΔt) .
m=-
Now our problem is, that given these sample values, how to reconstruct the original signal
x(t) from the sample values of
120
(Refer Slide Time 02:41)
and for this purpose we have introduced what is known as Convolution Theorem. The
Convolution Theorem says if you have two signals x(t)
and y(t) in time domain, then the multiplication of x(t) and y(t) in time domain is equivalent
to, if you take the convolution of the frequency spectrum of x(t) and frequency spectrum of
y(t) in the frequency domain. So that, that is to say that x(t). y(t) is equivalent to
121
(Refer Slide Time 03:17)
X(ω) convoluted with Y(ω). Similarly, if you take the convolution of x(t) and y(t) in time
domain, that is equivalent to multiplication of X(ω) and Y(ω) in the frequency domain. So by
using this concept of the convolution theory, we will see that how to reconstruct the original
signal x(t) from the sampled values of Xs(t). Now as per this Convolution Theorem, we have
seen that Xs(t) is nothing but multiplication of x(t) into the comb function comb(t, Δ ts). So in
the frequency domain that will be equivalent to Xs(ω) is equal to X(ω) convoluted with the
frequency spectrum of comb(t, Δ ts) where Δ ts is the sampling interval.
122
(Refer Slide Time 04:27)
or the bandwidth of the signal, frequency spectrum of the signal which is presented here and
this is the frequency spectrum of the sampling function, then when these two, I, we
convolute, the convolution result will be like this, where the original frequency spectrum of
the signal gets replicated along the frequency axis at an interval, at an interval of 1/ Δts, where
1/ Δts is nothing but the sampling frequency fs. And here you find that for proper
reconstruction what you have to do is, this original spectrum, the spectrum of the original
signal has to be taken out and if we want to take out this, then we have to make use of a filter
which will only take out this particular band and the remaining frequency components will
simply be discarded. And for this filtering operation to be successful, we must need that 1/Δts-
ωo, where ωo is the bandwidth of the signal or the maximum frequency component present in
the signal x(t), so 1/Δts - ωo must be greater than or equal to ωo. And that leads to the
condition that the sampling frequency fs must be greater than twice of ωo, where ωo is the
bandwidth of the signal and this is what is the Nyquist rate.
123
(Refer Slide Time 06:02)
sampling frequency is less than twice of ωo? In that case as it is shown in this figure, you will
find that
124
(Refer Slide Time 06:13)
subsequent frequency bands after sampling, they overlap. And because of this overlapping, a
single frequency band cannot be extracted using any of the low-pass filters. So effectively, as
a result what we get is,
after low pass filtering the signal, which is reconstructed is a distorted signal, it is not the
original signal. And this effect is what is known as aliasing.
So now let us see what happens in case of two-dimensional image which is a function of two
variables x and y. Now find that here, in this slide
125
(Refer Slide Time 06:55)
we have shown two figures. On the top we have shown the same signal x(t) which we have
used earlier which is a function of t and the bottom figure is an image which a function of
two variables x and y. Now if t is time, in that case x(t) is a signal which varies with time.
And for such a signal the frequency is measured as you know in terms of Hertz which is
nothing but cycles per unit time. Now how do you measure the frequency in case of an
image? You find that, in case of an image, the dimension is represented either in the form of
say 5 cm x 5 cm or say 10 cm x 10 cm and so on. So for an image, when we measure the
frequency, it has to be cycles per unit length, not the cycles per unit time as is done in case of
a time varying signal.
126
Now in this figure we have shown that as we in case of the signal x(t), we had its frequency
spectrum represented by X(ω) and we say that the signal x(t) is band limited if X(ω) is equal
to 0 for ω is greater than ωo, where ωo is the bandwidth of the signal x(t). Similarly, in case of
an image, because the image is a two-dimensional signal which is a variable, which is a
function of two variables x and y, so it is quite natural that in case of image we will have
frequency components which will have two components, one in the x direction and other in
the y direction. So we call them ωx and ωy. So we say that the image is band limited if
127
and ωy > ωyo; so in this case, the maximum frequency component in the x direction is ωx and
the maximum frequency component in the y direction is ωy. And this figure on the bottom
left shows how the frequency spectrum of an image looks like.
the base of this frequency spectrum on the ωx ωy plane is what is known as the region of
support of the frequency spectrum of the image.
128
(Refer Slide Time 10:00)
Now let us see what happens in case of two dimensional sampling or when we try to sample
an image. The original image is represented by the function f(x, y), Ok and as we have seen
in case of a two-dimensional, one-dimensional signal that fx(t) is multiplied by comb(i, Δt)
for the sampling operation, in case of image also, f(x, y) has to be multiplied by comb(x, y,
Δx, Δy) to give you the sample signal fs(x, y). Now this comb function, because it is again a
function of 2 variables x and y is nothing but a two dimensional array of the delta functions,
where along x direction the spacing is Δx, along y direction the spacing is Δy. So again as
before, this fs(x, y) can be represented as, fs x, y = f mΔx, nΔy . x-mx, y-ny .
m n
129
(Refer Slide Time 11:41)
if we want to find out frequency spectrum of this sampled image, then this frequency
spectrum of the sampled image Fs(ωx, ωy) will be same as F(ωx, ωy), which is the frequency
spectrum of the original image f(x, y), which has to be convoluted with COMB(ωx, ωy),
where COMB(ωx, ωy) is nothing but the Fourier Transform of comb(x, y, Δx, Δy), Ok and if
you compute this Fourier Transform you find that COMB(ωx, ωy) will come in the form,
COMB ω x ,ω y = ωxs ω ys (ω x -mω xs ,ω y -nω ys ) . Here this ω xs and ω ys ; ωxs is nothing
m n
but 1/Δx, which is the sampling frequency along the x direction and ω ys s is equal to 1/Δy,
130
So, coming back to similar concept as we have done in case of one-dimensional signal x(t)
that Fs(ωx, ωy), which is now the convolution of F(ωx, ωy), which is the frequency spectrum
of the original image convoluted with COMB ωx ,ω y , where COMB ωx ,ω y is the Fourier
Transform of the sampling function in two-dimension. And as we have seen earlier that such
a type of convolution operation replicates the original, the frequency spectrum of the original
signal in along the ω axis in case of one dimensional signal, so here again
in case of two-dimensional signal this will be replicated, the original signal will be replicated
131
(Refer Slide Time 13:52)
a two-dimensional array of the spectrum of the image as shown in this particular figure.
So here again you find that the, we have simply shown the region of support getting
replicated, that you find that along y direction and along x direction, the spectrum gets
replicated and the spacing between two subsequent frequency band along the x direction is
equal to ω xs which is nothing but 1/Δx and along y direction the spacing is 1/Δy which is
Now if we want to reconstruct the original image from this particular spectrum, then what
you have to do is
132
we have to take out a particular frequency band, say a frequency band which is around the
origin in the frequency domain. And if we want to take out this particular frequency band,
then as we have seen before that this signal has to be low-pass filtered and if we pass to this
to a low-pass filter whose response is given by H ωx ,ω y = 1
ω xs ω ys ; for ωx ,ω y in the region R,
where region R just covers this central band. And it is equal to 0 outside this region R. In that
case it is possible that we will be able to take out just these particular frequency components
within this region R by using this low-pass filter. And again for taking out this particular
frequency region the same condition of the Nyquist rate applies, that is sampling frequency in
the x direction must be greater than twice of ωxo which is the maximum frequency
component along x. And sampling frequency along the y direction again has to be greater
than twice of ω yo , which is the maximum frequency component along direction y.
133
(Refer Slide Time 16:17)
4 different images. So here you will find that the first image which is shown here was
sampled with 50 dots/inch or 50 samples/inch. Second one was sampled with 100 dots/inch,
third one with 600 dots/inch and fourth one with 1200 dots/inch. So out of these 4 images you
find the quality of first image is very, very bad. It is very blurred and the details in the image
are not at all recognizable. As we increase the sampling frequency, when we go for the
second image where we have 100 dots/inch, you find that the quality of the reconstructed
image is better than the quality of the first image. But here again, still you find that if you
study this particular region or wherever you have edges, the edges are not really continuous.
They are slightly broken. So if I increase the sampling frequency further, you will find that
these breaks have been smoothed out. So at, with a sampling frequency of 600 dots/inch, the
quality of the image is quite acceptable. Now if we increase the sampling frequency further
when we go from 600 dots/inch to 1200 dots/inch sampling rate, you find that the
improvement in the image quality is not that much, as the improvement we have got when we
moved from say, 50 dots/inch to 100 dots/inch or 100 to 600 dots/inch. So it shows that when
your sampling frequency is above the Nyquist rate you are not going to get any improvement
of the image quality, where as when it is less than the Nyquist rate, the sampling frequency is
less than the Nyquist rate, the reconstructed image is very bad. So till now
134
(Refer Slide Time 18:24)
we have covered the first phase of the image digitization process, that is sampling and we
have also through the examples of the reconstructed image, that if we vary the sampling
frequency below and above the Nyquist rate, how the quality of the reconstructed image is
going to vary. Thank you.
135
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 01 Lecture Number 06
Quantizer Design
(Refer Slide Time 00:17)
Hello, welcome to the course on Digital Image Processing. The first phase of the image
digitization process, that is quantization and we have also seen through the examples of these
reconstructed image that if we vary the sampling frequency below and above the Nyquist
rate, how the quality of the reconstructed image is going to vary. So now let us go to the
second phase that is quantization of the sample values
136
Now this quantization is a mapping of the continuous variable u to a discrete variable u’,
where u’ takes values from a set of discrete variables. So if your input signal is say u, after
quantization the quantized signal becomes u’, where u’ is one of the discrete variables as
shown in this case as r1 to rL. So we have L number of discrete variables from r1 to rL and u’
takes a value of one of these variables.
Now what is this quantization?
You find that after sampling of a continuous signal, what we have got is a set of samples.
These samples are discrete in time domain, Ok.
137
But still every sample value is an analog value. It is not a discrete value. So what we have
done after sampling is, instead of considering all possible time instants, the signal values at
all possible time instants, we have considered the signal values at some discrete time instants.
(Refer Slide Time 02:12)
And at each of these discrete time instance, I get a sample value. Now the value of this
sample is still an analog value. Similar is the case with an image.
So here, in case of an image the sampling is done in two-dimensional grids where at each of
the grid locations, we have a sample value which is still analog. Now if I want to represent a
sample value on a digital computer, then this analog sample value cannot be represented. So I
have to convert this sample value again in the discrete form. So that is where the quantization
comes into picture.
138
(Refer Slide Time 02:57)
So for quantization what is done is, you define a set of decision or transition levels which in
this case has been shown as transition level tk, where k varies from 1 to L+1. So we have
defined a number of transition levels or decision levels which are given as t1, t2, t3, t4 up to
tL+1, Ok and here t1 is the minimum value and tL+1 is the maximum value. And you also
define a set of the reconstruction levels that is rk. So what we have shown in the previous
slide that the reconstructed value u’ takes one of the discrete values rk, so the quantized value
will take the value rk if the input signal u lies between the decision levels rk and rk+1. So this is
how you do the quantization.
139
So let us come to this particular slide.
So it shows the input output relationship of a quantizer. So it says whenever your input signal
u, so along the horizontal direction we have put the input signal u and along the vertical
direction we have put the output signal u’ which is the quantized signal. So this particular
figure shows that if your input signal u lies between the transition levels t1 and t2, then the
reconstructed signal or the quantized signal will take the value r1. If the input signal lies
between t2 and t3, the reconstructed signal or the quantized signal will take a value r2.
Similarly, if the input signal lies between tk and tk+1, then the reconstructed signal will take
the value of rk and so on. So given an input signal which is analog in nature, you are getting
the output signals which have, which is discrete in nature. So the output signal can take only
one of these discrete values. The output signal cannot take any arbitrary value.
140
(Refer Slide Time 05:34)
Now let us see that what is the effect of this? So as we have shown in this second slide that
ideally we want that whatever is the input signal, the output signal should be same as the
input signal and that is necessary for the perfect reconstruction of the signal. But whenever
we are going for quantization, your output signal, as it takes one of the discrete set of values,
is not going to be same as the input signal always. So in this, in this particular slide, again we
have shown the same staircase function
where along the horizontal direction we have the input signal and in the vertical axis we have
put the output signal.
141
So this pink staircase function shows what is the quantization function that will be used and
this green line which is inclined at an angle of 45o with the u axis, this shows that what
should be the ideal input output characteristics. So if the input output function follows this
green line, in that case, for every possible input signal I have the corresponding output signal.
So the output signal should be able to take every possible value. But when you are using this
staircase function, in that case, because of the staircase effect, whenever the input signal lies
within certain region, the output signal takes a discrete value. Now because of this staircase
function, you are always introducing some error in the output signal or in the quantized
signal. Now let us see that what is the nature of this error.
Here we have shown the same figure. Here you find that when this green line which is
inclined at 45o with the u axis crosses the staircase function, at this point whatever is your
signal value, it is same as the reconstructed value. So only at these crossover points, your
error in the quantized signal will be 0. At all other points, the error in the quantized signal
will be a non-zero value. So at this point the error will be maximum which will, maximum
and negative, which will keep on reducing. At this point, this is going to be 0, and beyond
this point again it is going to increase. So if I plot this quantization error, you find that the
plot of the quantization error will be something like this, between every transaction levels. So
between t1 and t2, the error value is like this. Between t2 and t3, the error continuously
increases. Between t3 and t4, error continuously increases and so on. Now what is the effect of
this error on the reconstructed signal?
142
(Refer Slide Time 08:44)
So for that let us take again a one-dimensional signal f(t), which is a function of t as is
and let us see that what will be the effect of quantization on the reconstructed signal.
143
(Refer Slide Time 09:04)
So here we have plotted the same signal, Ok. So here we have shown the signal is plotted in
the vertical direction so that we can find out what are the transition levels or the part of the
signal which is within which particular transition level. So you find that this part of the signal
is in the transition level say tk-1 and tk. So when the signal, input signal lies between the
transition tk-1 and tk, the corresponding reconstructed signal will be rk-1. So that is shown by
this red horizontal line. Similarly, the signal from this portion to this portion lies in the range
tk and tk+1. So corresponding to this, the output reconstructed signal will be rk, so which is
again shown by this horizontal red line. And this part of the signal, the remaining part of the
signal lies within the range tk+1 and tk+2 and corresponding to this, the output reconstructed
signal will have the value rk+1. So to have a clear figure,
144
(Refer Slide Time 10:22)
you find that in this, the green curve, it shows the original input signal and this red staircase
lines, staircase functions it shows that what is the quantization signal, quantized signal or
f’(t).
Now from this, it is quite obvious that I can never get back the original signal from this
quantized signal, because within this region the signal might have, might have had any
arbitrary value. And the details of that is lost in this quantized form, quantized output. So
because from the quantized signal I can never get back the original signal so we are always
introducing some error in the reconstructed signal which can never be recovered. And this
particular error is known as quantization error or quantization noise. Obviously the
quantization error or quantization noise will be reduced, if the quantizer step size that is the
transition interval say tk to tk+1 reduces. Similarly, the reconstruction step size, rk to rk+1, that
interval is also reduced.
145
(Refer Slide Time 11:42)
So for quantizer design, the aim of the quantizer design will be to minimize this quantization
error. So accordingly we have to have an optimum quantizer and this Optimum Mean Square
Error quantizer known as Lloyd-Max Quantizer, this minimizes the mean square error for a
given a given number of quantization levels.
And here we assume that let u be a real scalar random variable with a continuous probability
density function pU(u) . And it is desired to find the decision levels tk and the reconstruction
levels rk for an L level quantizer which will reduce or minimize
146
Now you remember that u is the input signal and u’ is the quantized signal. So the error of
reconstruction is the input signal
t1
transition level and tL+1was the maximum transition level. So if I just integrate this function,
tL+1
(u-u') p (u)d(u) , I get the mean square error. This same integration can be rewritten in
2
U
t1
ti+1
this form as (u-r ) p (u)d(u)
2
i U because ri is the reconstruction level of the reconstructed
ti
ti+1
ti
summation of this for i equal to 1 to L, Ok. So this modified expression will be same as this
and this tells you that what is the square error of the reconstructed signal. And the purpose of
designing the quantizer will be to minimize this error value.
147
(Refer Slide Time 14:44)
So obviously from school level mathematics we know that for minimization of the error
value, because now we have to design the transition levels and the reconstruction levels
which will minimize the error, so the way to do is, to do that is to differentiate the error
function, the error value with tk and with rk and equating those equations to 0. So if I
differentiate this particular
error value
148
(Refer Slide Time 15:16)
ti+1
(u-r ) p (u)d(u) , in that case what
2
i U
ti
I get is, ξ is the error value, tk (tk - rk-1)2 pU(tk) - (tk - rk) 2 pU(tk) = 0 .
tk+1
Similarly, the second equation,
rk 2 (u-rk)pU(u)du = 0 , where the integration has to be
tk
149
(Refer Slide Time 16:01)
and using the fact that tk - 1 is less than tk we get two values, one is for transition level and
(rk + rk+1)
other one is for the reconstruction level. So the transition level tk is given by tk= and
2
tk+1
upU(u)du
tk
the reconstruction level rk is given by rk = tk+1
. So what we get from these two
pU(u)du
tk
that the optimum transition level tk lie halfway between the optimum reconstruction levels So
that is quite obvious
150
(Refer Slide Time 16:57)
the optimum reconstruction levels in turn lie at the center of mass of the probability density in
between the transition levels. So which is given by the second equation
151
(Refer Slide Time 17:22)
tk+1
upU(u)du
that rk = tk
tk+1
. So this is nothing but the center of mass of the probability density
pU(u)du
tk
or the Lloyd-Max Quantizer gives you the reconstruction value, the optimum reconstruction
value
152
(Refer Slide Time 17:56)
and the optimum transition levels in terms of probability density of the input signal.
Now you find these two equations are non-linear equations
and we have to solve these non-linear equations simultaneously given the boundary values t1
and tL+1 and for solving this one can make use of the Newton method, Newton iterative
method to find out the solutions. An approximate solution or an easier solution will be when
the number of quantization levels is very large. So if the number of quantization levels is very
large you can approximate pU(u) , the probability density function as piecewise constant
function. So how do you do this piecewise constant approximation?
153
(Refer Slide Time 18:52)
So in this figure you see that is a probability density function has been shown
154
(Refer Slide Time 18:58)
which is like a Gaussian function. So we can approximate it this way, that in between the
labels tj and tj + 1, we have the min value of this as tj , which is halfway between tj and tj + 1,
and within this interval we can approximate pU(u) , where pU(u) is actually a non-linear one,
we can approximate this as pU(tj) . So in between tj and tj + 1, that is in between every two
transition levels, we approximate the probability density function to be a constant one which
is same as the probability density function at the midway, halfway between these two
transition levels. So if I do that, this continuous probability density function will be
approximated by staircase functions like this. So if I use this approximation and recomputed
those values, you will find
155
zk+t1
p (u)
-1
A U
3
du
t1
that this t tk+1= tL+1
+t1 where this A, the constant A = tL+1 - t1 and we have said
p (u)
-1
U
3
du
t1
that tL+1 is the maximum transition level and t1 is the minimum transition level and
So we can find out tk + 1 by using this particular formulation, when the continuous probability
density function was approximated by piecewise constant probability density function.
And once we do that, after that we can find out the values of the corresponding reconstruction
levels. Now for solving
156
(Refer Slide Time 21:25)
t1 and tL+1 to be finite. That is the minimum transition level and the maximum transition level,
they must be finite. At the same time, we have to assume t 1 and tL+1, apriori before placement
of decision and reconstruction levels. This t1 and tL+1 are also called as overload points and
these two values determine the dynamic range a of the quantizer.
157
(Refer Slide Time 22:24)
So if you find that when we have a fixed t1 and tL+1, then any value less than t1 or any value
greater than tL+1, they cannot be properly quantized by this quantizer; so this represents that
what is the dynamic range of the quantizer
Now once we get the transition levels then we can find out the reconstruction levels by
averaging the subsequent transition levels. So once I have the reconstruction levels and the
transition levels, then the quantization mean square error can be computed as this, that is the
3
1
t L+1
2
pU(u) du . And this
13
mean square error of this designed quantizer will be
12L t 1
expression gives an estimate of the quantizer error in terms of probability density and the
number of quantization levels.
158
(Refer Slide Time 23:18)
Normally two types of probability density functions are used. One is Gaussian where the
Gaussian probability density function is given by there is an well-known expression
1 - (u - μ) 2
pU(u)= exp 2 and the Laplacian probability density function which is given
2πσ 2 2σ
α
by pU(u)= exp - α u - μ , where μ and σ2 denote the mean and variance of the input signal
2
2
u, the variance in case of Laplacian density function is given by σ 2 = .
α
Now find that though the earlier quantizer was designed for any kind of probability density
functions, but it is not always possible to find out the probability distribution function of a
signal apriori. So what is in practice is you assume a uniform distribution, uniform
159
probability distribution which is given pU(u)= 1 , where u lies between t1 and tL+1. And
tL+1-t1
pU(u) = 0, when u is outside this region t1 and tL+1. So this is the uniform probability
distribution of the input signal u. And by using this uniform probability distribution
if I compute this, then you will find that the reconstruction level rk will be nothing but
tk+1 - tk rk-1 + rk tk+1 - tk-1
rk = , where tk will be tk = , which is same as . So I get the
2 2 2
reconstruction levels and the decision levels for a uniform quantizer.
160
(Refer Slide Time 25:50)
Now these relations lead tk - tk-1 is same as tk+1 - tk and that is constant equal to q, which is
known as the quantization step. So finally, what we get is the quantization step, is given by
tL+1 - t1
q= , where tL+1is the maximum transition level and t1 is the minimum transition level
L
and L is the number of quantization steps. We also get the transition level tk in terms of
transition level tk-1 and q as tk = tk-1 + q and the reconstruction level rk in terms of the
q
transition level tk as rk = tk + . So we obtain all the related terms of a uniform quantizer
2
using this mean square error quantizer design which is the Lloyd Max quantizer for a uniform
distribution.
161
So here you find that all the transition levels as well as the reconstruction levels are equally
spaced and the
quantization error in this case is uniformly distributed over the interval -q /2 to q/2. And the
q 2
1
mean square error in this particular case if you compute will be given by ξ =
q -q 2
u 2 du ,
q2
which will be nothing but .
12
So for uniform distribution the Lloyd Max quantizer equation becomes linear because all the
equations that we had derived earlier, they are all linear equations giving equal intervals
162
between transition levels and the reconstruction levels and so this is also sometimes referred
as a linear quantizer.
(Refer Slide Time 28:04)
Ok. So there are some more observations from this linear quantizer. The variance, σ 2u of a
2
uniform random variable. whose range is A, is given by A . So for this, you find that for a
12
uniform quantizer with B bits. So if we have a uniform quantizer, where every level has to be
A
represented by B bits we will have q = because the number of steps will be, 2B number
2B
A
of steps and thus the quantization step will be q = and from this you find that the
2B
ξ
2
= 2-2B and from this we can compute the signal to noise ratio. In case of a uniform
σu
quantizer the signal to noise ratio is given by SNR = 10log10 22B 10, this is nothing but 6B dB.
So this says that signal to noise ratio that can be achieved by an optimum mean square
quantizer for uniform distribution is 6 dB per bit that means if I increase the number of bits
by 1. So if you increase the number of bits by 1, that means the number of quantization levels
will be increased by 2, by a factor of 2. In that case you gain a 6 dB in the signal to noise
ratio in the reconstructed signal.
163
(Refer Slide Time 29:59)
So with this we come to an end on our discussion on the image digitization process. So here
we have seen that how to sample an image or how to sample a signal in one-dimension, how
to sample an image in two-dimension. We have also seen that after you get the sample values,
where each of the sample values are analog in nature, how to quantize those sample values so
that you can get the exact digital signal as well as exact digital image. Thank you
164
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 01 Lecture Number 07
Relationship between Pixels
(Refer Slide Time 00:19)
So in today's lecture, we will see, we will try to see that, what are the relationships that exist
among the pixels of an image? And among these relationships the first relationship that we
will talk about is the neighborhood and we will also see that, what are different types of
neighborhood of a pixel in an image? Then we will also try to explain that what is meant by
connectivity in an image. We will also learn the connected component labeling algorithm, the
165
importance of this connected component leveling algorithm and the properties, that is
connectivity, we will discuss about later. We will also explain, what is adjacency and, we will
see, what are the different types of adjacency relationships? Then we will also learn different
distance measures and towards the end of today's lecture we will try to find out,
what are the different image operations? We will try to see that what are pixel by pixel
operations and what are the neighborhood operations in an image.
166
(Refer Slide Time 01:45)
or the neighborhood relationship. Now let us first try to understand that what is meant by
neighborhood.
We say that the people around us are our neighbors. Or we say that a person who is living in
the house next to mine is my neighbor. So it is the closeness of the different persons which
forms the neighborhood of the persons. So it is the persons who are very close to me, they are
my neighbors. Similarly, in case of an image also, we say that pixels are neighbors if the
pixels are very close to each other.
167
(Refer Slide Time 02:32)
Formally, what is meant by neighborhood in case of an image pixel? Here let us consider a
pixel at location, a pixel p at location (x, y) as shown is this middle pixel. Now find that
because the image in our case is represented by a
two-dimensional matrix, so the matrix will have a number of rows and a number of columns.
So when I consider this pixel p whose location is (x, y) that means the location of the pixel in
the matrix is in row number x, in row x and in column y. Obviously, there will be a row
which is just before x that is row, x-1. There will be a row just after the row x which is row
x+1. Similarly, there will be a column just before the column y that is column, y-1 and there
will be a column just after column y which is column y+1.
168
(Refer Slide Time 03:43)
So come back to this figure. Coming to this particular pixel p which is at location (x, y) I can
have 2 different pixels, one is in the row just above row x, other one in the row just below
row x but in the same column location y. So I will have 2 different pixels, one is in the
vertically upward direction, the other one is in the vertically downward direction. So these are
the 2 pixels which are called the vertical neighbors of point p. Similarly, if I consider the
columns, there will be a column pixel at location (x, y-1) that is in row number x column
number y - 1. There is a pixel x and y + 1 that is row number x and column number y + 1.
So in this case these are the two pixels which are the horizontal neighbors of the point p. So
in this case these are not 4,
(Refer Slide Time 04:51)
169
Rather, these should be 2; here this will also be 2. So this pixel p has two neighbors in the
horizontal direction and two neighbors in the vertical direction. So these total 4 pixels are
called 4 neighbors of the point p and is represented by
N4(p), that is these pixels are 4 neighbors of the pixel p or point p. Each of these neighbors, if
you find out the distance between these neighboring pixels you will find that each of the
neighbors is at a unit distance from point p. Obviously if p is a boundary pixel then it will
have less number of neighbors. Let us see why.
Say I have a
170
(Refer Slide Time 05:56)
two-dimensional image where this image is represented in the form of a matrix So I have
pixels in different rows and pixels in different columns Now if this pixel p, the point p is one
of the
boundary pixels, say I take this corner pixel, then as we said that for a pixel p usually there
are
171
(Refer Slide Time 06:27)
4 different pixels taken from a row above it, a row below it, the column before it and the
column after it. But, when we consider
this particular pixel p, this pixel p does not have any pixel in the row above this pixel it does
not have any pixel in the column before this particular column. So for this particular pixel p, I
will have only 2 neighboring pixels, one is in this location, the other one is in this location
which are part of 4 neighbor or N4(p). So find that for all the pixels which belong to the
boundary of an image, the number of neighboring pixels is less than 4 whereas,
172
(Refer Slide Time 07:24)
for all the pixels which belong to, which is inside an image, the number of neighborhood
pixels is equal to 4. So this is what is the 4 neighborhood of
a particular pixel.
Now as we have done, as we have taken the points from vertically upward direction and
vertically downward direction or horizontally from the left as well as from right, similarly we
can find that there are 4 other points which are in the diagonal direction. So those points are
here.
173
(Refer Slide Time 07:57)
174
(Refer Slide Time 08:04)
but, now if I consider the diagonal points you find that there are 4 diagonal images, there are
4 diagonal points, one at location
175
(Refer Slide Time 08:25)
176
(Refer Slide Time 08:33)
Now we say these 4 pixels, because they are in the diagonal directions, these 4 pixels are
known as diagonal neighbors of point p.
And it is represented by ND(p). So I have got 4 pixels which belong to N4(p) that is those are
the 4 neighbors of point p and I have got 4 more points are the diagonal neighbors or
represented by ND(p). Now I can combine these 2 neighborhoods and I can say an 8
neighborhood.
So again coming back to this, if I take the points both from N4(p)
177
(Refer Slide Time 09:18)
and ND(p), they together are called 8 neighbors of point p and represented by N8(p). So
obviously, this N8(p)is the union of N4(p) and ND(p).
178
(Refer Slide Time 09:38)
that if the point p belongs to the boundary of the image, then ND(p), the number of diagonal
neighbors of point p will be less than 4, similarly the points belonging to N8(p) or the number
of 8 neighbors of the point p will be less than 8 where as if p is inside an image, it is not a
boundary point, in that case there will be 8 neighbors, 4 in the horizontal and vertical
directions and 4 in the diagonal directions so there will be 8 neighbors of point p if point p is
inside an image. So these are the different neighborhoods of the point p.
179
(Refer Slide Time 10:33)
Now after this, we have another property which is called as connectivity. Now the
connectivity of the points
in an image is a very, very important concept which is used to find the region property of the
image or the property of the particular region within the image. You recollect that in our
introductory lectures, we have given an example of segmentation that is we had an image of
certain object and we wanted to find out the points which belong to the object and the points
which belong to the background. And for doing that, we had used a very, very primitive
operation called the thresholding operation. So here in this particular case, we have shown
that
180
(Refer Slide Time 11:26)
if the intensity value or if F(x, y) at a particular point (x, y) is greater than certain threshold
say Th, then in that case we decided that the point (x, y) belongs to the object.
Where as if the point or the intensity level at the point (x, y) is less than the threshold, then
we have said, we have decided that the point (x, y) belongs to the background.
181
(Refer Slide Time 11:55)
So by simply by performing this operation and if you represent every object point as a white
pixel or assign a value 1 to it and
every background pixel as a black pixel or assign a value 0 to it, in that case the type of
image that we will get after the thresholding operation is like this.
So here
182
(Refer Slide Time 12:21)
And you find that for all the points which belong to the object, the intensity value is greater
than the threshold. So the decision that we have taken is, these points belong to the object, so
in case of the segmented image or the thresholded image
183
(Refer Slide Time 12:40)
we have assigned value 1 to each of those image points where as in other regions, region like
this we decided that these points belonged to the background, so we have assigned a value 0
to these particular points. Now just by performing this operation what we have done is,
we have identified certain number of pixels, the pixels which belong to background, and the
pixels which belong to the object. But just by identification of the pixels belonging to the
background, I cannot, or the pixels belonging to the object, I cannot find out what is the
property of the object until and unless I do some more processing to say that those pixels
belong to the same object. That means I have to do some sort of grouping operations.
184
(Refer Slide Time 13:43)
185
(Refer Slide Time 14:06)
So just by using this thresholding operation what I have done is, I have decided that all the
pixels in this region will also get a value 1,
all the pixels in this region will also get a value 1. So both these two sets of pixels, they
belong to the object. But
186
(Refer Slide Time 14:32)
here our solution does not end there. We have to identify that a set of pixels or this particular
set of pixels belong to one object
187
(Refer Slide Time 14:47)
188
(Refer Slide Time 14:55)
that which pixels are connected and I also have to identify which pixels are not connected
So I will say that the pixels having value equal to 1 which are connected, they belong to one
region and another set of pixels or points having value equal to 1 but not connected to the
other set, they belong to some other object. So this connectivity property between the pixels
is a very, very important property and using this connectivity property
189
(Refer Slide Time 15:29)
we can establish the object boundaries. We can find out what is the area of the object and
likewise we can find out many other properties of the object or the descriptors of the object
which will be useful for further
high level processing techniques, where we will try to recognize or identify a particular
object So now let us try to see that what is this connectivity property. What do we mean by
connectivity?
190
(Refer Slide Time 16:01)
We say that 2 pixels are connected if they are adjacent in some sense. So this term some
sense is very, very important. So this adjacent means that they have to be neighbors, that
means one pixels,
if I say two points p and q are connected, then by adjacency we mean that p must be a
neighbor of q or q must be a neighbor of p. That means q has to belong to N4(p), or ND(p) or
N8(p). And in addition to this neighborhood, one more constant that has to be put is that the
intensity values
191
(Refer Slide Time 16:52)
So let us take this example. Here we have shown 2, 3 different situations where we have
taken points p and q. So here we find that point q, it belongs to
192
(Refer Slide Time 17:21)
point q belongs to the 4 neighborhood of point p. And in this case we will say that point p and
q are connected, obviously the neighborhood restriction holds true because q and p, they are
neighbors and along with this we have said that another
193
(Refer Slide Time 17:45)
restriction or another constraint must be satisfied that their intensity values must be similar.
So in this particular case, because we are considering
a binary image, so we will say that if q belongs to the neighborhood of p, or p belongs to the
neighborhood of q and the intensity value at point p is same as the intensity value at point q.
So because it is binary image
194
(Refer Slide Time 18:13)
So in this case, if I assume that if the pixels have value equal to 1, then we will assume that
that those 2 pixels to be connected. So in this case, if for both p and q
195
(Refer Slide Time 18:31)
the intensity value is equal to 1 and since they are the neighbors so we will see, say that
points p and q are connected. Now, from this connectivity
So the earlier example that we had taken is the connectivity in case of a binary image, where
the intensity values
196
(Refer Slide Time 19:04)
are either 0 or 1 This connectivity property can also be defined in case of gray level image so
how do you define connectivity in case of a gray level image? In case of a gray level image,
we define a set of gray levels. Say for example in this case, we have defined V
to be a set of gray levels which is used to define the connectivity of two points p and q. So, if
intensity values at points p and q belongs to the set V so this is not point p and q but the
intensity values of f p and f q. So the intensity values at the points p and q belong to set V and
points p and q are neighbors then we can say that points p and q are connected. And here
again, we can define 3 different types of connectivity. One is 4 connectivity that is in this
case, the intensity values at p and q must be from the set V and p must be a 4-neighbor or q
197
must be a N4(p). In that case we define 4-connectivity. Similar we define 8-connectivity. If
the intensity values at point p and q
belong to the set V and p is an N8(q) or q is an N8(p). There is another type of connectivity
which is defined which is called m-connectivity or mixed connectivity. So in case of
m-connectivity it is defined like this, that p and q, the intensity values at points p and q,
obviously they have to be from the same set V and if q belongs to neighborhood of p or q
belongs to ND(p), that is diagonal neighborhood of p and N 4 p N 4 q = φ . So this
concept extends or puts some restriction on the 8-connectity in the sense that here we say that
198
either q has to be a N4(p) or p has to be a N4(q) or, q has to be a ND(p) but at the same time,
N4 p N4 q = φ .
And you find that this N4(p) N4(q)this indicate the set of points, which are 4 neighbors
of both the points p and q. So this says that if the point q belongs to the diagonal neighbor of
p and there is a common set of points which are 4 neighbors to both the points p and q, then
m-connectivity is not valid. So the reason why this m-connectivity is introduced is to avoid
some problems that may arise with simple 8-connectivity concept. Let us see what are these
problems.
199
(Refer Slide Time 22:37)
So the problem is like this. Here again we have taken the example from a binary image and in
case of a binary image we say that two points may be connected if the values of both the
points equal to 1, so set V contains a single intensity value which is equal to 1.
200
(Refer Slide Time 22:57)
Now
here we have depicted one particular situation where we have shown the different pixels in a
binary image. So you find that if I consider this point at the middle of this image which is
having the value 1
201
(Refer Slide Time 23:17)
there is one more pixel on the row above this which is also having the value 1 and a diagonal
pixel which is having a value 1 and a diagonally downward pixel which is also having a value
equal to 1. Now if I define 4-connnectivity, then you find that this point is 4-connected to this
point. This point is 4 connected to this point because this particular point is member of the 4-
neighbor of this particular point. This point is member of 4-neighbor of this point. But by 4-
connectedness, this point is not connected because this is not a 4-neighbor of any of these
points. Now from 4-connectivity, if I move to 8-connectivity, then what I get?
Again I have the same set of points. Now you find that we have defined 8-connectivity
202
(Refer Slide Time 24:21)
that when I consider this central pixel, again these 2 connectivities which are 4-connectivity,
they exist. In addition to this, this point which was not connected considering the 4-
neighborhood now gets connected because this belongs to the diagonal neighborhood of this
central point. So these two points are also connected.
203
Now the problem arises here. This point was connected through 4-neighborhood and at the
same time, this point, because this is a diagonal neighbor of the central point, so this point is
also connected through this diagonal neighborhood. So if I consider this situation and I
simply have 8-connectivity, I consider 8-connectivity, then you find that multiple number of
paths for connection exist in this particular case.
So the m-connectivity or mixed connectivity has been introduced to avoid this multiple
connection path. So you just recollect the restriction we have put in case of mixed
connectivity. In case of mixed connectivity, we have said that 2 points are m-connected if one
is the 4-neighbor of other or one is 4-neighbor of other and at the same time they don't have
any common 4-neighbor. So just by extending this concept in this case, you find that
204
(Refer Slide Time 26:27)
for m-connectivity, these are diagonal neighbors so they are connected but these 2 points,
though they are diagonal neighbors but they are not m-connected because these 2 points have
a point here. This point is a N4(p) of this, at the same time this point is N4(p) of this. So when
I introduce this 4-connectivity concept, you find
that the problem that arises, that is the multipath connection which have come in case of 8-
connectivity no more exist in case of m-connectivity. So in case of m-connectivity, even if
we consider the diagonal neighbors but the problem of multiple path does not arise. So this is
the advantage that you get in case of m-connectivity.
205
(Refer Slide Time 27:22)
206
(Refer Slide Time 27:27)
207
(Refer Slide Time 27:35)
Thank you.
208
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 02 Lecture Number 08
Relationship of Adjacency and Connected Components Labeling
(Refer Slide Time 00:18)
209
(Refer Slide Time 00:29)
they are connected. So since for connectivity we have introduced 3 different types of
connectivity that is 4-connectivity,
210
(Refer Slide Time 00:46)
8-connectivity and m-connectivity. So, for all these 3 different types of connectivity, we will
have 3 different types of adjacency because our adjacency definition is two points are
adjacent if they are connected. So just by extension of this definition you find that we have
got
3 different types of adjacency, the first one is 4-adjacency, the second one is 8-adjacency, and
the third one is m-adjacency. And you define this type of adjacency depending upon the type
of connectivity that is used. Now this is about the adjacency of two different
211
(Refer Slide Time 01:28)
points or two or more different points We can extend this concept of adjacency to image
regions, that is we can also say, that 2 image regions may be adjacent or they may not be
adjacent. So what is the condition for adjacency of 2 image regions? In this case you find that
we define that adjacency for 2 image regions
like this, that if there are 2 image subsets Si and Sj, we say that Si and Sj will be adjacent if
there exists a point p in image region Si and point q in image region Sj such that p and q are
adjacent ok.
So just let us try to elaborate this. So I have this overall image region. This is the
212
(Refer Slide Time 02:37)
This is the whole image and within this image I have two image regions, one is here
213
(Refer Slide Time 02:49)
So the adjacency relation between two image regions is defined like this
that I have to have some point in one region which is adjacent to a point in the other region
So if I call say this is image region Si
214
(Refer Slide Time 03:05)
and this is image region Sj then I must have some point p in Si and some other point q in
image Sj so that this p and q they are adjacent
so if p and q are adjacent then I say that this image region Si is adjacent to image region Sj
that means Si and Sj they must
215
(Refer Slide Time 03:34)
216
(Refer Slide Time 03:46)
we can also define a path between 2 points p and q. So the definition of a path is like this.
That we say
a path exists from a point p having a coordinate (x,y) to a point q having coordinate (s,t) if
there exist a sequence of distinct pixels say (xo,yo), (x1,y1), (x2,y2) and so on up to (xn,yn)
where (xo,yo) = (x,y) that is the same as point p and (xn,yn) is equal to (s,t), which is the same
as point q and all the other intermediate points like (x1,y1) and (x2,y2), they must be adjacent
the subsequent points must be adjacent in the sense that (xi,yi) has to be adjacent to (xi-1,yi-1)
for all values of i lying between 1 and n. So if I have such a sequence of points between
217
(Refer Slide Time 05:05)
p and q such that all the points which are traversed in between p and q, all those subsequent
points, they are adjacent then we say that a path exists between from point p to point q and
we also define the length of the path to be the n
218
(Refer Slide Time 05:34)
number of points including the end points p and q and all the points in between, the length of
the path is said to be n.
So this is what we define as path. Now very important concept that can arise
219
(Refer Slide Time 05:59)
connected region. We have said two pixels are connected, we said two pixels are connected if
they are adjacent in some sense that is they are neighbors and their intensity values are also
similar. We have also defined two regions to be adjacent if there is a point in one region
which is adjacent to some other point in another region. And we have also defined a path
between a point p and q if there are a set of points in between, which are adjacent to each
other. Now this concept can be extended to define what is called a connected component.
We take a subset s of an image I and we take two points p and q which belong to this subset s
of image I. Then we say that p is connected to q in S, so you just mind this term, that p is
220
connected to q in the subset s, if there exists a path from p to q consisting entirely of pixels in
S. For any such p belonging to s, the set of pixels in s that are connected to p is called a
connected component of S. So the concept is like this.
Say this is my entire image I
Ok and here say we have a point p and we say that take any other point q.
221
(Refer Slide Time 08:03)
If there exists a path from p to q consisting of all other intermediate points, these intermediate
points must also belong to the same subset s, so there exists a path between p and q
consisting of intermediate points belonging to the same subset s, then we say that the point p
and q, they are connected and if there are a number of such points to which a path exists from
p then set of all these points are said to be connected to p and they form a connected
component of s.
222
(Refer Slide Time 08:49)
a region in an image. So going back to our earlier example where we have said that simply by
identifying that a pixel belongs to an object does not give me the entire solution because I
have to group the pixels, which belong to the same object and give them some identification,
that these are the group of pixels which belong to the same object and then I can go for
extracting the region property which will tell me what is the property of that particular object.
And now that belongingness to a particular object
223
(Refer Slide Time 09:33)
So any two pixels of a connected component, we say they are connected to each other and
distinct connected components are disjoint. Obviously the points belonging to one particular
region and points belonging to another
224
(Refer Slide Time 09:56)
particular region, they are not connected but the points belonging to a particular region, they
are connected with each other.
So for this identification or group identification what we have to do is when we identify a set
of pixels which are connected, then for all those set, all those points belonging to a particular
group, we have to assign a particular identification number.
Say for example, in this particular figure you find that there are
two groups of pixels. So in the first figure we have a set of pixels here; we have another set of
pixels here. So find that these set of pixels are connected, these set of pixels, they are also
connected; so this forms one connected component. This set of pixels form another connected
225
component. So this connected component labeling problem is that I have to assign a group
identification number to each of these pixels. That means the first set of pixels which are
connected to each other; I had to give them one group identification number. In this particular
case, all these pixels are identified to be yellow and I have to give another group
identification to this second set of pixels. So in this particular case, all these pixels are given
the color red.
pixels to belong to a particular region then we can go for finding of some region properties
and those region properties may be shape of that particular
(Refer Slide Time 11:45)
226
region, it may be area of that particular region, it may be the boundary, the length of the
boundary of this particular region and many other shape, area or boundary based features can
be extracted
once we identify
(Refer Slide Time 12:02)
227
(Refer Slide Time 12:08)
So
(Refer Slide Time 12:17)
Now, let us see that what will be the algorithm that has to be followed
228
(Refer Slide Time 12:21)
for to find out what is the group identification for a particular region or the pixels belonging
to a particular region. So idea would be, the algorithm will be like this. That you scan the
image from left to right and from top to bottom. So as shown in this particular figure, if I scan
the image like this from left to right and from top to bottom, so this will be our scanning
order and for the time being let us assume that we are
(Refer Slide Time 12:57)
229
neighbors of p which will be scanned are point r and point t. So by using this particular fact
that which are the points which will be scanned before you scan point p
the steps involved will be like this. So when I consider a point p I assume that I(p) is the pixel
value at that particular location. And I also say that L(p) will be the label assigned to the pixel
at location p. Then the algorithm steps will be like this. That if I(p) equal to zero, because as
we have seen in the previous case, that after segmentation we say
230
(Refer Slide Time 14:40)
that whenever the intensity value at a particular location is above certain threshold we assign
a value 1 to that particular location, where as if the intensity value is less than the threshold
we assign the value 0 to that particular location. So by using this convention, when I want to
find out the region property based on the shape the pixels which are of importance or the
points which are of importance are the points having a value equal to 1 and we assume the
points having a value equal to 0, they belong to the background so they are not of importance.
So just by this, if a point
has a value equal to 0, that is I(p) equal to 0, we don't assign any label to this, so we just
move to the next scanning position either to the left, either to the right or to bottom.
231
But if I(p) equal to 1, that is the value at a particular point equal to 1 and we find while
scanning this we come across 2 points r and t. So when I find a point p for which value equal
to 1 and the values at both the points r and t equal to 0,
to position p.
If I(p) equal to 1, and only one of the two neighbors that is r and t is 1 and because r and t has
already been scanned so both r and t, if those values were equal to 1
232
(Refer Slide Time 16:22)
So in this particular case, if I(p) equal to 1 and one of r and t is equal to 1, in that case, to p
we assign the same label which was assigned to r or t whichever was 1 before, Ok.
(Refer Slide Time 16:47)
So if I(p) equal to 1 and only one of the two neighbor is 1, then assign the label of that
neighbor to point p.
But the problem comes, if I(p) equal to 1 and both r and t are equal to 1. So the problem
becomes simpler, our assignment is simple if the label which was assigned to t and the label
which was assigned to r, that were same. So if L(r) equal to l t, then you assign the same label
to point p.
233
So you see in this particular case that if L(r) equal to L(t) then L(p) gets L(r) which is
obviously same as L(t).
But the problem comes if the label assigned to r and the label assigned to t, they were not
same. So in this particular case we have, what we have to do is we have to assign one of the
two labels to point p and we have to note that these two labels are equivalent because p and t
or p and r, they are adjacent and for r and t, the labels were different.
So after doing this initial labeling, we have to do some post-processing
(Refer Slide Time 18:06)
so that all these pixels p, r and t, they get the same label. So here what we have to do is, we
have to assign to point p one of the labels, the label of r or the label of t and we have to keep
a note that the label of the other pixel and the label of this, the label which is assigned to p,
they are equivalent so that in the post processing stage this anomaly that has been generated,
that can be avoided.
234
(Refer Slide Time 18:40)
So at the end of the scan, all pixels with value 1 will have some label and some of the labels
will be equivalent. So during post-processing or the second pass what we will do is we will
identify all the equivalent pairs to form an equivalent class and we can assign a different label
to each of these
equivalent classes. And in the second pass you go through the image once more and
235
(Refer Slide Time 19:10)
for all the labels which belongs to a particular equivalent class you replace its original label
by the label that has been assigned to the equivalent class.
So these two passes
(Refer Slide Time 19:23)
gives, at the end of the second pass, you get a labeled image where you maintain, where you
identify the region belongingness of a particular pixel by looking at what is the label assigned
to that particular pixel.
So here
236
(Refer Slide Time 19:45)
let us
(Refer Slide Time 19:48)
237
(Refer Slide Time 19:53)
So in this particular example you find that we have shown two different regions or two
different connected regions of pixels having value equal to 1. So here what we do is, during
the initial pass, as we scan this image from left to right and from top to bottom what we do is,
the first image that I get I assign a label 1 to it. Then you continue you, your scanning. When
I, when you come to the second pixel i, second white pixel you find that by connectivity it
belongs to the same region but when I come to this particular pixel, if I go to the top pixel
and the left pixel, I find that there is no other pixel which is having a value equal to 1. So I
have to assign a new value to this particular pixel and this pixel gets a value equal to 2.
Come to the next one. This pixel again gets the value equal to 1 because it's top pixel equal to
1. Coming to the next one, this one gets the value
238
equal to 2 because its top pixel has a value equal to 2. Come to the next one, this gets the
value 3 because it has to be a new label, because neither its top neighbor nor the left
neighbor has any other label.
So the next one again gets the value 3 because its left neighbor has the label 3. Again this gets
the value 1 because top neighbor is equal to 1. This gets the value 1 because the left neighbor
is equal to 1. This gets the value 1. This gets the value 1. Now you find that in this case, there
is an anomaly. Because for this pixel the top pixel has label equal to 2 and the left pixel has
value equal to 1. So I have to assign one of these two labels to this particular pixel. So here
we have assigned label 1 to this pixel and after this what we have to do is, we have to declare
that 2 and 1, they are equivalent.
Then you continue your processing. Here again I have to assign a new label because none of
the top or the left pixel of this, neighbors of this have got any labels so it gets a label 4.
Coming to the next one, here you find that its top one has got label 3 and left one has got
label 4. So I have to assign one of the labels and in this case the label assigned is 4 but at the
same time, I have to keep a note that label 3 and label 4, they are equivalent. So you mark 3
and 4, equivalent.
So if you continue like this, the next one is again 4, the next one again gets a 4, this one gets a
1, this one again gets 1, this one gets 4, this one gets 4, this one gets a 5, that's a new label
because for this particular pixel
239
(Refer Slide Time 23:07)
it gets a label 1 because here the top pixel has already had a label 1 but at this particular point
we have to keep a note that 5 and 1 are equivalent. So you note 5 and 1 to be equivalent. You
continue like this. The other pixel gets a label 4. This pixel gets a label 4. This pixel gets a
label 4. This pixel again gets a label 5 because its top pixel is already having a label equal to
5. So at the end of this scanning
240
(Refer Slide Time 23:44)
I get 3 equivalent pairs One equivalent pair is 1, 2. The other equivalent pair is 3, 4 and the
third equivalent is 1, 5.
241
(Refer Slide Time 23:56)
So in the second pass what I have to do is I have to process these equivalent pairs, to identify
the equivalence classes, that means the labels, all the labels which are equivalent. So by
processing this you find
(Refer Slide Time 24:14)
that 1 and 2, and 1 and 5, they are equivalent, Ok So 1, 2 and 5, these two labels form a
particular equivalence class. And similarly 3 and 4, they are equivalent forming an
equivalence class. So if I assign a label 1 to the equivalence class containing the labels 1, 2
and 5, and at the same time I assign a label 3 to the equivalence class containing labels 3 and
4 then during the second pass, what I will do is I will scan over this labeled image which is
already labeled and I wall reassign the labels. So wherever the label was equal to 1, I will
maintain that equal to 1. And wherever I get a label which is 2 or 5, I will reassign that label
to be equal to 1. So in this particular case if you remember, this pixel had got the label equal
242
to 2. I reassign because 2 belongs to an equivalence class consisting of the labels 1, 2 and 5 to
which we have assigned the label equal to 1. So wherever I get the label equal to 2, I reassign
that label equal to 1. So continue this way. This was already 1. This was 2 which has been
reassigned to be 1. This was 3 which remains because 3 and 4 form equivalence class and the
label assigned to this equivalence class is 3. This is also 3. That remains. This was 1 that
remains. This was 1 that remains. This was 1 that remains. This was say possibly 2 or 1, so I
make it equal to 1. This had got a label equal to 4, so that has been reassigned the label value
equal to 3.
So will find that at the end of the second pass, I identify all the pixels belonging to a
particular group to have a single label
(Refer Slide Time 26:30)
and similarly all the pixels belonging to this particular group to have
243
(Refer Slide Time 26:36)
another label. So I will stop here today. I will continue with this
(Refer Slide Time 26:43)
244
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 02 Lecture Number 09
Application of Distance Measures
(Refer Slide Time 00:19)
245
(Refer Slide Time 00:32)
Now finding out the distance between two points, we are all familiar that if I know the
coordinate or the location of two different points I can find out what is the distance between
the two points. Say for example, what I can do is if I have two points, say one point p and
other
point q and I know that the coordinate of point p is given by (x, y) and coordinate of point q
is given by (s, t) then we all know from our school level mathematics that the distance
between
246
(Refer Slide Time 01:19)
I represent this as D(p, q), that is distance between p and q which will be given by
(x - s) 2 + (y - t) 2 .
247
(Refer Slide Time 01:48)
from our school level mathematics. Now when I come to digital domain then, this is not only
distance measure that can be used There are various other distance measures which can be
used in digital domain those distance measures are say city block distance, chess board
distance and so on. So to see that
248
(Refer Slide Time 02:22)
is the property that should be followed by this distance function D(p, q); so for this let us take
3 points, we take 3 points here p
having a coordinate (x, y), q having a coordinate (s, t) and I take another point z having the
coordinate (u, v) Then D is called a distance measure, is a valid distance measure or a valid
distance metric if D(p, q) 0 for any p and q, any two points p and q.
249
(Refer Slide Time 02:59)
D(p, q) 0 and D(p, q) will be 0 only if p = q. So that is quite obvious because the distance
of the point from the point itself has to be equal to 0.
Then the distance metric
distance function should be symmetric. That is, if I measure the distance from p to q that
should be same
250
(Refer Slide Time 03:23)
distance function should be symmetric. That is, if I measure the distance from p to q that
should be same as the distance if I measure from q to p that is the second property, that must
hold true, that is,
D(p, q) should be equal to D(q, p). And there is a third property, which is an inequality; that
is if I take a third point z then the distance between p and z that is
D p, z D p, q + D q, z .
251
(Refer Slide Time 03:56)
our school level mathematics you know that if I have say 3 points
252
(Refer Slide Time 04:05)
p, q and I have another point z and if I measure the distance between p and z, this must be
less than the distance between (p, q) + the distance between (p, z). So this is what we all have
done
253
(Refer Slide Time 04:30)
254
(Refer Slide Time 04:36)
other distance functions So these are the 3 properties which must hold true for a function, if a
function is to be considered as a distance function or a distance metric.
Now the first of this, that is the distance between p and q which we have already seen that if p
has a
coordinate (x, y) and q has a coordinate (s, t) then D(p, q), the distance between p and q is
equal to (x - s)2 + (y - t)2 . This is a distance measure which is called a Euclidean distance.
So in case of Euclidean distance you will find that set of points p and q
255
(Refer Slide Time 05:27)
that is D(p, q) the distance between p and q, obviously we are talking about the Euclidean
distance is some value r,
so set of all these points are the points contained within a disk of radius r, where the center of
the disk is located
256
(Refer Slide Time 05:46)
257
(Refer Slide Time 05:58)
and I take a point q and I say that the distance between p and q is r. So if I take set of all these
points where the distance is equal to r that forms a circle like this, so all other points having a
distance less than r from the point p will be the points within this circle. So set of all these
points where the distance value is less than or equal to r, obviously we are talking about the
Euclidean distance,
in that case the set of all these points forms a disk of radius r, where the center of the disk
258
(Refer Slide Time 06:43)
is at location p.
Now coming to the second distance measure, which is also called D4 distance or city block
distance or this is also known as Manhattan distance. So this is defined as
D 4 p, q = |x - s|+|y - t| .
259
(Refer Slide Time 07:08)
260
(Refer Slide Time 07:19)
with coordinate (x, y) and I have point q with coordinate (s, t). So this, D4 distance as it is
defined that D 4 p, q = |x - s|+|y - t|. So this clearly indicates that if I want to move from
point p to point q then, how much distance I have to move along the x direction and how
much distance I have to move along the y direction. Because |x – s| is the distance travelled
along x direction and |y - t| is the distance travelled along the y direction. So the sum of these
distances along x direction and y direction gives you the city block distance, that
261
(Refer Slide Time 08:30)
the city block distance from point p some value r will form a diamond centered at point p,
so which is quite obvious from here
262
(Refer Slide Time 08:52)
That here you find that, if p is the point at the center then all the points having city block
distance, they are just the 4 neighbors of the point p. Similarly, all the points having the city
block distance is equal to 2, they are simply the points which are at a distance 2, that is the
distance taken in the horizontal direction plus the distance taken in the vertical direction, that
becomes equal to 2 and set of all these points with city block distance is equal to 2, that
simply forms a diamond of radius 2 and similarly other points at distances 3, 4 and so on.
Now we come to the third distance measure,
263
(Refer Slide Time 09:49)
As you have seen in case of city block distance, the distance between two points was defined
as the sum of the distances that you cover along x direction + the distance along the y
direction. In case of chess board distance, it is the maximum of the distances that you cover
along x direction and y direction. So this is D8(p, q) which is equal to max(|x - s|,|y - t|) ,
264
(Refer Slide Time 10:21)
where we take the absolute value of both x - s and y - t. And following the same argument,
here you find that the set of points with a chess board distance of less than or equal to r now
forms a square centered at point p. So here all the points with a chess board distance of equal
to 1 from point p, they are nothing but the 8 neighbors of point p.
Similarly, the set of points with a chess board distance will be equal to 2 will be just the
points outside the points having a chess board distance equal to 1. So if you continue like this
you will find that all the points having a chess board distance of less than or equal to r from a
point p will form a square with point p at the center of the square. So these are the different
distance measures that can be used in the digital domain.
265
Now let us see that what is the application of this distance measure; one of the obvious
application is if I want to find out the distance between 2 points, I can make use of either
Euclidean distance or city block distance or the chess board distance. Now let us see one
particular application other than
just finding out the distance between 2 points. Say for example here I want to match two
different shapes which are shown in this particular diagram. Now you will find that these two
shapes are almost similar except that you have a hole in the second shape. But if I simply go
for matching these two shapes they will be almost similar. So just by using these original
figures,
266
I cannot possibly distinguish between these two shapes. So if I want to say that these two
shapes are not same, that they are dissimilar in that case, I cannot work on these original
shapes but I can make use of some other feature of this particular shape. So let us see, what is
that other feature?
If I
in that case, you will find that this third figure gives you what is the skeleton of the first
shape. Similarly, the fourth figure gives you
267
(Refer Slide Time 13:00)
what is the skeleton of the second shape? Now if I compare these two skeletons rather than
comparing the original shapes you find that there is lot of difference between these two
skeletons.
So I can now describe the shapes with the help of the skeletons in the sense that I can find out
that how many line segments are there in the skeleton. Similarly, I can find out that how
many points are there where more than 2 line segments meet. So by this if I compare the two
skeletons you will find that for the
268
(Refer Slide Time 13:38)
skeleton of the first shape there are only 5 line segments, whereas for the skeleton of the
second shape there are 10 line segments. Similarly, the number of points where more than 2
line segments meet; in the first skeleton there are only 2 such points where as in the second
skeleton there are 4 such points. So if I compare using the skeleton
rather than comparing using the original shape you will find that, there is lot of difference
that can be found out both in terms of the number of line segments the skeleton has and also
in terms of the number of points where more than one line segments meet. So using these
descriptions which I have obtained from the skeleton, I can distinguish between the two
shapes as shown in this particular figure.
Now the question is, how do we get this skeleton
269
(Refer Slide Time 14:36)
and what is this skeleton? So you will find that if you analyze the skeletons you will find that
the skeletons are obtained
by removing some of the foreground points but the points are removed in such a way that the
shape information as well as the dimensionality that is, what is the length or breadth of that
particular shape is more or less retained in the skeleton. So this is how, this is what is the
skeleton of the particular shape and now the question is, how do you obtain the skeleton?
Now before coming to how do you
270
(Refer Slide Time 15:14)
let us see that how the skeleton can be found out. So the skeleton can be found out in this
manner. If I assume that the foreground region in the input binary image is made of some
uniform slow burning material and then what I do is
271
(Refer Slide Time 15:36)
I light fire at all the points across the boundary of this region, that is the foreground region.
Now if I light fire across the boundary points simultaneously then the fire lines will go in
slowly because the foreground region consists of slow burning material. Then you will find
that as the fire lines they go in, there will be some points in the foreground region where the
fire coming from two different boundaries will meet and at that point the fire will extinguish
itself. So the set of all those points is
272
(Refer Slide Time 16:21)
the quench line that we obtain by using this fire propagation concept.
273
(Refer Slide Time 16:37)
Now to obtain this kind of skeleton, you find that this simple description of movement of the
fire line
does not give you an idea of how to compute the skeleton of a particular shape. So for that
what we can use is something called distance measure. Now the distance measure is in the
same manner we can define that when we are lighting the fire across all the boundary points
simultaneously and the fire is moving inside the foreground region slowly we can note at
every point that how much time the fire takes to reach that particular point. That is the
minimum time the fire takes to reach that particular point and at every such foreground point,
if we note this time taken to reach, the time the fire takes to reach that particular point then
274
effectively, what we get is a distance transformation of the image. So in this case, you will
find that
this distance transform is normally used for binary images and because at every point we are
noting the time the fire takes to reach that particular point so by applying distance
transformation what we get is an image where the shape of the image is similar to the input
binary image but in this case the image itself will not be a binary but it will be a gray level
image where the gray level intensity of points in the, inside the foreground region are
changed to show the distance of that point from the closest boundary point. So let us see
275
(Refer Slide Time 18:22)
Here you find that here we have shown a particular binary image where the foreground
region is a rectangular region and if I take the distance transform of this, the distance
transformed image is shown in the right hand side. Here you find that all the boundary points,
they are getting a distance value equal to one. Then the points
inside the boundary points, they get the distance value equal to 2
276
(Refer Slide Time 18:56)
and the points further inside, they get the distance value equal to 3. So you will find that the
intensity value that we are assigning to different points within the foreground region, the
intensity value increases slowly
277
(Refer Slide Time 19:27)
shapes that we have just discussed, the two rectangular shapes, on the left hand side you have
the original rectangular shape or the binary image. On the right hand side what is shown is
the distance transformed image. And here you find again in this distance transformed image,
as you move inside, inside the foreground region that distance value increases gradually. And
now if you analyze this distance transformed image you find that there are few points at
which there is some discontinuity of the curvature. So from this distance transformed image,
if I can identify the points of discontinuity or curvature discontinuity those are actually the
points which lies on the skeleton of this particular shape.
So as shown
278
in the next slide you find that on the left hand side we have the original image. The middle
column tells you the distance transformed image and the rightmost column tells you the
skeleton of this particular image. And, if you now correlate this rightmost column with the
middle column, you will find that the skeleton in the rightmost column can now be easily
obtained from the middle column which tells you that what is the distance transform of the
shape that we have considered.
So this shows some more skeleton
of some more shapes. Again on the left hand side we have the original image. In the middle
column we have the distance transformed image and on the rightmost column we have the
skeleton of this particular shape. So here again you can find that, the relation between the
279
skeleton and the distance transformed image, they are quite prominent. So for all such shapes
or wherever
we go for some shape matching problem or shape discrimination problem; in that case instead
of processing on the original shapes, if we compare the shapes using the skeleton then the
discrimination will be better than if we compare the original shapes. Now here when I am
going for distance transformation of a particular shape, as you have seen that we can have
different kinds of distance measures or distance metrics like Euclidean distance metric, we
can have city block distance metric or even we can have chess board distance metric.
Similarly, when I take this distance transformation for each of the distance metrics there will
be different transformations, different distance transformations. And obviously the different
distance transformations will produce different results. But all the results that we will get,
similarly from the distance transformed image when we get the skeleton, all the skeletons that
we will get using different distance metrics, they will be slightly different but they will be
almost similar. So this can be just another application of the distance metric and you find that
here
280
(Refer Slide Time 22:48)
the skeleton is very, very useful because it provides a simple and compact representation of
shape that preserves many of the topological and size characteristics of the original shape.
Also from this skeleton
we can get a rough idea of the length of a shape. Because when I get the skeleton
281
(Refer Slide Time 23:16)
I get different end points of the skeleton and if I find out the distances between every pair of
end points in the skeleton then, the maximum of all those pair wise distances will give me an
idea of, what is the length of that particular shape and as we have already said, that using this
distance, using this skeleton we can qualitatively differentiate between different shapes
because here we can find out, we can get a description of the shape from the skeleton in terms
of the number of line segments that the skeleton has and also in terms of the number of points
in the skeleton where more than two line segments meet.
So as we have said that the distance metric or the distance function is not only
282
(Refer Slide Time 24:11)
useful for finding out the distance between two points in an image but the distance metric is
also used for, useful for other applications though here also we have found out the distance
measure between different pair of points and for skeletonization what we have used is first
we have taken the distance transformation and in case of distance transformation we have
taken the distance of every foreground pixel from its nearest boundary pixel and that is what
is gives you a distance transformed image and from the distance transformed image we can
find out the skeleton of that particular shape, considering the points of curvature discontinuity
in the distance transformed image. And later on also we will see that this distance metric is
useful in many other cases.
Now after our discussion on these distance metrics
283
let us see, what are simple operations that we can perform on the images. So as you have seen
that in case of numerical system, whether it is decimal number system or binary number
system, we can have arithmetic as well as logical operations. Similarly, for images also
we can have arithmetic and logical operations. Now coming to images, I can add two images,
pixel by pixel, that is pixel from an image can be added to the corresponding pixel of a
second image. I can subtract two images pixel by pixel, that is pixel of an image can be
subtracted from the corresponding pixel of the another image. I can go for pixel by pixel
multiplication. I can also go for pixel by pixel division. So these are different arithmetic
operations that I can perform on two images and these operations are applicable both in case
of gray level image as well as in case of binary image. Similarly,
284
(Refer Slide Time 26:16)
in case of binary images, we can have logical operations, the logical operation in terms of
anding pixel by pixel, oring pixel by pixel, and similarly inverting pixel by pixel. So these are
the different arithmetic, logical operations that we can do on gray level image and similarly
logical operations on binary image
So here is an example that if I have a binary image A where again all the pixels with value
equal to 1 are shown as shown in green color and the pixels with value equal to 0 are shown
in black color, then I can just invert this particular binary image, that is I can make a not
operation or invert operation. So not of A is another binary image
285
(Refer Slide Time 27:10)
where all the pixels in the original image which was black now becomes white or 1 and
pixels which were white or 1 in the original pixel, those pixels become equal to 0. Similarly, I
can perform other operations like
given two images A and B, I can find out A and B, the logical ANDing operation which is
shown in the left image. Similarly, I can find out the XOR operation and after XOR the
image that I get is shown in the right image.
286
(Refer Slide Time 27:44)
So these are the different pixel operations or pixel by pixel operations that I can perform.
In some other applications
we can also perform some neighborhood operations, that is, the intensity value at a particular
pixel may be replaced by a function of the intensity values of the pixels which are neighbors
of that particular pixel p. Say for example, in this particular case, if this 3x3 matrix, this
represents
287
(Refer Slide Time 28:18)
the part of an image, so which has got 9 pixel elements z1 to z9 and I want to replace every
pixel value by the average of its neighborhood, considering the pixel itself. So you find that at
location z5 if I want to take the average, the average is simply given by z1 + z2 + z3 + z4 up to
+ z9 that divided by 9. So this is a simple average operation at individual pixels that I can
perform which is nothing but a neighborhood operation because at every pixel level we are
replacing the
intensity by a function of the intensities of its neighborhood pixels. And this averaging
operation, we will see later, that this is the simplest form of low pass filtering to remove noise
from a noisy image.
Now this kind of
288
(Refer Slide Time 29:22)
neighborhood operation can be generalized with the help of templates. So here what we do is,
we define a 3x3 template, which is shown on this right hand figure, where the template has
contained 9 elements w1 to w9. And, if I want to perform the neighborhood operation, what
we do is, you put this template, this particular template, on the original image in such a way
that the pixel at which I want to replace the value, the center of the template just coincides
with that pixel. And then at the particular location, we replace the value with the weighted
sum of the values taken from the image and the corresponding point from the template. So in
9
this case the value will be, which will be replaced is given by z = wizi and here you find if
i=1
I simply put wi = 1/9, that is all the points in the template have the same
289
value which is equal to 1/9, then the resultant image that I will get is nothing but the averaged
image which we have considered, we have done just in the previous slide. So this
neighborhood operation using the template is a very, very general operation. It is useful not
only for averaging purpose. It is useful for many other neighborhood operations and we will
see later that
this can be used for noise filtering, it can be used for thinning of binary images, this same
template operation can also be used for edge detection operation in different images.
So with this, we complete our lecture today.
Thank you.
290
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 02 Lecture Number 10
Basic Transform
(Refer Slide Time 00:19)
Hello, welcome to the video lecture series on Digital Image Processing. So in today's lecture
as we said that we will discuss about some basic mathematical transformations which will
include translation, rotation and scaling and this we will discuss both in two-dimension as
well as in three-dimension. We will also discuss about the inverse transformations of these
different mathematical transformations. We will find out the relationship between the
Cartesian coordinate system and a homogenous coordinate system and we will see that this
291
homogenous coordinate system is very, very useful while discussing about the image
formation
(Refer Slide Time 01:09)
by a camera. We will also talk about the perspective transformation and the imaging
process and, then we will talk about the inverse perspective transformation. Now coming to
the basic mathematical transformations. Let us first talk about that, what is the translation
operation and we will start our discussion with a point in
292
(Refer Slide Time 01:40)
two-dimension.
So you know that if I have a two dimensional coordinate system given by the axes x and y
and if I have a point p which is having a coordinate given by say, (x,y). And I want to
translate this point p(x,y) by a (xo, yo) so after translating this point by the (xo, yo), I get the
translated point say at point p prime, whose coordinates are x’ and y’. And because the
translation vector in this case we have assumed as (xo, yo) so you know that after translation
the new position
x’ will be given by x + xo and y’ will be given by y + yo. Now this is the basic relation when a
point at location (x,y) is translated by a (xo, yo).
293
(Refer Slide Time 02:58)
Now let us see that how this can be represented more formally by using a matrix equation. So
if I translate this equation in the form of a matrix, the equation looks like this. I have to find
out the new
location vector x’ y’ and we have said that this x’ is nothing but x + xo and y’ is nothing but y
+ yo. So this particular relation, if I represent in the form of a matrix, it will simply look like
this.
So you find that, if you solve this particular matrix expression it gives you the same
expression x’ = x + xo and y’ = y + yo. So on the right hand side you find that I have product
294
of two matrices, which is added to another column matrix or column vector. Now if I want to
combine all these operations in a single matrix form
then the operation will be something like this. On the left hand side, I will have x’ and y’
which will be in the form of a matrix and on the right hand side, I will have 1 0 then xo,0 1 yo
and then I will have x y and 1. So if I again do this same matrix computation it will be x’ = x
+ xo, so which is nothing but x’ = x + xo. Similarly, y’ will be 0 + y plus yo, which is nothing
but = y + yo. But you find that in this particular case, there is some asymmetry in this
particular expression. So if I want to make this expression symmetric, then I can write it in
this form, x’ y’ and I introduce one more component which I make equal to 1,
1 0 xo x
0 1 yo y . So find that this second expression which I have just obtained from the first
0 0 1 1
one is now a symmetric expression and this is what is called a unified expression. So find that
basically what I have is
295
(Refer Slide Time 06:34)
I had the original coordinate x y of the point p which is appended with one more component
that is given as 1 and if this modified coordinate is now transformed by a transformation
x' 1 0 xo x x'
matrix which is y' = 0 1 yo y , then I get the translated point as y' , where, if I just
1 0 0 1 1 1
neglect the additional component which in this case is 1, then I get the translated point
In the same manner, given a point p again in 2 D so again I have this point p which is having
again having a coordinate (x,y) and suppose I want to rotate this point p around the origin by
296
an θ now, one way of representing this point p is if r is the distance of point p from the
origin then these are the coordinates of the point p, this is the x coordinate, this is the y
coordinate so I can also represent and suppose this angle is α, then I can also represent x
as x = rcos(α) and y = rsin(α) . Now suppose I want to rotate this point p by angle θ in the
clockwise direction, so the new position of p will now be p’ having the coordinate location x’
and y’ and this rotation angle, angle of rotation is now angle θ. So our job is that what will be
these points, the coordinate points x’ and y’. So here you find that I can write this
x' = rcos(α-θ) and I can write y' = rsin(α-θ) . Ok, so if I simply
expand this so, I have x' = rcos(α-θ) and y' = rsin(α-θ) . So if I simply expand this cosine term
it will simply be rcos(α)cos(θ) + rsin(α)sin(θ) . Now we know that rcos(α) is nothing but x, so
it becomes xcos(θ) + ysin(θ) . Similarly, in this case, if I expand this, it becomes
rsin(α)cos(θ) - rcos(α)sin(θ) . So again, rsin(α) this is nothing but, y. So this takes the
expression ycos(θ) - xsin(θ) . So again, so here you find that x’ is xcos(θ) + ysin(θ) and y’ is
given by - xsin(θ) ycos(θ) . So even now if I represent this in a form of a matrix equation, it
297
(Refer Slide Time 11:21)
an angle θ around the origin in the clockwise direction, in that case the transformation matrix
which gives you the rotation transformation is given by this particular matrix which
cosθ sinθ
is .
-sinθ cosθ
Now in the same manner if I go for scaling say for example if I have a scaling Sx scaling
factor of Sx in the x direction and I have scaling factor Sy in the y direction, in that case the
x' Sx 0 x
transformation matrix for scaling can also be represented as = . So here you
y' 0 Sy y
find that the transformation matrix for performing the scaling operation is nothing but the
Sx 0
matrix . So these are the simple transformations that I can have in two-dimension.
0 Sy
Now it is also possible to concatenate the transformations.
For example, here I have considered the rotation of a point around the origin. Now if my
application demands that I have to rotate the point p around an arbitrary q in the two-
dimension. Then finding out the expression for this rotation of point p by an angle θ around
another point q is not an easy job, I mean that expression will be quite complicated. So I can
simplify this operation just by translating the point q to the origin and the point p also has to
be translated by the same vector and after performing this transformation, the translation, if I
now rotate point p by the same angle θ and now it will be the rotation around the origin so
298
cosθ sinθ
whatever expression that we have found here that is , the same transformation
-sinθ cosθ
matrix will be applicable and after getting this rotation, now you translate back the rotated
point by the same vector but in the opposite direction. So here
the transformations that we are applying is first we are performing a translation of point p by
the vector and after performing this translation, we are performing the rotation so this is your
transformation say Rθ. So first, we are translating by a vector say r then we are performing
rotation by vector Rθ and after doing this, whatever point I get, that has to be translated back
by -r, so I will put it as translation by the vector -r.
So this entire operation will give you the rotation of a point p, suppose this is point p and I
want to rotate this around the point q. So if I want to rotate p around q by an angle θ, then this
operation can be performed by concatenation of this translation, rotation then followed by
inverse translation, which puts back the point to its original point where it should have been
after rotating around
299
(Refer Slide Time 15:04)
point q by angle θ.
So these are the different transformations, the basic mathematical transformations that we can
do in a two-dimensional space. Now let us see, what will be the corresponding
300
(Refer Slide Time 15:19)
301
(Refer Slide Time 15:26)
302
(Refer Slide Time 15:45)
So first, let us see as you have seen in case of two-dimension, that if a point (X, Y, Z) is, is
translated to a new coordinate say (X*, Y*, Z*) using a displacement (XO, YO, ZO) , then this
translated coordinates Z* will be given by X + Xo, Y* will be given by Y + Yo and Z* will be
given by Z + Zo. So you see that in our previous case we have said that, because we had only
the coordinates x and y so this third expression Z* = Z + Zo that was absent. But now we are
considering the three-dimensional space, a 3 D coordinate system, so we have 3 coordinates
x, y and z and all these 3 D coordinates, all these 3 coordinates are to be translated by the
translation (XO, YO, ZO) and the new translation vector, and the new point we get as (X*, Y*,
Z*).
Now if I write these 3 equations in the form of a matrix then the matrix equation will be
303
(Refer Slide Time 17:03)
X
X* 1 0 0 X0
Y
like this. Y* = 0 1 0 Y0 * .
Z
Z* 0 0 1 Z0
1
So this is the similar situation that we have also seen in case of two-dimension
304
(Refer Slide Time 17:27)
that we have to add an additional component which is equal to 1 in our original position
vector (x y z). So, in this case again we have added the additional component which is equal
to 1, so our next position, our new position vector becomes [x y z 1]T which has to be
1 0 0 XO
multiplied by the translational matrix given by 0 1 0 YO . So again as before we
0 0 1 ZO
can go for a unified expression where this translation matrix which at this moment is having a
dimension 3 by 4 that is, it is having 3 rows and 4 columns, in unified representation we
represent this matrix, the dimension of the matrix will be 4 by 4 which will be a square
matrix and the left hand side also will have the same unified coordinate, that is (X*, Y*, Z*,
1).
305
(Refer Slide Time 18:32)
So the unified representation of as we have already said is given by (X*, Y*, Z*, 1) is equal
1 0 0 XO X
0 1 0 YO Y
to the translation matrix multiplied by the column vector . So this
0 0 1 ZO Z
0 0 0 1 1
1 0 0 XO
0 1 0 YO
particular matrix that is ,
0 0 1 ZO
0 0 0 1
this represents a transformation matrix used for the translation and we will represent this
matrix by this upper case letter T. So that is about the simple translation that we can have
306
(Refer Slide Time 19:31)
So in our unified matrix representation, we have done if you have a vector v, a position
vector v, which is translated by the transformation matrix A, the transformation matrix A is a
4 by 4 transformation matrix, the v, if the original position vector was [x y z]T, we have
added an additional component 1 to it in our unified matrix representation, so v now becomes
a four-dimensional vector having components x y z and 1. Similarly, the transformed position
vector v* is also a four-dimensional vector which is having components (X*, Y*, Z*, 1). So
this how in the unified matrix representation, we can represent the translation of a position
vector or
307
(Refer Slide Time 20:36)
that this is the transformation matrix which is represented, which is used for translating a
point in 3 D by
(Refer Slide Time 20:45)
308
(Refer Slide Time 20:50)
Similarly, as we have seen in case of scaling in two-dimension, that if we have the scaling
factor of Sx, Sy and Sz along the directions x y and z .so along direction x, we have the scaling
factor Sx. Along direction y, we have the scaling factor
309
(Refer Slide Time 21:17)
Sy and
(Refer Slide Time 21:29)
along the direction z we have the scaling factor Sz. Then the transformation matrix for this
Sx 0 0 0
0 Sy 0 0
scaling operation can be written by S = . So here again if you find, you find
0 0 Sz 0
0 0 0 1
310
(Refer Slide Time 21:45)
that a position vector [x y z]T in unified form it will be [x y z 1]T. If that position vector is
translated by this scaling matrix, then what we get is the new position vector corresponding to
point (x y z 1) in the scaled form and there if we remove the last component that is equal to
1, what we get is the scaled 3 D coordinate of the point, which has been scaled up or scaled
down. So it will be scaling up or scaling down depending upon whether the value of the scale
factors are greater than 1 or they are less than 1.
311
(Refer Slide Time 22:35)
312
(Refer Slide Time 23:54)
will look like in 3 D. So here you find that we have shown, on the right hand side this
particular figure where, this figure shows the rotation of the point along x axis. So if the point
is rotated along x axis, the rotation is given by, is indicated by α, if it is rotated along z axis,
the rotation is indicated by θ and if the rotation is done along y axis, the rotation angle is
indicated by β. So if I rotate the point along z axis, so when I am rotating a point along z axis,
then
(Refer Slide Time 24:46)
obviously the z coordinate of the point will remain unchanged even in the rotated position.
But what will change is the x coordinate and y coordinate of the point in its new rotated
position. And because the z coordinate is remaining unchanged so we can think that this is a
rotation on a plane which is parallel to the x-y plane. So the same transformation which we
had done for rotating a point in two-dimension in the (x,y) coordinate, the same
313
transformation matrix holds true for rotating this point in three-dimension along the axis z.
But now because the number of components in our position vector is more, so we have to
take care of the other components as well. So using this you find that
when I rotate the point around z axis, the rotation angle is given by θ and the rotation matrix
cosθ sinθ 0 0
-sinθ cosθ 0 0
is given by . So this is the transformation matrix or rotation
0 0 1 0
0 0 0 1
matrix for rotating
a point around z axis So here you find that the first few components that is the,
314
(Refer Slide Time 26:11)
cosθ, sinθ then - sinθ and cosθ, so this 2 by 2 matrix is identical with the transformation
matrix or rotation matrix that we have obtained in case of two-dimension. So this says,
because the z th coordinate is remaining the same, so the x coordinate and y coordinate due to
this rotation around z axis follows the same relation that we had derived in case of two
dimension.
Similarly, when I translate the point around y axis, where rotate the point around y axis, the
angle of rotation is given by β, so Rβ as given in this particular case, gives you the rotation
matrix and if you rotate the angle along x axis where the rotation angle is given by α, you will
find that Rα gives you the corresponding rotation matrix along the corresponding
transformation matrix for rotation along the x axis. So as before, you find that when you
rotate the point around x axis, the x coordinate will remain the same where as the y
coordinate and the z coordinate of the point is going to differ, similarly when you rotate the
point around y axis,
315
(Refer Slide Time 27:37)
the y coordinate is going to remain the same but x coordinate and z coordinate they are going
to be different. Now as we have also mentioned, in case of two-dimension that
(Refer Slide Time 27:54)
different transformations can be concatenated. So here we have shown that, how we can
concatenate the different transformations. Here you find that all the transformations that you
have considered in three-dimension, all of them are in the unified form, that is, every
transformation matrix is a 4 by 4 matrix and all the coordinates that we consider, we add 1 to
the
316
(Refer Slide Time 28:25)
coordinate (x, y, z), so that our position vector becomes a four-dimensional vector and the
translated point is also a four-dimensional vector.
then this translation, scaling and rotation, these can be concatenated as, first you translate the
point v by the translation operation T or the translation matrix T, then you perform scaling,
then you perform rotation and this rotation is Rθ and all these 3 different transformation
matrices that is Rθ, S and T, all of them being four-dimensional matrices can be combined
into a single matrix,
317
(Refer Slide Time 29:21)
say this single transformation matrix in this case a which is nothing but the product of Rθ, S
and T and this A again will be a matrix of dimension 4 by 4. Now you note that, whenever we
are going for the concatenation of the transformations, the order in which these
transformations are to be applied, that is very, very important. Because these matrix
operations are in general, not commutative
Now just to illustrate
that these matrix operations are not commutative, let us take this example. Suppose I have
this particular point v and to this point v, I want to perform two kinds of operations. One is
translation by vector, so the translation vector, we have represented by this arrow by which
this point has to be translated and the point v is also to be rotated by certain angle. Now there
318
are two ways in which these two operations can be done. The first one shows that suppose I
rotate point v first by using the operation Rv. So here the transformation matrix is a R for the
rotation operation and after rotating this point v by using this transformation R, I translate the
rotated point by using the transformation matrix T. So if I do that you find that this v is the
original position of this point. If I first rotate it by using this rotation transformation, the point
v comes here in the rotated position. And after this, if I give translation to this point v by this
translation vector then the translated point is coming over here, which is represented by v2.
So the point v2 is obtained by v, first by applying the rotation by transformation R followed
by applying the translation by the transformation T.
Now if I do it reverse, that is first I translate the point v using this translation transformation
T and after this translation I rotate this translated point which is now t v by the same angle θ.
So what I do is, first I translate the point using the transformation T and after that this
translated point is rotated by using the rotation transformation R and this gives me the rotated
point or the new point that is equal to v1. Now from here you find that, in the earlier case
where I got the point v2 and now I get the point v1, this v1 and v2 they are not the same point.
(Refer Slide Time 32:51)
319
(Refer Slide Time 33:22)
the concatenation is applied, the transformations are applied, that has to be thought of very,
very carefully.
(Refer Slide Time 33:31)
Now if I have to transform a set of points, say for example, I can have a square figure which,
say like this, say for example in a two-dimensional space
320
(Refer Slide Time 33:53)
(x,y), I have a square figure like this So this will have 4 vertices, the vertices I can represent
as point p1, point p2, point p3 and point p4. Now so far the transformations that we have
discussed, that is the transformation of a single point around origin or the transformation of a
single point around another arbitrary point in the same space. Now here, if I have to
transform the entire figure, that is for example, I want to rotate this entire figure about the
origin or I want to translate this entire figure by certain vector say v. So for example I want to
translate this entire figure to this particular position, so you find that here, all these points p1,
p2, p3 and p4 all these points are going to be translated by the same displacement vector v.
So it is also possible that we can apply transformation to
321
(Refer Slide Time 35:10)
simultaneously rather than applying transformation to individual points one by one. So for a
set of n points what we have to do is, we have to construct a matrix v of dimension 4 x m,
that is every individual point will now be considered of course in the unified form, will now
be considered as a column vector of a matrix which is of dimension 4 x m and then we have
to apply the transformation A to this entire matrix and the transformation, after this
transformation we get the new matrix v* which is given by the transformation A multiplied
by the matrix v. So here we find that any particular column, the ith column in the matrix v*
which is a vi* is the transformed point corresponding to the ith column of matrix v which is
represented by vi. So if I have a set
322
of points which are to be transformed by the same transformation then all those points can be
arranged in the form of columns of a new matrix. So if I have m number of points I will have
a matrix having m number of columns. The matrix will also obviously have 4 rows and this
new 4 x m matrix that I get, this entire matrix has to be transformed using the same
transformation operation and I get the transformed points again in the form of a matrix. And
from that transformed matrix I can identify that which point, which is the transformed point
of the original point.
Now once we get these transformations,
(Refer Slide Time 37:08)
again we can get the corresponding inverse transformations. So the inverse transformations in
most of the cases can be obtained just by observation. Say for example, if I apply,
323
if I translate a point by a displacement vector v, then inverse transformation should bring
back that translated point, that transformed point to its original position. So if my translation
is by a vector v, the inverse transformation or the inverse translation should be by a vector -v.
So the inverse transformation matrix
(Refer Slide Time 37:47)
1 0 0 -XO
0 1 0 -YO
T can be obtained as
-1
. So you remember that the corresponding
0 0 1 -ZO
0 0 0 1
1 0 0 XO
0 1 0 YO
transformation matrix that we said was .
0 0 1 ZO
0 0 0 1
324
(Refer Slide Time 38:07)
So you will find that xo yo and zo, they have just been negated to give you the inverse
translation matrix T-1. So similarly by the same observation we can get inverse rotations
325
(Refer Slide Time 38:30)
Rθ -1, where what we have to do is in the transformation matrix, original rotation matrix we
had the term cosθ, sinθ, -sinθ, cosθ, now all these thetas are to be replaced by –θ, which gives
me the inverse rotation matrix around the z axis. Similarly, we can
(Refer Slide Time 38:54)
also find out the inverse matrix for scaling where the Sx will be replaced by 1/ Sx. Thank you.
326
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 03 Lecture Number 11
Image Formation - 1
(Refer Slide Time 00:19)
Hello, welcome to the video lecture series on Digital Image Processing. Now we will see
another form of transformation which is called a perspective transformation. Now this
perspective transformation is very, very important to understand how a point in three-
dimension, in the 3D world is imaged by a camera. So this perspective transformation is also
known as an
327
imaging transformation. And, the purpose of this imaging transformation is to project a 3D
point, a 3D world point into the image plane. And this gives an approximation of, to the
image formation process which is actually followed
followed by a camera.
Now let us see what is this perspective transformation.
Here we have shown a figure, which is an approximation of the image formation process.
Here you find that, we have two coordinate systems which are superimposed one over the
other. One is the 3D world coordinate system represented by (X, Y, Z), so this is the 3D
world coordinate system and I also have the camera coordinate system which is given by (x,
y, z). Now here we have assumed that this camera coordinate system and the 3D world
328
coordinate system, they are perfectly aligned, that is, X axis of the 3D world coordinate
system coincides with the x axis
of the camera coordinate system, Y axis of the world coordinate system coincides with the y
axis of the camera coordinate system; similarly, the Z axis of the world coordinate system
coincides with the z axis of the camera coordinate system. They have, both these coordinate
systems have the same origin.
Now if I have a point (X, Y, Z)
in 3D, so this is the point (X, Y, Z) in three-dimension and I assume that the center of the lens
is at the location (0, 0, λ). So obviously the λ which is, the z coordinate of the lens center, this
is also nothing but the focal length of the camera and this (X, Y, Z), this particular 3D point,
329
we assume that it is mapped to the camera coordinate given by (x, y). Now our purpose is,
that if I know this 3D coordinate system (X, Y, Z) and I know the value of λ that is the focal
length of the camera, whether it is possible to find out the coordinate, the image coordinate
corresponding to this 3D world coordinate (X, Y, Z).
330
(Refer Slide Time 03:56)
similar triangles. So here, what we do is, by using the similar triangles, we can find out an
given by Y . So from this I can find out that the image coordinates of the 3D world
(Z-λ)
Now these expressions also can be represented in the form of a matrix and here we will find
that if I go for homogenous coordinate system then this matrix expression is even simpler.
331
So let us see, what is this homogenous coordinate system? Homogenous coordinate system is
if I have the Cartesian coordinate (X, Y, Z), then we have said that in unified coordinate
system we just append a value 1 as an additional component. Homogenous coordinate system
is instead of simply adding 1, we add an arbitrary non-zero constant say k and multiply all the
coordinates X, Y and Z by the same value k. So given the Cartesian coordinate (X, Y, Z), I
can convert this to homogenous
coordinate by (k X, k Y, k Z).
The inverse process is also very simple. That if I have a homogenous coordinate then what I
have to do is I have to divide all the components of this homogenous coordinate by the fourth
term. In this case the fourth term is k and all other terms were (k X, k Y, k Z). So if I divide
all this three
332
(Refer Slide Time 06:22)
333
(Refer Slide Time 06:31)
the 3D point, the coordinates from the Cartesian coordinate system to the
homogenous coordinate system and I can also very easily convert from homogenous
coordinate, from homogenous coordinate system to Cartesian coordinate system.
Now to understand the imaging process,
334
(Refer Slide Time 06:49)
1 0 0 0
0 1 0 0
let us define a perspective transformation which is given by P .
0 0 1 0
0 0 -1 λ 1
And we translate our world coordinate w to the homogenous coordinate so it becomes (k X, k
Y, k Z, k). Now if I translate, if I transform this homogenous world coordinate by this
prospective transformation matrix p then I get the homogenous camera coordinate ch which is
kX
kY
given by . So this is the homogenous camera coordinate.
kZ
k( Z λ) + k
335
(Refer Slide Time 07:50)
camera coordinate to the Cartesian camera coordinate, I find that the Cartesian camera
x λX/(λ-Z)
coordinate is given by c = y = λY/(λ-Z) . So, on the right hand side this X, Y and Z they
z λZ/(λ-Z)
are all in upper case indicating those are the coordinates of the world point.
Now if I compare these expressions
336
(Refer Slide Time 08:30)
with the camera coordinates that we have obtained with respect to our previous diagram, you
find that, here we get x = λX/(λ - Z) , similarly y is also nothing but y = λY/(λ - Z) . So using
our previous diagram we have also seen that these lower case x and lower case y, they are the
image points on the image plane of the world coordinate X, Y and Z. So this shows clearly
that using the perspective transformation that we have defined as the matrix
337
(Refer Slide Time 09:24)
that is
338
(Refer Slide Time 09:43)
the value of z, which is not of importance in our case because in the camera coordinate the
value of z is always equal to 0 because we are assuming that
(Refer Slide Time 09:59)
the imaging plane is the x-y plane of the world coordinate system as well as the camera
coordinate system. So we will stop our discussion here today and in the next class we will see
that as we have seen with the perspective transformation, we can transform a world
coordinate, we can project a world point, a 3D world point on to an imaging plane, similarly
using the inverse perspective transformation, whether it is possible that given a point in an
image plane, whether we can find out the corresponding 3D point in the 3D world coordinate
system. Thank you
339
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 03 Lecture Number 12
Image Formation - 2
(Refer Slide Time 00:20)
Hello, welcome to the video lecture series on Digital Image Processing and we have said that
these transformations are very, very useful to understand the image formation process. So in
the last class, what we had talked about is
the basic transformations and we had talked about the transformations like translation,
rotation and scaling and these transformations we have said both in the two-dimension and
the three-dimensional cases. Then, for all these transformations we have also seen what is the
340
corresponding inverse transformations. Then after that we have gone for the conversion from
the Cartesian coordinate system to homogenous coordinate system and we have seen the use
of homogenous coordinate system in perspective transformation where perspective
transformation we have said is an approximation of the imaging process
so that when a camera takes an image of a point in a three-dimensional world, then imaging
transformation can be approximated by the perspective transformation that we have discussed
in the last class.
Today
341
(Refer Slide Time 01:47)
that the perspective transformation takes an image of a point or a set of points in the three-
dimensional world and these points are mapped to the imaging plane, which is a two-
dimensional plane. The inverse perspective transformation just does the reverse process that
is given a point in the imaging plane; we will see that using this inverse perspective
transformation whether it is possible to find out that what is the point in the three-
dimensional coordinate system to which this particular image point corresponds. Then we
will also talk about the imaging
geometry, where the world coordinate system and the camera coordinate system are not
aligned. You try to remember in the last class that the imaging
342
(Refer Slide Time 02:39)
geometry that we had considered, there we had assumed that the three dimensional world
coordinate system is aligned with the camera coordinate system that is x axis of the camera is
aligned with the x axis of the 3D world, y axis of the camera is aligned with the y axis of the
3D world and z axis of the camera is also aligned with the z axis of the 3D world. In addition
to that the origin of the camera coordinate system also coincides with the origin of the image
coordinate system. In today's lecture we will take a generalized imaging model where the
camera coordinate system and 3D the world coordinate system they are not aligned which is a
general situation. Then we will try to see that, what are the transformations
343
which are involved in such a generalized imaging set up, which will help us, to understand
the image formation process in a generalized set up. Then we will illustrate this concept with
the help of an example
Now let us briefly recapitulate
what we had done in the last class. This figure shows the imaging geometry that you had
considered where the 3D world coordinate system is aligned with the
camera coordinate system. There we have taken a 3D point whose coordinates are given by
(X, Y, Z) all in the capital and (x, y) lower case coordinates are the corresponding
344
(Refer Slide Time 04:21)
image points in the imaging plane. And we have assumed that the focal length of the camera
is lambda. That means the coordinate of the focal point of the lens center is (0, 0, λ). Now
using this particular figure
we have tried to find out a relation between the 3D world coordinate (X, Y, Z) and the
corresponding
345
(Refer Slide Time 04:50)
image point which is (x, y). For that what we have done is we have
taken a conversion from the Cartesian coordinate system to a homogenous coordinate system.
So while doing this conversion what we have done is, every component of the coordinate,
that is (X, Y, Z) is multiplied by a non-zero arbitrary constant k and the same value of k is
appended with the three components. For Cartesian coordinates, (X, Y, Z) the corresponding
homogenous coordinate is given
346
(Refer Slide Time 05:27)
by (k X, k Y, k Z, k)
So for a world coordinate point (X, Y, Z) once we have the corresponding homogenous
coordinate (k X, k Y, k Z, k) then we found that after this conversion if we define
perspective transformation, so
347
(Refer Slide Time 05:51)
1 0 0 0
0 1 0 0
this perspective transformation matrix which in this case is and the
0 0 1 0
0 0 -1 λ 1
homogenous coordinate wh is transformed with this perspective transformation matrix P then
what we get is the homogenous coordinate of the camera point to which this world point w
will be mapped and the homogenous coordinate of the camera point of the image point after
kX
kY
the perspective transformation is obtained as and you will see that if convert
kZ
k( Z λ) + k
this homogenous camera point, the homogenous image point
348
(Refer Slide Time 06:45)
this conversion
349
(Refer Slide Time 06:50)
x λX/(λ-Z)
gives us the Cartesian coordinates of the image point as c = y = λY/(λ-Z) . So you just
z λZ/(λ-Z)
note that (x, y, z) in the lower case letters, these indicate the
(Refer Slide Time 07:16)
camera coordinate, the image coordinate whereas, (X, Y, Z) in the capital form, this
represents the coordinate in the 3D world or the 3D coordinate of the world point w.
Now what we are interested in is the camera coordinate x and y at this moment we are not
interested in the image coordinate z. So this can be obtained
350
(Refer Slide Time 07:43)
by simple conversion that if we find out the value of lower case z with respect to λ and capital
Z, then after solving the same equation, here that is lower case x lower case y and lower case
z. We find that the image coordinate x and y in terms of
(Refer Slide Time 08:26)
the 3D coordinate capital X and capital Z is given by x = λX/(λ - Z) and image coordinate
y = λY/(λ - Z) . So as we said the other value that is the z coordinate in the image plane is of
no importance at this particular moment but we will see later that when we talk about inverse
prospective transformation, when we try
351
(Refer Slide Time 09:02)
an image point to the corresponding 3D point in the 3D world then we will make use of this
particular coordinate z in the image plane as a free variable. So now let us see that,
(Refer Slide Time 09:17)
352
(Refer Slide Time 09:26)
a 3D point on to a point in the image plane The purpose of inverse perspective transformation
is just the reverse. That is, given a point in the image plane, the inverse perspective
transformation or P-1 tries to find out the corresponding 3D point in the 3D world. So for
doing that, again we make use of the homogenous coordinate system that is the camera
coordinate c of the image coordinate point c will be replaced, will be converted to the
corresponding homogenous form which is given by ch and the world coordinate, the world
point w will also be obtained in the form in the homogenous coordinate form wh. And we
353
1 0 0 0
0 1 0 0
-1
define inverse perspective transformation P = and you can easily
0 0 1 0
0 0 1λ 1
verify
that this matrix, this transformation matrix is really an inverse of the perspective
transformation matrix P because if we multiply the perspective transformation matrix by this
matrix P-1 what we get really is a unitary matrix.
Now given this
(Refer Slide Time 10:55)
354
(Refer Slide Time 10:57)
as we said that if we assume an image point say (xo, yo) and we want to find out, what is the
corresponding 3D world point w to which this (xo, yo)
(Refer Slide Time 11:12)
image point corresponds? So the first step that we will do is to convert this image point (xo,
yo) to the corresponding homogenous coordinate which will be obtained as kxo, kyo and
355
0 and the fourth component comes as k now here you find that the third component or z
coordinate we had taken as 0 because what we have is a point in two-dimension that is on the
imaging plane, so we have assumed the z coordinate to be 0.
(Refer Slide Time 11:47)
kxo
kyo
Now if we multiply or if we transform this homogenous coordinate with the inverse
0
k
perspective transformation P-1 then what we get is the homogenous coordinate corresponding
kxo
kyo
to the 3D world point which is obtained as wh as given in this equation wh
0
k
356
(Refer Slide Time 12:19)
Now from this homogenous coordinate system, if I convert this to the Cartesian coordinate
form then the Cartesian coordinate corresponding to this homogenous coordinate is obtained
as w equal to capital X, Y and Z, which is nothing but xo, yo and 0. So you find that in this
particular case the 3D world coordinate is coming as xo, yo and 0, which is the same point
from where we have started, that is the image point from where we had started.
(Refer Slide Time 12:55)
Moreover, for all the 3D coordinate points, the z component always comes as 0.
Obviously this solution is not acceptable because for every coordinate or for every point in
the three-dimensional world, the z coordinate cannot be 0. So what is the problem here? If
you remember the figure of
357
(Refer Slide Time 13:21)
imaging system that we have used, let me just redraw this particular figure we had an
imaging plane x y plane like this
(Refer Slide Time 13:33)
on which the camera coordinate system and the image coordinate system, camera coordinate
system and the 3D world 3D coordinate system they are perfectly aligned. So we had this x
same as capital X, we had this y same as capital Y, we had this z same as capital Z and this is
the origin of both the coordinate systems and we had somewhere the optical center of the
lens.
358
(Refer Slide Time 14:09)
Now if I take some point here, some image point here and if I draw a line passing through
this image point and the camera optical center and the world point w comes somewhere at
this location. So we have seen in the previous figures that this point, if I call this point as c,
this point c is the image point corresponding to this 3D world point w whose coordinate is
given by (X, Y, Z) and this c in our case has a coordinate of (xo, yo, 0). And when we have
tried to map it back, this point c to the 3D world coordinate system, what we have got is for
every point w the value of z cannot be 0.
Now the problem comes here is because of the fact that if I analyze this particular mapping
that is mapping of point w in the
359
(Refer Slide Time 15:17)
3D world to point c in the image plane, this mapping is not a one to one mapping. Rather it is
a many to one mapping. Say for example, if I take any point
(Refer Slide Time 15:29)
on this particular straight line passing through the point c and the point (0, 0, λ) which is
nothing but the optical center of the camera lens then all these points on this line will be
mapped to the same point c in the image plane so
360
(Refer Slide Time 15:49)
Naturally, this being a many to one mapping when I do the inverse transformation using the
inverse perspective transformation matrix from image point c to the corresponding 3D world
the solution
(Refer Slide Time 16:02)
that I get cannot be acceptable solution. So we have to have to have something more in this
formulation and let us see what is that we can add
361
(Refer Slide Time 16:12)
over here
(Refer Slide Time 16:13)
Now here
362
(Refer Slide Time 16:16)
if I try to find out equation of the straight line which passes through the point (xo, yo, 0)
(Refer Slide Time 16:24)
that is the image point and the point (0, 0, λ) that is the optical center of the camera lens the
xo yo
equation of the straight line will come of this form that is X= (λ - Z) and Y= (λ - Z) . So
λ λ
this is the equation of the straight line so that
363
(Refer Slide Time 16:51)
every point in this straight line is mapped to the same point (xo, yo) in the image plane.
So the inverse perspective transformation as we have said that it cannot give you a unique
point in the 3D world because the mapping, the perspective transformation was not a one to
one mapping. So by using the inverse perspective transformation even if we can't get exactly
the 3D point but at least the inverse transformation matrix should be able to tell me the points
belonging to which particular line maps to this point (xo, yo) in the image plane. So let us see
if we can have this information at least.
So for doing this,
in earlier case when we have converted the image point (xo, yo) to the homogenous coordinate
then we had taken (kxo, kyo, 0, k). Now here for the z coordinate, what we will do is instead
364
of assuming the z coordinate to be zero we will assume the z coordinate to be a free variable.
So, in our homogenous coordinate we will assume the homogenous coordinate to be (kxo, kyo,
kz, k). Now this point when it is inverse transformed using the inverse transformation matrix
then what we get is the world coordinate, the world point in homogenous coordinate system
kxo
kxo
as wh = P-1 ch and in this particular case you find that this wh = kx , so this wh we have
kz + k
λ
got in the homogenous coordinate system. Now what we have to do is, this homogenous
coordinate, we have to convert to the Cartesian coordinate system and as we have said earlier
that for this conversion we have to divide all the components with the last components in this
kz
case kxo , kyo , kz all of them will be divided by + k . So after doing this division
λ
operation,
what I get in the Cartesian coordinate system is w equal to sorry, it is not c, it should be w, so
X λxo/(λ+z)
w = Y = λyo/(λ+z) . So on the right hand side all the z's are the lower case letters which is
Z λz/(λ+z)
the free variable that we had assumed that we had used for the image
365
(Refer Slide Time 20:04)
coordinate. And for the matrix, the column matrix on the left hand side, all X, Y, Z are in the
upper case letters which indicates that these X, Y, Z are the 3D coordinates.
So now what we do is
366
(Refer Slide Time 20:23)
we try to solve the values, solve for the values of capital X and capital Y. So just from this
λxo λyo λz
previous matrix you find that capital X = , capital Y = and capital Z = . So
λ+z λ+z λ+z
from these three equations I can obtain capital X equal to x naught by lambda into lambda
minus z and capital Y equal to y naught by lambda into lambda minus z.
So if you recall the equation of the straight line that passes through (xo, yo) and (0, 0, λ) you
xo
find that the equation of the straight line was exactly this that is capital X= (λ-Z) and
λ
yo
capital Y= (λ - Z) . So using this inverse perspective transformation, we have not been able
λ
to identify the 3D world point, which is of course not possible but we have been able to
identify
367
(Refer Slide Time 21:39)
the equation of the straight line so the points on this straight line maps to the image point (xo,
yo) in the image plane. And now if I want to exactly find out a particular 3D point to which
this image point (xo, yo) corresponds then I need some more information. Say for example I at
least need to know what is the z coordinate value of the particular 3D point w and once we
know this then using the perspective transformation along with this information of the z
coordinate value we can exactly identify the point w which map to point (xo, yo) in the image
plane. Thank you.
368
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 03 Lecture Number 13
Image Geometry - 1
(Refer Slide Time 00:19)
Hello, welcome to the video lecture series on Digital Image Processing. Indentify the point w
which maps to point (xo, yo) in the image plane. Now till now, all the discussions that we had
done, for all these discussions we have assumed that the image coordinate system and the
camera coordinate system, they are perfectly aligned. Now let us discuss about a general
situation where the image coordinate system and the camera coordinate system, they are not
perfectly aligned. So here we assume that the
369
camera is mounted on a gimbal. So if you mount the camera on the gimbal, then using the
gimbal the camera can be given a pan of angle theta, it can also be given a tilt by an angle
alpha. So you remember that pan is the rotation around z axis and the tilt is the rotation
around x axis. We also assume that the gimbal center is displaced from the 3D world
coordinate origin (0, 0, 0) by a vector wo which is equal to (Xo, Yo, Zo) and finally, we also
assume that the camera center or the center of the imaging plane is displaced from the gimbal
center by a vector r, which will have components say r1, r2 and r3 in the x, y and z direction of
the 3D world coordinate system.
Now here our interest is, given such a type of imaging arrangement, now if we have a point in
the 3D world coordinate w what would be the camera coordinate, what would be the image
point c to which this world point w will be mapped? So this is a general situation. And now
let us see how we can obtain the solution to this particular problem that for this generalized
imaging setup, for a world point w what will be the corresponding image point c?
So
370
(Refer Slide Time 02:43)
So the steps would be like this. Since our earlier formulations were very simple in which case
we had assumed that both the camera coordinate system and the 3D world coordinate system,
they are perfectly
aligned, in this generalized situation we will also try to find out a set of transformations
which, if applied one after another will bring the camera coordinate system and the world
coordinate system in perfect alignment. So once that alignment is made, then we can apply
the perspective transformation to the transformed 3D world points and this perspective
transformation to the transformed 3D world points give us the corresponding image
coordinate of the transformed point w. So what are the transformation steps that we need in
this particular case?
371
(Refer Slide Time 03:40)
So the first step is, we assume that the image coordinate system and the 3D world coordinate
system, they are perfectly aligned. So from, this we displace the gimbal center from the origin
by the vector wo and after displacing the gimbal center from the origin by wo, we pan along x
axis by an angle θ followed by tilt around z, tilt of z axis by angle α which will be followed
by the final displacement of the image plane with respect to gimbal center by the vector r. So
we have 4 different transformation steps
which are to be applied one after another and these transformation steps will give you the
transformed coordinates of the 3D world point w. So let us see how this transformation is to
be applied one after another.
372
(Refer Slide Time 04:43)
So here on the left hand side we have shown a figure, where the camera coordinate system
and the world coordinate system are perfectly aligned. Now from this alignment if we give a
displacement by a vector wo to the gimbal center then, the camera will be displaced as shown
on the right hand side of the figure where you find that the center is displaced by vector wo.
You remember that if I displace the camera center
by vector wo then all the world coordinates, all the world points will have displaced by a
vector - wo with respect to the camera. Now you just recollect that when we tried to find out
the image point of a 3D world point then the image point, the location of the image point is
decided by the location of the 3D world point with respect to the camera coordinate. It is not
with respect to the 3D world coordinate.
373
So in this case, also after the set of transformations, we have to find out what are the
coordinates of the 3D world point with respect to the camera coordinate system, where
originally the coordinates of the 3D world point are specified with respect to the 3D world
coordinate systems. So here as we displace the camera center by vector wo, so all the world
coordinate points, all the world points will be displaced by the vector which is negative of wo
that is by -wo and if wo has components xo along x direction, yo along y direction and zo along
z direction, so the corresponding transformation to the 3D points will be -xo, -yo and -zo. And
we have seen earlier that if a 3D point is to be displaced by be -xo, -yo and -zo, then in the
uniform representation the corresponding transformation matrix for this translation is given
1 0 0 -XO
0 1 0 -YO
by G .
0 0 1 -ZO
0 0 0 1
So this is a transformation matrix which translates all the world coordinates, all the world
points by vector (-xo, -yo, -zo) and this transformation is now
374
(Refer Slide Time 07:33)
after this displacement, we pan the camera by angle θ. And this panning is done along the z
axis. So when we pan along the z axis, the coordinates which are going to change
375
(Refer Slide Time 07:56)
is the x coordinate and the y coordinate The z coordinate value is not going to change at all,
and for this panning by an angle θ, again we have seen earlier that the corresponding
cosθ sinθ 0 0
-sinθ cosθ 0 0
transformation matrix for rotation theta is given by Rθ = .
0 0 1 0
0 0 0 1
So when we rotate the camera by an angle θ all the world coordinate points, all the world
points will be rotated by the same angle θ but in opposite direction and that corresponding
matrix will be given by this matrix Rθ.
So we have completed two steps, first displacement of the camera center with respect to the
origin of the world coordinate
376
(Refer Slide Time 08:54)
system, then panning the camera by angle θ The third step is, now we have to tilt the camera
by an angle α. And again we have to find out what is the corresponding transformation for
this tilt operation which has to be applied to all the 3D points. So for this tilt
377
(Refer Slide Time 09:36)
that these are the basic transformations which we had already discussed in the previous class
and how these transformations are being used to understand the imaging process.
So so far we have applied one displacement and two rotation transformations along Rθ and
Rα. Now find that this Rθ and Rα
they can be combined into a single rotation matrix R which is equal to Rα concatenated with
Rθ and the corresponding transformation matrix
cosθ sinθ 0 0
-sinθcosα cosθcosα sinα 0
R = .
sinθsinα -cosθsinα cosα 0
0 0 0 1
378
(Refer Slide Time 10:33)
that we have to give, to the camera center or the center of the imaging plane from the gimbal
center by a vector r and this vector r has the components r1, r2 and r3 along x, y and z
directions. And by this transformation, now all the world points
379
(Refer Slide Time 11:01)
are to be transformed, are to be translated by a vector (-r1, -r2, -r3) and the corresponding
1 0 0 -r1
0 1 0 -r2
translation matrix now will be T = .
0 0 1 -r3
0 0 0 1
380
(Refer Slide Time 11:30)
381
(Refer Slide Time 11:43)
that these transformations T, R and G taken together on this homogenous coordinate wh gives
you the homogenous transformed point w as seen by the camera. And once I have these
transformed
382
(Refer Slide Time 12:21)
ch is given by ch = PTRGwh . So you remember that this coordinate comes in the homogenous
form.
Then final operation that you have to do is to convert this homogenous coordinate ch into the
corresponding Cartesian coordinate c, so that Cartesian coordinate,
383
(Refer Slide Time 12:46)
if I solve those equations, will come in this form, you can try to derive these equations that
(X-Xo)cosθ + (Y-Yo)sinθ - r1
x =λ and the camera image
-(X-Xo)sinθsinα + (Y-Yo)cosθsinα - (Z-Zo)cosα + r3 + λ
(X-Xo)sinθcosα + (Y-Yo)cosθcosα + (Z-Zo)sinα - r2
coordinate y = λ .
-(X-Xo)sinθsinα + (Y-Yo)cosθsinα - (Z-Zo)cosα + r3 + λ
So these are
384
(Refer Slide Time 13:53)
the various transformation steps that we have to apply if I have a generalized imaging set up
in which case the 3D coordinate axis and camera coordinate axis, they are not aligned. So the
steps that we have to follow is first we assume that the camera coordinate axis and the 3D
coordinate axis, they are perfectly aligned. Then give a set of transformations to the camera
to bring to its given setup and apply the corresponding transformations but in the reverse
direction to the 3D world coordinate points. So by applying these transformations to the 3D
world coordinate points the image points, the 3D world coordinate points as seen by the
camera will be obtained in the transformed point and after that if I apply the simple
perspective transformation to this transformed 3D points, what I get is the image point
corresponding to those transformed 3D world points.
Now let us try to see an example to illustrate this operation.
385
(Refer Slide Time 15:02)
So let us take a figure where we assume that, the camera or the center of the imaging plane of
the camera is located at location (0, 0, 1) with respect to the 3D world coordinate system (x,
y, z). And we have an object placed in the x-y plane where one of the corners of the object is
at location (1, 1, 0.2) and we want to find out that what will be the image coordinate for this
particular 3D world point which is now a corner of this object
as placed in this figure. So what we will try to do is, we will try to apply the set of
transformations to the camera plane one after another and try to find out
386
(Refer Slide Time 16:04)
that what are the different transformations that we have to apply or what are the different
corresponding transformations to the 3D world point that will bring, that will give us the
world coordinate points, world points as seen by the camera. So initially I
Assume, again that all the points or the image coordinate system and the camera coordinate
system, they are perfectly aligned. Now after this assumption, what I have to do is, I have to
give a displacement to the camera so, by the vector (0, 0, 1) so what I will do is I will bring
the camera to a location here. So this is my,
387
(Refer Slide Time 16:58)
camera where the image plane center is at location (0, 0, 1). So this is my x axis, this is my y
axis, this is the z axis. Now if I do this transformation then you find that all the 3D points will
be transformed by the vector
388
(Refer Slide Time 17:35)
1 0 0 0
0 1 0 0
points in the 3D coordinate system is given by .
0 0 1 1
0 0 0 1
So this is the first transformation that has to be applied to all the 3D points Now after the
camera is displaced by the vector (0, 0, 1), the next operation that you have to apply is to pan
the camera by an angle 135o. I just forgot to mention that in that arrangement, that pan was
135o; the tilt was also 135o. So after the initial
389
(Refer Slide Time 18:28)
transformation, displacement of the camera by vector (0, 0, -1), we have to apply a pan of
135o to this camera. So if I
represent that, let us take a one, two-dimensional view. As we said, panning is nothing but
rotation around z axis, so if I say that this is the x axis, this is the y axis then by panning we
have to make an angle of 135o between the x axis of the camera coordinate system so the
situation will be something like this. So this is the y axis of the camera coordinate system.
This is the x axis of the camera coordinate system and by pan of 135o, we have to rotate the
camera imaging plane in such a way that the angle between the x axis of the camera
coordinate and the x axis of the 3D world coordinate is 135o.
390
(Refer Slide Time 19:38)
And once we do this, here you find that this rotation of the camera is in the anti-clockwise
direction.
391
(Refer Slide Time 19:59)
cos135° sin135° 0 0
-sin135° cos135° 0 0
which is now given by Rθ = . So this is the rotation
0 0 1 0
0 0 0 1
transformation
that has to be applied to all the world coordinate points. So after we apply this Rθ the next
operation that we have to perform is to tilt the camera by
392
(Refer Slide Time 20:51)
an angle 135o.
So again to have the look at this tilt operation, we take again a two dimensional view so the
view will be something like this. So we take the z axis of the 3D world coordinate system and
in this case it will be the y-z plane of the 3D world coordinate system. And by tilt what we
mean is something like this. This is the z axis of the camera coordinate system
and the angle between the z axis of the 3D world coordinate system and the camera
coordinate system is again 135o. So this is the angle, tilt angle α. So here again you find that
the tilt is in the anti-clockwise direction so the corresponding transformation in the 3D world
point will be
393
(Refer Slide Time 21:53)
rotating the 3D world point by 135o in the clock wise direction around the
394
(Refer Slide Time 22:37)
the transformation matrix that has to be applied to the tilt operation So after doing this, you
find that the 3D world coordinate, the 3D world point for which we want
395
(Refer Slide Time 23:03)
This is the 3D world coordinate point and after application of all these transformations, the
y,
transformed coordinate of this 3D world point, if we write it as (x, z)
and this has to be
x 1
1
y
represented in unified form, so this will be like this, = Rα Rθ T in the unified form.
0.2
z
1 1
Now if I compute this
396
(Refer Slide Time 23:50)
we have just computed, that we have just derived, you will find that this transformation
matrix
0.707 0.707 0 0
0.5 0.5 0.707 0.707
can be computed . This is the overall transformation
0.5 0.5 0.707 0.707
0 0 0 1
matrix which takes care of
397
(Refer Slide Time 24:45)
the translation of the image plane, then pan by angle θ and also tilt by angle α. So if I apply
this transformation to my original 3D world coordinates which was (1, 1, 0.2, 1) then what I
get is the coordinates of the point as observed by the camera. So if you compute this, you will
find that this will come in the form (0, 0.43, 1.55, 1) again this is in the unified form. So the,
corresponding world Cartesian coordinates will be given by x=0, y=0.43, z=1.55 . So these
398
(Refer Slide Time 2:53)
λx
length of the camera is 0.035 then we obtain the image coordinates as x = , which will
λ - z
λy
be in this case, 0 and y = , which if you compute, this will come as -0.0099. So these are
λ - z
the image coordinates of the world coordinate point that we have considered. Now note that
the y coordinate in the image plane
399
has come out to be negative This is obvious because the original 3D world coordinate as
obtained by, after applying the transformations came out to be positive so obviously in case
of image plane, there will be an inversion so this value of y coordinate will come out to be
negative. So this particular example
illustrates the set of transformations that we have to apply followed by the perspective
transformation so that we can get the image point for any arbitrary point in the in the 3D
world.
So with this, we complete our discussion on the different transformations and the different
imaging models that
400
(Refer Slide Time 27:46)
we have taken.
Thank you.
401
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 03 Lecture Number 14
Image Geometry - 2
(Refer Slide Time 00:20)
Hello, welcome to the video lecture series on Digital Image Processing. In a 3D coordinate
system where the coordinate system and the camera coordinate system are not perfectly
aligned, in that case what are the set of transformations which are to be applied to the points
in the 3D world coordinate system, which will be transformed in the form as seen by a
camera then followed by that if we apply the perspective transformation then, we get the
image coordinates for different points in the 3D world coordinate system.
So what we have seen in the last class is
402
(Refer Slide Time 01:02)
how to apply the inverse perspective transformation to get the equation of the straight line so
that the points on that straight line map to a particular image point on the imaging plane Then
we have seen a generalized imaging geometry
403
(Refer Slide Time 01:24)
where the world coordinate system and the camera coordinate system are not aligned and we
have also discussed the set of transformations
404
(Refer Slide Time 01:39)
we have also seen how to find image coordinate for any arbitrary point in the 3D world
coordinate system in such a generalized imaging setup and the concept, we have illustrated
with the help of an example.
In today's lecture, we will see that given an imaging setup, how to calibrate the camera and
then we will also explain the concept of how to extract the 3D point from two images, which
is also known as stereo images.
405
(Refer Slide Time 02:19)
So in the last class that we have done is we have given an imaging setup like this while the
3D world coordinate system is given by (X, Y, Z). In this world coordinate system, we had
placed a camera, where that
406
(Refer Slide Time 02:37)
camera coordinate system is given by (x, y, z) and we have assumed that camera is placed, is
mounted on a gimbal, where the gimbal is displaced from the origin of the world coordinate
system
407
(Refer Slide Time 02:58)
is displaced from the gimbal by a vector r, the camera is given a pan of angle θ and it is also
given a tilt of angle α and in such a situation, if w is a point in the 3D world coordinate
system, we have seen that how to find out the corresponding image point corresponding to
point w in the image plane of the camera. So for that we have done
a set of transformations and with the help of the set of transformations, what we have done is
408
(Refer Slide Time 03:37)
we have brought the 3D world coordinate system and camera coordinate system in alignment
and after the 3D world coordinate system and the camera coordinate system are perfectly
aligned with that set of transformations then, we have seen that we can find out the image
point corresponding to any 3D world point by applying the perspective transformation. So the
type of transformations that
we have to apply is, first we have to apply that transformation for the displacement of the
gimbal center from the origin of the 3D world coordinate system by vector wo followed by a
transformation corresponding to the pan of x axis of the camera coordinate system by θ,
which is to be followed by a transformation corresponding to a tilt of the z axis of the camera
409
coordinate system by angle α and finally the displacement of the camera image plane with
respect to gimbal center by vector r. So
the transformation, the first transformation which translates the gimbal center
410
(Refer Slide Time 04:57)
1 0 0 -XO
0 1 0 -YO
by vector wo is given by the transformation matrix G . The pan of
0 0 1 -ZO
0 0 0 1
the x axis of the
411
(Refer Slide Time 05:30)
1 0 0 0
0 cosα sinα 0
angle α is given by the other transformation matrix Rα = .
0 -sinα cosα 0
0 0 0 1
412
(Refer Slide Time 05:51)
center of the image plane with respect to gimbal center by the vector r. So if, we assume that
r has the components, the vector r has components r1, r2 and r3 along the x direction, y
direction and z direction of the 3D world coordinate system then, corresponding
1 0 0 -r1
0 1 0 -r2
transformation matrix with respect to this translation is given by T = .
0 0 1 -r3
0 0 0 1
413
(Refer Slide Time 06:43)
We have also seen in the last class that the rotation matrices
414
(Refer Slide Time 07:14)
transformation matrices
415
(Refer Slide Time 07:21)
translation matrix G then by the rotation matrix R followed by the second translation matrix,
what we do is we align the coordinate system of the camera with the 3D world coordinate
system. That means now that every point in the 3D world will have a transformed coordinate
as seen by the camera coordinate system. So once we do this then finally, applying the
perspective transformation to these
416
(Refer Slide Time 07:56)
of the point in the image plane for any point in the 3D world coordinate system. So here you
find, the
final form of the expression is like this, that both of the world coordinate system and the
camera coordinate system, in this case they are represented in the homogenous form. So wh is
the homogenous coordinate corresponding to the world coordinate w and ch is the
homogenous form of the image coordinate c. So for a world point w whose homogenous
coordinate is
417
(Refer Slide Time 08:35)
represented by wh here you can find that. I can find out the image coordinate of the point w
again in the homogenous form which is given by this matrix equation that ch = PTRGwh .
And here you note that each of these transformation matrices that is P, T, R and G. all of
them are of dimension
418
(Refer Slide Time 09:05)
4 by 4. So when I multiply all these matrices together to give a single transformation matrix,
then the dimension of that transformation matrix will also 4 by 4. So what we have now is of
this form. So after doing this
transformation I can find out the image coordinate of the corresponding point w, where x and
y coordinates will be given by these expressions.
So after doing this what
419
(Refer Slide Time 09:38)
I have is, I have an equation, a transformation or a matrix equation so that for any world point
w, I can find out what is the homogenous coordinate
420
(Refer Slide Time 09:57)
421
(Refer Slide Time 10:29)
the transformations T, R and G, they depend on the imaging setup. So G corresponds to the
transformation of the gimbal center from the origin of the 3D world coordinate system, R
corresponds to the pan angle and the tilt angle and T corresponds to translation of the image
plane center from the gimbal center. So these 3 transformation matrices depend upon the
geometry of the imaging system where as the other transformation matrix that is P or
perspective transformation matrix, this is entirely a property of the camera because you will
find the components of this transformation matrix P has a term λ, which is equal to the focal
length of the camera.
422
(Refer Slide Time 11:26)
the focal length λ is known, I can find out what is the corresponding perspective
transformation matrix P, where as to find out the other transformation matrices like T, R and
G, I have to do the measurement physically that what is the translation of the gimbal center
from the origin of the 3D world coordinate system, what is the pan angle, what is the tilt
angle. I also have to measure physically that what is the displacement of the image center,
image plane center from the gimbal center. And in many cases measuring these quantities is
not very easy and it is more difficult if the imaging setup is changed quite frequently. So in
such cases it is always better that you first have an imaging setup and then try to calibrate the
423
imaging setup with the help of the images of some known points of 3D objects that will be
obtained with the help of the same imaging setup.
So by calibration, what I mean is as we said that now I have a combined transformation
matrix for the given imaging setup which is A, which is nothing
but the product of P, T R and G. So this being a 4 by 4 matrix what I have to do is I have to
estimate the different
element values of this matrix a. So if I can estimate the different element values of the total
transformation matrix A, from some known images, then given any other point in the 3D, I
can find out what will be the corresponding image point. Not only that if I have an image
point, a point in the image, by applying the inverse transformation I can find out what will be
424
the equation of the straight line on which the corresponding world point will be lying. So this
calibration means that we have to estimate the different values of this matrix A. Now let us
see how we can estimate
these values of the matrix A. So here you find. that we have this matrix equation which is of
this form that is ch = Awh, where we have said
425
(Refer Slide Time 14:12)
put in homogenous form and ch is the image point on the plane again in the homogenous form
and A is the total transformation matrix So here if the world point w
has the coordinates say (x, y, z), the corresponding homogenous coordinate system will be
kx
ky
given by wh = . So this will be the
kz
k
426
(Refer Slide Time 14:53)
homogenous coordinate wh, corresponding to the point w. Now without any loss of generality
I can assume the value of k equal to 1 So if I take k equal to 1 and if I expand this matrix
equation then what I get is, I get the component (ch1, ch2, ch3, ch4) this will be, now I expand
the matrix A also,
427
a11 a12 a13 a14
a21 a22 a23 a24
so a will have the components into the homogenous coordinate of the
a31 a32 a33 a34
a41 a42 a43 a44
X
Y
point in the 3D space, which is now . So you remember that we have now assumed
Z
1
(Refer Slide Time 16:21)
or I have to estimate the component values a11, a12, a13 and so on. Now here
428
(Refer Slide Time 16:39)
once I have the homogenous image coordinates ch1, ch2, ch3 and ch4, then we had already
discussed that the corresponding Cartesian coordinate in the image plane is given by
x = ch1 ch4 and y = ch2 ch4 . So this is simply a conversion
429
(Refer Slide Time 17:11)
coordinate system to the Cartesian coordinate system. Now here, if I replace the values of ch1
and ch2
by x times ch4 and y times ch4 in our matrix equation, then the matrix equation will look like
xch4
ych4
. This will be equal to
ch3
ch4
430
(Refer Slide Time 17:53)
X
Y
point coordinate in homogenous form which .
Z
1
431
(Refer Slide Time 18:36)
So if expand this matrix equation what I get is xch4 = a11X + a12Y + a13Z + a14 ,
432
(Refer Slide Time 19:40)
that while doing this matrix equation or while trying to solve these matrix equations, we had
ignored the third component in the image, image point That is because the third component
corresponds to the z value and we have said for this kind of calculation the z value is not
important to us. Now from these given 3 equations, what we can do is
we can find out what is the value of ch4 in terms of X, Y and Z and if replace these values of
ch4 in the earlier two equations then these two equations will simply be converted in the form
a11X + a12Y + a13Z - a41xX - a42xY - a43xZ - a44x + a14 = 0 and
433
(Refer Slide Time 21:02)
434
(Refer Slide Time 21:53)
x and y, are the coordinates in the image plane of a point in the 3D world coordinate system
whose coordinates are given by X, Y and Z. So if I take a set of images for which a point in
the 3D world coordinate system, that is X, Y and Z are known and also find out what is the
corresponding image point, image coordinate in the image plane, then for every such pair of
readings I get 2 equations, one is the first equation, other one is the second equation. Now if
you study this particular, these two equations
you find that there are 6 unknowns. The unknowns are there is a11, a12, a13, a41, a42, a43, a14,
a21, a22, a23 then you have a24. So the number of unknowns we have in this equation are 1, 2,
3, 4, 5, 6, 7, 8,9, 10, 11. So there, 11 or 12; 1, 2, 3, 4, 5, 6, 7, 8,9, 10, have I missed
435
something. Sorry there should be one more term here–a44x, and here should be one more term
–a44y, so this a44 this is another term,
so there are twelve unknowns. So for solving these 12 unknowns, we need 12 different
equations and for every known point in the
3D world, I get two equations. So if I take such images for 6 known points then I can find
out…
Thank you.
436
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 03 Lecture Number 15
Stereo Imaging Model - 2
(Refer Slide Time 00:19)
Hello, welcome to the video lecture series on Digital Image Processing. So this camera
calibration, using this procedure can be done for any given imaging setup. But the problem
still exists, that given an imaging point, an image point I cannot uniquely identify what is the
location of the 3D world point. So for the identification of the 3D world point or finding out
all the 3 x, y and z coordinates of a 3D world point I can make use of another camera. So let
us look at a setup like this
437
(Refer Slide Time 00:59)
where on the left side I have image 1, on the right side I have image 2. The image 1 is taken,
is captured with the help of one camera; image 2 is taken with the help of another camera. So
image 1 has the coordinate system say (x1, y1, z1), Image 2 has the coordinate system (x2, y2,
z2),. And we can assume that the 3D world coordinate system that (X, Y, Z) is aligned
with the left camera that means the left image coordinate system is same as the 3D world
coordinate system where as the right image coordinate system is different. Now once I have
this, given a point
438
(Refer Slide Time 01:47)
w in the three-dimensional world, three-dimensional space, you find that the corresponding
image point in image 1 is given by (x1, y1) and the image point for the same point w in image
2 is given by (x2, y2). I assume that both the cameras are identical that means they have the
same value of wavelength λ, so they will have the same perspective transformation as well as
inverse perspective transformation. Now once I know that in image 1, the image point
corresponding to point w is (x1, y1), then by applying the inverse perspective transformation I
can find out the equation of the straight line on which the three-dimension, the point w will
exist. Similarly, from image 2 where I know the location (x2, y2) of the image point if I apply
the
439
(Refer Slide Time 02:46)
inverse perspective transformation I also get equation of another straight line on which this
equation of w will exist. So now you find that by using these two images, I got equations of
two straight lines. So if I solve these two equations, then the point of intersection of these two
straight lines gives me the (X, Y, Z) coordinate of
point w. But here you find that we have taken a general stereo imaging setup, where there is
no alignment between the left camera and the right camera or between the first camera and
the second camera. So for doing all the mathematical operations what we have to do is, we
have to apply again a set of transformations to one of the camera coordinate systems so that,
both the camera coordinate systems are aligned. So these transformations will again involve,
may be a transformation for some translation, transformation for some rotation and possibly it
440
will also employ some transformation for scaling if the image resolution of both the cameras
are not same. So there will be set of transformations, a number of transformations and the
corresponding mathematical operations to align the 2 camera systems. But, here you find that,
positioning of the camera is in our control. So why do we consider such a generalized set up.
Instead we can arrange the camera in such a way that we can put the imaging plane of both
the cameras to be coplanar. And we use the coordinate systems in such a way that the x
coordinate system of one camera and the x coordinate system of the other camera are
perfectly aligned. There will be a displacement in the y axis and the displacement in the z
axis. So effectively the camera setup that we will be having
is something like this. Here you find that for the two cameras, the image plane 1 and the
image plane 2, they are in the same plane, the x axis of both the cameras, the camera
coordinate systems they are collinear. The y axis and z axis, they have a shift of value B. So
this shift B, this value B is called the camera displacement. We assume both the cameras are
identical otherwise, that is they have the same resolution, they have the same focal length λ.
Again here for the given 3D point w we have in image 1, the corresponding image point (x1,
y1) and image 2, we have the corresponding image point as (x2, y2). Now this imaging setup
can be seen
441
(Refer Slide Time 05:55)
as a section, where you find that this x-y plane of both the cameras are now perpendicular to
the plane. So I have x axis which is horizontal, the z axis which is vertical and the y axis
which is perpendicular to this plane. So in this figure I assume that, the camera coordinate
system of one of the cameras, in this case the camera 1 which is also called the left camera is
aligned with the 3D world coordinate system
(X, Y, Z), the coordinate system of the left camera is assumed to be (x1, y1). The coordinate
system of the right camera is assumed to be (x2, y2). Now given this particular imaging setup
you find that for any particular image point say w, with respect to the cameras, camera 1 and
camera 2, this point w will have the same value of the z coordinate. It will have the same
442
value of the y coordinate but it will have different values of the x coordinates because the
cameras are shifted or displaced only in the x axis, not in the z axis or y axis. So origin of this
world coordinate system and origin of the left coordinate system, they are perfectly aligned.
Now taking this particular imaging setup, now I can develop a set of equations. So the set of
equations will be something like this. We have seen that for image 1, for point w the
corresponding image point is at location (x1, y1). For the same point w in right image, the
image point is at location (x2, y2). So these are the image coordinates in the left camera and
the
and the right camera Now by applying inverse perspective transformation we find that the
equation of straight line with respect to left camera on which point w will lie is given by the
443
x1
equation say X1= (λ - Z) . Similarly, with respect to the right camera the equation of the
λ
straight line on which the
x2
same point w will exist is given by X2= (λ - Z) .
λ
where this capital X1 is the x coordinate of the point w with respect to the camera coordinate
of camera 1, and capital X2 is the x coordinate of the 3D point w with respect to
444
(Refer Slide Time 09:07)
the camera coordinate of the second camera. Now recollect the figure that we have shown
that is the arrangement of the camera, where the cameras are displaced by the displacement
B. So with respect to that camera arrangement we can easily find out that the value of X2 will
be simply
445
(Refer Slide Time 09:36)
this value of X2 = X1 + B in this particular equation then I get a set of equations, which
x1 x2
gives (λ - Z)+B = (λ - Z) . And from this I get an equation
λ λ
λB
of the form Z = λ - . So find that this Z is the z coordinate
x2 - x1
446
(Refer Slide Time 10:25)
coordinate system of the first camera, it is same as the coordinate system, the Z value with
respect to coordinate system of the second camera. It is also the Z value with respect to the
3D world coordinate system. So that means that it gives me what is
447
(Refer Slide Time 10:48)
the Z value of the 3D point for which the left image point was
(x1, y1) and the right image point was (x2, y2) and I can estimate this value of Z from the
knowledge
448
(Refer Slide Time 11:05)
of the wavelength λ, from the knowledge of the displacement between two cameras which is
B and from the knowledge
the difference of the x coordinates that x2 - x1 in the left camera, in the left image and the
right image. So this
449
(Refer Slide Time 11:23)
x2 - x1, this term, this particular quantity is also known as disparity. So if know this disparity
for a particular point
in the left image and the right image, I know the λ that is focal length of the camera
450
(Refer Slide Time 11:46)
and I know the displacement between the two cameras, I can find out what is the
corresponding depth value that is z. And once I know this depth value, I can also find the x
coordinate and y coordinate of the 3D point w with respect to the 3D world coordinate system
x1
for which we have already seen the equations are given by X = (λ - Z)
λ
(Refer Slide Time 12:18)
y1
and Y = (λ - Z) . So first we have computed
λ
451
(Refer Slide Time 12:26)
and the displacement between the cameras and then from this value of Z and the image
coordinates in say, left image that is (x1, y1), I can find out what is the X value, X coordinate
value
452
(Refer Slide Time 12:47)
that given a point in the left image, what will be corresponding point
453
(Refer Slide Time 13:03)
in the right image. So this is the problem which is called as stereo correspondence problem.
So in
today's lecture we are not going to deal with the details of the stereo correspondence problem,
that is how do you find out a point in the left image and the corresponding point in the right
image. But today what we will discuss is about the complexity of this correspondence
operation. So our problem is like this. We have a left image and we have a right image. So
this is the left image and this is the right image. So
454
(Refer Slide Time 13:50)
if I have a point say cL in the left image, I have to find a point cR in the right image which
corresponds to cL. And once I do this here
I find out what is the coordinate, image coordinate of this point cL which is say (x1, y1) and
which is image coordinate of this point cR which is (x2, y2). So once I know these
455
(Refer Slide Time 14:20)
image coordinates I can compute x2 - x1, which is the disparity and then this x2 - x1 is used for
the computation of the Z, value Z. Now what about the complexity of this search operation?
Say I identify a particular point say cL in the left image, then a corresponding
point cR in the right image may appear anywhere in the right image. So if I have
(Refer Slide Time 14:50)
456
the images whose dimensions are of order NxN that means I have N number of rows and N
number of columns then; you find that for every point in the left image I have to search N2
number of points in the right image, and because there are N2 of points in the left image, in
the worst case I have to search for N4 number of points
457
(Refer Slide Time 15:16)
correspondence for every point in the left image and the corresponding point in the right
image so this is a massive computation; so, how to reduce this computation? Fortunately
458
(Refer Slide Time 15:32)
the imaging geometry that we have used, that helps us in reducing the computation that we
will be doing. So you find that
λX1
for the point (X, Y, Z) in the 3D space, the corresponding left image is given by x1 = .
λ - Z1
So I assume
459
(Refer Slide Time 16:09)
that (X1, Y1, Z1), they are the image coordinates of the first camera. And I also assume that
(X2, Y2, Z2); they are the coordinates for the second camera. So this is for camera 1 and this is
for camera 2.
So with
λX1
respect to camera 1, the value of x1, image point x1 is given by x1 = . Similarly, y1, the y
λ - Z1
λY1
coordinate in the first image is given by y1 = . Now with respect to the second image
λ - Z1
460
(Refer Slide Time 17:02)
λX2
the image coordinate x2 is given by x2 = .
λ - Z2
λY2
Similarly, y2 is also given by y2 = .
λ - Z2
461
(Refer Slide Time 17:25)
Now we find that the imaging system or the imaging setup that we have used, in that we have
said, we have seen that Z1 = Z2, Y1 =Y2 but X1 X2. This is because
462
(Refer Slide Time 17:53)
the two cameras are displaced only in the x direction. They don't have any displacement in
the y direction. Neither they have any displacement in the z direction. So for both the camera
coordinate systems the x coordinate, sorry the z coordinate and the y coordinate value
for both the cameras will be same, whereas the x coordinate will be different. So by applying
that, since Z1 = Z2 and Y1 =Y2, so you find that among the image coordinates on the two
images image 1 and image 2, y1 will be equal to y2. So what does
463
(Refer Slide Time 18:40)
this mean? This means that whatever is the (x1, y1) value
of point cL in the left image, the corresponding right image point cR will have a different x
value, x coordinate value but it will have the same y coordinate value. That means two
corresponding points, image points must lie on the same row. So if I pick up cL belonging to
row i in the left image, the corresponding point cR in the right image will also belong to the
same row i. So by this for a given point I don't have to search the entire right image to find
out the correspondence but I will simply search that particular row to which cL belongs, that
particular row in right image to find out a correspondence. So this saves a lot of time for
searching for correspondence between a point in the left image
464
(Refer Slide Time 19:41)
and the corresponding point in the right image. So till now we have discussed that how using
two different cameras
and having a stereo imaging setup, we can find out the 3D world coordinates of the points
which have a point, an image point in the left image and a corresponding point in the right
image. But by studying this stereo imaging setup you can find out that it is, it may not always
be possible to find out a point in the right image for every possible point in the left image. So
there will be a certain region, there will be a certain region in three-dimensional space where
for which space for all the points in that space I will have image points both in the left image
and the right image but for any point outside that region I will have points only in one of the
images, either in the left image or in the right image but I cannot have points in both the
465
images. And unless I have points in both the images I cannot estimate the three-dimensional
(X, Y, Z) coordinate of those points. So till now we have seen that using a single camera
2 cameras and using stereo setup, I can estimate the depth value of the three-dimensional
points. Thank you.
466
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-16.
Interpolation and Resampling.
Hello, welcome to the video lecture series on Digital Image Processing. Till the last class we
have seen various geometric transformations. And we have seen how those geometric
transformations can be used to model and imaging a image formation process. We have also
seen that how to calibrate a camera given a particular imaging set up. And, we have also seen
that using two identical cameras, how we can have a studio imaging setup using which the
3D coordinate of a point in the 3 dimensional scene can be obtained.
Now, in today’s lecture we will try to explain some interpolation operations, we will explain
when the interpolation operation is needed. And at the end of today’s lecture, the students
will be able to write algorithms for different image transformations and the needed
interpolation operations. Now, let us see that why and when do we need image interpolation
and image desampling.
467
(Refer Slide Time: 1:42)
So, let us first introduce this problem. Say, for example if we have a 3x3 image like this. So,
we have this XY coordinate system and this XY coordinate system we have a 3x3 image. So,
I have an image pixel here, I have an image pixel here, I have an image pixel here, I have an
image pixel here, here, here, here, here and here. So, you can easily identify that the
coordinates of the image pixel, this particular image pixel is (0,0), this is (1,0), this is (2,0),
this is (0,1), this is (1,1), this is (2,1), this is (0,3), this is (1,3)and this is (3,3).
Now, let us try to apply some simple transformations, geometric transformations on these
images. Say for example I want to scale up this image by a factor of 3 in both the X
dimension and Y dimension. So, if I scale us this image by factor 3, in that case this 3 x 3
image will become a 9 x 9 image. And let us see how those image points how the pixels in
the 9 x 9 image can be obtained.
So, I just apply a scaling operation by factor Sx = 3 and Sy = 3. That is both in the X
direction and Y direction I am applying a scaling of factor 3. So, naturally this 3 x 3 image
after being scaled up by factor 3 in both the directions will be converted to an image of 9 x 9.
So, let us see how these pixel values will look like. Again I put this XY coordinate system,
x Sx 0 x
now you remember that this scaling operation is given by say = .
y 0 Sy y
468
Sx 0 x
Where this , this is the transformation matrix, is the coordinate of the pixel in
0 Sy y
x
the original image and , is the coordinate of the pixels in the transforming image. So, if I
y
simply apply this scaling transformation then obviously this (0,0) point will lie at location
(0,0) in the transformed image. But what will happen to the other image points, so because
this will be converted to a 9 x 9 image.
So, let us first form a 9 x 9 grid, ok. So, now you find that this (0,0) pixel even after this
scaling transformation remains at location (0,0). But, the other pixels say for example, this
pixel (1,0) which was originally at location x = 1 and y = 0 that will be transformed to y
coordinate will remain as 0 but now x coordinate will become equal to 3.
So, this point will be mapped to this particular location. Similarly, (2,0) will be mapped to
(6,0) location, so this becomes 1,2,3,4,5,6. So, this pixel will be mapped to this particular
location. Similarly, (0,1) pixel will now be mapped to (0,3) location. (0,2) pixel will now be
mapped to (0,6) location so 3,4,5,6. Similarly, I will have pixels in these different locations in
the scaled image.
But, you find that because in the original image I had 9 different pixels, even in the
transformed image I got 9 different pixels that is, 3 pixels in the horizontal direction and 3
pixels in the vertical direction. But because I am applying a scaling of factor 3 in both X
direction and Y direction my final image size after scaling should be 9 pixels in the horizontal
direction and 9 pixels in the vertical direction.
So we find that, there are many pixels which are not filled up in this scaled up image. So
those pixels some of them I can simply mark, say this is one pixel which has not been filled
up. This pixel has not been filled up, this pixel has not been filled up, this pixel has not been
filled up. This point has not been filled up, so likewise there are my many pixels in this
particular image, in this scaled up image which has not been filled up.
469
(Refer Slide Time: 8:03)
Let us try to take another example, say apply a instead of scaling I apply a rotation operation
to all these different pixels. So, I have this 3 x 3 pixel and suppose I rotate this image by an
angle of 45o in the clock wise direction. So, if I rotate this image by 45o in the clock wise
direction you know that we had a transformation matrix which takes care of rotation and that
cosθ sinθ
is given by .
-sinθ cosθ
So, this is the rotation matrix, which when applied to different pixels in the original image
will give you the pixels in the rotated mage. So, in the rotated image I can represent these
y) . Whereas my original pixels are (x, y). And in this particular case the value
pixels as (x,
of theta is simply 45o. So, if I apply this transformation, the rotation transformation to all the
pixels in the original image you find that (0,0) location will be transformed to location (0,0)
even in the transformed image.
(0,1) location will be transformed to (0.707, 0.707) location. Then (0,2) point will be
transformed to location (1.414, 1.414). Similarly, (1,0) location pixel at location (1,0) will be
transformed to location (0.707, -0.707). Location (1,1) will be transformed to location (1.414,
0) and location (1,2) will be transformed to location (2.121, 0.707). Similarly, the other
coordinates (2,0) this pixel will be transformed to location (1.414, -1.414), (2,1) this will be
transformed to location (2.121, -0.707). And (2, 2) this particular image pixel will be
transformed to location (2.828, 0).
470
So, these are the various transformed locations of the pixels in the rotated image. Now, if you
just look at these transform locations, you find that the coordinates that we get are not always
integer coordinates. In many cases, in fact in this particular example in most of the cases the
coordinates are real valued coordinates. But, whenever we are going to have a digital image
whether it is the original image or the transformed image. Even in the transformed image all
the row index and the column index should have an integer value.
I cannot represent any real number or a fractional number as a row index or a column index.
So, in this case whenever I I am going for this particular transformation, what I have to do is
whenever I am getting a real number as a row index or a column index. I had to take its
nearest integer where that particular pixel will be put. So, in this case for the original image
location (0,1) which has now been transformed to location (0.707, 0.707) this has to be
mapped to a pixel location (1,1) in the transformed image.
So, if I do that mapping, if you find that all the pixel locations will now be mapped like this.
So, I put a new grid and the new grid will appear like this. So, you find that the (0,0) location
has been mapped to (0,0) location. So, I have a point over here, pixel over here. (0,1) location
in the original image has been transformed to (0.707, 0.707) is a transformed image. So, what
I have to do is I have to take the nearest integer of these fractional numbers where this
particular pixel will be put.
So, (0.707) in the X direction and (0.707) in the Y direction will be mapped to (1,1) in the
transformed image. So, this (0,1) point will now be mapped to location (1,1) in the
transformed image. So, I get a point here. Similarly, (0,2) you find that the corresponding
transform location is (1.414, 1.414) again the nearest integer of (1.414) is 1. So this point
(0,2) will also be mapped to location (1,1) in the transformed image. (1,0) in the same way
will be mapped to location (1, -1), so (1,0) will be mapped to this particular point in the
transformed image.
(1,1) will be mapped to (1.4140) here again, by integer approximation this point will be
mapped to (1,0) location in the transformed image. So, I have a pixel in this particular
location, (2,1) point will be mapped to (2.121, .707). So, again by integer approximation I
map this pixel to (2,1), so this is the point where the pixel (2,1) of the original image will be
mapped.
471
In the same manner (2,0) point will be mapped to location (1, -1) where I already have a
particular point, (2,1) location will be mapped to location (2, -1). So, I will have a point here,
(2, 2) location will be mapped to (3, 0) in the transformed image. So, I will have a pixel in
this particular location. So we find that in the original image we had 9 pixels whereas, in this
rotated image we are going to have only 7 pixels.
And this comes because when you are rotating any image the integer coordinates in the
original pixel some of the turns out to be fractions or real numbers in the transformed image.
And these fractions or real numbers cannot be represented in digital form so we have to round
it. Round those numbers to the nearest integer and which gives leads to this kind of problem.
And not only this you find that if I just rotate this original image ideally my rotated image
should have been something like this, ok. But, here you find that there are a number of points
where I do not have any information. Say for example this point I do not have any
information, this point I do not have any information, this point I do not have any
information, similarly all these points I do not have any information. And the reason is
because of digitization.
So to fill up these points what I have to do is I have to identify in the transformed image what
are the locations where I do not have any information.
So, to take the simple case take the previous one. Here, you find that I do not any information
at location (1,1) of the transformed image or the scaled image. So, because I do not have any
472
information here now I have to look in the original image to find out which value should be
put at this particular location.
Now, because this image I have obtained using a scaling of 3, in both x and y direction. If I
want to go back to the original image, then to this transformed image I have to apply a
scaling of 1/3 in both x direction and y direction. Now we find that in the transform image
this particular location, this particular pixel has a coordinate of (1,1). So, if I transform this,
inverse transform this using the scaling factor of 1/3 and 1/3. I get in the original image the
x= 1/3, and y =1/3.
Now, here comes the problem. In the original image I have the informations at location (0,0),
I have the information location information at location (0,1), I have information at location
(1,0), I have information at location (1,1). But at location (1,3) and (1,3), I do not have any
information. Because you remember from our earlier classes that whenever we have gone for
image digitization the first step we had done was sampling and the second step that we had
done was quantization.
Now, the moment we sample the image is, what we have done is we have taken some
representative value from discrete grid points. We have not considered the intensity values at
all possible points in the continuous image. So in the process of sampling, whatever value
was there at location (1/3, 1/3) in the continuous image that information is lost. So in the
digital image at these particular location (1/3, 1/3), I do not have any information.
So what is the way out. Now the only process that I have to do is, I have to go for
approximation of the intensity value which should have been at this location (1/3, 1/3). Now
how to get that approximate value, that is the problem. So, only way in which this can be
done is, I have to interpolate the image in these points where I do not have any information.
And after interpolation, so in these locations where I do not have any information in my
discrete image I have to interpolate the values in these locations.
And, after interpolation, I have to check what should be the interpolated value at location
(1/3, 1/3). And, whatever value I get at this location (1/3, 1/3), these particular value has to be
taken to fill up this location (1,1) in the transformed image.
473
(Refer Slide Time: 20:34)
Similar is the case for the other transformation that is rotation, in this case also, this rotated
image we have obtained by rotating the original image by 45o in the clock wise direction.
So, whenever I find any point in the rotated image where there is no information. What I have
to do is, that particular coordinate I have to inverse transform that is I have to give a rotation
to that particular point by minus 45o go back to the original image point and obviously in this
case in the original image these row column indices will not be integers but they will be real
numbers or fractions.
And, because we have real numbers or fractions for which we do not have any information in
the original digitized image, we have to go for interpolation and after interpolation we have to
go for resampling to find out what is, what should be the intensity value or approximate
intensity value at that particular location. Then take that intensity value and put it in to the
474
point in the transformed image where I do not have the information.
So, this is why you find that the interpolation and resampling is very, very important
whenever you are working in the digital domain or you are doing some sort of
transformations over a digital image.
Now, whenever we go for interpolation so this gives the situation that we have a one
dimensional signal f(t) of function t and after sampling we have got the sampled signals fs(t).
475
So, here we find that after sampling what we have got is the sample values which are
represented by fs(t). Now, as we have fs(t) these values are present only at discrete locations.
So at any intermediate location in this particular one, we do not have any information of this
function f(t). And because we do not have these informations we have to go for interpolation
for all those values of t where we do not have the samples present.
And after interpolation again we have to go for resampling to fill up those positions. So, this
slide shows a sampled one dimensional signal f(t) of a function t. So after sampling we have
represented the signal by fs(t), where fs(t) is nothing but a sequence of sample values. So, in
476
this case you find that we have the values available for say t = 0, here I can put t = 0, at t = 1,
t = 2, t = 3 and so on.
But, I do not have any information for a value of t which is in between 0 and 1. So if I need to
obtain a value of f at location say (0.3) then what I have to do is, I have to interpolate this
function fs(t) and I have to find out after resampling that what will be the value of the
function at t equal to (0.3). So this is why the interpolation and resampling is very, very
important whenever you are working with a digital signal and digital image in particular and
you are going for any type of transformation particularly, the rotation and translation of the
digital image. Usually the translation operation when if it is only translation does not need
any kind of resampling or interpolation, thank you.
477
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture - 17.
Interpolation Techniques.
Now, whenever we interpolation, operation should have certain desirable properties. Firstly,
the interpolation function that use that you use for interpolating the discrete values that
should have a finite region of support. That means interpolation should be done based on the
local information it should not be based on the global information or it should not take into
consideration all the sample values of that particular digitized signal.
The second desirable property, is the interpolation should be very smooth that is the
interpolation should not introduce any discontinuity in the signal. And the third operation is
the interpolation should be shifting variant, so if the signal is shifted or given some
translation then also the same interpolation function should be available, should be done. And
the B Spline function is one such function which, satisfies all these three desired properties.
Now let us see, what is this B Spline function? B Spline function is a piecewise polynomial
function that can be used to provide local approximations of curves using very small number
of parameters. And because it is useful for local approximation of curves, so it can be very,
very useful for smoothening operation of some discrete curves, it is also very, very useful for
interpolation for of a function from discrete number of samples.
478
(Refer Slide Time: 2:25)
So let us see, what is this B Spline function? A B Spline function is usually represented by
n
say x(t) = piBi,k(t) , where, 0 to n, that is n + 1 is the number of samples which are to be
i=0
approximated. Now these points pi they are called the control points and Bi,k is the
normalized B Spline of order k. So, Bi,k , this is the normalized B Spline of order k and pi are
known as the control points.
So, these control points actually decide that how the B Spline functions should be guided to
give you a smooth curve. Now this normalized B Spline that is Bi,k can be recursively
defined as Bi,1(t) =1 , whenever ti t 1 , so this Bi,1(t) =1 , whenever ti t 1 , and Bi,1(t) =0 ,
whenever value of t takes other values.
So, this is = 1 for all values of t lying between ti and 1, ti is inclusive and Bi,1(t) =0 for any
other value of t. And then we can find out Bi,k(t) using the relation. So
than from this Bi,1(t) , we can recursively compute the values of Bi,k(t) using this relation.
479
(Refer Slide Time: 6:45)
Now, we find that once we have this relation of Bi,k(t) , you can easily verify that once I have
B0,k(t) then Bi,k(t) is nothing but translates of B0,k(t) . So this Bi,k(t) can be written as this is
nothing but B0,k(t - i) . So, this can be easily verified from the way this B Spline function is
defined. Now, this B Spline function for various values of i and k can be obtained like this,
you can easily get that B0,1(t) = 1, whenever 0 t 1 0.
Because, earlier we have said that, Bi,1(t) =1 , whenever ti t 1 , and Bi,1(t) =0 , whenever
value of t takes other values. So, just by extending this I can just write that B0,1(t) =1
whenever t lies between 0 and 1, 0 inclusive and B0,1(t) = 0 otherwise. So we find that B0,1(t)
is constant in the region 0 to 1. Similarly, we can find out B0,2(t) = t for 0 t 1 it will be =
2 – t, for 1 t 2 and it will be 0 otherwise.
480
(Refer Slide Time: 9:01)
t2
Similarly, you can find that B0,3(t) can be written as ,for 0 t 1 , this will be =
2
2
(3 t)
-t 2 + 3t -1.5, for 1 t 2 , it will be , for 2 t 3 and it will be 0 otherwise. Similarly,
2
t3
we can also find out B0,4(t) will be = , for 0 t 1 . It will be =
6
-3t 3 12t 2 12t 4
, for 1 t 2 .
6
481
(Refer Slide Time: 11:16)
Now, after giving these equations let us try to see that, what is the nature of this of this B
Spline functions. So we find that Bi,1(t) because it will be = 1 for i = 1 to 2, between i = 1 to
2, this Bi,1(t) will be equal to constant and it will be equal to 1. So, this first figure in this
particular case tells you that, what is the nature of this Bi,1(t) .
The second figure shows, what is the nature of this B Spline function if it is Bi,2(t) . And it
shows that it is a linear function. So, Bi,2(t) will lie between, will have a support from i to i +
2 and the points which will be supported by this Bi,2(t) are i, i + 1 and i + 2. Similarly, a
quadratic function which is Bi,3(t) is given by this figure and a cubic function which is Bi,4(t)
is given by this fourth figure.
And, here you find that the region of support for this cubic function is 5 points, the region of
support for quadratic function is 4 points, the region of support for the linear B Spline is 3
points whereas the region of support for the B Spline of order 1 is only 2 points. So in all the
cases the region of support is finite. Now, let us see that using these B Splines how we can go
for interpolation.
482
(Refer Slide Time: 13:05)
As, we have said earlier that using these B Splines a function is approximated as
n
f t = piBi,1(t) . So, in this case let us take the value of k to be equal to 1 that means we
i=0
have a B Spline of order 1. And we have shown that if we have a B Spline of order 1 then the
nature of the B Spline function is it is constant between i and i + 1, and which is equal to 1.
So, using this if I try to interpolate this particular function as shown in this diagram which are
nothing but some sample values. I take say t = 0 here, t = 1 here, t = 2 here, 3 here, 4 here, 5,
6, 7, 8 and so on. Now, if I want to find out say f(1.3), so f(1.3) should lie somewhere here.
So to find out this f(1.3) what I have to do is I have to evaluate this function f(t) where I have
n
to put t is = 1.3 and this f(t) is given by this expression that is f t = piBi,1(t) . Now we find
i=0
that this Bi,1(t) when, if I take i = 1 so B1,1(t) is a function like this. So in between 1 and 2 this
B1,1(t) is constant and that is equal to 1, ok.
So, if I find out the value at this particular point it will be pi that is fs(1) multiplied by B1,1(t)
at t = 1.3 and which is equal to 1. And this B1,1(t) = 1 for all values of t from t = 1 to t = 2 but
excluding t = 2. So if I interpolate this function using this Bi,1(t) you will find that this
interpolation will be of this form, sorry. This interpolation will now be of this form, so it goes
like this. So all the values at all points between t = 1 and t = 2 the interpolated value is fs(1).
483
(Refer Slide Time: 16:45)
Similarly, between t = 2 and t = 3 the interpolated values is equal to fs(2) and so on.
Similarly, if I go for interpolation using say linear interpolation where value of k = 2 in that
case again you find that if I put say t = 0 here, t = 1, t = 2, t = 3, 4, 5, 6, 7, 8, like this. Now
B1,2(t) is a linear function between 1 and 3. So B1,2(t) is something like this. Similarly,
B2,2(t) is something like this, B3,2(t) is something like this.
So if now I want to have, I want to interpolate or I want to find out the value of this function
at point say 2.5, at t = 2.5 say here. Then you find that the sample values which take part in
this interpolation are f(2) and f(3) and by using this two sample values by linear interpolation
I have to find out what is the interpolated value at t = 2.5. Now you take a case that I want to
interpolate this function value f(t) at t = 3.1 that means somewhere here.
So if want to do this, then what I have to do is, this will be an interpolation of p3, so this will
be nothing but p3 B3,2(t) at point t = 3.1 +, you can easily find out it will be p4 B4,2(t) again at
point 3.1. So, I want to interpolate this value here, now we find that weight of this p3 is given
by only this much whereas, weight of sorry this will be p2 B2,2(t) and p3 B3,2(t) .
So weight of p2 is given by this value whereas, weight of p3 is given by this value. So when I
am interpolating 3.1 I am giving less weightage to p3 which is nearer to this particular point
t= 3.1 and I am giving more weightage to this particular point p2 which is away from 3.1,
which is not very logical. So, in this case we have to go for some modification of this
interpolation function.
484
(Refer Slide Time: 19:54)
So, what is the modification that we can think of. The kind of modification that we can do in
n
this case is, instead of having the interpolation as say f t = piBi,1(t) . I will just modify
i=0
this expression as f(t) is equal to, I will put same pi Bi but Bi will shift by some values, so I
will take it Bi-s,k(t) , again i will vary from 0 to n. And the value of this i will be value of this
shift s, will depend upon what is the value of k.
So, I will take s= 0.5, if I have k = 1 that is constant interpolation I will take s = 1, if I have
k= 2, that is linear interpolation and I will take s = 2. If k = 4, that is I have by I have cubic
interpolation. Now you find that I have not considered k = 3, which is a quadratic
interpolation because quadratic interpolation leads to asymmetric interpolation.
If, I go for other interpolations like k equals to 1 with of course a shift of Bi,k(t) by a value of
0.5, so s = 0.5 or if I take k = 2 with s = 1, or if I take k = 4 with s = 2, what I get is
asymmetric interpolation. So, these are the different interpolation, the B Spline interpolation
functions that we can use for interpolating the function from sample values.
485
(Refer Slide Time: 22:39)
So this shows the situation when we have shifted this Bi,1(t) by a value 0.5, so here you find
now that Bi,1(t) is constant from -0.5 to 0.5. Similarly, 0.5 to 1.5, it will be from 1.5 to 2.5
and so on. Similarly, Bi,2(t) now it is the regions of support for B0,2(t) is between -1 and 1,
for B1,2(t) is from 0 to 2 and so on. So by using this similarly for cubic interpolation I do the
corresponding shifting.
486
And by using this now the interpolation can be obtained as if I go for cubic interpolation that
n
f t = piBi-2,4(t) . And, here it gives that if I want to interpolate this function at this
i=0
particular point, the weight given by this particular sample is only this much, the weight
given by this particular sample is this much, the weight given by this particular sample is this
much, the weight given by this particular sample this sample is weighted by this much.
So, by taking this weighted average the weights given by these B Spline functions, I can find
out what will be the interpolated value at this particular location.
487
So if I use that cubic interpolation function possibly, I will have for this set of sample values
and interpolation is smooth interpolation like this. So may be this kind of smooth
interpolation is possible using the cubic B Spline function.
Now let us see some of the results on the images that we have obtained. So this is the
example of a scaling operation, I have a small image which is scaled up by factor 3 in both X
direction and Y direction. You find that the image on the left side is obtained by scaling
without applying any interpolation. So obviously you find that here the image appears to be a
set of dots where many of the pixels in the image are not filled up.
488
to be a collection of blocks.
489
The same result is also the same experiment is also done in case of rotation. This particular
image has been rotated by 30 degrees. And if I do not apply any interpolation so without
interpolation the rotated image is shown on the left hand side, and here again you find that
there are a number of black spots in this rotated image which cannot be filled up which could
not be filled up, because no interpolation was used in this case. Where in the whereas in the
right hand side this particular image after rotation has been interpolated using by cubic
interpolation function. So you find that all those black spots in the digital image have been
filled up.
490
Now, let us see the answers of the quiz questions that we had given in the last class. In the
last class we had given a quiz question for finding out the 3D coordinate point of a world
point where its coordinate in the left camera and the coordinates in the right camera are
given.
The value of the focal length λ was given as 40 mm and the x coordinate in one case was
given as 0.2 in the left camera and in the right camera it was given as 0.4.
And the camera separation was given as 6 cm that is 60 mm. So if I simply use the formula
λB
say Z = λ + , you find that this will be because x2 - x1 in this case is 0.2 and which
x2 - x1
comes out to be 12040mm, ok. So this is the value of Z that is the depth information.
And once I have the value of Z, then the 3D coordinates X and Y can be computed from this
xo 0.2
value of Z, so X is nothing but (λ - Z) , which is nothing but (-12000) , which is equal
λ 40
to -60 mm. And by apply applying the same procedure you can find out Y = -90 mm. So this
is about the first question.
491
(Refer Slide Time: 29:17)
Second question was to find out, what is the minimum depth that can be obtained by using
the stereo camera where the geometry of the stereo camera was specified.
Now this is also very simple, what you can do is you find that we have 2 cameras with certain
dimension of the imaging plate they have been specified certain focal length and if I just find
out what is the limit of the imaging image points. So we find that if there is any point beyond
this particular line. This point cannot be imaged by the left camera if there is any point in this
direction that cannot be imaged by the right camera, because it goes beyond the imaging
plate.
492
And also for finding out the depth information it is necessary that the same point should be
imaged by both the cameras. So the points which can be imaged by both the cameras are only
the points lying in this particular conical region. The points belonging to this region or the
points belonging to this region cannot be imaged by both the cameras, so all the points must
be lying within this.
So the minimum depth which can be computed is this particular depth. Now I know what is
the separation between the cameras, I know what is the dimension of the imaging planes, ok.
I also know what is the focal length. So, from these informations by using the concept of
similar triangles you can easily find out what is the minimum depth that can be computed by
using this stereo setup, thank you.
493
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-18.
Interpolation with Examples-I.
Lecture series on digital image processing. So let us see that what we mean by image
interpolation operation. So here we have shown a diagram which shows the sample values of
a one dimensional signal say f(t), which is a function of t. So we find that in this particular
diagram we have given a number of samples and here the samples are present for t = 0, t = 1,
t = 2, t = 3, t = 4 and t = 5.
The functions are given like them, they sample data. You find that the sample values are
available only at 0, 1, 2, 3, 4 and 5. But in some applications we may need to find out the
approximate value of this function at say t = 2.3 or say t = 3.7 and so on. So again here in this
diagram you find that at t = 2.3 say somewhere here I do not have any information.
Or say t = 3.7 somewhere here again I do not have any information. So the purpose of image
interpolation is by making use or the signal interpolation is by using the sample values at
these discrete locations, we have to reconstruct or we have to approximate the value of the
function f(t) at any arbitrary point in the time access. So that is the basic purpose of the
interpolation operation.
494
(Refer Slide Time: 2:21)
Rather if I want to approximate the function value at location say t = 2.3, then the samples
that should be considered are the samples, which are nearer to t = 2.3. So I can consider a
samples at t = 1, I can consider the samples at t = 2, I can consider the sample at t = 3, I can
consider the sample at t = 4 and so on. But for if approximate the functional value, to
approximate the functional value at t = 2.3, I should not consider the sample value at say t =
50. So that is what is meant by finite region of support.
Then the second property which this interpolation operation must satisfy is it should be a
smooth interpolation. That means by interpolation we should not introduce any discontinuity
in the signal. Then the third operation, the third condition that must be satisfied for this
interpolation operation is that the interpolation must be shift invariant. That is if I shift the
signal by say t = 5, even then the same interpolation operation the same interpolation function
should give me the same result in the same interval.
So these are what are known by the shift invariance property of the interpolation. And we
have seen in the last class that B Spline interpolation functions satisfy all these three
properties which are desirable properties for interpolation.
495
(Refer Slide Time: 4:22)
So these B-Spline functions are something like this. We have seen that for interpolation with
the help of B-Spline function we use a B-Spline function which is given by Bi,k(t) .
So let me just go, to what is the interpolation operation that we have to do. So for
n
interpolation what we use is say x(t) = piBi,k(t) . Where pi indicates the ith sample and
i=0
Bi,k(t) is the interpolation function. And we have defined in the last class that this, Bi,k(t) can
496
(t-ti)Bi,k-1(t) (ti+1 -1)Bi+1,k-1(t)
be defined as recursively as Bi,k(t) = + , where Bi,1(t) is given
ti+k-1 - ti ti+k - ti+1
by, Bi,1(t) =1 , whenever ti t 1 and it is = 0 otherwise.
So you find that when we have defined Bi,1(t) t to be 1 within certain region and it is = 0
beyond that region. Then using this Bi,1(t) , I can estimate, I can calculate the values of other
Bi,k(t) by using the recursive relation. And pictorially these defined values of Bi,k(t) , for k = 1
it is a constant. For Bi,2(t) that is for k = 2, it is a linear operation, linear function. For k = 3,
Bi,3(t) is a quadratic function. And for k = 4, that is Bi,4(t) it is a cubic equation.
And we have said in the last class, so here you find that, the region of support for Bi,1(t) is
just one sample interval, for Bi,2(t) the region of support is just two sample intervals. For
Bi,3(t) it is three sample intervals and for Bi,4(t) it is four sample intervals. And we have
mentioned in the last class that out of this the quadratic one that is for the value k = 3, it is
normally not used because this does not give a symmetric interpolation.
Whereas using the other three that is Bi,1(t) , Bi,2(t) and Bi,4(t) we can get symmetric
interpolation. So normally the functions, the B Spline functions which are used for
interpolation purpose are the first order that is k = 1, the second order for linear that is k = 2
and the cubic interpolation that is for k = 4. And we have also said that these functions for k =
1, k = 2 and k = 4 can be approximated by B0,1(t) =1, for 0 t <1 and it is = 0 otherwise.
497
(Refer Slide Time: 9:08)
So only in the range 0 to 1 excluding t = 1, B0,1(t) =1 . And, beyond this range B0,1(t) = 0 .
Then B0,2(t) is defined like this that B0,2(t) = t for 0 t 1 it will be = 2 – t, for 1 t 2 and
it is equal to 0 otherwise.
So here again you find that for the values of t between 0 and 1, B0,2(t) increases linearly. For
t = 1 to 2, the value of B0,2(t) decreases linearly and beyond 0 and 2, that is for values of t less
than 0 and for values of t greater than 2, the value of B0,2(t) = 0 . Similarly, for the quadratic
t3
one sorry, for the cubic one B0,4(t) is defined as, b 0,4 (t) will be defined as , for 0 t 1 , it
6
-3t 3 12t 2 12t 4
is defined as , for 1 t 2 .
6
498
So these are the different orders of B Spline function, so here again you find that for value of
t < 0, B0,4(t) = 0 . And similarly for value of t > 4, B0,4(t) = 0 . So these are the B-Spline
functions using which the interpolation operation can be done. Now let us see that how do we
interpolate, again I take the example of this sample data where I have a number of samples of
a function t and which is represented by fs(t).
499
Now here Bi,1(t) , so when I had computed say B0,1(t) we have said that this value is equal to
1 for t lying between 0 and 1 and this is equal to 0 otherwise.
So if I interpolate this particular sample data say for example, I want to find out what is the
value of the signal at say 1.3, so this is the point t = 1.3. So I want to find out the value of f(t)
at t = 1.3. So to do this my interpolation formula says that this should be equal to f(1.3), this
should be equal to poB0,1(1.3) + p1B1,1(1.3) and so on. Now if I plot this B0,1(1.3) just super
impose this on this particular sample data diagram.
You find that B0,1(1.3) or B0,1(t) the function is equal to 1 in the range 0 to 1 excluding 1.
Similarly, p1B1,1(1.3) , the value will be equal to 1 in the range 1 to 2 excluding 2 and it will
be 0 beyond this. So when I try to compute the function value at t = 1.3, I have to compute p0,
that is the sample value at t = 0 multiplied by this B0,1(t) . Now because B0,1(t) = 0 for values
of t 1. So this particular term poB0,1(1.3) , this term will be equal to 0.
Now, when I compute this p1B1,1(1.3) , you find that this B1,1(1.3) is equal to 1 in the range 0
to, 1 to 2, excluding 2. and beyond 2 the value of B1,1(1.3) = 0 . Similarly, for values of t 1 ,
the value of B1,1(t) = 0 . Similarly, p2B2,1(1.3) = 1 , within the range 2 to 3. So within this
range B2,1(t) = 1 and beyond this B2,1(t) = 0 .
So, when I try to compute the value at point 1.3, you find that this will be nothing but
p1B1,1(1.3) and this B1,1(1.3) = 1 , so the value at this point will be simply equal to p1, so in
this case it is fs(1) and you find that for any other values of t within the range 1 to 2. The
value of f(t) will be same as f(1) or p1. So I can approximate this or I can do this interpolation
like this, that is between 1 and 2, all the values the function, values for all values of t between
1 and 2 will be equal to 1.
Following similar argument, you find that between 2 and 3, the function value will be equal
to f(2), between 3 and 4 the function value will be equal to f(3). Between 4 and 5 the function
value will be equal to f(4) and it will continue like this. So this is what I get, if I use this
simple interpolation formula that is f(t) = p1 Bi,1(t) . Similar is also the situation if I go for
linear interpolation.
500
(Refer Slide Time: 17:18)
So what I get in case of linear interpolation. In case of linear interpolation f(t) is given by
piBi,2(t) where, you have to take the summation of all these terms for values of i, from i equal
to 0 to n. So what we get in this case. You find that we have said that Bi,2(t) is nothing but a
linearly increasing and decreasing function. So if I plot Bi,2(t) , Bi,2(t) is a function like this.
Which increases linearly between 0 and 1, at 1 this reaches a value 1 and then again from t =
1 to t = 2. The value of Bi,2(t) decreases linearly and it becomes 0 at t = 2.
Similarly, B1,2(t) will have a function value something like this it will increase linearly from
1 to 2, it will reach a value of 1 at t = 2 and again from t = 2 to t = 3 it decreases linearly then
at t = 3 the value of B1,2(t) becomes equal to 0. So here again if I want to find out, say for
example the value of function at say t = 1.7.
So I have the functional values at t = 1, I have the value of the function at t = 2. Now at t =
1.7, I have to approximate the value of this function using its neighboring pixels. Now if I try
to approximate this you find that using this particular interpolation formula here again f (1.7)
is to be computed as p0B0,2(1.7) + p1B1,2(1.7), so it will continue like this.
Now here you find that the contribution to this point by this sample p0, by this sample value
p0 is given by this interpolation function B0,2(t) and by this the contribution to this point t =
1.7, by this sample f(1) or p1 is given by B1,2(1.7), contribution to this point by the sample
value f(2) is given by B2,2(1.7). But B2,2(1.7) = 0, at t = 1.7, because B2,2(1.7) is something
like this, so value of this function is 0 at t = 1.7.
501
So only contribution we get at t = 1.7 is from the sample f(0) and from the sample f(1), ok. So
using this, I can estimate what will be the value of B0,2(t) at 1.7, I can also estimate what will
be the value of B1,2(1.7). And in this particular case we have seen a property of this B-Spline
function that is we have said earlier that Bi,k(t) is nothing but Bi,k(t-i) , so that is a property of
these B-Spline functions.
So when I do this you find that this B1,2(1.7) is nothing but B0,2(0.7) because this is t - i and
value of i = 1, so this will be B0,2(0.7). So if I simply calculate the value of B0,2 for different
values of t, I can estimate that what will be B0,2(1.7). And in that case the value at this
location that is f(1.7) will now be given by, if I simply calculate this. So in this case f(1.7)
will be given by p0B0,2(1.7) + p1B0,2(0.7), because this is same as B1,2(1.7),.
Where value of p0 = 0.5, which is the sample value at location t is = 0 and value of p1 = 1,
that is the sample value at location t = 1. Now here you find that there is a problem that, if
when I am trying to compute the value at t = 1.7, the contribution only comes from the
sample values at t = 0 and t = 1. But this interpolation of this approximate value does not
have any contribution from t = 2 or t = 3.
So your interpolation or approximation that you are doing is very much worst because it is
only considering the sample values to the left of this particular point, we are not considering
the sample values to the right of this particular point t = 1.7. So that is a problem with this
basic interpolation formula.
502
So to solve this problem what we do is instead of using the simple formula that is
n
f(t) = piBi,k(t) . We slightly modify this interpolation formula, thank you.
i=0
503
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-19.
Interpolation With Examples-II.
Welcome to the Lecture series on digital image processing, formula we make say
n
f* t = pB
i=0
i (t) . And the value of s we decide that for k=1, that is when you go for
i-s,k
constant interpolation, we assume value of s to be 0.5, for k = 2, that is for linear interpolation
we assume value of s to be 1. And for k equal to 4 that is for cubic interpolation, we assume
value of s to be 2.
So here again we find that I have not considered k = 3, because as we said that k = 3 gives
you the quadratic interpolation. And in case of quadratic interpolation, the interpolation is not
symmetric. So what we effectively do by changing piBi,k(t) to piBi-s,k(t) is that the
interpolation function or the B Spline interpolation function. We give this a shift by s in the
left direction, while we consider the contribution of the sample pi to the point t for the
interpolation purpose.
Now let us see what we get after doing this. So as we said that with k = 1, I take the value of s
= 0.5, so in the formula that piBi,k(t) , if I where I consider the contribution of sample p0 in the
earlier formulation we had to use the B-Spline function B0,k(t) . Now if k = 1, this is a
504
constant interpolation. So I have to consider B0,1(t) . Using this modified formulation, when I
consider the contribution of point p0, I do not consider the B Spline function to be B0,1(t) , but
rather I consider the B Spline function to be B-0.5,1(t) .
Similarly, for the linear operation when I take the contribution of point p0 to any arbitrary
point along with p0, I had to consider as per the initial formulation B0,2(t) . Now using, this
modified formulation I will use B-1,2(t) . Similarly, for the cubic interpolation again with p0, I
will consider the B Spline function to be B-2,4(t) instead of B0,4(t) . So here you find that in
this particular diagram using this formulation effectively what we are doing is, we are
shifting the B Spline functions by the value of s in the leftward direction.
So, B0,1(t) in the earlier case, we had B0,1(t) to be 1 between 0 and 1. So, B0,1(t) was
something like this, now along with p0, I do not consider B0,1(t) but I will consider B-0.5,1(t)
and B-0.5,1(t) = 1 , for values of t between -0.5 and 0.5 and value of B-0.5,1(t) = 0 beyond this
range. Similarly, is also the case for the linear interpolation the linear B Spline and it is also
similar for the cubic B Spline that is Bi,4(t) and in this case it will be Bi-2,4(t) .
505
(Refer Slide Time: 4:27)
So using this, let us see that how it helps us in the interpolation operation. So as I said that for
interpolation when I consider the contribution of p0 to any particular point along with p0, I
will consider the B Spline function to be B-0.5,1(t) for constant interpolation. So it appears like
this that for contribution of p0. I consider B-0.5,1(t) , to find out the contribution of p1, I
consider the B-Spline function to be B-0.5,1(t) .
Similarly, in case of linear interpolation to find out the contribution of point p0, I consider the
B Spline function to be B-1,2(t) and to find out the contribution of p1, I consider the B Spline
function to be B0,2(t) , and so on.
506
Similar is also case for the cubic interpolation, here again to find out the contribution of say
p0, I have to consider the B Spline function of B-2,4(t) . To find out the contribution of p1, I
have to consider the B Spline function of B-1,4(t) .
Find out the contribution of p2, I have to consider the B Spline function of B0,4(t) and so on.
Now let us see that using this kind of using this modified formulation whether our
interpolation is going to improve or not. So let us take this particular case again we go for
constant interpolation.
Here again I have shown the same set of samples and now suppose I want to find out, what
will be the value of the function at say t = 2.3 and for considering this I will consider the
equation to be pi B-0.5,1(t) for constant interpolation value of k = 1, t and I will take the sum
for i = 0 to n.
So this will give me the approximate or interpolated value at t. So again you find that coming
to this diagram as we have said that when I consider B-0.5,1(t) , ok. In that case B-0.5,1(t) , is
equal to 1 between the range –0.5 to 0.5 and beyond this range B-0.5,1(t) , will be equal to 0.
So when I compute p0, B-0.5,1(t) , for computation of this particular component along with this
sample p0 which is equal to 0.5, I have to consider the B-Spline interpolation function, has
this which is equal to 1 from -0.5 to 0.5.
507
So here again you find, that because this B-0.5,1(t) =0 beyond t = 1.5, so this p0 does not have
any contribution to point t = 2.3, because at this point the contribution of this, or the this
particular product p0 B-0.5,1(t) = 0. Similarly, if I consider the effect of point p1 to t = 2.3, the
effect of point p1 will also be equal to 0. Because the product p1 B-0.5,1(t) = 0 at t = 2.3.
But, if I consider the effect of p2, which is equal to 0.6, here you find that this the B Spline
interpolation function has a range something like this, the region of support. So this is equal
to 1 for t = 1.5 to t = 2.5. And it is equal to 0 beyond this range. To find out the contribution
of p3 which is equal to 1.1, here again, I you can see that to find out the contribution of this
particular point. The corresponding B-Spline function that is B2.5,1(t) = 1 in the range 2.5 to
3.5 and it is equal to 0 outside.
So even p3 does not have any contribution to this particular point t = 2.3, so at t equal to 2.3,
if I expand this I will have a single term which is equal to p2 B1.5,1(t) , so at t = 2.3 and the
same will be applicable for any value of t in the range t = 1.5 to t = 2.5. So I can say that
using this formulation what I am getting is, I am getting the interpolation something like this.
The, after interpolation, using this constant interpolation function after or modified
formulation the value of the interpolated function will be, say from t = 0 to t = 1.5, the f(t)
value of f(t) will be equal to f 0. Between 1.5 to between 0.5 to 1.5, value of f(t) will be equal
to f(1), from point 1.5 to 2.5, the value of f(t) will be equal to f(2). From 2.5 to 3.5 the value
508
of f(t) will be equal to f(3), from 3.5 to 4.5 the value of f(t) will be equal to f(4). And from
4.5 to 5.5 the value of f(t) will be equal to f 5.
And this appears to be a more reasonable approximation, because what we are doing is
whenever we are trying to interpolate at a particular value of(t) what we are doing is, we are
trying to find out what is the nearest sample to that particular location, of that particular value
of t and whatever is the value of the nearest sample we are simply copying that value to this
desired value to t.
So here for any point within this range that is for t = 1 to t = 1.5, the nearest sample is f(1).
For any sample from 1.5 to 2, the nearest sample is p2, or f(2), so this f(2) is copied to this
particular location t, where t is from 1.5 to 2. So this appears to be a more logical
interpolation than the original formulation of interpolation. Similar is also case for linear
interpolation.
In case of linear interpolation, when I consider the value of p0. What I do is I consider B-1,2(t)
and B-1,2(t) is something like this. Where from -1 to 0, this increases linearly attends a value
of 1 at t = 0. Similarly, if when I consider the contribution of p1, the corresponding B Spline
interpolation function that I have to consider is B0,2(t) , which is something like this. So now,
if I want to find out what is the value at the same point say 2.3.
You find that to find out the value at point 2.3, the contribution of p1 will be equal to 0.
Because the value of B0,2(t) = 0, beyond point t = 2. And by this you will find the only
509
contribution that you can get to this point t = 2.3 is from the point p2 and from the point p3,
ok. And in this particular case I will have f(2.3), which will be nothing but p2 then, because it
is, I have to make it i-2, in i - 1 in this particular case. So this will be equal to p2 B1,2(2.3) + 2
p3 B2,2(2.3) , just this.
So the contribution of this point to this point t = 2.3 will be given by this value and the
contribution of point p3 will be given by this value. And here you find that because the
function increases linearly from 0 to 1, between t = 2 and t = 3 and it decreases linearly from
3 to 0 when t varies from 3 to 4. So here, you find that the value that will get will be nothing
but p2 into the value of this function, in this particular case will be equal to 0.7 + p30. 3.
So if I simply replace the value of p2 which is equal to 0.6 and p3 which is equal to 1.1, I can
find out, what is the value of f(t) at t equal to 2.3.
So similar is also the case for cubic interpolation and in case of cubic interpolation you find,
the region of supports will be something like this, ok, sorry.
So the region, the nature of the region of support will be something like this and again by
n
using the same formulation that is f(t) = piBi-2,4(t) , I can find out what will be the value of
i=0
f(t) for any particular time instant t, or any arbitrary time instant t.
510
So by modifying these interpolation operations we can go for all this different types of
interpolation. Now to explain this, let us take a particular example so, I take an example like
this I take a function say f the function values are like this that f (0). So I take f (1) is equal to
say 1.5, I take f(2) = 2.5, I take say f(3) is equal to say 3, I take f (4) is equal to something
like say 2.5, I take f(5) is equal to again 3, f(6) may be something like 2.4, I can take f (7) to
be something say 1, I can take f (8) to be something like 2.5.
And I want to, find out the approximate value of this function at say t = 4.3, so given this
sample values. I want to find out what is the value of this function at t equal to 4.3, so I want
to find out f(4.3), given these sample values. And suppose the kind of interpolation that I will
use is a cubic interpolation. So I use cubic interpolation and using this samples, I want to find
out the value of f(4.3).
511
(Refer Slide Time: 19:08)
So let us see that how we can do it. Here you find that f(4.3)can be written as using,
considering the region of support this can be written as f(3) B1,4(4.3) , so we find that since
this f(3) is nothing but p3 in our case. So this Bi becomes Bi-2, so 3-2 = 1, so I am considering
B1,4(4.3) . So this will be + f(4) which is nothing but p4 B2,4(4.3) +, it will be f(5) B3,4(4.3) + it
will be f(6) B4,4(4.3) , ok.
Now as we said that, we have told that Bi,k(t) is nothing but Bi,k(t-i) . So just by using this
particular property of the B Spline functions, I can now rewrite this equation in the form, that
this will be equal to f(3)or p3 B0,4(3.3) and this will be now t- i, i =1, so this will be B0,4(3.3) +
f(4) B2,4(2.3) + f(5) B0,4(1.3) + f(6) B0,4(0.3) b 0 4 and this will be equal to 0.3.
Now you can compute the values of this B0,4(3.3) , you can compute the values of B2,4(2.3) ,
you can compute the values of B0,4(1.3) , and you can also compute the value of B0,4(0.3)
using the approximate analytical formula of B0,4(t) that you have given and you have seen
that this is nothing but a cubic formula of variable t.
So if I do this, you will find that and using the sample values this B0,4(3.3) , this gets a value
of 0.057, B2,4(2.3) , this gets a value of 0.59, B0,4(1.3) this gets a value of 0.35, and B0,4(0.3)
gets a value of 0.0045 and you can verify this by using the computation the analytical
formula that we have given.
512
And by using the values of f(3), f(4), f(5) and f(6) if I compute this equation then, I get the
final interpolated value to be 2.7068. Now if do the same computation using constant
interpolation and as I said that the constant interpolation is nothing but a nearest newer
interpolation. So when I try to find out the value at f(4.3), the point t = 4.3 is nearest to point t
= 4 at which I have a sample value.
So using nearest neighbor or constant interpolation f(4.3)will be simply equal to f(4) and in
which in our case is equal to 2.5, whereas if I go for linear interpolation again you can
compute this using the linear equations that we have said, that using linear interpolation the
value of f(4.3) = 2.65. So we find that there is slight difference of the interpolated value
whenever we go for constant interpolation or we go for linear interpolation or we go for cubic
interpolation.
So using this type of formulation we can go for interpolation of the one dimensional sampled
functions. Now, when I go for the interpolation of image functions, you find that images
consist of a number of rows and a number of columns. Now the interpolation that you have
discussed so far, these interpolations are mainly valid for one dimension. Now the question is
how do we extend this one dimensional interpolation operation into two dimension so that it
can be applied for interpolation images as well?
So in case of image, what we will do is as the image is nothing but a set of pixels arranged in
a set of rows and in a set of columns. So let us consider a two dimensional grid to consider
the image pixels. So I have the grid points like this, so in case of an image, I have the image
513
points or image pixels located at this location, located at this location, so these are the grid
points for I have the image pixels.
So it is something like this, so I say this is location (0,0), this is location say (0,1), this is
location (0,2) and so on. So I have this as the 0th row, this as the 0th column. And similarly I
have row number 1, row number 2, row number 3, row number 4 and column number 1,
column number 2, column number 3, column number 4 and so on. Now given this particular
situation, so I have the sample values present at all these grid points.
Now giving this particular pixel array which is again in the form of two dimensional matrix.
If I have to find out what will be the pixel value at this particular location. Suppose this is say
location (4, 4). Let us assume, this is at pixel location say (4 ,5), fourth row, fifth column.
This may be a pixel location say (5,4), that is fifth row, fourth column. And this may be the
pixel location say (5, 5), that is fifth row and fifth column.
So at these different pixel locations, I have the intensity values or pixel values. And using this
I want to interpolate what will be the pixel value at location say (4.3, 4.2). So I want to
compute what will be the pixel value at this particular location (4.3, 4.2). So now we find that
the earlier discussion what we had, for interpolation in case of one dimension. Now that has
to be extended for two dimension, to get the image interpolation.
Now the job is not very complicated, it is again a very simple job. What you have to do is I
have to take the rows one after another, I also have to take the columns one after another. So
first what you do is you go for interpolation along the rows and then you try for interpolation
along the column. And for this interpolation again I can go for either constant interpolation,
or I can go for linear interpolation, or I can go for cubic interpolation.
But now, because our interpolation will be in two dimension, so the kind of interpolation if it
is a linear interpolation it will be called a bilinear interpolation. For cubic interpolation it will
be called a bi-cubic interpolation. So let us see that what will be the nature of interpolation if
I go for a constant interpolation. So effectively what we will do is, so in this particular case,
we will interpolate along the row 4, will also interpolate along the row 5.
So we will try to find out what is the value pixel value at location (4, 4.2), will also try to find
out what is the pixel location at point (5, 4.2). So once I have the pixel values interpolated
values at this location and this location, that is (4, 4.2) and (5, 4.2) using these two sample
values, I will try to interpolate the value at this location (4.3, 4.2).
514
(Refer Slide Time: 29:36)
Now if I want to find out what will be the value of the pixel at this particular location. What I
do is, I simply try to find out which is the nearest neighbor of this particular point. And here
you find that the nearest neighbor of this particular point is this point. So all the points which
are nearest to this particular point so within this square all the pixel values will get the value
of this particular pixel. So it will be something like this.
Similarly, all the pixels, all the points lying within this region will get the pixel values of this
particular region. Similar is the case here and similar is the case here. When you go for
bilinear interpolation, when I try to interpolate the pixel at any particular location like this.
515
(Refer Slide Time: 31:01)
So again here, in case of bilinear interpolation, if I want to find out the pixel value at any
particular, any arbitrary grid location say, something like this. Say somewhere here, what I
have to do is I have to consider this pixel and this pixel, do the bilinear interpolation to find
out the pixel at this location. I consider this pixel and this pixel, do bilinear interpolation to
find out the pixel value at this location. Then using this and this, doing bilinear interpolation
along the column, I can find out what is the pixel value at this particular location.
And the same concept can also be extended for bi-cubic interpolation. So by this, we explain
how to do interpolation either constant interpolation, which we have said is also the nearest
neighbor interpolation, we can go for linear interpolation, in case of image it is bilinear
interpolation or the cubic interpolation, in case of image it is bi-cubic interpolation. So where
you have to do interpolation both along the rows and after the rows, you can do along the
columns.
It can also be reversed, first you can do the interpolation along column then using the
interpolated value along two or more columns, I can find out the interpolated value on any
row location which does not fall on any regular grid point, ok. So now let us see that, what
are the results, this results we had already shown in the last class.
516
(Refer Slide Time: 32:31)
So we find that the first one is interpolated using nearest interpolation and as we have
explained that because the value of the nearest pixel is copied to all the arbitrary location. So
this is likely to a blocking artifacts. And in this nearest neighbor interpolated image you also
find that those blocking artifacts are quite prominent. We have also seen the output with other
interpolation operations.
We have shown the output with linear B Spline interpolation. We have also shown the output
with cubic B Spline Interpolation, ok. So this is the case with rotation again when you rotate
if you do not interpolate you get a number of patches black patches as this as is shown in the
517
left image. If you go for interpolation all those black patches will be removed, then you get a
continuous image.
As it is shown in the right image, now this interpolation operation is useful not only for this
translation or rotation kind of operations, you find that in many other application, for example
in case of satellite imagery when the image of the Earth surface is taken with the help of a
satellite. Now because of Earth’s rotation the image which is obtained from the satellite the
pixels always do not fall on regular grids.
So in such cases what we have to go for is to, rectify the distortion or to correct the distortion,
which appears in the satellite images and this distortion is mainly due to the rotation of the
Earth’s Surface. So for correction of those distortions, the similar type of interpolation is also
used, thank you.
518
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-20.
Image Transformation-I.
Welcome to the course on digital image processing. In the last class we seen the interpolation
and resampling operation of images and we have seen different applications of the
interpolation and resampling operations. So while we have talked about the interpolation and
resampling, we have seen that it is the B-Spline functions or B-Spline interpolation functions
of different orders which are mainly used for image interpolation purpose.
And before this interpolation, we have also talked about, the basic transformation operations
and the transformation operations that we have discussed, those were mainly in the class of
geometric transformations. That is we have talked about the transformation like translation,
we have talked about rotation, we have talked about scaling and we have seen that these are
the kind of transformations which are mainly used for coordinate translation.
That is given a point in one coordinate system, we can translate the point or we can represent
the point in another coordinate system, where the second coordinate system may be a
translated or rotated version of the first coordinate system. We have also talked about another
type of transformation, which is perspective transformation and this perspective
transformation is mainly used to find out or to map a point in a three dimensional world
519
coordinate system to a two dimensional plane where this two dimensional plane, is the
imaging plane.
So there our purpose was that given a point or the 3D coordinates of a point in a three
dimensional coordinate system, what will be the coordinate of that point on the image plane,
when it is imaged by a camera. In todays, lecture we will talk about another kind of
transformation which we call as image transformation. So we will talk about or we will
explain the different image transformation operations.
Now before coming to specific transformation operations, like say Fourier transform or
discrete cosine transform or say discrete cosine transform. Before we come to such specific
transformations. We will first talk about a unitary transformation, which is a class of
transformations or class or unitary transformations and all the different sort of
transformations that is whether it is discrete Fourier transform or discrete cosine transform or
Hadamard transform.
All these different transform are different cases of this class of unitary transformations. Then
when you talk about this unitary transformation, we will also explain what is an orthogonal
and orthonormal basis function. So you see that what is known as an orthogonal basis
function, what is also known as an orthonormal basis function. We will also explain how an
arbitrary one dimensional signal can be represented by series summation of orthogonal basis
vectors and we will also explain how an arbitrary image can be represented by a series
summation of orthonormal basis images.
520
(Refer Slide Time: 4:02)
Now firstly let us see that, what is the image transformation? You find that in this case we
have shown a diagram, where the input is an image and after the image is transformed, we get
another image. So if the size of the input image is NxN, say it is having N number of rows
and N number of columns. The transformed image is also of same size, that of size NxN. And
given this transformed image, if we perform the inverse transformation, we get back the
original image.
That is, image of size NxN. Now if given an image by applying transformation, we are
transforming back to another image of same size and doing the inverse transformation
operation we get back the original image, then the question naturally comes that what is the
use of this transformation. And here you find that after transformation the second image of
same size NxN that we get that is called the transformed coefficient matrix.
So the natural question that arises in this case that if by transformation I am going to another
image. And by using inverse transformation I get back the original image, then why do we go
for this transformation at all. Now we will find and we will also see on in our subsequent
lectures that this kind of transformation has got a number of very, very important
applications. One of the application is for preprocessing, in case of image preprocessing of
the images.
521
(Refer Slide Time: 5:32)
If the image contains noise, then you find or you know that contamination of noise gives rise
to high frequency components in the image. So if by using some sort of unitary
transformation, we can find out what are the frequency components in the image. Then from
these frequency coefficients, if we can suppress the high frequency components, then after
suppressing the high frequency components, the modified coefficient matrix that you get if
you take the inverse transform of that modified coefficient matrix. Then the original image or
the reconstructed image that we get that is, a filtered image.
So filtering is very, very important application where this image transformation techniques
can be applied. The other kind of preprocessing techniques we will also see later on that, it is
also very, very useful for image enhancement operation.
Say for example if we have an image which is very blurred, that is the contrast of the image is
very, very poor, then again in the transformation domain or using the transform coefficients,
we can do certain operations by which, we can enhance the contrast of the image so that is
what is known as enhancement operation. We will also see that this image transformation
operations are very, very useful for data compression.
So if I have to transmit an image, or if I have to store the image on hard disk. Then you can
easily think that if I have an image of size say 512x512 pixels and if it is a black and white
image. Every pixel contains 8 bits, if it is a color image contains normally 24 bits. So storing
a colored image of size 500x500, 512x512 pixel size takes huge amount of disk space.
522
So if by some operation I can compress the space or I can reduce the space required to store
the same image then obviously on the on a limited disk space I can store more number of
images. Similar is the case, if I go for transmission of the image or transmission of image
sequences or video.
In that case the bandwidth of the channel over which this image or the video has been
transmitted is a bottle neck, which forces us that we must employ some data compression
techniques, so where the bandwidth requirement for the transmission of the image or the
transmission of the video will be reduced. And we will also see later on that this image
transformation techniques is the first step in most of the data compression or image or video
compression techniques.
These transformation techniques are also very, very useful for feature extraction operation.
By features I mean that in the images if I am interested to find out the edges or I am
interested to find out the corners of certain shapes. Then this transformation techniques or if I
work in the transformation domain then finding out the edges or finding out the corners of
certain objects that also becomes very, very convenient.
So these are some of the applications where this image transformation techniques can be
used, so apparently we have seen that by image transformation I just transform an original
image to another image. And by inverse transformation that transformed image can be
retransformed to the original image. So the application of this image transformation operation
can be like this and here I have cited only few of the applications we will see later that
applications of these image transformations are much more than what I have listed here.
523
(Refer Slide Time: 9:52)
So, A-1 = A*T, where A* is the complex conjugate of the matrix A. That is complex conjugate
of each and every element of matrix A. And these unitary matrices will call as the basis
images. So the purpose of this image transformation operation is to represent any arbitrary
image as a series summation of such unitary matrices, or series summation of such basis
images.
524
(Refer Slide Time: 11:29)
Now to start with, I will first try to explain with the help of one dimensional signal. So let us
take an arbitrary one dimensional signal say I take a signal say x(t). So I take an arbitrary
signal x(t) and you see that this is a function of t. So this x(t), the nature of x(t) can be
anything say, let us take that I have a signal like this, x(t) which is a function of t. Now this
arbitrary signal x(t), can be represented as a series summation of a set of orthogonal basis
function.
So I am just taking this as an example in for one dimensional signal and later on we will
extend to two dimension that is in, for the images. So this arbitrary signal this one
dimensional signal x(t), we can represent by the series summation of a set of orthogonal basis
functions. Now the question is, what is orthogonal? By orthogonal I mean that if I consider a
set of real valued continuous functions.
So I consider a set of real valued continuous functions say an(t), which is equal to set say,
a0(t), a1(t) and so on, ok. So this is a set of real valued continuous functions and this set of
real valued continuous functions is said to be orthogonal over an interval say t0 to t0 + T. So I
define that this set of continuous real valued functions will be orthogonal over an interval t0
to t0 + T, if I take the integration of function say am(t)an(t)dt and take the integration of this
over the interval capital T, then this integral will be equal to some constant k, if m = n. And
this will be equal to 0, if m n.
So I take two functions, am(t) and an(t), take the product and integrate the product over
interval capital T. So if this integration is equal to some constant say k, when m = n and this
525
is equal to 0, whenever m n. So if this is true, for this set of real valued continuous
functions, then this set of real valued continuous functions form an orthogonal set of basis
functions.
And if the value of this constant k is equal to 1, so if the value of this constant k = 1 than we
say that the set is orthonormal, ok. So an orthogonal basis function as we have defined this
non-zero constant k, if this is equal to 1, then we say that it is an orthonormal set of basis
functions. Let us just take an example that what do you mean by this. Suppose we take a set
like this.
So, sinωt, sin2ωt and sin3ωt. So this is my set of functions an(t), ok. Now if I plot sinωt over
2π
interval t = 0 to T, ok. So this will be like this and where ω = . So T is the period of this
T
sinusoidal wave. Then if I plot this sinωt, you will find that sinωt in the period 0 to T is
something like this.
So this is t, this is sinωt and this is the time period T. If I plot sin2ωt over this same diagram
sine of twice omega t will be something like this, ok. So this is sine of sorry this is sine of
sin2ωt, now if I take the product of sinωt t and sin2ωt in the interval 0 to T, the product will
appear something like this. So we find that in this particular region both sin2ωt and sinωt
they are positive.
526
So the product will be of this form, in this region sinωt is positive but sin2ωt is negative, so
the product will be of this form. In this particular region sin2ωt is positive, whereas sinωt is
negative. So the product is going to be like this, this will be of this form. And in this
particular region both sinωt and sin2ωt, they are negative so the product is going to be
positive, so it will be of this form.
T
Now if I integrate this, so if I integrate
0
sinωtsin2ωtdt . This integral is nothing but the area
covered by this curve. And if you take this area you will find that the positive half will be
cancelled by the negative half and this product will come out to be 0. This integration will
come out to be 0.
Similar is the case, if I multiply sinωt with sin3ωt and take the integration. Similar will also
be the case if I multiply sin2ωt will with sin3ωt and take the integration. So this particular set
that is, sinωt, sin2ωt and sin3ωt, this particular set is the set of orthogonal basis functions.
Now suppose we have an arbitrary real valued function x(t) and this function x(t) is we
consider within the region to t to+T . Now this function x(t) can be represented by as a
series summation. So we can write x(t)= cnan(t) , so you remember an(t) is the set of
n=0
orthogonal basis functions. So represent x(t) as a series summation, so x(t) = cnan(t) .
n=0
527
Then, this term Cn is called the nth coefficient of expansion. This is called nth coefficient of
expansion. Now the purpose is, the problem is how do we find out or how do we calculate the
value of Cn. To calculate the value of Cn, what we can do is we can multiply both the left
hand side and the right hand side by another function from the set of orthogonal basis
function. So multiply both sides by function say am(t) and take the integration from t = 0 to T
or take the integration over the interval T.
Now if I expand this you find that, if I expand this, this will be of the form,
c0 a0(t).am(t)dt + c1 a1(t).am(t)dt + ......+ cm am(t).am(t)dt+ ........ .
T T T
Now as per the definition of the orthogonality that we have said that, a (t).a (t)dt
T
n m will be
equal to some constant k, if and only if m = n. And this integral will vanish for all the cases
wherever m is not equal to n. So by using that formula of orthogonality what we get in this
case is, we simply get x(t).a (t)dt .
T
m
This will be simply equal to constant k times Cn because the right hand side of this
integration that we have said, this right hand side all these terms will be equal to 0 only for
528
this term a (t).a (t)dt
T
m m = k. So what we get here is x(t).a (t)dt
T
m =kCn. So from this we can
1
k T
easily calculate that the mth coefficient Cm will be given by x(t).am(t)dt .
And obviously, we can find out that if the set is an orthonormal set, not an orthogonal set, in
that case the value of k = 1. So we can easily get the mth coefficient cm = x(t).am(t)dt . So the
T
value of term k = 1. So this is how we can get, the mth coefficient of expansion of any
arbitrary function x(t), right and this computation can be done if the set of basis functions that
we are taking that is the set am(t) is an orthogonal basis function.
Now the set the orthogonal basis, the set of orthogonal basis functions an(t) is said to be
complete we say that this orthogonal basis function is complete, if this is complete or closed
if one of the two conditions hold. The first condition is, there is no signal, say x(t) with
and so on. And the second condition is that for any piecewise continuous signal x(t) , so x(t)
529
if there exist an > 0, however small this is, there exists an N and a finite expansion such
N-1
that x(t)
= cnan(t) , such that |x(t) - x(t)|2 dt < .
n=0
T
So this says that form a piecewise continuous function x(t) having finite energy, there must
be an , which is > 0 but very small and there must be some constant N such that if we can
N-1
have an expansion that x(t)
= cnan(t) x, for which this term |x(t) - x(t)|2 dt < .
n=0
T
So we find that this x(t) is the original signal x(t) and x(t) , earlier case we have seen that if
we go for infinite expansion then then this x(t) can be represented exactly. Now what we are
doing is we are going for a truncated expansion, we are not going to take all the infinite
number of terms but we are going to take only capital N number of terms. So obviously this
x(t), it is not being represented exactly but we are we are going to have its approximate
expansion.
N the number of terms, for which the error of the reconstructed signal. So this
|x(t) - x(t)| dt
2
, this is nothing but the energy of the error signal, or the error that is
T
So we say that the set of orthogonal basis functions an(t) is complete or close if one of this
conditions hold, at least one of this conditions hold, that is the first condition or the second
condition. So this says that, when we have a complete orthogonal function then this complete
orthogonal function expansion enables representation of x(t) by a finite set of coefficients,
where the finite set of coefficients are C0, C1 like this up to CN-1.
So this is the finite set of coefficients, so if we have complete orthogonal function set of
orthogonal functions then using this complete set of orthogonal functions, thank you.
530
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-21.
Image Transformation-2.
Welcome to the course on digital image processing. Then using this complete set of
orthogonal functions we can go for a finite expansion of a signal x(t) using the finite number
of expansion coefficients C0, C1, upto CN-1, as is shown here. So I have a finite set of
expansion coefficients. So from this discussion what we have seen is that an arbitrary
continuous signal x(t) can be represented by the series summation of a set of orthogonal basis
functions.
531
(Refer Slide Time: 1:06)
And this series expansion is given as x(t) is equal to Cn an(t), where n varies from 0 to
infinity, if I go for infinite expansion or this can also be represented as we have seen by finite
expansion finite series expansion in this case this will be represented by Cn an(t), where n will
now vary from 0 to N-1. So this is x hat (t), so obviously we are going for an approximate
representation of x(t) not a complete expansion not the exact representation of x(t). So this is
the case that we have for continuous signals x(t).
But, in our case we are not dealing with the continuous signal but we are dealing with the
discrete signals. So in case of discrete signals what we have is a set of samples or a series of
samples.
532
(Refer Slide Time: 2:30)
So the series of samples can be represented by say u(n), where 0 n N-1. So we have a
series of discrete samples u(n), so in this case we have capital N number of samples.
So obviously you can see that this is a one dimensional sequence of samples. And because it
is one dimensional sequence of samples and the sample size is capital N, that that is we have
capital N number of samples. So I can represent this set of samples by a vector say u of
dimension capital N. So I am representing this by a vector u of dimension capital N. And for
transformation what I do is I pre multiply this vector u by a unitary matrix A, of dimension
NxN.
So given this vector u, if I pre multiply this with a unitary matrix capital A, where the
dimension of this unitary matrix is NxN. So you find that this u is a vector of dimension N.
And I have a matrix, a unitary matrix of dimension NxN. So this multiplication results in
another vector v. So this vector v we call as a transformed vector or transformation vector.
This is transformed vector and this unitary matrix A is called the transformation matrix.
So what I have done is I have taken an N dimensional vector u pre multiplied by, pre
multiplied that n dimensional vector u by a unitary matrix of dimension NxN. So after
multiplication I got again an n dimensional vector v. Now, so by matrix equation this is v
equal to A times u.
533
(Refer Slide Time: 5:02)
If I expand this, so now what I do is, I expand this matrix equation. So if I expand this matrix
equation this can be represented as a series summation which will be given by
N-1
v k = a k,n u(n) , where n varies from 0 to capital N-1.
n=0
And this has to be computed for k equal to 0, 1, up to N-1. So I get the all the N elements of
the vector v(k). Now if A is a unitary matrix, then from vector v, I can also get back our
original vector u, so for doing that what we will do is we will pre multiply v by A-1. So this
should give me the original vector u, and this A-1v because this is an unitary matrix will be
nothing but A conjugate transpose v.
And if I represent this same equation in the form of a series summation this will come out to
N-1
be u n = a* k,n .v k , where k will now vary from 0 to N-1. And this has to be
k=0
computed for all values of n varying from 0,1, up to N-1. Now you find that, what is this
a*(k, n)? now if I represent this a(k, n) or if I expand this matrix a(k, n) this is of the form a11
or a01, a02, a03 like this a0n, a10, sorry this is a00, a01, a02 up to a0n. This will be a10, a11, so it will
go like this and finally I will have ak0, ak1, like this I will have ak,n.
Now find that in this expression we are multiplying a*(k, n) by v(k), a*(k, n), which is the
conjugate of a(k, n) into v(k). Now this a*(k, n) is nothing but the column vector of matrix
A*. So if I have this matrix A, this a*(k, n) is nothing but a column vector of matrix A*. So
these column vectors, or column vectors of matrix A* transpose.
534
(Refer Slide Time: 8:29)
So these column vectors or x a*(k, n), this columns vectors are actually called the basis
vectors of the matrix A. And you do remember this matrix A is an unitary matrix. And here
what we have done is the sequence of samples u(n) or vector u(n) has been represented
N-1
because we have represented a(n), u n = a* k,n .v k . So this vector u(n) has been
k=0
represented as a series summation of a set of basis vectors. Now if this basis vector has to be
orthogonal or orthonormal, then what is the property it has to follow.
So if we have a set basis vectors and in this case we have said that the columns of A*T this
forms the set of basis vectors. So if I take any two columns and take the dot product two
columns the dot product is going to be 0. And if I take the dot product of the column with
itself these dot product is going to be non 0.
So if I take a column say Ai and take the dot product of Ai with Ai, or I take two columns Ai
and Aj and take the dot product of these two columns. So these dot products will be equal to
some constant k, whenever i = j. And this will be equal to 0 ,whenever i j. So if this
property is followed then the matrix A will be an unitary matrix.
So in this case we have represented the vector v or vector u by a series summation of a set of
basis vectors. So this is what we have got in case of one dimensional signal or a one
dimensional vector u.
535
Now we are talking about image transformation, so in our case our interest is on image
transformations. Now the same concept of representing a vector as a series summation of a
set of basis vectors, can also be extended in case of an image.
So in case of an image, the vector u that we have defined in the earlier case, now it will be a
two dimensional matrix. So u instead of being a vector now it will be a two dimensional
matrix and we represent this by u(m,n) where m and n are row and column indices, where
0 m,n N-1 .
So see that we are considering an image of dimension NxM. Now transformation on this
image can be represented as v(k,l) will be equal to again, we take the series summation
N-1 N-1
v(k,l)= ak,l m,n u m,n , ok. So here you find that ak,l m,n is a matrix again of
m=0 n=0
dimension MxN. But in this case the matrix is itself has an index k, l and this computation
v(k, l) has to be done for 0 k,l N-1 .
So this clearly shows that the matrix that we are taking this is of dimension MxN and not
only that we have NxN, that is N2 number of such matrices or such unitary matrices. So this
ak,l m,n because (k,l), k and l both of them take the values from 0 to N-1, so I have N2
536
And from this v(k, l), which is in this case the transformation matrix I can get back this
original matrix u(m, n) by applying the inverse transformation. So in this case u(m, n) will be
N-1 N-1
equal to, we will have, u(m,n) will be given by u m,n = a*k,l m,n .v(k,l) .
k=0 l=0
So we find that by extending the concept of series expansion of one dimensional vectors to
two dimension we can represent an image as a series summation of basis unitary matrices. So
in this case all of ak,l m,n or a*k,l m,n will be the unitary matrices. Now what is the
N-1 N-1
The orthogonality property says that for this matrix A it says that a m,n .a* m',n' .
k=0 l=0
k,l k,l
This will be equal to a kronecker delta function of δ = (k - k', l - l') . So it says that this
functional value will be equal to 1, whenever k = k' and l = l' . In all other cases this
summation will be 0 and the completeness says that if I take the summation
N-1 N-1
a m,n .a* m',n' , this will be equal to Kronecker delta function δ = (m - m', n - n') .
k=0 l=0
k,l k,l
So it says that this summation will be equal to 1 whenever m = m' and n = n' . So the matrix
by applying this kind of transformation the matrix v which we get which is nothing but set of
v(k, l), this is what is the transform matrix or the transformation coefficients. So this is also
537
called the transformed coefficients. So we find that in this particular case any arbitrary image
is represented by a series summation of a set of basis images or set of unitary matrices.
Now if we truncate the summation. So in this case what we get is we get the set of coefficient
and the coefficient size is same as the original image size, that is if we have NxN image or
coefficient matrix will also be of NxN. Now is while doing the inverse transformation if I do
not consider all the coefficient matrices, I consider a subset of it, in that case what we are
going to get is an approximate reconstructed image.
And it can be shown that this approximate reconstructed image will have an error, a limited
error if the basis matrices that we are considering the set of basis matrices or set of basis
images that is complete. So this error will be minimized if the basis images that we consider
is complete basis images. So in that case, what we will have is the reconstructed image that
P-1 Q-1
u = a*k,l m,n .v(k,l) .
k=0 l=0
Now suppose l will vary from 0 to Q - 1 and say k will vary from 0 to P - 1. So instead of
considering both k and l, varying from 0 to N - 1, I am considering only Q number of
coefficients along l and P number of coefficients along k. So the number of coefficients that I
am considering for reconstructing the image of an inverse transformation is PxQ instead of
N2 .
538
So this PxQ is, using this PxQ number of coefficients I get the reconstructed image u , so
obviously this u is not the exact image it is an approximate image because I did not consider
all the coefficient values and the sum of squared error in this will be given by
N-1 N-1 2
2 u(m,n) - u(m,n) .
m=0 n=0
And it can be shown that this error will be minimized if a set of basis images that is ak,l(m,n)
this is complete. Now another point that is to be noted here if you compute the amount of
computation that is involved you find that if N2 is the image size. The number of
computations or amount of computations that will be needed both for forward transformation
and for inverse transformation will be of order N4.
So for doing this we have to have, we have to enquire tremendous amount of computation. So
one of the problem is how to reduce this computational requirement when you go for inverse
transformation or whenever we go for forward transformation, thank you.
539
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-22.
Separable Transformation.
Hello, welcome to the video lecture series on digital image processing. So let us see, what we
have done in our last lecture? In our introductory lecture on image transformations we have
said the basics of image transformation. We have seen what is meant by a unitary transform?
We have also seen what is orthogonal and orthonormal basis vectors. We have seen how an
arbitrary one dimensional signal can be represented by series summation of orthogonal basis
vectors.
And we have also seen how an arbitrary image can be represented by series summation of
orthonormal basis images. So when we talk about the image transformation basically the
image is represented as a series summation of orthonormal basis images.
540
(Refer Slide Time: 1:30)
After today’s lecture the students will be able to analyze the computational complexity of
image transform operations. They will be able to explain what is meant by a separable unitary
transformation. They will also know how separable unitary transforms help to implement fast
transformations and of course they will be able to write algorithms for fast transforms. So
first let us see that what we have done in the last class. In the last class we have taken one
dimensional sequence of the discrete signal samples, say given in the form u(n), where n
varies from 0 to some N-1.
541
So we have taken initially a one dimensional sequence of discrete samples like this that is
u(n) and we have found out what is meant by unitary transformation of this one dimensional
discrete sequence. So by unitary transformation the unitary transformation of this one
dimensional discrete sequence is given by say v = Au, where A is a unitary matrix. And this
N-1
can be represented expanded in the form v(k) = a(k,n)u(n) , where n varies from 0 to N-1.
n=0
Assuming that we have N number of samples in the input discrete sequence. Now we say that
this transformation is a unitary transformation, if the matrix A is a unitary matrix. So what is
meant by a unitary matrix, the matrix A will be said to be a unitary matrix if it obeys the
relation that A-1, inverse of matrix A will be given by A*T. That is if you take the conjugate
of every element of matrix A and then the take the transpose of those conjugate elements then
that should be equal to the inverse of matrix A itself.
So this says that A A*T that should be same as A*TA, which will be same as an identity
matrix. So if this relation is true for the matrix A, then we say that A is a unitary matrix and
the transformation which is given by this unitary matrix is a unitary transformation. So using
this matrix A we go for unitary matrix, unitary transformation.
Now, once we have this transformation and we get the transformation coefficient say v(k), or
the transformed vector transformed sequence v(k). We should be also able to find out that
how from these transformation coefficients we get back the original sequence u(n).
542
So this original sequence is obtained by a similar such relation which is given by u = A -1v
obviously it should be equal to A-1v and in our case since A-1 = A*T. So this can be written as
N-1
u = A *T v , and this expression can be expanded as u(n) = v(k)a * (k,n) ,where n, k varies
k=0
from 0 to N-1. And we have to compute this for all values of n varying from 0 to N-1, so
0 n N-1 . So by using the unitary transformation we can get the coefficients the
transformation coefficients and using the inverse transformation we can obtain the input
sequence, input discrete sequence from the coefficients, from this sequence of coefficients.
And this expression says that the input sequence u(n) is now represented in the form of a
series summation of a set of vectors or orthonormal basis vectors. So this is what we get in
case of one dimensional sequence. Now let us see what will be the case in case of a two
dimensional sequence.
So for a two dimensional sequence, see if I go for the case of two dimensional signals, then
the same transformation equations will be of the form v(k,l) is equal to we have to have
double summation u m, n into a k, l m, n, where both m and n varies from 0 to N-1.
So here u m, n is the input image, it is the two dimensional image, again we are transforming
this using the unitary matrix A and in the expanded form, the expression can be written like
this v(k,l) is equal to double summation u m, n a k, l m, n, where both m and n varies from 0
to infinity. And this has to be computed for all the values of k and l, where k and l varies
from 0 to N-1. So all k and l will be in the range 0 to N-1.
543
In the same manner, we can have the inverse transformation so that we can get the original
two dimensional matrix from the transformation coefficient matrix and this inverse
transformation in the expanded form can again be written like this. So from v(k,l) we have to
get back u m, n, so you can write it as u m, n again is equal to double summation v(k,l) into a
star k, l m, n, where both k and l will vary in the range 0 to N-1. And this we have to compute
for all values of m and n in the range 0 to N-1.
Where this image transform that is a k, l m, n, this is nothing but a set of complete
orthonormal normal discrete basis functions. So this a k, l m, n, this is a set of complete
orthonormal basis functions. And in our last class we have said what is meant by the
complete set of orthonormal basis functions. And in this case this quantity, the v(k,l) what we
are getting these are known as transformed coefficients.
Now let us see that what will be the computational complexity of these expressions. If you
take any of this expressions, say for example the forward transformation where we have this
N-1 N-1
particular expression v(k,l)= u m,n ak,l m,n .
m=0 n=0
So to compute this v(k,l), you find that if I compute this particular expression. For every
v(k,l) the number of complex multiplication and complex addition that has to be performed is
of the order of N2, ok. And you remember that this has to be computed for every value of k
and l, where k and l vary in the range 0 to N-1, that is k is having N number of values, l will
also have N number of values.
So to find out v(k,l), a single coefficient v(k,l) we have to have of the order of N2, number of
complex multiplications and additions. And because this has to be computed for every v(k,l)
and we have N2 number of coefficients because both k and l vary in the range 0 to N-1. So
there are N2 number of coefficients. And for computation of each of the coefficient we need
N2 number of complex addition and multiplication.
So the total amount of computation that will be needed in this particular case is of the order
of N4, ok. Obviously this is quite expensive for any of the practical size images because in
practical cases we get images of the size of say 256x256 pixels or 512x512 pixels, even it can
go up to say 1k x 1k number of pixels or 2k x 2k number of pixels and so on.
So if the computational complexity is of the order of N4, where the image is of size NxN you
find that what is the tremendous amount of computation that has to be performed for doing
544
the image transformations using this simple relation. So what is the way out? We have to
think that how we can reduce the computational complexity. Obviously to reduce the
computational complexity, we have to use some mathematical tools and that is where we
have the concept of separable unitary transforms.
represented in the form so if I can represent this in the form ak(m) into say bl(n) or
equivalently I can put it in the form ak,l m,n = ak(m).bl(n). So if this ak,l m,n can be
represented as a product of ak(m) and bl(n) then this is called a then this is called separable.
So in this case both ak(m), where k varies from 0 to N-1 and bl(n), where l also varies from 0
to N-1. So these two sets ak(m) and bl(n), they are nothing but one dimensional complete
orthogonal sets of basis vectors. So both a ak(m) and bl(n) they are one dimensional complete
orthonormal basis vectors. Now, if I represent this set of orthonormal basis vectors both ak(m)
and bl(n) in the form of matrices, ok.
That is, we represent A, as ak(m), as matrix A and similarly bl(n), the set of this orthonormal
basis vectors if we represent in the form of matrix B. Then both A and B themselves should
be unitary matrices. And we have said that if they are unitary matrices then AA *T = A T A * ,
which should be equal to identity matrix. So if this holds true. in that case we say that the
transformation that we are going to have is a separable transformation.
545
And we are going to see next that how this separable transformation helps us to reduce the
computational complexity. So in the original form we had the computational complexity of
the order N4. And we will see that whether this computational complexity can be reduced
from, from the order N4. Now in most of the cases what we do is we assume these two
matrices A and B to be same and that is how these are decided.
So if I take both A and B to be equal to same then the transformation equations can be written
N-1 N-1
in the form v(k,l)= ak,l m,n u m,n , so compare this with our earlier expressions where
m=0 n=0
in the expression we had ak,l m,n . So now this, ak,l m,n we are separating it into two
components one is a k,m the other one is a l,n and this is possible because the matrix A
N-1 N-1
So because this is a separable matrix, we can write v(k,l) = a k,m u m,n a l,n , where
m=0 n=0
again in this case both m and n will vary from 0 to N-1. And in matrix form, this equation can
be represented as V = AUAT. Where U is the input image of dimension NxN and V is the
coefficient matrix again of dimension NxN. And the matrix A is also of dimension NxN.
In the same manner the inverse transformation, that is what we what we have got is the
coefficient matrix and by inverse transformation we want to have the original image matrix
546
from the coefficient matrix. So in the same manner, the inverse transformation can now be
N-1 N-1
written as u m,n = a * (k,m)v k,l a * l,n .
k=0 l=0
So this is the expression for the inverse transformation, and again as before this inverse
transformation can be represented in the form of a matrix equation, where the matrix equation
will look like this U = A *T VA * . And these are called two dimensional separable
transformations. So we find that from our original expressions, we have now brought it to an
expression in the form of separable transformations.
So we find that this particular expression that is V, when we have written this V equal to
sorry, so here we have written V equal to so if you go back to our previous slide you find that
V = AUAT. So if I just write in the form V = AUAT, so I get the coefficient matrix V, from
our original image matrix U by using the separable transformations.
The same equation we can also represent in the form of VT = A[AU]T. Now what does this
equation mean, you find that here what it says that if I compute A, the matrix multiplication
of A and U, take the transpose of this then pre multiply that result with the matrix A itself
then what we are going to get is the transpose of the coefficient matrix V.
So if I analyze this equation it simply indicates that this two dimensional transformation can
be performed by first transforming each column of U with matrix A and then transforming
each row of the result to obtain the rows of the coefficient matrix V. so that is what is meant
547
by this particular expression. So AU, what it does is it transforms each column of the matrix
A with, of the input image A with the input image U with the matrix A.
And this intermediate result you get, you transform each row of this again with matrix A and
that gives you the rows of the transformation matrix or the rows of the coefficient matrix V.
And so if I take the transpose of this final result what we are going to get is the set of
coefficient matrix that we want to have. Now if I analyze this particular expression you find
that A is a matrix of dimension NxN.
U is also a matrix of the same dimension NxN. And then from matrix algebra, we know that
if I want to multiply two matrices of dimension NxN, then the complexity or the number of
additions or multiplications that we have to do is of order N3.
So here to perform this first multiplication we have to have of order N cube number of
multiplications and additions. The resultant matrix is also of dimension NxN. And the second
matrix multiplication that we want to perform that is A[AU]T, this will also need of order N3
number of multiplications and additions. So the total number of addition and multiplication
that we have to perform when I implement this as a separable transformation is nothing but of
order 2N3.
And you compare this with our original configuration when we had seen that the number of
addition and multiplication that has to be done is of order N4. So what we have obtained in
this particular case is the reduction of computational complexity by a factor of N. So this
simply indicates that if the transformation is done in the form of a separable transformation,
thank you.
548
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-23.
Basis Images.
Hello, welcome to the video lecture series on digital image processing. Now let us see that
what is meant by the basis images. So what is meant by basis image. Now here we assume
that suppose a *k this denote the kth column of the matrix A conjugate transpose. So a *k we
represent this, the kth column of A*T, where A is the transformation matrix.
matrix A*T, a *l is also the lth column of the matrix A*T. So if I take the product of a *k and a *T
l
, then I get the matrix, a matrix A*k,l , ok. And let us also define the inner product of say two
NxN matrices, so I define inner product of two NxN matrices, say F and G.
N-1 N-1
form of F,G = f(m,n)g* (m,n) .
m=0 n=0
549
(Refer Slide Time: 3:16)
So now by using these two definitions now if I rewrite our transformation equations, so now
we can write the transformation equations as v(k,l) is equal to, you find that the old
N-1 N-1
expression that we have written v(k,l)= u m,n ak,l m,n .
m=0 n=0
So this is nothing but as per our definition so if you just look at this definition. This is nothing
but an expression of an inner product. So this was the expression of the inner product. So this
transformation equation is nothing but an expression of an inner product and this inner
product is the inner product of the image matrix U with the transformation matrix A*k,l .
Similarly if I write the inverse transformation u(m,n) which is given as again in the form of
N-1 N-1
u m,n = v k,l .a *k,l (m,n) .
k=0 l=0
N-1 N-1
So again we find that in the matrix form this will be written as U = v k,l .A *k,l , where
k=0 l=0
both k and l vary from 0 to capital N-1. So if you look at this particular expression you find
that our original image matrix now is represented by a linear combination of N2 matrices A*k,l
. Because both k and l vary from 0 to N-1, so I have N2 such matrices A k,l .
And by looking at this expression you find that our original image matrix U is now
represented by a linear combination of N2 matrices A*k,l , where each of this N2 matrices are of
550
dimension NxN. And these matrices A*k,l are known as the basis images. So this particular
derivation simply says that the purpose of image transformation is to represent an input image
in the form of linear combination of a set of basis images.
Now to look at how these basis images look like, to see how these basis images look like. Let
us see some of the images, so here we find that we have shown two images we will see later
that these are the basis images of dimension 8x8. So here we have shown basis images of
dimension 8x 8. And there are total 8x8 that is 64 basis images. We will see later that in case
of discrete Fourier transformation we get two components one is the real component, other
one is the imaginary component.
551
So accordingly we have to have two basis images one corresponds to the real component,
other one corresponds to the imaginary component. Similarly, this is another basis image
which corresponds to the discrete cosine transformation, so again here I have shown the basis
images of size N by 8x 8, of course the image size, image is quite expanded and again we
have 8 x 8, that is 64 number of images.
So here we find that a row of this represents the index k and the column indicates the index l.
So again we have 64 images, each of these 64 images is of size 8x8 pixels. Similarly, we
have the basis images for other transformations like Walsh Transform, Hadamard Transform
and so on. So once we look at the basis images, so the purpose of showing these basis images
is that as we said that the basic purpose of image transformation is to represent an input
image as linear combination of a set of basis images.
And when we take this linear combination, each of these basis images will be weighted by
the corresponding coefficient, the transformation coefficient v(k,l) that we compute after the
transformation. And as we have said that this v(k,l) is nothing but the inner product of (k,l)th
basis image. So when we compute this v(k,l), as we have seen earlier, so if you just look at
this, this v(k,l) which is represented as inner product of the input image U and the (k,l)th
basis image A*k,l .
So each of these coefficients v(k,l) is actually represented as v k,l = U, A*k,l . And because
this is the inner product of the input image U and the (k,l)th basis image A*k,l , this is also
552
called the projection of the input image on the (k,l)th basis image. So this is also called the
projection of the input image u onto the (k,l)th basis image A*k,l , ok.
And this also shows that any NxN image, any image, input image of size, any input image U
of size NxN can be expanded using a complete set of N2 basis images. So that is the basic
purpose of the image transformation. So let us take an example.
553
So let us consider an example of this transformation. Say we have been given a
1 1 1
transformation matrix which is given by A = and we have the input image matrix
2 1 -1
1 2
U = .
3 4
And in this example will try to see that how this input image U can be transformed with this
transform matrix A and the transformation coefficients that you get, if I take the inverse
transformation of that we should be able to get back our original input image U. So given
this, the transformed image, we can compute the transformed image like this, the
1 1 1 1 2 1 1
transformation matrix V = .
2 1 -1 3 4 1 -1
So if you just see our expressions, you find that our expressions were something like this.
When we computed V, we had computed V=AUAT, ok. So by using that we have A, U then
AT and by nature of this transformation matrix A, you find that AT is nothing but same as A.
1 1
So will have . And if you do this matrix computation it will simply come out to be
1 -1
1 4 6 1 1
and on completion this matrix multiplication the final coefficient matrix
2 2 2 1 -1
5 -1 5 -1
V will come out to be . So I get the coefficient matrix V = , ok.
2 0 2 0
554
Now let us see that what is the, for this particular transformation what will be the
corresponding basis images. Now when we define the basis images, you remember that we
have said that we have assumed A *k to be the kth column, this was the kth column of matrix
A*T, ok. Now using the same concept and form this our basis functions was taken as A*k,l
which was given by A *k multiplied with, sorry A*k,l , the (k,l)th basis image was computed as
a *k . a *T
l .
So this is how we had computed the basis images we have defined the basis images. So using
the same concept in this particular example where we have the transformation matrix A is
1 1 1 *
given as A = 1 -1 ,I can compute the basis images as A 0,0 , the 0th basis image will be
2
1 1 1 1 1
simply 1 1 . So this will be nothing but .
2 1 2 1 1
1 1 -1
Similarly, we can also compute A*0,1 that is (0,1)th basis image will be given as ,
2 1 -1
*
which will be same as A1,0 , that is (1,0)th basis image and similarly, we can also compute
* 1 1 -1
A1,1 that is (1,1)th basis image will come out to be . So this is simply by the matrix
2 -1 1
multiplication operations we can compute these basis images from the rows of, from the
columns of A*T.
555
Now to see that what will be the result of inverse transformation, you remember the
5 -1
transformation coefficient matrix V we had obtained as . So this was our coefficient
2 0
matrix. By inverse transformation what we get is our inverse transformation is A*TVA*,
1 1 1 5 1 1 1
which by replacing these values will get as .
2 1 -1 2 0 1 -1
1 2
And if you compute this matrix multiplication the result will be U = , which is nothing
3 4
but our original image matrix U. So here again you find that by the inverse transformation we
get back our original image U and we have also found that what are the basis images the four
basis images A*0,0 , A*0,1 , A1,0
* *
and A1,1 for this particular transformation matrix A which has
to be operated on the image matrix U and we have also seen that by the inverse
transformation, we can get back the original image matrix U.
556
(Refer Slide Time: 18:56)
Now let us look further in this separable transformations. So what we had in our case is we
had U as the original image matrix and after transformation, we get V as the coefficient
matrix. And you would remember that both these matrices are of dimension NxN. Now what
we do is for both these matrices U and V, we represent them in the form of vectors by row
ordering. That is, we concatenate one row after another.
So by this row ordering, what we are doing is we are transforming this matrix of dimension
NxN to a vector of dimension capital N2. And by this row ordering the vector that we get let
us represent this by the variable say u. So by row ordering the input image matrix is mapped
to a vector say u, similarly by row ordering the matrix coefficient matrix V is also
represented by v.
Now once we do this then, this transformation equation can also be written as , v = (A A)u ,
ok. So this Kronecker product of A and A can be represented as, this ₳ and it is represented
by ₳u. Similarly, the inverse transformation can also be written as u (A A)*T v .
Where this particular sign (A A) , this represents Kronecker product. And the matrix ₳
which is equal to the kronecker product of the two matrices A and A, this is also a unitary
matrix. So once we do this, then you find that our two dimensional transformation after doing
this row ordering of the input image U and the coefficient matrix V, once they are
represented as one dimensional vectors of dimension N2, so this two dimensional image
transformation is now represented in the form of, in a one dimensional transformation form.
557
(Refer Slide Time: 22:49)
So by this what we have is say any arbitrary one dimensional signal say x, can now be
represented as say y can now be transformed as y =₳x. And we say that this particular
transformation is separable, where ₳ is the transformation matrix we say that this
transformation is separable, if this transformation matrix A can be represented by as the
kronecker product of two matrices A1 and A2.
Now if we represent this in this form, then it can be shown that if both A1 and A2 are of
dimension NxN and then because this ₳ is the Kronecker product of A1 and A2, this ₳ will be
of dimension N2, ok. And by this matrix multiplication again we can see this will be of
dimension N2xN2, so total N4 number of elements. So the amount of computation that we
have to do in this particular case will be again of order N4.
And because this transformation A is separable and this can be represented as Kronecker
product of A1 and A2 and you find that this particular operation can now be obtained using N3
number of operations, order N3 number of operations. So this again says that if a
transformation matrix is represented as kronecker product of two smaller matrices then we
can reduce the amount of computation, thank you.
558
559
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-24.
Fourier Transform.
Hello, welcome to the video lectures series on digital image processing. In the last two
classes, we have seen the basic theories of unitary transformations and we have seen we have
analyzed the computational complexity of the unitary transformation operations particularly
with respect to the image transformations. We have explained the separable unitary
transformation.
We have explained how separable unitary transformation helps to implement the fast
transformations and fast transformation implementation as we have seen during our last class.
It reduces the computational complexity of the transformation operations. After giving the
general unitary introduction to the general unitary transformations, in today’s lecture, we are
going to discuss about the Fourier transformation, which is a specific case of the unitary
transformation.
560
(Refer Slide Time: 1:25)
So during today’s lecture, we will talk about the Fourier transformation and we will talk
about Fourier transformation both in the continuous domain as well as in discrete domain.
We will see what are the properties of the Fourier transformation operations and we will also
see that what is meant by Fast Fourier Transform, that is fast implementation of the Fourier
transformation operation.
Now this Fourier Transformation operation, we have discussed in brief when we have
discussed about the sampling theorem that is giving given an analog image or continuous
image while discretization. The first stage of discretization was sampling the analog image.
So during our discussion on sampling, we have talked about the Fourier Transformation and
there we have said that Fourier Transformation gives you the frequency components present
in the image.
And for sampling, we must meet the condition that your sampling frequency must be greater
than twice the maximum frequency present in the continuous image. In today’s lecture, we
will discuss about the Fourier Transformation in greater details. So first let us see what is
meant by the Fourier Transformation. As we have seen earlier, that if we assume, a function
say f(x).
561
(Refer Slide Time: 2:51)
So we will first talk about the Fourier Transformation in the continuous domain, and if we
assume that f(x) is a continuous function, so this f(x) is a continuous function of some
variable say x, then a Fourier transformation of this function f(x), we normally write it as, the
Fourier transformation of the function f(x). This is also written as capital F(u). This is given
f(x)e
-j2πux
by the expression, integral expression F(u) = dx .
-
Here the integration is carried over from minus infinity to infinity. Now this variable u, this is
the frequency variable. So given a function f(x), a continuous function f(x), by using this
integration operation, we can find out the Fourier transformation or the Fourier transform of
this continuous function f(x) and the Fourier transform is given by F(u).
Now for doing this continuous Fourier transformation, this function f(x) has to meet some
requirement. The requirement is, the function f(x) must be continuous, it must be continuous
and it must be integrable. So if f(x) meets these two requirements, that is f(x) is continuous
and integrable then using this integral operation, we can find out the Fourier transformation
of this continuous function f(x).
Similarly, we can also have the inverse Fourier transformation, that is given the Fourier
transform F(u) of a function f(x) and if F(u) is integrable, f(u) must be integrable, then we can
find out the inverse Fourier transform of F(u) which is nothing but the continuous function
562
F(u)e
j2πux
f(x) and this is given by a similar integration operation and now it is f(x) = du and
-
this integration again has to be carried out from minus infinity to infinity.
So from f(x) using this integral operation we can get the Fourier transformation which is the
F(u) and if F(u) is integrable then using the inverse Fourier transformation, we can get back
the original continuous function f(x) and these two expressions that is F(u) and f(x), the
expressions for F(u) and expression for f(x), these two expressions are known as Fourier
transform pairs.
So these two are known as Fourier transform pairs. Now from this expression, you find that
because for doing the Fourier transformation, what we are doing is we are taking the function
f(x) multiplying it with an exponential e -j2πux and integrating this over the interval -∞ to ∞. So
naturally this expression F(u) that you get is in general complex because e -j2πux , this quantity
is a complex quantity.
563
Or the same F(u) can also be written in the form |F(u)|e jφ(u) , where the modulus of F(u),
which gives you the modulus of this complex quantity F(u), this is nothing but
12
R 2 (u) + I 2 (u) . Ok? And this is what is known as Fourier spectrum of f(x). So this we call
I(u)
as Fourier spectrum of the function f(x) and this φ(u) which is given by tan -1 .
R(u)
This is what is called the phase angle. This is the phase angle. So from this, we get what is
known as the Fourier spectrum, Fourier spectrum of f(x) which is nothing but the modulus of
I(u)
magnitude of the Fourier transformation F(u) and tan -1 this is, that is what is the
R(u)
phase angle for this particular, for a particular value of u.
Now there is other term which is called the power spectrum. So power spectrum of the
function f(x) which is also represented as P(u). this is nothing but |F(u)|2 and if you expand
this, this will be simply R 2 (u) + I 2 (u) . So you get the power spectrum, we get the Fourier
spectrum and we also get the phase angle from the Fourier transformation coefficients.
And this is what, we have in case of 1 dimensional image because we have because we have
taken a function f(x) which is a function of single variable x. Now because in our case, we are
discussing about the image processing operations and we have already said that the image is
nothing but a 2 dimensional function, which is a function of two variables x and y, so we
have to discuss about the Fourier transformation in 2 dimension rather than in single
dimension.
564
(Refer Slide Time: 10:46)
So when you go for 2 dimensional Fourier transformation, so you talk about 2D Fourier
Transform. The 1 dimensional Fourier transform that we have discussed just before can be
easily extended to 2 dimension in the form that now in this case, our function is a 2
dimensional function f(x, y) which is a function of two variables x and y and the Fourier
f x,y e
-j2 π( ux + vy)
transform of this f(x, y) is now given by a F(u, v) = dx dy .
So find that from a 1 dimensional we have easily extended that to 2 dimensional Fourier
transformation and now this integration has to be taken over x and y because our image is a 2
dimensional image, which is a function of two variables x and y. So the forward
f x,y e
-j2 π( ux + vy)
transformation is given by this expression F(u, v) = dx dy .
In the same manner, the inverse Fourier transformation, so you can take the inverse Fourier
transformation to get f(x, y), that is the image from its Fourier transform coefficient F(u,v) by
F u,v e
j2 π( ux + vy)
taking the similar integral operation and in this case it will be du dv .
565
(Refer Slide Time: 13:33)
So in this 2 dimensional signal, the Fourier spectrum F(u,v) is given by R2(u,v), so as before
this R gives you the real component + I2(u,v) ,where I gives you the imaginary component
and square root of this.
So this is what is the Fourier spectrum of the 2 dimensional signal f(x, y). We can get the
I(u,v)
phase angle in the same manner. The phase angle φ (u,v) is given by tan -1 and the
R(u,v)
power spectrum in the same manner we get as P(u,v) = |F(u,v)|2 , which is nothing but
R 2 (u) + I 2 (u) .
So you find that all these quantities which we had defined in case of the single dimensional
signal is also applicable in case of the 2 dimensional signal that is f(x, y). Now to illustrate
this Fourier transformation, let us take an example.
566
(Refer Slide Time: 15:10)
Suppose we have a continuous function like this, the function f(x, y) which is again a
function of two variables x and y and the function in our case is like this that f(x, y) assumes
a value, a constant value say capital A, for all values of x lying between 0 to X and all values
of y lying between 0 to Y. So what we get is a rectangular function like this where all values
of x > X, the function value is zero and all values of y > Y, the function value is also zero.
And between 0 to X and 0 to Y, the value of the function is equal to A. Let us see, how we
can find out the Fourier transformation of this particular 2 dimensional signal.
567
(Refer Slide Time: 16:18)
So to compute the Fourier transformation, we follow the same expression. We have said that
f x,y e
-j2 π( ux + vy)
F(u,v) is nothing but dx dy . Now in our case, this f(x, y) is equal to
And outside this region, the value of f(x, y) = 0. So you can break this particular integral in
this form. This will be same as A then take the integration over x which will be in this
particular case e -j2πux dx. Now this integration over x has to be from 0 to X multiplied e -j2πvy
dy, where this integration will be in the range 0 to Y.
So if I compute this, these two integrations, these two integrals, you will find that it will take
the form something like this. And if you compute these two limits, you will find that, it will
sin(πux).e-jπux sin(πvy).e-jπvy
take the value AXY . So after doing all these integral
πux πvy
operations I get an expression like this.
568
(Refer Slide Time: 19:42)
So from this expression, if you compute the Fourier spectrum, the Fourier spectrum will be
something like this. So what we are interested in is the, Fourier spectrum. So the Fourier
sin(πux) sin(πvy)
spectrum, that is |F(u,v)| will be given by AXY . So this is what is the
πux πvy
Fourier spectrum of the Fourier transformation that we have got.
Now if we plot to the Fourier spectrum, the plot will be something like this. So this is what is
the plot of this Fourier spectrum. So the Fourier spectrum plot is this one. So you find that
569
this is again a 2 dimensional function. Of course in this case the spectrum that has been
shown is shifted, so that the spectrum comes within the range for its complete visibility.
So for a rectangular function, rectangular 2 dimensional function, you will find that the
Fourier spectrum will be something like this and you can find out that if I say that this is the
x-axis and this is the y- axis and assuming the center to be at the origin, you will find that
along the x-axis at point 1/X. similarly 2/X. The value of this Fourier spectrum will be equal
to 0. Similarly, along the y-axis, at values 1/Y, 2/Y, the values of this spectrum will also be
equal to 0.
So what we get is the Fourier spectrum and the nature of the Fourier spectrum of the
particular 2 dimensional signal. Now so far what we have discussed, is the case of the
continuous functions or analog functions. But in our case, we have to be interested in the case
for discrete images or digital images where the functions are not continuous but the functions
are discrete.
So all these integration operations that we are doing in case of the continuous functions, they
will be replaced by the corresponding summation operations. So when you go for the 2
dimensional signal, ok? So in case of this discrete signals, the discrete Fourier transformation
will be of this form F(u,v), now this integrations will be replaced by summations.
570
ux vy
1 M-1 N-1 -j2π +
So this will take the form of
MN x=0 y=0
f(x,y)e M N
. Because our images are of size MxN
and the frequency of variables u, because our images are discrete, the frequency variables are
also going to be discrete.
So the frequency variables u will vary from 0, 1 up to M-1 and the frequency variable v will
similarly, vary from 0, 1, up to N-1. So this is what is the forward discrete Fourier transform.
Forward 2 dimensional discrete Fourier transformation. In the same manner we can also
obtain, the inverse Fourier transformation for this 2 dimensional signal.
M-1 N-1 ux vy
j2π +
So the inverse Fourier transformation will be given by f(x,y) F(u,v)e M N
. So the
u=0 v=0
And obviously, this will give you give you back the digital image f(x, y), the discrete image
where x will now vary from 0 to M-1 and y will now vary from 0 to N-1. So we have
formulated these equations, in a general case where the discrete image is represented by a 2
dimensional array of size MxN. Now as is said, that in most of the cases, the image is mostly
represented in the form of square array where M = N.
571
So if the image is represented in the form of square array, in that case these transformation
2π
1 N-1 N-1 -j ux+vy
equations will be represented as F(u,v) will be equal to F(u,v)
N x=0 y=0
f(x,y)e N
.
2π
1 N-1 N-1 j ux + vy
And similarly, the inverse Fourier transform f(x, y) will be given by F(u,v)e N
N u=0 v=0
. So this is the Fourier transformation pair that we get in discrete case for a square image
where the number of rows and the number of columns are same.
2π
j (ux + vy)
And as we have discussed earlier, that e N
, this is what we have called the basis images.
This we have discussed with, when we have discussed about the unitary transformation. And
we have said, we have shown that time that these basis images will be like this.
572
(Refer Slide Time: 28:05)
So as the Fourier transformation as we have seen that it is a complex quantity, so for the
Fourier transformation we will have two basis images. One basis image corresponds to the
real part, the other basis image corresponds to the imaginary part and these are the two basis
images, one for the real part and the other one for the imaginary part.
Now as we have defined the Fourier transform, the Fourier spectrum, the phase the power
spectrum in case of analog image. All these quantities can also be defined, are also defined in
the case of discrete image in the same manner.
573
So in case of this discrete image, the Fourier spectrum, is given by similar expression that is
12
F(u,v) is nothing but R 2 (u,v) + I 2 (u,v) .
I(u,v)
Phase is given by φ(u,v) =tan -1 and the power spectrum P(u,v) is given by the
R(u,v)
similar expression, which is nothing but |F(u,v)|2 which is nothing but R 2 (u,v) + I 2 (u,v) ,
where R(u,v) is the real part of the Fourier coefficient and I (u,v) is the imaginary part of the
Fourier coefficient. So after discussing about this Fourier transformation both in the forward
direction and also in the reverse direction. Let us look at how these Fourier transform
coefficients look like.
So here we have the result on one of the images and you find that this is a popular image, a
very popular image which is cited in most of the image processing textbooks that is the image
of Lena. So if you take the discrete Fourier transformation of this particular image, the right
hand side, this one shows that DFT which is given in the form of an intensity plot and the
bottom one that is this particular plot, is the 3 dimensional plot of the DFT coefficients. Here
again when these coefficients are plotted, it is shifted so that the origin is shifted at the center
of the plane so that you can have a better view of all these coefficients.
Here we find that at the origin, the intensity of the coefficient or the value of the coefficient is
quite high compare to the values of the coefficients as you move away from the origin. So
this indicates that the Fourier coefficient is maximum at least for this particular image at
574
origin that is when u = 0 or v = 0 and later on we will see that u = 0, v = 0 gives you what is
the DC component of this particular image.
And in most of the images the DC component is maximum and as you move towards the
higher frequency components, the energy of the higher frequency signals is less compared to
the DC component. Thank you.
575
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-25.
Properties of Fourier Transform.
Hello, welcome to the video lecture series on digital image processing. So after discussing
about all these Fourier transformation, the inverse Fourier transformation and looking at how
the Fourier coefficients look like, let us see some of the properties of these Fourier
transformation operations. So now, we will see some of the properties, important properties
of Fourier transformation. So the first property that we will talk about is the separability.
576
(Refer Slide Time: 1:03)
Now if you analyze the expression of the Fourier transformation, where you have said that
2π
1 N-1 N-1 -j ux+vy
the Fourier transformation F(u,v) f(x,y)e N .
N x=0 y=0
Now find that this particular expression of the Fourier transformation, this particular
1 N-1 -j 2πN ux 1 N-1 2π
-j vy
expression can be rewritten in the form F(u,v)
N x=0
e N.
N y=0
f(x,y)e N
.
So it is the same Fourier expression, but now we have separated the variables x and y into
two different summation of operations. So the first summation operation, you will find that it
involves the variable x and the second summation operation involves the variable y. Now if
you look at this function f(x,y) for which we are trying to find out the Fourier
transformations.
Now the second summation operation where the summation is taken over y, where y varies
from 0 to N-1, you will find that in this f(x,y) if we keep the value of x to be fixed, that is for
a particular value of x, the defined values of f(x,y) that represents nothing but a particular row
of the image. So in this particular case, for a particular value of x, if I keep x to be fixed, so
for a fixed value of x, this f(x,y) represents a particular row of the image which is nothing but
an 1 dimensional signal.
577
So by looking at that, what we are doing is, we are transforming the rows of image and
different rows of the image for different values of x. So after expansion or elaboration of this
1 N-1 -j 2πN ux
particular expression, the same expression now gets converted to e .N. F(x,v) .
N x=0
I represent this as F(x,v) and of course there is a multiplication term which is N. Ok? And
2π
1 N-1 -j ux
this is nothing but
N x=0
F(x,v)e N
. So once if we look at these expressions, you find that
the second summation, the second summation operation gives you the Fourier transformation
of the different rows of the image and that Fourier transformation of the different rows which
now we represent by F(x,v). Ok?
This x represents, the x is an index of a particular row and the second summation what it does
is, it takes this intermediate Fourier coefficients and on these Fourier coefficients now it
performs the Fourier transformations over the columns to give us the complete Fourier
transformation operation or F(u,v).
So the first operation that we are performing, is the Fourier transformation over of different
rows of the image multiplying this intermediate result by the factor of N and then this
intermediate result intermediate Fourier transformation matrix that we get, we further take the
Fourier transformation of different columns of this intermediate result to get the final Fourier
transformation.
578
So graphically, we can represent this entire operation like this that, this is our x-axis, this is
our y-axis. I have an image f(x,y). So first of all, what we are doing is we are taking the
Fourier transformation along the row. So we are doing row transformation. And after doing
row transformation, we are multiplying all these intermediate values by a factor N. so you
multiply by the, by the factor N.
And this gives us, the intermediate Fourier transformation coefficients which now we
represent as F(x,v). So you get one of the frequency components which is v. And then, what
we do is we take this intermediate result and initially we had done row transformation and
now we do column transformation. And after doing this column transformation what we get
is, so here it will be x and it will be axis v and we get the final result as u, v and our final
transformation coefficients will be F(u,v).
Of course, this is the origin (0,0). All these values are N-1. Here also it is (0,0). This is N-1,
N-1. Here also it is (0,0). Here it is N-1, here it is N-1. So you find using this separability
property, what we have done is, this 2 dimensional Fourier operation is now converted into
two 1 dimensional Fourier transformation Fourier operations.
So in the first case what we are doing is, we are doing the 1 dimensional Fourier
transformation operation over different rows of the image and the intermediate result that you
get, that you multiply with the dimension of the image which is M and this intermediate
result, you take and now you do again 1 dimensional Fourier transformation across the
different columns of this intermediate result and then you finally get the 2 dimensional
Fourier transformation coefficient.
579
(Refer Slide Time: 10:06)
So again as before, you will find that this second operation, this is nothing but inverse
discrete Fourier transformation along a row. Ok? So the second expression, this gives you
the inverse Fourier transformation along the row and when you finally convert this and get
2π
1 N-1 j ux
the final expression, this will be
N u=0
N.f(u,y)e N
.
In the same manner in the inverse Fourier transformation, we can also take the Fourier
coefficient array, do the inverse Fourier transformation along the rows and all those
intermediate results that you get, for that you second step you do the inverse discrete Fourier
transformation along the columns. And these two operations completes the inverse Fourier
transformation of the 2 dimensional array to give you the 2 dimensional signal F(x,y).
580
So because of this separability property, we have been able to convert the 2 dimensional
Fourier transformation operation into two 1 dimensional Fourier transformation operations.
And because now it has to be implemented as 1 dimensional Fourier transformation
operation, so the operation is much more simple than in case of 2 dimensional Fourier
transformation operation.
Now let us look at the second property of this Fourier transformation. The second property
that we will talk about is the translation property. Translation property says that if we have a
2 dimensional signal f(x,y) and translate this by a (xo,yo). So along x direction, you translate it
by xo and along y direction you translate it by yo. So the function that you get is f(x-xo, y-yo).
So if I take the Fourier transformation of this translated signal, if (x-xo, y-yo), how the Fourier
transformation may look like. So you can find out the Fourier transformation of this
translated signal and let us call this Fourier transformation as Ft(u,v), so I represent this as
Ft(u,v). So going by the similar expression, this will be nothing
2π
1 N-1 N-1 -j ux + vy
Ft(u,v)
N x=0 y=0
f(x-xo,y-yo)e N .
581
2π 2π
1 N-1 N-1 -j ux+vy -j uxo+vyo
Ok? So if I expand this, what I will get is
N x=0 y=0
f(x,y)e N
e N
by simply
expanding this particular expression. So here if you consider the first expression that is
2π
1 N-1 N-1 -j ux+vy
f(x,y)e N
N x=0 y=0
.
This particular term is nothing but our Fourier transformation F(u,v). So by doing this
2π
-j uxo+vyo
translation, what you get is the final expression Ft(u,v) will come in the form F(u,v)e N
. So this is the final expression of this translated signal that we get. So if I compare, if you
compare these 2 expressions, F(u,v) and Ft(u,v) you will find that the Fourier spectrum of the
signal after translation does not change because the magnitude of this Ft(u,v) and the
magnitude of F(u,v) will be the same.
So because of this translation, what you get is only it introduces some additional phase
difference. So whenever f(x,y) is translated by (xo,yo), the additional difference which is
2π
-j uxo+vyo
introduced by the e N
. But otherwise, the magnitudes of the Fourier spectrum or the
magnitude of the Fourier transformation that is the Fourier spectrum, that remains unaltered.
In the same manner, if we talk about the inverse Fourier transformation, the inverse Fourier
2π
j uox+voy
transformation F(u-uo ,v-vo), this will give rise to f(x,y)e N
. So this says if f(x,y) is
582
multiplied by this exponential term then its Fourier transformation is going to be replaced is
going to be displaced by the (uo,vo).
And this is the property which will we will use later on to find out that how the Fourier
transformation coefficients can be better visualized. So here in this case, we get the Fourier
transformation, the forward Fourier transformation and the inverse Fourier transformation
with translation and you will find that and we have found that a shift in f(x,y) say (xo,yo) does
not change the Fourier spectrum of the signal.
What we get is just an additional phase term gets introduced in the Fourier spectrum. Thank
you.
583
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-26.
FT Result Display-2.
Hello, welcome to the video lecture series on digital image processing. In our last lecture, we
have started discussion on the Fourier transformation and towards the end, we have seen
some of the properties of Fourier transformation.
So what we have done in the last class is, we have talked about the Fourier transformation
both in the continuous and the discrete domain and we have talked about some of the
properties of the Fourier transformation like the separability property and the translation
property.
Today we will continue with our lecture on the Fourier transformation and we will see the
other properties of the Fourier transformation and we will talk about how to implement
Fourier transformation in a faster way, that is we will talk about the fast Fourier
transformation algorithm.
584
(Refer Slide Time: 1:21)
So in today’s lecture, we will see the properties of the discrete Fourier transformation,
specifically the periodicity and conjugate property of the Fourier transformation.
We will talk about the rotation property of the Fourier transformation, we will see the
distributivity and the scaling property of the Fourier transformation followed by the
convolution and the correlation property of the Fourier transformation. And then we will talk
about an implementation, a fast implementation of the Fourier transformation which is called
Fast Fourier Transform.
585
So first let us see, what just try to repeat what we have done in the last class. So in the last
class, we have talked about the separability. We have talked about the separability of the
Fourier transformation and here we have seen that given our 2 dimensional signal f(x,y) in
the discrete domain, that is samples of this 2 dimensional signal f(x,y), we can compute the
2π
1 N-1 N-1 -j ux+vy
Fourier transformation of f(x,y) as F(u,v)
N x=0 y=0
f(x,y)e N , where our original signal
2π
1 N-1 N-1 -j ux+vy
And the Fourier transformation expression comes as F(u,v)
N x=0 y=0
f(x,y)e N
. And if
I rearrange this particular expression, then this expression can be written in the form
1 N-1 -j 2πN ux 1 N-1 2π
-j vy
F(u,v)
N x=0
e N.
N y=0
f(x,y)e N
.
So in the inner summation, it is taken from y = 0 to N-1 and the outer summation is taken
from x = 0 to N-1 and here we have seen that this inner summation, this gives the Fourier
transformation of different rows of the input image f(x,y) and the outer summation, this outer
summation gives the Fourier transformation of different columns of the intermediate result
that we have obtained.
So the advantage of the separability property that we have seen in the last classes because,
because of this separability property we can do the Fourier transformation, 2 dimensional
Fourier transformation in two steps. In the first step we take the Fourier transformation of
every individual row of the input image array and in the second step we can take the Fourier
transformation of every column of the intermediate result that has been obtained in the first
step.
586
(Refer Slide Time: 5:46)
So now, the implementation of the 2 dimensional Fourier transformation becomes very easy.
So the scheme that we have said in the last class, is like this, if I have an input array given by
f(x,y). Here this is the x-dimension, this is the y-dimension. So first what we do is, we do row
transformation that is take Fourier transformation of every row of the input image, multiply
the result by N.
So what I get is an intermediate result array and this intermediate result array gives Fourier
transformation of different rows of the input image. So this is represented as F(x,v) and this is
my x dimension and this becomes the v dimension. And after getting this intermediate result,
I take the second step of the Fourier transformation and now the Fourier transformation is
taken for every column.
So we do column transformations and that gives us the final result of the 2 dimensional
Fourier transformation F(u,v). So this becomes my u axis, the frequency axis u, this becomes
the frequency axis v and of course this is the origin (0,0). So it shows that because of the
separability property, now the implementation of the 2 dimensional Fourier transformation
has been simplified because the 2 dimensional Fourier transformation can now be
implemented as two step of 1 dimensional Fourier transformation operations.
And that is how we get this final Fourier transformation F(u,v) in the form of the sequence of
1 dimensional Fourier transformations. And we have seen in the last class that the same is
also true for inverse Fourier transformation. Inverse Fourier transformation is also separable.
So given an array F(u,v), we can do first inverse Fourier transformation of every row
587
followed by inverse Fourier transformation of every column and that gives us the final output
in the form of f(x,y), which is the imagery.
So this is the advantage that we get because of separability property of the Fourier
transformation. The second one, the second property that we have discussed in the last class
is the translation property. So this translation property says that if we have an input image
f(x,y), input image that is f(x,y) then translate this input image by (xo,yo). So what we get is a
translated image f(x-xo, y-yo).
So if we take the Fourier transformation of this, we have found that the Fourier
transformation of this translated image which we had represented as Ft(u,v), this became
2π
-j uxo+vyo
equal to F(u,v)e N
. So if you find, so in this case the Fourier transformation of the
translated image is F(u,v), that is the Fourier transformation of the original image f(x,y)
2π
-j uxo+vyo
which is multiplied by e N
.
So if we consider, the Fourier spectrum of this particular signal, you will find that the Fourier
2π
-j uxo+vyo
spectrum that is f transpose Ft(u,v) will be same as F(u,v). Now this term e N
, this
simply introduces an additional phase shift. But the Fourier spectrum remains unchanged and
in the same manner, if the Fourier spectrum F(u,v) is translated by (uo,vo).
So instead of taking F(u,v), we take F(u- uo, v-vo) which obviously is the translated version
of F(u,v) where F(u,v) has been translated by vector (uo,vo) in the frequency domain. And if I
588
take the inverse Fourier transform of this, the inverse Fourier transform will be
2π
j uox+voy
f(x,y)e N
.
So this also can be derived in the same manner in which we have done the forward Fourier
2π
j uox+voy
transformation. So here we find, that if f(x,y) is multiplied by this exponential term e N
, then the corresponding, in the frequency domain, its Fourier transform is simply translated
by the vector (uo,vo). So you get is F(u-uo, v-vo).
2π
j uox+voy
So under this translation property, now the DFT pair becomes if we have f(x,y)e N
.
The corresponding Fourier transformation of this is F(u-uo, v-vo) and if we have the translated
2π
-j uxo+vyo
image, f(x-xo, y-yo), the corresponding Fourier transformation will be F(u,v)e N
.
So these are the Fourier transform pairs under the translation. So this f(x,y) and f(x-xo, y-yo),.
So these two expressions gives, you the Fourier transform pairs, the DFT pairs under
translation. So these are the two properties that we have discussed in the last lecture. Today
let us talk about some other properties. So the third property that we will talk about today is
the periodicity and conjugate property.
So the first one that we will discuss is the periodicity and the conjugate property. The
periodicity property says that both the discrete Fourier transform and the inverse discrete
Fourier transform that is DFT and IFT are periodic with a period N. So let us see, how this
589
periodicity can be proved. So this periodicity property says that F(u,v), this is the Fourier
transform of our signal f(x,y), this is equal to F(u+N,v) which is same as F(u,v+N) which is
same as F(u+N,v+N).
So this is what is meant by periodic. So if find that the Fourier transformation F(u,v) is
periodic both in x direction and in y direction that give rise to F(u,v) = F(u+N,v+N) which is
same as F(u+N, v) and which is also same as F(u,v+N). Now let us see how we can derive or
we can prove this particular property. So you have seen, the Fourier transformation
2π
1 N-1 N-1 -j ux+vy
expression as we have discussed many times F(u,v)
N x=0 y=0
f(x,y)e N .
Of course, we have to have the scaling factor 1/ N where both x and y vary from 0 to N-1.
Now if we try to compute F(u+N, v+N) then what do we get? Following the same expression,
2π
1 N-1 N-1 -j ux+vy+Nx+Ny
this will be nothing but F(u,v)
N x=0 y=0
f(x,y)e N
, because now u is replaced by
u+N, so you will have Nx+Ny. Here both x and y will vary from 0 to N-1.
Now this same expression, if we take out this N x+N y in a separate exponential then this will
2π
1 N-1 N-1 -j ux+vy -j2π x+y
take the form F(u,v)
N x=0 y=0
f(x,y)e N
.e . Now if you look at the second
-j2π x+y
exponential term, that is e , you find that x and y are the integer values.
-j2π j+k
So x plus y will always be integer, so this will be the exponential e . Ok? And because
this is an exponentiation of some integer multiple of 2π, so the value of this second
2π
1 N-1 N-1 -j ux+vy
exponential will always be equal to 1. So finally what we get is
N x=0 y=0
f(x,y)e N .
And you find that this is exactly the expression of F(u,v). So as we said, that the discrete
Fourier transformation is periodic with period N, N both in the u direction as well as the v
direction and that is that can be very easily be proved like this by this mathematical
derivation. We have found that F(u+ N, v+N) = F(u,v).
And the same is true in case of inverse Fourier transformation, so if we derive the inverse
Fourier transformation then we will get the similar result showing that the inverse Fourier
transformation is also periodic with period N. Now the other property that we said is the
Conjugate property.
590
(Refer Slide Time: 19:36)
The conjugate property says that if f(x,y), this function, if this is a real valued function. f(x,y)
if it is a real valued function, in that case the Fourier transformation f(u,v) will be F*(-u,-v)
where this F* indicates that it is complex conjugate. And obviously, because of this, if I take
the Fourier spectrum, |F(u,v)| = F (-u,-v). So this is what is known as the conjugate property
of the discrete Fourier transformation.
Now we find that using the periodicity property, helps to visualize the Fourier spectrum of a
given signal. So let us see how this periodicity property helps us to properly visualize the
Fourier spectrum. So for this, we will consider a 1 dimensional signal obviously, this can
very easily be extended to a 2 dimensional signal. So by this, what you mean is if we have a 1
dimensional signal say f(x) whose Fourier transform is given by F(u) then as we said, that the
periodicity property says that F(u) is equal to F(u+N).
And also the Fourier spectrum F(u) is same as F(-u). So this says that F(u) has a period of
length N and because the spectrum, F(u) is same as F(-u). So the magnitude of the Fourier
spectrum, of the Fourier transform is centered at the origin. So by this what we mean is, let us
consider a figure like this.
591
You will find that this the typical, this is a typical Fourier transform of a particular signal and
here you find that this Fourier spectrum, the Fourier transform is centered at the origin and if
you look at the frequency axis, so this is the u axis. If you look at this frequency axis, you
will find that F(-u), the magnitude of F(-u) is same as the magnitude of the F(u).
So this figure shows, that the transform values, if we look at the transform values from N/2 +
1 so that is somewhere here. This is N/2 + 1, to N-1, so that is somewhere here. So find that
the transform values, in the range N/2 + 1 to N-1, this is nothing but the transform values in
the left transform values of the half period in the left half of in the left of the origin.
So just/looking at this, the transform values from N/2 + 1 to N-1, you find that this values are
nothing but the reflections of the half period to the left of the origin, 0. But what we have
done is we have computed the Fourier transformation in the range 0 to N-1. So you will get
all the Fourier coefficients in the range 0 to N-1 so the Fourier coefficients ranging the values
of u from 0 to N-1.
And because of this conjugate property, you will find, we find that in this range 0 to N-1,
what we get is two back to back half periods of this interval. So this is nothing but two back
to back half periods. So this is one half period, this is one half period and they are placed
back to back. So to display this Fourier transformation coefficient in the proper manner, what
we have to do is we have to displace the origin by a value, N/2.
592
So by displacement, what we get is this. So here we find that in this particular case, the origin
has been shifted to N/2. So now what we are doing is instead of considering the Fourier
transformation F(u), we are considering, the Fourier transformation F(u-N/2) and for this
displacement, what we have to do is we have to multiply f(x) by (-1) x .
So every f(x) has to be multiplied by (-1) x and this result, if you take the DFT of this, then
what we get is the Fourier transformation coefficients in this particular form. And this comes
from the shifting property of the inverse Fourier transformation. So this operation we have to
do if we want to go for the proper display of the Fourier transformation coefficients. Thank
you.
593
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-27.
Rotation Invariance Property.
Hello. Welcome to the video lecture series on Digital Image Processing. Now Rotation
property. Rotation property of the discrete Fourier transformation. So to explain this rotation
property, we will introduce the polar coordinate system, that is we will now replace
x = rcosθ , y will be replaced by y = rsinθ . Ok? u will be replaced by u = ωcosφ and v will
be replaced by v = ωsinφ .
So by this, now our original 2 dimensional signal, 2 dimensional array in the plane f(x,y) gets
transformed into f(r,θ) and the Fourier transformation F(u,v), the Fourier transform
coefficients F(u,v) now gets transformed into F(ω,φ) . Now using these polar coordinates, if
we find out, compute the Fourier transformation, then it will be found that, f(r,θ+θo) , the
corresponding Fourier transformation will be given by F(ω,φ+θo) .
So this will be the Fourier transformation pair in the polar coordinate system. So this
indicates our original signal was f(r,θ) , if I rotate this f(r,θ) by an angle θo then the rotated
image becomes f(r,θ+θo) and if I take the Fourier transform of F(ω,φ+θo) that is the rotated
594
image which is now rotated by an angle θo, then the Fourier transform becomes F(ω,φ+θo) ,
where F(ω,φ+θo) was the Fourier transform of the original image f(r,θ) .
So this simply says, that if I rotate image f(x,y) by an angle say θo, its Fourier transformation
will also be rotated by the same angle θo and that is what is obvious from this particular
expression because f(r,θ+θo) gives rise to the Fourier transformation F(ω,φ+θo) , where if
omega phi was the Fourier transformation of f(r,θ) .
So by rotating an input image, by an angle theta naught, the corresponding Fourier transform
is also rotated by the same angle theta naught. So we will illustrate this, let us come to this
particular figure, so here we find that we had a rectangle, an image where we have all values,
we have, pixel values equal to 1 within a rectangle and outside this, the pixel values are equal
to 0.
And the corresponding Fourier transformation is this, so here the Fourier transformation
coefficients are or the Fourier spectrum is a represented in the form of intensity values in an
image. The second pair shows, that the same rectangle, is now rotated by an angle 450, so
here we have rotated this rectangle by an angle 450 and here you find that if you compare the
Fourier transformation of the original rectangle and the Fourier transformation of this rotated
rectangle.
595
Here also you find that the Fourier transform coefficients, they are also rotated by the same
angle of 450. So this illustrates the rotation property of the discrete Fourier transformation.
The next property that we will talk about is, what is called distributivity and scaling property.
The distributivity property says that if I take two signals, two arrays f1(x,y) and f2(x,y). So
these are two arrays. Take the summation of these two arrays f1(x,y) and f2(x,y) and then you
find out the Fourier transformation of this particular result that is f1(x,y) + f2(x,y) and take the
Fourier transform of this. Now this Fourier transformation will be same as the Fourier
transformation of f1(x,y) + Fourier transformation of f2(x,y).
So this is true under addition that is, for these two signals f1(x,y) and f2(x,y), if I take the
addition, if I take the summation and then take the Fourier transformation. The Fourier
transformation of this will be the summation of the Fourier transformation of individual
signals f1(x,y) and f2(x,y). But if I take the multiplication, that is if I take f1(x,y).f2(x,y) and
take the Fourier transformation of this product, this in general is not equal to F1(x,y).F2(x,y).
So this shows, that the discrete Fourier transformation and same is true for the inverse Fourier
transformation; so this shows that the discrete Fourier transformation and its inverse is
distributive over addition but the discrete Fourier transformation and its inverse is in general
not distributive over multiplication. So that distributivity property is valid for addition of the
two signals but it is not in general valid for multiplication of two signals.
596
So the next property of the same discrete Fourier transform that we will talk about is the
scaling property. The scaling property says that if we have two scalar quantities a and b. Now
given a signal f(x,y) multiply this by the scalar quantity a, its corresponding Fourier
transformation will be F(u,v) multiplied by the same scalar quantity a and the inverse is also
true.
So if I multiply a signal by a scalar quantity a and take its Fourier transformation then you
will find that Fourier transformation of this multiplied signal is nothing but the Fourier
transformation of the original signal multiplied by the same scalar quantity and the same is
true for the reverse that is also for inverse Fourier transformation.
And the second one is, if I take f(ax, by) that is now you scale the individual dimensions. x is
scaled by the scalar quantity a, the dimension y is scaled by the scalar quantity b. The
1 u v
corresponding Fourier transformation will be F , and this is the reverse. So these
ab a b
Now we can also compute, the average value of the signal f(x,y). Now the average value for
1 N-1 N-1
f(x,y) is given by, if I represent it like this, this is nothing but f(x,y) . So this is what
N x=0 y=0
is the average value of the signal f(x,y). Now we find that for the Fourier coefficient, the
transform coefficients F(0,0).
597
1 N-1 N-1
What is this coefficient? This is nothing but f(x,y) because all the exponential terms
N x=0 y=0
will lead to a value 1 and this summation has to be taken for x and y varying from 0 to N - 1.
So you find that there is a direct relation, between the average of the 2 dimensional signal,
f(x,y) and its zeroth Fourier coefficient, DFT coefficient.
1
So this clearly shows that the average value, f(x,y), the average value is nothing but F(0, 0)
N
and this is nothing but because here the frequency u equal to 0, frequency v equal to 0. So
this is nothing but the DC component of the signal. So the DC component divided by N, that
gives you the average value of the particular signal.
The next property, this we have already discussed in one of our earlier lectures when we have
discussed about sampling and quantization, that is the convolution property. In case of
convolution property you have said that if we have say two signals f(x), multiply this with the
signal g(x). Then the Fourier transform in the frequency domain, this is equivalent to
F(u)*G(u) .
Similarly, if I take the convolution of two signals f(x) and g(x), the corresponding Fourier
transformation in the Fourier domain, it will be the multiplication of a F(u) and G(u). So the
convolution of two signals, in the spatial domain is equivalent to multiplications of the
Fourier transformations of the same signals in the frequency domain. On the other hand,
598
multiplication of two signals in the spatial domain is equivalent to convolution of the Fourier
transforms of the same signals in the frequency domain.
So this is what is known as the convolution property. The other one is called the correlation
property. The correlation property says that if we have two signals f(x,y) and g(x,y), so now
we are taking 2 dimensional signals and if I take the correlation of these two signals f(x,y)
and g(x,y), in the frequency domain this will be equivalent to the multiplication
F*(u,v).G(u.v), where this * indicates the complex conjugate.
And similarly, if I take the multiplication in the spatial domain, that is f*(x,y). g(x,y), in the
frequency domain, this will be equivalent to F(u,v) G(u,v). So these are the two properties,
which are known as the convolution property and the correlation property of the Fourier
transformations. So with this we have discussed, the various properties of the discrete Fourier
transformation.
Now let us see an implementation of the Fourier transformation because if you look at the
expression of Fourier transformation, the expression we have told many times. This is
2π
1 N-1 N-1 -j ux+vy
F(u,v)
N x=0 y=0
f(x,y)e N
. So if I compute, if I analyze this particular expression,
which we have done earlier also in relation with unitary transformation, you will find that this
takes N4 number of computations.
599
2π
1 N-1 -j ux
In case of 1 dimensional signal, F(u) =
N x=0
f(x)e N
and you have to scale it by 1/ N. This
So we find that a computational complexity of N2 for a data set of size capital N is quite high.
So for implementation, we have discussed earlier that if our transformations are separable in
that case we can go for fast implementation of the transformations. Let us see how that fast
implementation can be done in case of this discrete Fourier transformation. So because of the
separability property, we can implement this discrete transformation in a faster way.
2π
1 N-1 -j ux
So for that what I do is, let us represent this particular expression F(u) =
N x=0
f(x)e N
F(u).
1 N-1
We represent this expression in the form
N x=0
f(x)WNux . Now here this WN is nothing but
2π
-j
e N
.
So we have simply introduced this term for simplification of our expressions. Now if I
assume, which generally is the case that the number of samples N is of the form say 2N. So if
I assume that number of samples is of this form, then this N can be represented as 2M. And
let us see that how this particular assumption helps us.
600
1 2M-1
And with this assumption now we can represent rewrite F(u) as F(u) f(x).W2Mux
2M x=0
because N=2M,. The same expression, I can rewrite as
1 1 M-1 1 M-1 u(2x+1)
2 M x=0
f(2x).W u(2x)
2M +
M x=0
f(2x+1).W2M . Here also x varies from 0 to M - 1.
Now by this you see that what we have done. f(2x) as x varies from 0 to M - 1, this gives us
only the even samples of our input sequence. Similarly f(2x + 1) as x varies from 0 to M - 1
this gives us only the odd samples of the input sequence. So we have simply separated out the
even samples from the odd samples.
And if I further simplify this particular expression this expression can now be written in the
1 1 M-1 1 M-1
form
2 M x=0
f(2x).W u(2x)
2M +
M x=0
u(2x)
f(2x+1).W2M W2M . So after simplification, after some
simplification, the same expression can be written in this particular form.
Now if you analyze this particular expression, you find that the first summation, this one
gives you the Fourier transform of all the even samples, so this gives you Feven(u) and this
quantity in the second summation, this gives you the Fourier transformation of all the odd
samples, so I will write it as Fodd(u). And in this particular case u varies from 0 to M - 1.
Ok? So by separating the even samples and odd samples, I can compute that the Fourier
transformation of the even samples to give me Feven(u), I can compute the Fourier
transformation of the odd samples to give me Fodd(u) and then I can combine these two to
give me the Fourier DFT coefficients of values from 0 to M - 1. Ok.
601
(Refer Slide Time: 22:34)
Now following some more property, so effectively what we have got is,
1
F(u) = Feven u + Fodd u W2M
u
. Now, we can also show, that WMu+M WMu . This can be
2
u+M
derived from the definition of WM and also we can find out that W2M W2M
u
.
1
So this tell us that F(u+M) = Feven u - Fodd u W2M
u
. So here again, u varies from 0 to
2
M–1, that means this gives us the coefficient from M to 2M-1. So I get back all the
coefficients. The first part, this part gives us the coefficient from 0 to M-1 and this half gives
us the coefficients from M to 2M-1.
Now what is the advantage that you have got? In our original formulation, we have seen that
the number of complex multiplications and additions were of the order of N2. Now we have
divided the N number of samples into two halves. For each of them, for each of the halves,
when I compute the discrete Fourier transformation, the amount of computation will be N2/4
for each of the halves.
And the total amount of computation will be of order N2/2 taking 2 halves, considering 2
halves separately. So straightway we have got a reduction in the computation by a factor of 2.
So it is further possible that this odd 1/2 of the samples and the even 1/2 of the samples that
we have got, we can further subdivide it. So from N/2, we can go to N/4, from N/4 you can
go to N/8 number of samples.
602
From N/8 we can go to N/16 number of samples and so on, until we are left with only two
samples. So if I go further, breaking the sequence of samples into smaller sizes, compute the
DFTs of each of those smaller size samples and then combine them together, then you will
find that we can gain enormously in terms of amount of computation.
And it can be shown that for this first Fourier transform implementation, the total number of
computation is given by N log2 N, where log is taken with base 2. So this gives enormous
amount of gain in computation as against N2 number of computations that is needed for direct
implementation of discrete Fourier transformation. Thank You.
603
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-28.
DCT and Walsh Transform.
Hello. Welcome to the video lecture series on Digital Image Processing. We will talk about
the Discrete Cosine Transform, we will talk about the Discrete Walsh Transform, we will talk
about the Discrete Hadamard Transform and we will also see some properties of these
different transformation techniques.
Now for during the last two classes, when we have talked about the Discrete Fourier
Transformation, you might have noticed one thing, that this Discrete Fourier Transformation
is nothing but a special case of a class of transformations or a class of separable
transformations. Some of these discussions, we have done while we have talked about the
unitary transformations.
Now before we start our discussion on the Discrete Cosine Transformation or Walsh
Transformation or Hadamard Transform, let us have some more insight on this class of
transformations. Now as we said, that Discrete Fourier Transformation is actually a special
604
case of a class of transformations.
Let us see what is that class of Transformation. You find that if we define a transformation of
N-1 N-1
this form, say T(u,v)= f(x,y).g(x,y,u,v) . where both x and y vary from 0 to N-1. So we
x=0 y=0
are assuming that our 2 dimensional signal f(x,y) is an NxN array, NxN array.
N-1 N-1
And the corresponding inverse transformation is given by f(x,y) = T(u,v).h(x,y,u,v) ,
u=0 v=0
where this g(x,y,u,v), this is called the forward transformation kernel and h(x,y,u,v) is called
the inverse transformation kernel of the basis functions. Now these transformations, this class
of transformation will be separable if we can write g(x,y,u,v) in the form
g x,y,u,v = g1 x,u .g2 y,v .
So if g(x,y,u,v) can be written in the form g1 x,u .g2 y,v , then this transformation will be a
separable transformation. Moreover if g1 x,u and g2 y,v , these are functionally same, that
means if I can write this as g1 x,u .g1 y,v , that is I am assuming g1 x,u and g2 y,v to be
functionally same. So in that case, this class of transformations will be separable obviously
because g(x,y,u,v), we have written as product of two functions g1 x,u .g2 y,v .
605
And since g 1 x, u and g2 y,v , so this function g1 x,u and g2 y,v , they are functionally
same. So this I can write as g1 x,u and g1 y,v . And in this case, the function will be called
Now find that for a 2 dimensional Discrete Fourier Transformation, we had g(x,y,u,v), which
1 -j 2πN
was of this form g(x,y,u,v)= e (ux + vy) . So this was the forward transformation kernel
N
in case of 2 dimensional Discrete Fourier Transform or 2D DFT.
Obviously this transformation is separable as well as symmetric because I can now write this
1 -j 2πN ux 1 -j 2πN vy
g x,y,u,v = g1 x,u .g1 y,v , which is nothing but e . e .
N N
So you find that the first product g 1 x, u and the second term that is g1 y,v . They are
functionally the same but only the arguments, x in one case it is ux and in the other case it is
vy. So obviously, this 2 dimensional Discrete Fourier Transformation is separable as well as
symmetric. So as we said that this represents a specific case of the 2 dimensional Discrete
Fourier Transformation represents a specific case of a class of transformation and we had also
discussed it, discussed the same when we have talked about the unitary transformation.
606
In today’s lecture, we will talk about some other transformations belonging to the same class.
The first transformation belonging to this class that we will talk about is called the Discrete
Cosine Transformation or DCT. Let us see what are the forward as well as inverse transform
kernels of this Discrete Cosine Transformation. So now let us talk about the discrete Cosine
Transformation or DCT.
In case of Discrete Cosine Transformation, the forward kernel, the forward transformation
2x+1 uπ 2y+1 vπ
kernel g(x,y,u,v) is given by g x,y,u,v = α(u).α(v)cos .cos , which
2N 2N
is same as the Inverse Transformation kernel which is given by h(x,y,u,v). So you find that in
case of Discrete Cosine Transformation, if you analyze this, you will find that both the
forward transformation kernel and also the inverse transformation kernel, they are identical.
And not only that, these transformations transformation kernels are separable as well as
2y+1 vπ
symmetric because in this I can have g1 v,y = α(v)cos and
2N
2x+1 uπ
g1 x,u = α(u)cos . So this transformation the Cosine Transformation is
2N
separable as well as symmetric.
And the Inverse Transformation kernel and the Forward Transformation kernel, they are
identical. Now we have to see what are the values of α(u) and α(v). here α(u) is given by
607
1 2
α(u) = , where u = 0 and α(u) = for values of u equal to 1, 2 to N-1. So these are
N N
the values of α(u) for different values of u and similar is the values of α(v) for different
values of v.
Now using this forward and inverse Transformation kernels, let us see how the basis
functions or the basis images look like in case of Discrete Cosine Transform. So this figure
shows the 2 dimensional basis images or basis functions in case of Discrete Cosine
Transformation where we have shown the basis images for an 8x8 Discrete Cosine
Transformation or 8x8, 2 dimensional Discrete Cosine Transformation.
608
Now using these kernels, now we can write the expressions for the 2 dimensional Discrete
Cosine Transformation in the form of
N-1 N-1
2x+1 uπ 2y+1 vπ
C(u,v)=α(u).α(v) f(x,y)cos .cos .
x=0 y=0 2N 2N
Now you find that there is one difference. In case of Forward Discrete Cosine
Transformation, the terms α(u) and α(v) were kept outside the summation, double
summation, whereas in case of inverse Discrete Cosine Transformation, the terms α(u) and
α(v) are kept inside the double summation.
The reason being in case of Forward Transformation because the summation is taken over x
and y varying from 0 to N - 1. So α(u) and α(v), these terms are independent of these
summation operation. Whereas in case of Inverse Discrete Cosine Transformation, the double
summation is taken over u and v varying from 0 to N - 1, so these terms α(u) and α(v) were
kept or are kept inside the double summation operation.
609
So using this Discrete Cosine Transformation, 2 dimensional Discrete Cosine Transformation
Let us see that for a given image what kind of output we get. So this shows, this figure shows
the Discrete Cosine Transformation coefficients for the same image which is very popular in
image processing community, the image of Lena.
The results are shown in two forms, the first figure, this is the coefficients which are shown
in the form of intensity plots in the form of a 2 dimensional array. Whereas the third figure
shows the same coefficients which are plotted in the form of a 3 dimensional in the form of a
surface in 3 dimension. Now if you closely look at these output coefficients, you find that in
case of Discrete Cosine Transformation, the energy of the coefficients is concentrated mostly
in a particular region where the coefficients are near the origin, that is (0,0), which is more
visible in the case of a 3 dimensional plot.
So you find that here, in this particular case, the energy is concentrated in a small region in
the coefficient space near about the (0,0) coefficients. So this is a very, very important
property of the Discrete Cosine Transformation which is called energy compaction property.
Now among the other properties of Discrete Cosine Transformation which is obviously
similar to the Discrete Fourier Transformation as we have said that the Discrete Cosine
Transformation is separable as well as symmetric. It is also possible to have a faster
implementation of Discrete Cosine Transformation or FDCT in the same manner as we have
implemented FFT in case of Discrete Fourier Transformation.
The other important property of the Discrete Cosine Transformation is the periodicity
property. Now in case of Discrete Cosine Transformation, you will find that the periodicity is
not same as in case of Discrete Fourier Transformation. In case of Fourier Transformation,
we have said that the Discrete Fourier Transform is periodic with period N, where N is the
number of samples.
In case of Discrete Cosine Transformation, the magnitude of the coefficients is periodic with
a period 2N, where N is the number of samples. So the periodicity in case of Discrete Cosine
Transformation is twice of the period in case of Discrete Cosine Transformation is twice of
the period in case of Discrete Fourier Transformation. And we will see later that this
particular property helps to obtain data compression and a smoother data compression using
the Discrete Cosine Transformation and not using the Discrete Fourier Transformation.
610
The other property which obviously helps the data compression using Discrete Cosine
Transformation is the energy compaction property because most of the signal energy or
image energy is concentrated in a very few number of coefficients near the origin or near the
(0,0) value in the frequency domain in the uv-plane.
So by coding few number of coefficients we can represent or we can represent most of the
energy most of the signal energy or most of the image energy, so that also helps in the data
compression using Discrete Cosine Transformation, a property which is not normally found
in case of Discrete Fourier Transformation.
So after discussing about all these different properties of the Discrete Cosine Transformation,
let us go to the other transformation which we have said as Walsh Transformation.so now let
us discuss about the Walsh Transform. In case of 1D, the Discrete Walsh Transform kernels
1 n-1
are given by g(x,u) = ( 1) bi(x).bn-1-i(u) .
N i=0
So you find in this particular case that N gives you the number of samples and the lowercase
n is the number of bits needed to represent x as well as u. So, N is the number of samples and
the lowercase n is the number of bits needed to represent both x and u and in this case, the
1 n-1
Forward Transformation kernel is given by g(x,u) = ( 1) bi(x).bn-1-i(u) .
N i=0
611
(Refer Slide Time: 22:46)
Now in this particular case, the convention is say if I represent bk(z), bk(z)represents the kth
bit in the digital representation of z, digital or binary representation of z. So that is the
interpretation of bi(x). So using this, the Forward Discrete Walsh Transformation will be
1 N-1 n-1
given by W(u) =
N x=0
f(x) (1)bi(x).bn-1-i(u) .
i=0
The inverse Transformation kernel in case of this Discrete Walsh Transformation is identical
with the forward transformation kernel. So h(x,u), the inverse Transformation Kernel is same
n-1
as h(x,u) = (1) bi(x).bn-1-i(u) . And using this inverse transformation kernel, we can get the
i=0
N-1 n-1
inverse Walsh transformation as f(x) = W(u) (1)bi(x).bn-1-i(u) .
u=0 i=0
So this is the inverse kernel and this is the inverse transformation. So here you find that the
discrete Walsh Transformation both the forward transformation and the inverse
transformation, they are identical only thing is the difference of the multiplicative factor 1/N.
But otherwise because the transformations are identical, so the algorithm used to perform the
forward transformation, the same algorithm can also be used to perform the Inverse Walsh
transformation.
612
(Refer Slide Time: 25:19)
Now in case of 2 dimensional signal. So in case of 2 dimensional signal, we will have the
1 n-1
transformation kernel as g(x,y,u,v) which is equal to g(x,y,u,v) = (1) bi(x).bn-1-i(u) + bi(y).bn-1-i(v) .
N i=0
And the Inverse Transformation kernel in this case is identical with the forward
transformation kernels so the inverse transformation kernel is given by
1 n-1
h(x,y,u,v) = ( 1) bi(x).bn-1-i(u) + bi(y).bn-1-i(v) .
N i=0
613
So using this Forward Transformation kernel and the inverse transformation kernel, now we
find that the inverse as well as the forward Discrete Walsh Transformation can now be
1 N-1 N 1 n-1
implemented as W(u,y) =
N x=0 y 0
f(x,y)
i=0
(1) bi(x).bn-1-i(u) +bi(y).bn-1-i(y) .
And the summation has to be taken over x and y varying from 0 to N-1. And in the same
manner because the forward transformation as well as the inverse transformation they are
identical in case of discrete Walsh Transformation. The same expression if I replace f(x,y) by
W(u,v) and the summation is taken over u,v varying from 0 to N-1.
What I get? It is the inverse Walsh transformation and I get back the original signal f(x,y)
from the Transformation coefficients W(u,v). So you find that here the same algorithm which
is used for computing the Forward Walsh Transformation can also be used for computing the
inverse Walsh Transformation. So now let us see that what are the basis functions of this
Walsh Transformation? And what are the results on some image?
614
So for Walsh transformation, the basis function appears like this or the set of basis images
appear like this. Here the basis images are given for a 4x 4 2D Walsh Transformation and if I
apply this Walsh Transformation on the same image say Lena, you find that this is the kind of
result that we get.
So here again, you find that the property of this Cosine Transformations that is the
coefficients near 0, they are having the maximum energy and as you go away from the origin
in in the uv-plane, the energy of the coefficients reduces. So this transformation also has the
energy compaction property but here you find that the energy compaction property is not as
strong as in case of the discrete Cosine Transformation.
So here, the coefficient energies which is mostly concentrated in this particular region is not
that strong as the compaction of energy in case of discrete cosine transformation. And by
analyzing this forward as well as inverse transformation Walsh Transform kernels, Walsh
Transform Kernels. You can again find out this that this Walsh Transformation is separable
as well as symmetric.
Not only that for this Walsh Transformation, it is also possible to have a fast implementation
of 2D Walsh transformation almost in the same manner as we have done in case of the
Discrete Fourier Transformation, where we have computed the first Fourier Transform of
FFT.
615
(Refer Slide Time: 31:14)
And that is also true in case of Discrete Cosine Transformation. So first you perform 1
dimensional Walsh Transformation along the rows of the image and then the intermediate
result that you get on that you perform 1 dimensional Walsh Transformation along the
columns of the intermediate matrix. So you get the final Transformation coefficients. The
same is also true in case of Discrete Cosine Transformations.
Because the Discrete Cosine Transformation is also separable. So to illustrate the faster
implementation of the Walsh Transformation, I take the 1 dimensional case, so here the fast
1
implementation can be done in this form. I can write, W(u) = Weven(u) + Wodd(u) .
2
So you find and in this case u varies from 0 to N/2 - 1 and M = N/2. So you find that almost
in the same manner in which we have implemented the first Fourier Transformation. The
Discrete 2 dimensional or Discrete Walsh Transformation Fast Discrete Walsh
Transformation can also be implemented in the same manner.
616
Here we divide all the samples of which the Walsh Transformation has to be taken into even
numbered samples and odd numbered samples. Compute the Walsh Transformation of the
even numbered samples. Compute the Walsh Transform of the odd numbered samples, then
combine these two intermediate results to give you the Walsh Transformation of the total
number of samples.
And because this division can be recursive, so first I have N number of samples, I divide
them into N/2 odd samples and N/2 even samples. Even and odd samples can be divided into
4 number of odd samples and even samples.
And if I continue this and finally I come to a stage where I am left with only 2 samples, I
perform the Walsh Transformation of those two samples, then hierarchically combine those
intermediate results to get the final Walsh Transformation. So here again by using this fast
implementation of the Walsh Transformation, you may find that the computational
complexity will be reduced drastically. Thank you.
617
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-29.
Hadamard Transformation.
Hello, welcome to the video lecture series on Digital Image Processing. Next transformation
which is called the Hadamard transform. So the next transformation that we discuss is
Hadamard transformation. In case of Hadamard transform, first let us consider the case in 1
N-1
1 bi(x).bi(u)
dimension. The forward transformation kernel is given by g(x,u) = (1) i=0 .
N
So again the N as well as n, they have the same interpretation as in the case of Walsh
transformation. And using this forward transformation kernel, the forward Walsh
transformation, forward Hadamard transformation can be obtained as
N-1
1 N-1 bi(x).bi(u)
H(u) = f(x)(1) i=0 .
N x=0
And for Hadamard transform also the forward transformation as well as the inverse
transformation, they are identical. That is the forward transformation kernel and the inverse
transformation kernel, they are identical. So here again, the same algorithm can be used for
forward transformation as well as the inverse transformation.
618
(Refer Slide Time: 2:34)
N-1
1 bi(x).bi(u)
So here, the inverse transformation kernel is given by h(x,u) = (1) i=0 . And using
N
N-1
N-1 bi(x).bi(u)
this, the inverse Hadamard transformation is obtained as f(x) = W(u)(1) i=0 . So these
u=0
Obviously this can be easily extended into 2 dimension as in the other cases, where the 2
dimensional forward and inverse transformation will be given by
619
N-1
1 bi(x).bi(u) + bi(v).bi(y)
g(x,y,u,v) = (1) i=0 . And similarly, the inverse transformation kernel, is also
N
N-1
1 bi(x).bi(u) + bi(v).bi(y)
given by h(x,y,u,v) which is same as g(x,y,u,v) that is (1) i=0 .
N
So we find that the forward transformation kernel and the inverse transformation kernel in
case of 2 dimensional discrete Hadamard transformation are identical, so that gives us the
forward transformation and the inverse transformation for the 2 dimensional discrete
Hadamard transformation to be same which enables us to use the same algorithm or same
program to compute the forward transformation as well as the inverse transformation.
And if you analyze this, you will find that this Hadamard transformations are also separable
and symmetric. That means, in the same manner, this 2 dimensional Hadamard
transformation can be implemented by using a sequence of 1 dimensional Hadamard
transformation so for the image. first we implement 1 dimensional Hadamard transformation
over the rows of the image and then implement the 1 dimensional Hadamard transformation
over the columns of this intermediate matrix. And that gives you the final Hadamard
transformation output.
620
Now if I further analyze, the kernels of this Hadamard transformation and because we have
said that 2 dimensional Hadamard transformation can now be implemented in the form of a
sequence of 1 dimensional Hadamard transformations. So we analyze further with respect to
an 1 dimensional Hadamard transformation. So as we, so as we have seen that 1 dimensional
N-1
1 bi(x).bi(u)
Hadamard transformation is given by g(x,u) = (1) i=0 .
N
And let me mention here that all these additions, that we are doing, these summations follow
modulo 2 arithmetic that means these summations are actually nothing but ORing operation
of different bits. Now if I analyze this one dimensional forward Hadamard transformation
kernel, you will find that if I omit the multiplicative term, 1/N.
So if I omit from this, this multiplicative term, 1/N, then this forward transformation actually
forms, leads to a matrix which is known as Hadamard Matrix.
621
(Refer Slide Time: 7:58)
So to see that what is this Hadamard matrix, you find that for different values of x and u, the
Hadamard matrix will look like this. So over here we have shown an Hadamard matrix for
Hadamard transformation of dimension 8.
So for N = 8, this Hadamard matrix has been formed and here, ‘+’ means it equals to +1 and
‘-’ means it is equal to -1. Now if you analyze this particular Hadamard matrix, you will find
that it is possible to generate a recursive relation. It is possible to formulate a recursive
relation to generate the transformation matrices. Now how that can be done?
You will find that these particular parts, if I consider say these 4x4, these 4 by four elements
these 4x4 elements, they are identical. Whereas these 4x4 elements of this matrix is just
negative of this. And the same is followed, the same pattern can be observed in all other parts
of this matrix. So by observing this, now we can formulate a recursive relation to generate
these transformation matrices.
622
(Refer Slide Time: 9:42)
So to have that recursive relation, let us first have a Hadamard matrix of the lowest order, that
is for value of N = 2. And for this lowest order, we have a Hadamard matrix H2, which is
1 1
nothing but . And then using this recursively a Hadamard matrix of dimension 2N
1 1
can be obtained from a Hadamard matrix of dimension N, which is given by the relation
HN HN
H2N = .
HN H-N
So Hadamard matrix of higher dimension can recursively formed from a Hadamard matrix of
lower dimension. So this is a very, very important property of the Hadamard transformation.
Now if we analyze, the Hadamard matrix further, let us see, suppose I want to analyze, I want
to analyze this particular Hadamard matrix. Here you find, that if I consider the number of
sign changes along a particular column.
623
(Refer Slide Time: 10:58)
So you will find that the number of sign changes along column number of 0 is equal to 0.
Number of sign changes, along column number 1 is equal to 7. Number of sign changes along
column number 3 is equal to 3. Along column number, three the number of sign changes is
equal to 4. Along column 4 it is equal to 1. Along column 5, it is equal to 6. Along column 6
it is equal to 2. Along column 7 it is equal to 5.
So if I define the number of sign changes along a particular column as the sequency which is
similar to the concept of frequency in case of discrete Fourier transformation or in case of
discrete cosine transformation. So in case of Hadamard matrix, we are defining the number of
sign changes along a particular column, for a particular value of u as the sequency. So here
we find, that for value of u = 0, the sequency is equal to 0, u = 1, the sequency equal to 7, u =
2, the sequency equal to 3. So there is no straightforward relation between the value of u and
the corresponding sequency unlike in case of discrete Fourier transform or in case of discrete
cosine transform, where we have seen that increasing values of the frequency variable u
corresponds to increasing values of the frequency components.
624
(Refer Slide Time: 13:10)
So if we want to have similar types of concepts in case of Hadamard transformation also then
what we need is, we need some sort of reordering of this Hadamard matrix. And that kind of
reordering can be obtained by another transformation where that particular transformation,
N-1
1 bi(x).pi(u)
the kernels of that particular transformation will be given by g(x,u) = (1) i=0 .
N
And this particular term pi(u) can be obtained from bi(u) using these relations. p0(u) will be
given by bn-1(u) . p1(u) will be given by bn-1(u) + bn-2(u) . p2(u) will be given by
bn-2(u) + bn-3(u) and continuing like this, pn-1(u) will be given by b1(u) + b0(u) . All these
summations are again modulo-2 summations, that is they can also be explained, they can be
implemented using the binary OR operations.
625
(Refer Slide Time: 15:03)
Now by using this modification, again this particular forward transformation kernel that you
get, the modified forward transformation kernel that you get that leads to a modified
Hadamard matrix. So let us see what is this, modified Hadamard matrix. So the modified
Hadamard matrix that you get is of this particular form. And if you look at to this particular
modified Hadamard matrix, you will find that here sequency for u = 0 is again equal to 0.
Sequency, for u = 1 is equal to 1. Sequency, for u = 2 is equal to 2 and now for increasing
values of u, we have increasing values of sequency. So using this modified or ordered
Hadamard matrix, forward kernel in case of 2 dimension, the ordered Hadamard basis
functions are obtained in this particular form.
626
(Refer Slide Time: 15:53)
And using this ordered Hadamard basis functions, if I compare this with the Walsh basis
functions, you will find that basis functions or basis images in case of Walsh transformation
and the basis images in case of ordered Hadamard transformation. The basis functions are
identical but there is a difference of ordering of the Walsh basis functions and the ordered
Hadamard basis functions.
Otherwise, the basis functions for Walsh transformation and the ordered Hadamard
transformation, they are identical and because of this in many cases, a term which is used is
called Walsh Hadamard transformation. And this term Walsh Hadamard transformation is
627
actually used to mean either Walsh transformation or Hadamard transformation. So this uses,
this means one of the two.
Now using this ordered Hadamard transformation, the results on the same image that you get
is something like this. So here you find, again if I look at the energy distribution of different
Hadamard coefficients. The ordered Hadamard coefficients, you will find that here the energy
is concentrated more towards zero compared to the Walsh transformation. So the energy
compaction property of the, of the ordered Hadamard transformation is more than the energy
compaction property of Walsh transformation.
628
Now this slide shows, the comparison of the transformation coefficients of these different
transformations. The first one shows the coefficient matrix for the Discrete Fourier
Transformation. The second one shows the matrix for the Discrete Cosine Transformation.
Third one is for Discrete Walsh Transformation and fourth one is for Discrete Hadamard
Transformation. It is ordered Hadamard Transformation.
Now by comparing all these four different results, you find that in case of Discrete Cosine
Transformation, the Discrete Cosine Transformation has the property of strongly
concentrating the energy in very few number of coefficients. So the energy compaction
property of Discrete Cosine Transformation is much more compared to the other
transformations.
And that is why this Discrete Cosine Transformation is very popular for the data compression
operations unlike the other cases. And in case of Discrete Fourier Transformation and
Discrete Cosine Transformation, though we can associate, the frequency term with the
transformation coefficients, it is not possible to have such a physical interpretation of the
coefficients of the Discrete Walsh Transformation nor in case of Discrete Hadamard
Transformation.
So though we cannot have such a kind of physical interpretation but still because of this
energy compaction property, the Hadamard Transform as well as the Walsh transform can
have some application in data compression operations. So with this we come to our end of
our discussion on the Discrete Cosine Transformation, Discrete Walsh Transformation and
Discrete Hadamard Transformation and we have also seen some comparison with the
Discrete Fourier Transformation. Thank you.
629
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-30.
K-L Transform.
On Digital Image Processing. We will talk about another transform operation which is
fundamentally different from the transformations that we have discussed in last few classes.
So the transformation that we will discuss about, today is called K-L transformation. We will
see what is the fundamental difference between K-L Transform and other transformations.
We will see the properties of K-L Transform. We will see the applications of K-L Transform
for data alignment and data compression operations.
And we will also see the computation of K-L Transform for an image. Now as we said that
K-L Transform is fundamentally different from other transformations. So before we start
discussion on K-L Transform, let us see what is the difference. The basic difference, is in all
the previous transformations that we have discussed, that is whether it is the Fourier
Transformation or Discrete Cosine Transformation or Walsh transformation or Hadamard
Transformation.
630
(Refer Slide Time: 1:43)
In all these cases, the transformation kernel whether it is forward transformation kernel or
inverse transformation kernel, they are fixed. So for example in case of Discrete Fourier
Transformation or DFT, we have seen that the transformation kernel is given by
2π
-j ux
g(x,u) = e N
.
Similarly, for the Discrete Cosine Transformation as well as for other transformations like
Walsh Transform or Hadamard Transform, in all those cases the transformation kernels are
fixed. The values of the transformation kernel depend upon the locations x and the location u.
The kernels are independent of the data over which the transformation has to be performed.
But unlike these transformations, in case of K-L Transformation, the transformation kernel is
actually derived from the data. So in case of K-L Transform, it actually operates on the basis
of statistical properties of vectored representation of the data. So let us see, how these
transformations are obtained? So to go for K-L Transformation, our requirement is the data
has to be represented in the form of vectors.
So let us assume, a population of vectors say X which are given like this. So we consider a
are actually vectors of dimension n. Now given such a set of vectors, population of vectors X,
we can find out the mean vector given by μx , which is nothing but the expectation value of
these vector population X.
631
And similarly, we can also find out the covariance matrix CX, which is given by the
particular covariance matrix will be of dimension nxn. So this is the dimensionality of the
covariance matrix CX and obviously the dimensionality of the mean vector μx will be equal
to n.
Now in this covariance matrix, CX, you will find that an element Cii, that is the element in the
ith row and the ith column is nothing but the variance of the element xi of the vectors x.
Similarly, an element say Cij, this is nothing but the covariance of the elements xi and xj of
the vectors x. And you find that this particular covariance matrix CX, it is real and symmetric.
So because this covariance matrix is real and symmetric, we can always find a set of n
orthonormal eigenvectors. So because, this covariance matrix CX is real and symmetric, we
can find out a set of orthonormal eigenvectors of this covariance matrix CX. Now if we
assume that suppose ei is an eigenvector of this covariance matrix CX which corresponds to
the eigenvalue λi.
So corresponding to the eigenvalue λi, we have the eigenvector say ei and we assume that
these eigenvalues are arranged in descending order of magnitude of the eigenvalues. That is,
we assume that λj is greater than or equal to λj+1, for j varying from 1, 2 up to n-1. So what we
are taking? We are taking the eigenvalues of the covariance matrix CX and we are taking the
eigenvectors corresponding to every eigenvalue.
632
So corresponding to the eigenvalue λi we have this eigenvector ei and we also assume that
these eigenvalues are arranged in descending order of magnitude that is λj is greater than or
equal to λj+1 for j varying from 1 to n-1. Now from this set of eigenvectors, we form a matrix
say A. So you form matrix A, from this set of eigenvectors and this matrix A is formed in
such a way that the first row of matrix A is the eigenvector corresponding to the largest
eigenvalue.
And similarly the last row of this matrix A, corresponds to the eigenvector, is the eigenvector
which corresponds to the smallest eigenvalue of the covariance matrix CX. Now if we use
such a matrix A to obtain the transform operations, then what we get is, we get a
transformation of the form say y = A(x - μx) . So using this matrix A which has been so
formed, we form a transformation like y = A(x - μx) , where you find that x is a vector and μX
is the mean vector.
Now this particular transformation, the transformed output Y that you get that follows certain
important relationship. The first relationship, the important property is that the mean of these
vectors y or μY = 0. So these are the properties of the vector Y that is obtained. So the first
property is the mean of Y, mean of vectors Y, μY = 0.
Similarly, the covariance matrix of Y given by CY, this is also obtained from CX, the
covariance matrix of X and the transformation matrix that we have generated A. And the
relationship between the covariance matrix of Y is like this that CY = ACXA T . Not only that,
633
this covariance matrix CY is a diagonal matrix whose elements along the main diagonal are
the eigenvalues of CX.
λ1 0 0 ... 0
0 λ2 0 ... 0
So this CY will be of the form CY = 0 0 λ3 ... 0 . So this is the covariance matrix of
... ... ... ... ...
0 0 0 ... λn
Y that is CY. And obviously, in this particular case, you find that the eigenvalues of CY is
same as the eigenvalues of CX, which is nothing but λ1, λ2, up to λn.
And it is also a fact that the eigenvectors of CY will also be same as the eigenvector of CX.
And since in this case, we find that the off diagonal elements are always 0, that means the
elements of Y vectors, they are uncorrelated. So the property of the vectors Y that we have
got is the mean of the vectors equal to 0. We can obtain the covariance matrix CY from the
covariance matrix CX and the transformation matrices A.
The eigenvalues of CY are same as the eigenvalues of CX and also as the off diagonal
elements of CY are equal to 0, that indicates that the elements of the vectors Y, different
elements of the vector Y are uncorrelated. Now let us see what is the implication of this. To
see the implication of these observations, let us come to the following figure. So in this figure
we have, a binary image, 2 dimensional binary image.
634
Here we assume, that all the pixel locations which are white, there an object is present an
object element is present and wherever the pixel value is 0, there is no object element present.
So in this particular case, the object region consists of the pixels say
3 4 4 4 5 5 5 6
X = , , , , , , , . So these are the pixel locations which
4 3 4 5 4 5 6 5
contains the objects and other pixel locations does not contain the object.
Now what we plan to do is, we will find out the K-L Transform of those pixel locations
where an object is present. So from this, we have the population of eigenvectors which is
given by this. Just reconsider the locations of the pixels where an object is present that is the
pixel is equal to white. And those locations are considered as vectors and so the population of
3 3
vectors X is given by we have because in location we have an object present.
4 4
4 4
We have , here also an object is present. We have , here also an object is present. We
3 4
4 5 5 5 6
have , we have , then , then and then . So we have 1, 2, 3, 4, 5, 6, 7, 8
5 4 5 6 5
vectors, eight 2 dimensional vectors in this particular population. Now from these vectors it is
quite easy to compute the mean vector μX and you can easily compute that mean μ, mean
4.5
vector μX in this particular case will be nothing but .
4.5
635
So this is the mean vector that we have got. So once we have the mean vector, now we can go
for computing the covariance matrix and you will find that the covariance matrix CX was
defined as E x - μx x - μx
T
(Refer Slide Time: 16:01)
So finding out x - μx x - μx for all the vectors x and taking the average of them gives us
T
the expectation value of x - μx x - μx X, which is nothing but the covariance matrix CX.
T
So here for the first vector x1, we can find out (x1 - μX) as, you find that x1 is nothing but the
3
vector .
4
0.5
. So we can find out x - μx x - μx . If we compute this, this will be a
T
So (x1 - μX) =
1.5
2.25 0.75
. So similarly we find out x - μx x - μx for all other vectors in
T
value equal to
0.75 0.25
the population X.
636
And finally, average of all of them gives us the covariance matrix CX and if you compute like
0.75 0.375
this, you can easily obtain that covariance matrix CX will come out to be . So
0.375 0.75
this is the covariance matrix of the population of vectors X. Now once we have this
covariance matrix, to find out the K-L Transformation, we have to find out what are the
eigenvalues of this covariance matrix.
And to determine the eigenvalues of the covariance matrix, you all might be knowing that the
operation is like this that given the covariance matrix, we simply perform
0.75-λ 0.375
0 0.75 and set this determinant is equal to 0. And then you solve for the
0.375 0.75-λ
values of λ. So if you do this, you will find that this simply gives an equation of the form
0.75-λ
2
= 0.3752 .
Now if you solve this, the solution is very simple. The λ comes out to be 0.75 ± 0.375,
whereby we get λ1=1.125 and λ2 = 0.375. So these are the two eigenvalues of the covariance
matrix CX in this particular case and once we have these eigenvalues, we have to find out
what are the eigenvectors corresponding to these eigenvalues.
637
And to find out the eigenvectors, you know that the relation is for the given matrix, for a
given matrix say A or in our particular case it is CX. So let us take CX. So CX into say vector
Z has to be equal to λZ or CX Z = λZ, if Z is the eigenvector corresponding to the eigenvalue
λ. And if we solve this, we find that we get 2 different eigenvectors corresponding to two
different λ.
1 1
eigenvalue λ2 = 0.375. This corresponds to the eigenvector e2 which is equal to .
2 1
So you find that once we get these eigenvectors, we can formulate the corresponding
transformation matrix as we said we get the transformation matrix A, from the eigenvectors
of the covariance matrix CX, where the rows of the, of the transformation matrix A are the
eigenvectors of CX such that the first row will correspond to the eigenvector will be the
eigenvector corresponding to the maximum eigenvalue.
And the last row will be the eigenvector corresponding to the minimum eigenvalue. So in this
1 1 1
case the transformation matrix A will be simply given by . Now what is the
2 1 1
implication of this? So you find that using this particular transformation, transformation
matrix if I apply the K-L Transformation then the transformed output, the transform vector
will be y = A(x - μx) .
638
(Refer Slide Time: 22:08)
So you find that application of these particular transformation, this particular transformation
amounts to establishing a new coordinate system whose origin is at the centroid of the object
pixels. So this particular transformation K-L Transformation basically establishes a new
coordinate system whose origin will be at the center of the object and the axis of this new
coordinate system will be parallel to the direction of the eigenvectors.
So by this what we mean is like this one. So this was our original figure where all the white
pixels are the object pixels. Now by application of this transformation, this K-L Transform
with transformation matrix A, we get two eigenvectors, the eigenvectors are this e1 and e2. So
639
you find that this e1 and e2, it forms a new coordinate system and the origin of this coordinate
system is located at the center of the object and the axes are parallel to the directions of the
vectors e1 and e2.
And this figure also shows that this is basically a rotation transformation and these rotation
transformation aligns the data with eigenvectors and because of this alignment different
elements of the vector Y, they become uncorrelated. So it is only because of this alignment,
the data becomes uncorrelated and also because the eigenvalues of λi appear along the main
diagonal of CY, that we have seen earlier.
This λi basically tells the variance of the component yi along the eigenvector ei. And later we
see the application of this kind of transformation to align the objects around the eigenvectors
and this is very, very important for object recognition purpose. Thank you.
640
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-31.
K-L Transform-2.
Now let us see the other aspects of the K-L transformation. So this is one of the applications
where we have said that this, this K-L transformation basically aligns the data along the eigen
vectors. Another important property of K-L transformation deals with the reconstruction of
the vector x from the vector y. So by K-L transformation what we have got is, we have got a
set of vectors y from another set of vectors x using the transformation matrix A, where A was
derived using the eigenvectors of the covariance matrix of x that is CX.
641
(Refer Slide Time: 1:10)
So our K-L transformation expression was y = A(x - μx) . Now here we find that because this
matrix A, the rows of this matrix A are the eigenvectors of the covariance matrix CX. So A
consists of rows which are orthogonal vectors and because rows of A are orthogonal vectors,
so this simply says that inverse of A in nothing but AT. So now inverse of A is very simple.
If you simply take the transpose of the transform matrix A, you get the inverse of A. So from
the forward transform, forward K-L transform, we can very easily find out the inverse K-L
transform to reconstruct x from the transformed image or the transformed data y and in this
case, the reconstruction expression is very simple. It is given by x = A T y + μx .
This is a direct formation from the expression of forward transformation. Now the important
property of this particular expression is like this that suppose here you find that matrix A has
been formed by using all the eigenvectors of the covariance matrix CX. Now suppose I
choose that I will make a transformation matrix where I will not consider, I will not take all
the eigenvectors of the covariance matrix CX.
Rather, I will consider say k number of eigenvectors and using that k number of eigenvectors,
I will make a transformation matrix say Ak. So this Ak is formed using k number of
eigenvectors, k number of eigenvectors of matrix CX. I am not considering all the
eigenvectors of the matrix CX. And obviously because I am taking k number of eigenvectors,
I will take those eigenvectors corresponding to k largest eigenvalues.
642
So obviously, this matrix AK now it will have k number of rows and every row will have n
number of elements. So the matrix A will be of dimension kxn and the inverse transformation
will be, will also be done in the similar manner. So using this transformation matrix AK now I
apply the transformation, so I get y = Ak(x - μx) . Now because AK, is of dimension kxn and x
is of dimension nx1, so naturally this transformation will generate vectors y which are of
dimension k.
Now in earlier case, in our original formulation, here the transformation matrix y was the
transformed vector y was of dimension n. But when I have made a reduced transformation
matrix A considering only k number of eigenvectors, here I find that using the same
transformation. Now the transformed vectors y that I get, they are no longer of dimension n.
But this y are vectors of dimension k.
Now using these vectors of reduced dimension, if I try to reconstruct x, obviously the
reconstruction will not be perfect. But what I will get is an approximate value of x. So let me
write that expression like this.
Here what I will get is I will get an approximate x. Sorry. I will get an approximate x. Let me
write it as x ,which will be given by x = Ak T y + μx . Now here you find, that the AK matrix
was of dimension kxn.
Vector y was of dimension k. Now when I take Ak T , Ak T becomes of dimension nxk. Now if
I multiply this matrix Ak T which of, which is of dimension nxk by this vector y which is of
643
dimension k, obviously I get a vector which is of dimension nx1. So by this you find that this
So by this inverse transformation, I get back a vector x which is of same dimension as x but
it is not the exact value of x. This is an approximate value of x and it can be shown that the
mean square error of this reconstruction that is the mean square error between x and x is
n k
given by an expression that ems = λ - λ .
j=1
j
i=1
i
So you find that this mean square error, this term that we have got, you remember that while
forming our transformation matrix AK, we have considered k number of eigenvectors of
matrix CX and these k number of eigenvectors corresponding to corresponds to the largest
eigenvalues of matrix CX. And in this particular expression, the mean square error is given by
the sum of those eigenvalues whose corresponding eigenvectors was not considered for
formation of our transformation matrix A.
And because the corresponding eigenvalues are the smallest eigenvalues. So this particular
transformation and the corresponding inverse transformation ensures that the means square
error of the reconstructed signal or the mean square error between x and x will be minimum.
That is because this summation, consists of summation of only those eigenvalues which are
having the minimum value.
So that is why this K-L transform is often called an optimum transform because it minimizes
the error of reconstruction between x and x . Now this is a very, very important property of
K-L transformation which is useful for data compression. And in this particular case, let us
see that how this particular property of K-L transformation will help to reduce or to compress
the image data.
So obviously the first operation you have to do is, if I want to apply this K-L transformation
over an image, I have to see how to apply this K-L transformation over an image. So we have
already seen a digital image is a 2 dimensional array of quantized intensity values.
644
(Refer Slide Time: 10:21)
So this 2 dimensional image or 2 dimensional array, which consists of N number of rows and
N number of columns can be converted into a set of vectors in more than one ways. So let us
assume in this particular case, that we represent every column of this 2 dimensional array as a
vector. So if we do that, then every column of this, so this will be represented by a vector say
x0 this column will be represented by a vector say x1.
This column will be represented by a vector say x2 and this way we will have say N number
of vectors as there are N number of columns. So once we have these N number of vectors for
these N number of columns, we can find out the mean vector which is μX and this is given by
1 N-1
μX = xi .
N i=0
And similarly we can also find out the covariance matrix of these N vectors and the
expression for the covariance matrix as we have already seen that this is
1 N-1
CX = (xi - μx)(xi - μx)T . And here we find that our mean vector μX, this is of dimension
N i=o
capital N whereas the covariance matrix CX, this is of dimension capital NxN.
645
(Refer Slide Time: 14:00)
So once we have obtained the mean vector μX and the covariance matrix CX, we can find out
the eigenvectors and eigenvalues of this covariance matrix CX. And as we have already seen,
that because this particular covariance matrix CX is of dimension capital NxN, there will be N
number of eigenvalues λi, where this i varies from 0 to N-1 and corresponding to every
eigenvalue λi, there will be an eigenvector ei.
So this ei, eigenvector ei, i here again will vary from 0 toN-1. So given this N number of
eigenvectors for N number of eigenvalues, we can make the transformation matrix, we can
form the transformation matrix A and here this transformation matrix A will be formed as say
e0T. I will write this as transpose because e0 being eigenvector and normally a vector is
represented as a column vector.
So we will write the matrix A, which consists of a number of where rows of this matrix A
will be the eigenvectors of the covariance matrix A, covariance matrix CX. So this A will be
eT0
T
e1
A = . and there are N number of eigenvectors, so I will have eTN-1 , where this e0
.
eT
N-1
corresponds to the eigenvalue λ0.
And obviously in this case, as we have already said that our assumption is λ0 λ1 λ2 and
continued like, this it is λN-1. So this is how we form the transformation matrix A.
646
(Refer Slide Time: 16:31)
Now from this transformation matrix, we can make a truncated transformation matrix where
instead of using all the eigenvectors of the covariance matrix CX, we consider only the first k
number of eigenvectors which corresponds to k number of eigenvalues, k number of the
largest eigenvalues. So we form the transformation matrix, the modified transformation
matrix AK using the first k number of eigenvectors.
So in our case, AK will be eT0 , e1T and likewise it will go up to eTN-1 . And using this AK, we
take the transformation of the different column vectors of the image which we have
represented by vector xi. So for every xi, we get a transform vector say yi. So here, the
transformation equation is yi = Ak(xi - μx) . This is the modified transformation matrix, where
this i varies from 0 to N-1.
So here we find, that because AK is of dimension, the dimension of AK is kxN and dimension
of xi and μx, both of them are of dimension Nx1. So (xi- μx), this is a vector of dimension
capital Nx1. So this particular vector, this is of dimension capital Nx1. So you find that when
I multiply, this when I perform this transformation Ak(xi - μx) , this actually leads to a
transformed vector yi, where yi will be of dimension kx1.
So this is the dimensionality of yi. That means using this transformation with the
transformation matrix AK we are getting the transform vector yi of dimension k. So if this is
done, if this transformation is carried out for all the column vectors of matrix, of the 2
647
dimensional image, in that case I get n number of transformed vectors yi where each of this
transformed vector is a vector of dimension k.
That means the transformed image that I will get, the transformed image will consist of
Number of column vectors, where every column is of dimension k. That means the
transformed image now will be of dimension kxN, having k number of rows and N number of
columns. You remember that our original image was of dimension capital NxN.
Now using this transformed image, if I do the inverse transformation, to get back the original
image, as we said earlier that we do not get the perfect, perfectly reconstructed image rather
what we will get is an approximate image.
So this approximate image will be given by xi = Ak T yi + μx , where this xi , here you find that
it will be of dimension capital N.
So collection of all these xi that gives you the reconstructed image from the transformed
image. As we have said that the mean square error between the reconstructed image and the
original image in this particular case will be minimum because that is how we have formed
the transformation matrix and there we have said that the mean square error of the
reconstructed vector from the original vector was summation of the eigenvalues which are
left out, corresponding to which the eigenvectors were not considered for formation of the
transformation matrix.
648
So here we find, that because now, in this case for getting the reconstructed image, what are
the quantities that we have to save. Obviously, the first quantity that we have to save, the first
information that we have to save is the transformation matrix AK. So this AK needs to be
saved. And the other information that we have to save is the transformed matrix or the set of
transformed vectors yi, for i =0 to N-1.
So the set of transformed vectors yi, for i =0 to N-1. So if we save these two quantities AK
and the set of transformed vectors yi, then from these two quantities, we can reconstruct an
approximate original image given by the vectors xi . So we find that in this case, the amount
of compression that can be obtained depends upon, what is the value of k, that is how many
eigenvectors we really consider we really take into account for formation of our
transformation matrix A.
So the value of k can be 1, where we consider only one eigenvector to form our
transformation matrix A. It can be 2, where we consider only two eigenvectors to form the
transformation matrix A and depending upon the number of the eigenvectors, the amount of
compression that you that we can achieve will be varying. Now let us see that what are the
kind of results, that we get with different values of k.
So here you find that we have shown some of the images here the top image, this is the
original image. Now when this original image is actually transformed and reconstructed using
the transformation matrix with only one eigenvector. So this is the eigenvector having which
corresponds to the, corresponds to the largest eigenvalue of the covariance matrix.
649
Then the then the reconstructed image that we get is given by this result. So here you find,
that the reconstructed image is not at all good but still from the reconstructed image we can
make out that what this image is about. Now if we increase the number of eigenvectors in the
transformation matrix. When I use 5 eigenvectors as the transformation matrix then the
reconstructed image is given by this one.
You find that the amount of information which is contained in this particular image is quite
improved though this is not identical with the original image. If we increase the number of
eigenvectors further that is we use 25 eigenvectors to form the transformation matrix then this
is the reconstructed image that we get.
Now if you closely observe between these two images, compare these two images, you find
that these are some artefacts say for example in this particular region. There is an artefact
something like a vertical line, which was not present in the original image and that is
improved to a larger extent in this particular image. So again the image quality has been
improved if I increase the number of eigenvectors from 5 to 35.
So here, we have discussed about the K-L transformation, where we have said the K-L
transformation is fundamentally different from the other transformation that we have
discussed so earlier that is the discrete Fourier transformation, discrete cosine transformation
and so on. And there we have said that in those case of transformations, the transformation
matrix or the transformation kernel is fixed.
Whereas in case of K-L transformation, you find that the transformation kernel that is the
transformation matrix A, which is derived from the covariance matrix and this covariance
matrix actually represents what is the statistical property of the vector representation of the
data. So here the kernel of transformation or the transformation matrix is dependent upon the
data. It is not fixed.
So that is the fundamental difference between the other transformations with the, with the K-
L transformation. But the advantage of the K-L transformation that is quite obvious from the
650
reconstructed images is that that the energy compaction property which we have said earlier,
here in this particular case, in case of K-L transformation. The energy compaction property is
much higher than that of any other transformation.
Here you find that the earlier result that we have shown where using only one eigenvector as
the transformation matrix, this particular result here using only one eigenvector, still I can
reconstruct the image and I can say what is the content of that image, though the
reconstruction quality is very poor. So that shows that the energy compaction in eigenvector,
in the number of components is much higher in case of K-L transform than in case of other
transformation.
But as is, as it is quite obvious, that the computational complexity for K-L transformation is
quite high compared to the other transformations and in fact that is the reason that despite its
strong property of energy compaction K-L transformation has not been much popular for data
compression operations. Thank you.
651
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-32.
Image Enhancement: Point Processing Techniques.
Hello, welcome to the video lecture series on Digital Image Processing. Today and for
coming few lectures, we will be talking about Image Enhancement Techniques. First we will
see that what is the necessity of image enhancement. Then we will see that image
enhancement techniques fall under two broad categories; one of the categories is Spatial
domain operations. In spatial domain operations, the enhancement techniques work directly
on the image pixels.
And these spatial domain operations can have three different forms. One is the point
processing, other one is the histogram based processing techniques and the third one is mask
processing techniques. Of course histogram based processing technique is also a form of
point processing technique. For these spatial domain operations, we said that we do not do
any peak processing on the images.
The images are directly operated in their spatial domain to give us the transformed images
which are the enhanced images. The other category of this image enhancement techniques,
they work on normally the Discrete Fourier Transformation coefficients of the images. So
they are called as frequency domain operations and you will see later that there are different
652
operations which can be done in frequency domain like low pass filtering, band pass filtering,
high pass filtering and so on.
And then also we have different forms of these different filters. Now let us see that, what is
meant by image enhancement? By image enhancement, what we mean is it is a technique of
processing an image to enhance certain features of the image. Now as it is said that it is for
the enhancement of certain features of the image. So obviously, depending upon which
feature we want to enhance, there are different forms of image enhancement techniques.
Some applications may demand that our input image is noisy so we want to reduce the noise
so that the image becomes better visually. So reduction of this noise or removal of the noise
653
from the images is also a form of image enhancement. In many cases, we have found that the
images which are captured by image capturing devices for example camera, they are very
dark and an image may become very dark because of various reasons.
So for such kind of applications, the image enhancement technique may need to increase the
contrast of the image or to increase the intensity of the image. So for that kind of application,
we will have some other type of image enhancement techniques. Some applications may need
that the applications need that the edges of the objects present in the image, those should be
highlighted.
So again in such cases, the image enhancement techniques should be able to highlight the
edges of the objects present in the image. So you find, that the image enhancement
techniques, these techniques vary depending upon the application, different types of
applications need enhancement of different types of features in the image.
So, the result, the ultimate aim of the image enhancement techniques is such that we want to
process an image so that the result becomes more suitable than the original image for certain
specific applications. So as we have already said, obviously the processing techniques are
very much problem oriented because different kinds of problem demand enhancement of
different kinds of features in the image.
654
(Refer Slide Time: 5:13)
Now as we have already said that, image enhancement techniques, fall under two broad
categories. The first category is the spatial domain technique where the image enhancement
processes, they work directly on the image plane itself. That means these techniques try to
directly manipulate the pixels in the image. The other category of the image enhancement
techniques are frequency domain techniques.
So in case of frequency domain techniques, first we have to take the Fourier Transformation
of the image, then whatever is the Fourier transformation coefficients that we get, you modify
those Fourier transformation coefficients and these modified set of coefficients, you take the
inverse Fourier transform of that to obtain the enhanced image or the modified image as need.
So first, we will be talking about the image enhancement techniques in the spatial domain.
So let us see that what are the different spatial domain image enhancement techniques that we
can have. So as we said, that the spatial domain techniques work directly on the image pixels.
So naturally we have to define a transformation function which will transform an image pixel
from the original image to a pixel in the enhanced or processed image. So such a function,
can be defined in this form.
655
(Refer Slide Time: 6:58)
We can write that g(x) is equal to some transformation g(x) = T f(x) T or because in this
case we are dealing with the 2 dimensional images. So we will write the expressions as
g(x, y) = T f(x, y) . So in this case f(x,y) is the original image. T is the transformation which
Now as we said, that in case of spatial domain techniques you will find that this
transformation T is working directly on f(x,y) that is in the spatial domain or in the image
plane to give us the processed image g(x,y) where T is an operator which is to work on the
original image f and this operator is defined over a neighborhood of the point (x,y) in the
original image f(x).
And later on we will see that this operator T this transformation operator T can also operate
on more than one images. So for the time being, we are considering the case that where this
operator T, the transformation operator T works on a single image and when we want to find
out a processed image at location (x,y) then this operator T woks on the original image f at
location (x,y) considering certain neighborhood of the point (x,y) to determine what will be
the processed pixel value at location (x,y) in the processed image g.
656
(Refer Slide Time: 9:20)
Now the neighborhood of a point (x,y) is usually a square sub-image which is centered at
point (x,y). So let us look at this particular figure. Here, you find that we have taken a
rectangular image f. So this outer rectangle represents the image f and within this image f we
have taken pixel at a particular location (x,y). So this is the pixel location (x,y). And the
neighborhood of point (x,y) as we said that it is usually square sub- image around point (x,y).
So this shows a 3x3 neighborhood around the pixel point (x,y) in the image f. Now what
happens in case of point processing, we said that this operator, the transformation operator T
operates at point (x,y) considering a certain neighborhood of the point (x,y) and in this
particular case we have shown a neighborhood size is of 3x3 around point (x,y). Now for
different application the neighborhood size may be different.
We can have a neighborhood size of 5x5, 7x7 and so on depending on the type of the image
and the type of operation that we want to have. Now in case of point processing, the
neighborhood size that is considered is of size 1x1.
657
(Refer Slide Time: 10:48)
So the neighborhood size of a point in case of point processing is of size 1x1. So that means
that this operator T now works on the single pixel location. So it works on only that particular
pixel location (x,y) and depending upon the value, depending upon the intensity value at that
location (x,y), it determines what will be the intensity in the corresponding location in the
processed image g. It does not consider the pixel values of its neighboring locations.
So in such cases we can write the transformation function in the form, s = T(r) , where this r
is the pixel value in the original image and s is the pixel value in the corresponding location
in the processed image. So this transformation function, it simply becomes of this form,
s = T(r) , where s and r are independent pixel values at different locations.
658
(Refer Slide Time: 12:13)
Now these transformation functions can be put in the form of these two figures. So in this
particular case, the first figure shows a transformation function where you find that here in
this case along the x axis or along the horizontal axis we have put the intensity values r of the
original image and along the vertical axis we have put the intensity values of different pixels
in the processed image g and obviously they are related by s = T(r) . And the transformation
function is given by this particular curve.
So this is our T(r). And in this particular figure as it is shown, that the point so the pixel
values near 0 has been marked as dark regions. So it is quite obvious, that in an image, if the
intensity values of the pixels are near about 0 that is very small intensity values, those regions
appear as very dark and the intensity values which are higher in an image those regions
appear as light regions.
So this, first one, the first transformation function shows that in this particular range, a very
narrow range of the intensity values in the original image is mapped to a wide range of
intensity values in the processed image g. And effectively this is the operation which gives
enhancement of the image.
In the second figure that we have shown, here you find that this particular transformation
function says that if I consider say this is some intensity value say I, so for all the pixels in
the input image if the intensity values are less than I then in the processed image, the
corresponding pixel will be replaced by value 0, whereas a pixel where the intensity value is
greater than I, the corresponding pixel in the processed image will have a maximum value.
659
So this second particular transformation operation, it actually generates a binary image
consisting of only the low values and the high values and this particular operation is known
as thresholding operation. Now what happens in case of, so this is the kind of operation that
will be done for point processing. Now the other kind of special domain operation where the
neighborhood size is larger than one.
So neighborhood size of 3x3 or 5x5 and 7x7 and so on. That kind of operations is usually
known as mask operations. So in case of mask operations what we have to do is we have to
define a neighborhood around every pixel (x,y) at which point we want to get the intensity
value in the processed image.
And for doing this, it is not only the intensity value of that particular pixel but also the
intensity values of the pixels around that point which is within the neighborhood of that point.
All of them take part in deciding what will be the intensity value at the corresponding
location (x,y) in the processed image g.
So let us see that how that operation is done. So here, again we have copied the same 3x3
neighborhood what we have seen in our previous slide. So if I consider a 3x3 neighborhood,
then for mask processing what you have to do is? We also have to define a 3x3 mask and in
this particular case you will find that on the right hand side of the figure. We have defined a
mask where the values in the mask are represented as W-1,-1, W-1,0, W-1,1. W0,-1, W0,0, W0,1,
W1,-1, W1,0 and W1,1.
660
So these are the different values which are also known as coefficients which are present in
this 3x3 mask. Now to generate, the intensity value at location (x,y) in the processed image.
The operation that has to be done is given by the expression at the bottom where it says that
1 1
g(x,y) = Wi,j.f(x+i,y+j) .
i=-1 j=-1
So what does this actually mean? This means that if I place this mask on the image centered
at location (x,y), then all the corresponding pixels under this mask of the image and the
corresponding mask coefficient, they have to be multiplied by together, then take the sum for
all such mask locations. And what I get that gives me the processed image, the intensity value
of the processed image g at location (x,y).
So this is what is meant by mask operation and depending upon the size of the mask that we
want or the size of neighborhood we consider, we have to define the 3x3 mask or 5x5 mask
or 7x7 mask and so on. And the coefficient values, these defined W values in the mask, that
determine that what kind of image enhancement operations that we are going to do.
Whether this will be an image sharpening operation, image averaging operation, edge
enhancement operations and so on, all of them depend upon the mask values that is the Wi, j
present in this particular mask. So this is the basic difference between the point processing
and mask processing and obviously both these processing techniques fall under the category
of spatial domain techniques because in these cases we have not considered the Discrete
Fourier Transform coefficients of the original image which is to be processed.
661
(Refer Slide Time: 19:06)
Now let us come to the point processing techniques. The first one that we will consider is a
point processing techniques which we will call the negative image. Now in many of the cases
the images that we get they contain white or grey level informations embedded in black
pixels or very, very dark pixels and the nature of the information is such that we have very
few white or grey level informations present in a white background which is very much dark.
So in such cases, finding out the information from the images from the raw images, input
images becomes very, very difficult. So in such cases, it is beneficial that instead of
considering that raw image, if I just take the negative of the image that is all the white pixels
that we that we have in the image or the larger intensity values that we have in the image, you
make them darker and the darker and the darker intensity values, you make them lighter or
brighter.
So in effect what we get is a negative of an image and within this negative image we will find
through result that visualization or extracting information which we want will be more
convenient that than in the original image. So the kind of transformations that we need in this
particular case is shown in this figure. So here we consider that the digital image that we are
considering that will have L number of intensity levels represented from 0 to L-1 in steps of
1.
So again along the horizontal axis, we have put the intensity values of grey level values of the
input image and along the vertical axis we have put the intensity values or grey level values
of the processed image. And this corresponding transformation T now can be represented as
662
S = T(r) which is nothing but L – 1- r. So you find that whenever r = 0, then s will be equal to
L-1, which is the maximum intensity value within our digital image.
And when r = L-1, that is the maximum intensity value in the original image. In that case s=0.
So the maximum intensity value in the original image will be converted to the minimum
intensity value in the processed image. And the minimum intensity value in the processed
image will be converted to maximum intensity value, the minimum intensity value in the
original image, will be converted to maximum intensity value in the processed image.
So what, in effect what we are getting is a negative of the image and graphically this
transformation can be put in the form of this figure. So here you find that this transformation
is a straight line with a slope of minus 45 degree and passing through the points (0,L-1) and
(L-1,0) in this r-s Plane.
Now let us see what is the kind of result that we will get by applying this kind of
transformation. So here we have shown two images. On the left hand side, we have a digital
mammogram image and on the right hand side, we have the negative of this image which is
obtained by the transformation that we have just now discussed. So you find that in this
original image, we have some white grains and there is a white patch which indicates a
cancerous region.
And to this grains corresponds to the tissues, corresponding to the tissues, they are not very
prominent. I mean it is very difficult to make out which is what in this original image. Now if
663
I take a negative of this particular image, from the right hand side we have got this negative.
So here you find that all the darker regions in the original image has been converted to
brighter regions in this process image.
And the brighter regions in the original image has been converted to darker regions in the
processed image. And now it is very convenient to see what information we can get from this
negative image. And this kind of transformation, the negative transformation, is very, very
useful in medical image processing. And as this is just an example, which shows that
understanding of this particular digital mammogram image, the negative transformation gives
us much more information than that we have in the original.
And as we said maybe this is the transformation which is best suited for this particular
application. But this transformation may not be the best transformation for other kind of
applications. Thank You.
664
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-33.
Contrast Stretching Operation.
So now let us see, that what are the other kind of image enhancement techniques that we can
have. The next image enhancement technique is again a very, very simple enhancement
technique that we are going to discuss is called contrast stretching. Contrast Stretching
operation.
So, why do we need such contrast stretching? You might have found that in many cases, the
images that we get from an imaging device is very dark and this may happen because of
various reasons. One of the reasons is when you have taken the image of certain object or
certain scene, the illumination of the object or the illumination of the scene was very poor
that means the object itself was very dark.
So naturally the image has become very dark. The second reason why an image may be dark
is, the dynamic range of the sensor on which you are imaging is very small. Now what I mean
by dynamic range is, it is the capacity of the sensor to record the minimum intensity value
665
and the maximum intensity value, so the difference between the minimum intensity value and
the maximum intensity value is what is the dynamic range of the sensor.
So even if your scene is properly illuminated but your sensor itself is not capable of recording
all those variations in the scene intensity, that also leads to an image which is very, very
dark. The another reason which may lead to dark images is that when you have taken the
photograph, maybe the aperture of the lens of the camera was not properly set. Maybe the
aperture was very small so that very small amount of light was allowed to pass to the lens to
the imaging sensor.
So if the aperture is not properly set, that also leads to an image which is very, very dark. So
for such dark images, the kind of processing techniques which is very suitable is called the
contrast stretching operation.
Now let us see what is the kind of dark image that we can have. Here we show an image
which is a low contrast image. So here obviously you find that the contrast of the image or
the intensity of the image is very, very poor and overall appearance of this image is very
dark.
And the purpose of contrast stretching is to process such images so that the dynamic range of
the image will be very high will be quite high so that the different details in the objects
present in the image will be clearly visible.
666
(Refer Slide Time: 3:37)
Now a typical transformation which may be applied for contrast stretching operation is
shown in this particular figure. So here you find that in this particular transformation, we
have indicated two different points.
One is (r1,s1) that is this particular point and the other point is (r2,s2) that is this particular
point. Now it is the locations of this points (r1,s1) and (r2,s2) which controls the shape of this
transformation function and accordingly it influences upon that what are the defined types of
contrast enhancements that we can obtain in the processed image.
Now the locations of this (r1,s1) and (r2,s2) are very, very important. You will find that if we
make r1 equal to s1 and r2 equal to s2 then the transformation function becomes a straight line
which, with a slope equal to 45 degrees. That means that whatever is the intensity image we
have, in the processed image we will have the same intensity level.
So the values which are mostly used is, here you will find that if I make the other extreme, if
I make r1 = r2, s1 = 0 and s2 = L-1 then that leads to thresholding operation. So the
corresponding operation generates the binary image which is the processed image. Now for
667
enhancement operation, usually what is used is r1 < r2 and s1 < s2 which gives us a
transformation function as given in this particular figure.
And this transformation function generally leads to image enhancement. Now the condition
that r1 r2 that is very, very important. So the condition we have just said that r1 r2 and
s1 s2. Now this particular condition is very, very important as you find that if this condition
is maintained then the transformation function that we get becomes a single valued
transformation function and the transformation is monotonically increasing.
So that is very, very important to maintain the order of the intensity values in the processed
image. That is an image which is dark in the original image will remain darker in the
processed image. And image which is brighter in the original image that will, a point which is
brighter in the original image that will remain brighter in the processed image. But what
difference we are going to have is the difference of intensity values that we have in the
original image and the difference of intensity values we get in the processed image.
That is what gives us the enhancement. But if it is reversed, if the order is reversed, in that
case the processed image will look totally different from the original image. And all the
transformations that we are going to discuss except the negative operation that we have said
initially all of them maintain this particular property that the order of the intensity values is
maintained. That is the transfer of function is monotonically increasing and we will have a
transfer function which is single valued transfer function.
668
(Refer Slide Time: 8:02)
Now using this particular transfer function, let us see what kind of result we can obtain. So in
our earlier slide we have shown an image which is a low contrast image as is shown on the
left hand side of this particular diagram of this particular slide and so this left hand side
image, this is the original image which is a low contrast image and by using the contrast
enhancement operation, what we have got is an image which is the processed image shown
on the right hand side.
And here you can clearly observe that more details are available in the processed image than
in the original image. So obviously the contrast of the processed image has become much
much higher than the contrast in the original image. So this is a technique which is called
Contrast Stretching Technique which is mostly useful for images where the contrast is very,
very poor and we have said that we can get a poor contrast because of various reasons.
Either scene illumination was poor or the dynamic range of the image sensor was very, very
less or the aperture setting of the camera lens was not proper. And in such cases the dark
images that we get, that can be enhanced by using this Contrast Stretching Techniques. Now
there are some other kind of applications where we need to reduce the dynamic range of the
original images.
Now the applications where we need to reduce the dynamic range is say for example, I have
an original image whose dynamic range is so high that it cannot be properly reproduced by
our display device. So normally we have a grey level display device or a black and white
669
display device which normally uses 8 bits that means it can display intensity level levels from
0 to 255 that is total 256 different intensity levels.
But in the original image, if I have a minimum intensity value of say 0 and the maximum
intensity value of say a few thousands then what will happen that because the dynamic range
of the original image is very high but my display device cannot take care of such a high
dynamic range. So the display device will mostly display the highest intensity values and the
lower intensity values will be in most of the cases suppressed.
And by that a kind of image that we will get usually is something like this. So here you find
that on the left hand side we had shown an image, this is basically the Fourier
Transformation. The DFT coefficients of certain image. So on the left hand side, we have
shown an image the Fourier coefficients and here you find that only at the centre we have a
bright dot and outside this the image is mostly dark or mostly black.
But actually there are a number of intensity levels between the 0 and the minimum that is 0
but between this maximum and the minimum levels but which could not be reproduced by
this particular device because its dynamic range is very poor. On the right hand side, we have
shown the same image after some preprocessing that is after reducing the dynamic range of
the original image by using the image enhancement techniques.
670
(Refer Slide Time: 12:15)
And here you find that in the processed image, in addition to the bright spot at the centre, we
have many other coefficients which are visible as you move away from the centre. So here
our application is to compress the dynamic range of the input image and the kind of
transformations which can give us this dynamic range compression is a transformation of this
form, which is a Logarithmic transformation.
So here again we assume, that r is the intensity of a pixel in the original image and S is the
intensity of the pixel in the processed image and the relation is S = T(r) which is equal to
s = T(r) = c log(1+ r ) where, the c is a constant. This constant has to be decided depending
upon the dynamic range of your display device and the dynamic range of the input image
which is to be displayed.
And then log(1+ r ) is taken because otherwise whenever r is equal to 0 that is an intensity
level in the input image is equal to 0. Log(0) is not defined. So to take care of that we take
1+ r and if you take c log(1+ r ) , that gives a completion of the dynamic range and the
image can be properly displayed on a display where the dynamic range is limited.
671
(Refer Slide Time: 13:29)
A similar such operation again for enhancement that can be used is called Power-Law
Transformation. The Power-Law Transformation is normally used for different imaging
devices. It is used for image capturing devices, it is used for image printers and so on. In case
of Power-Law devices, the transformation function between the original image intensity and
the processed image intensity is given by s = T(r) which is nothing but s = T r = cr γ .
So in this plot that we shown, this plot is shown for different values of γ where c = 1. So you
find that for value of γ which is < 1, this transformation function usually towards the lowered
intensity side, it expands the dynamic range of a very small intensity range in the input
image, whereas for a higher intensity side, a higher range of input intensity is mapped to a
lower range of intensity values in the processed image.
And the reverse is true for values of γ which are greater than 1. Now for this kind of
transformation, the exponent is conventionally represented by the symbol γ and that is why
this kind of transformation, this kind of correction is also known as γ correction. And this
kind of processing is used as I said for different types of display devices. It is used for
different types of printing devices. It is used for different types of capturing devices.
The reason is all those devices mostly follow these Power-Law characteristics. So if I give an
input image that will be converted by Power-Law before the image is actually produced. Now
so compensate for this Power-Law which is introduced by the device itself if I do the reverse
672
operation beforehand then the actual image that I want to display that will be displayed
properly.
Say for example, in case of a CRT display, the relation between the intensity to voltage that
follows the Power-Law with the value of γ which varies normally from 0.8 to 2.5. So if I use
the value of γ = 2.5 and if I come to this particular figure, then you find that with γ = 2.5, this
is the curve or this is the correct transformation function that will be used.
So whichever image I want to display, the device itself will transform the image using this
particular curve before displaying the particular image. And as this curve shows that the
image which will be displayed will normally be darker than the original image that we intend
to display. So what you have to do is, we have to take some corrective measure before giving
the image to the CRT for the display purpose.
And because of this correction we can compensate this Power-Law so that our image will be
displayed properly. So coming to this next slide, you find that here we have shown an image
which is to be displayed and the image is on the top left corner. The monitor has a
characteristics of Power-Law. It has Power-Law characteristics which is given by s = r 2.5 .
And as we said that because of this Power-Law characteristics, the image will be darker and
which is obvious that the image as displayed on the device is given on the right hand side.
And you will find that this image is darker than the original image. So to compensate for this
673
what I do is, before giving this image to the CRT for display, we go for a γ correction. That
1
means you transform the image using the transformation function s = r 2.5
.
So by this transformation and if you refer back to our Power-Law curves, you will find that
the original image now becomes a brighter image, that is the lower intensity ranges in the
input image has now been mapped to a larger intensity range in the processed image. So as a
result, the image has become brighter. And when this brighter image is given to the CRT
display for display operations, the monitor will perform its characteristic power law that is
s = r 2.5 .
And because of this characteristic the earlier correction the γ correction that we have
incorporated that gets nullified and we get the image and now we find that the right bottom,
this is the actual image now which will be displayed on the CRT screen and this image now
appears to be almost same as the original image that we want to display.
So this is a sort of enhancement because if I do not use this kind of correction, then the image
that we are going to display on the CRT screen that will be a distorted image but because of
the Power-Law correction or the γ correction as it is called the image that we get on the CRT
screen will be almost same as the original image that we want to display.
Now this kind of Power-Law Transformation, it is not only useful for imaging devices like
CRT display or image printer and so on. Similar Power-Law Transformations can also be
used for enhancing the images. Now the advantage that you get in case of Power-Law
Transformation is that the transformation curve gets various shapes depending upon different
values of γ.
And as we have shown in our previous slides, that if the value of γ < 1, then on the darker
side, the lower range of intensity values will be mapped into a larger range of intensity values
in the processed image, whereas on the brighter side, a larger range of intensity values will be
mapped into a lower range of intensity values in the processed image. And the reverse is true
when γ > 1.
674
(Refer Slide Time: 21:01)
We cannot get the details of the image very easily. Now if we process this image using the
Power-Law Transformation then you find that the other three images that is the right top
image is obtained by using the Power-Law Transformation with certain value of γ. Similarly,
the bottom left image using some other value of γ and the right bottom image is also obtained
by using some other value of γ.
And here you find, that for the first image that is right top image has been corrected with a
value of γ which is less than the value of γ used for the image shown in the left bottom image
which is again less than the value of γ used for getting, obtaining the image shown in the
right bottom side.
And as it is quite obvious, in all these cases, you find that the washed out characteristics of
the original image have been controlled that is in the processed image we can get much more
details of the image content. And as you find that as we increase the value of γ the image
becomes more and more dark and which is obvious from the Power-Law characteristic the
Power-Law Transformation function plot that we have already shown.
675
So this is another kind of processing operation, the Power-Law Transformation that can be
used, also used to enhance some features of the input image. The other type of
transformation, that we can use for this image enhancement is called grey level slicing. So in
case of grey level slicing, some applications may need that application may not be interested
in all the intensity levels but the application may be, may need the intensity levels only in
certain grey level values.
So in such cases, for enhancement what you can use is the grey level slicing operation and the
transformation function is over here. Here the transformation function on the left hand side
says that for the intensity level in the level A to B, the image will be enhanced for all other
intensity levels, the pixels will be suppressed. On right hand side, the transformation function
shows that again within A and B the image will be enhanced, but outside this range, the
original image will be returned.
676
(Refer Slide Time: 24:04)
And the results that we get is something like this. The first image shows that only the desired
intensity levels are obtained or retained with enhancement all other regions they have been
suppressed. The right hand image shows that the desired range of intensities have been
enhanced but other intensity levels have remained as it is.
So with this we stop our today’s discussion on point processing. We will continue with this
topic in our next lecture. Thank You.
677
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, kharagpur.
Lecture-34.
Histogram Equalization and Specifications-I.
So we have talked about the image enhancement using point processing and under that we
have talked about the various point processing techniques like Negative Image
Transformation and in case of Negative Image Transformation, we have see that the
processed image that we get is a negative version of the input original image. And such
processed images are useful in case we have very few pixels in the original image where the
information content is mostly in the white pixels or grey pixels which are embedded into
large regions of dark pixels.
So in such cases if we take the negative of the image, in that case the processed image,
processed image, the information content becomes much more convenient to visualize. The
other kind of point processing techniques that we have discussed is the Contrast Stretching
operation. In case of Contrast Stretching operation, we have seen that these kind of contrast
stretching operations is useful where the original image is very dark.
And we have said that such dark images we can have, when the scene illumination was very
poor or we can also have a very dark image where the dynamic range of the sensor is very
678
small so that it cannot record all the intensity values present in the scene or the dark images
can also be obtained if while image acquisition, the aperture setting of the camera lens is not
proper. So for these different kinds of cases, we can have a dark image and Contrast
Stretching is a very, very useful technique to enhance the contrast of such dark images.
The other kind of transformation that we have used for image enhancement is a Logarithmic
Transformation and there we have said that Logarithmic Transformation basically
compresses the dynamic range of the input image. And this kind of transformation, we have
said that is very, very useful when an image which is to be displayed on a display device. But
the dynamic range of the input image is very, very large which the display device cannot
handle.
So for such cases you go for the Logarithmic Transformation which compresses the dynamic
range of the input image so that it can be reproduced faithfully on the display. Then we have
also talked about the other kind of image enhancement techniques like Power-law
Transformation and we have said that this Power-Law Transformation is very, very useful for
image display devices, for printing devices as well as for image acquisition devices.
Because by nature all these devices provide a Power-Law Transformation of the image that is
to be produced whether it is on the display or it is on the peak printer or the image which is to
be captured. So because the devices themselves transform the image using the Power-Law
Transformation then, if I, if we do not take any action before providing the image to those
devices then the images which will be produced will be distorted in nature.
The other kind of image enhancement techniques that we have discussed about is the Gray
level Slicing operation and we have said that this Gray Level Slicing operations are useful for
applications. So the application demands, the application wants the enhanced values of
certain gray levels. So there again, we have seen two different types of, two different types of
transformation functions.
In one case of transformation function, the transformation enhances all the intensity values
within a given range and the intensity values outside that given range is suppressed or made
679
to 0. The other kind of Gray Level Slicing Transformation that we have said is there within
the given range, the intensity values are enhanced but outside that particular range, the
intensity values remain untouched. That is whatever is the intensity values in the original
image, the same intensity values are reproduced in the processed image.
Whereas within the given range, the intensity values are enhanced. So this kind of
applications, this kind of transformation is very, very useful for applications where the
application wants that intensity values within a certain range should be highlighted. Now all
these different point processing techniques that we have discussed till now. They don’t
consider the overall appearance of the image. They simply provide the transformation of, on a
particular intensity value and accordingly produces the output intensity value.
Now in today’s discussion, we will talk about another approach where the transformation
techniques also take care of the global appearance of the image. So Histogram is such a
measure which provides a global description of the appearance of an image. So today what
we are going to discuss the enhancement techniques that we were going to discuss, few of
them are based on histogram based processing.
So in today’s discussion we will talk about initially, what is a histogram? Then we will talk
about two histogram based techniques, one of them is called histogram equalization and the
other one is called histogram specification or sometimes it is also called histogram matching
or histogram modification. Then apart from this histogram based techniques, we will also talk
about two more image enhancement techniques.
680
You remember from our previous discussion that when you have said that a transformation
function T is applied on the original image f to give us the processed image g and there we
have said that this transformation function T transforms an intensity in the input image to an
intensity value in the original image. And there we have mentioned, that it is not necessary
that the transformation function T will work on a single image.
The transformation function T can also work on multiple images, more than one images. So
we will discuss two such approaches. One approach is image enhancement using image
subtraction operation and the other approach is image enhancement using image averaging
operation. So first let us start discussion on histogram processing and before that let us see
that what do we mean by the histogram of an image. So to define the histogram of an image
we consider that an image is having grey level intensities in the range 0 to L-1.
So we will consider that the digital images that we are talking about, it will have L number of
discrete intensity levels and we will represent those intensity levels in the range 0 to L-1 and
we say that a variable say rk represents the kth intensity level.
Now an histogram is represented by h(rk), which is equal to nk, where nk is the number of
pixels in the image having intensity level h(rk). So once we get, the number of pixels having
an intensity value rk and if we plot these number of pixel values, the number of pixels having
different intensity values against the intensity value of that of those pixels then the plot that
we get is known as a histogram.
681
So in this particular case you will find, that because we are considering the discrete images so
this function, the histogram h(rk) will also be discrete. So here rk is a discrete intensity level,
nk is the number of pixels having intensity level rk and h(rk), which is same as nk this also
assumes discrete values. In many cases, we talk about what is called a normalized histogram.
So instead of taking a simple histogram as just defined, we sometimes take a normalized
histogram.
So a normalized histogram is very easily derived from these original histograms where the
normalized histogram is represented as. So as before, this nk is the number of pixels having
intensity value rk and n is the total number of pixels in the digital image. So you find that
nk
from this expression the p rk = . This p(rk) actually tells you that what is the probability
n
of occurrence of a pixel having intensity value equal to rk.
And such type of histograms, give as we said information, a global description of the
appearance of an image. So now let us see that what are the different types of images that we
can usually get and what are the corresponding histograms. So here we find that the first
image as you see that it is a very, very dark image. It is very difficult to find out, what is the
content of this particular image.
And if we plot the histogram of this particular image, then the histogram is plotted on the
right hand side. You will find that this plot says that most of the pixels of this particular
image have intensity values which are near to 0. So here this particular image because we are
682
considering all the images which are digitized and every pixel is digitizing using 8 bits. So
we will have total 256 number of intensity levels and those 256 number of intensity levels are
represented by intensity values from 0 to 255.
And for this particular case, for this particular dark image, you find that most of the pixels
have intensity values which are near to 0 and that gives a very, very dark appearance of this
image.
Now let us see a second image. Here you find that this image is very bright and if you look at
the histogram of this particular image, you will find that, for this image the histogram shows
that most of the pixels of this image have intensity values which are near to the maximum.
That is near value 255. And because of this, the image becomes very bright. Let us come to a
third image category.
683
(Refer Slide Time: 14:20)
This is an image where you find that the intensity values are higher than the values of the first
image that we had shown. It is lower than the intensity values of the just previous image that
we have shown. Ok? So this is something in between and the histogram of this particular
image shows that most of the pixels of this image have intensity values, which are in the
middle range and not only that, the spread of the intensity values of this pixels are also very
low. The spread is very, very small.
So this image appears to be a medium kind of image, it is neither very dark nor very bright.
So the image is a medium kind of image but at the same time, the variation of the intensity
values of this particular image is very poor and as a result, the image that we have got over
here, this image gives a medium kind of appearance not very bright not neither very low but
at the same time, the variation of intensities is not very clear.
684
(Refer Slide Time: 15:48)
That means the contrast of the image is very, very poor. So let us look at the fourth category
of image. So this one. In this image, the histogram plot shows that the intensity values vary
from very low values to very high values. That is it has a wide variation from 0 to 255 levels.
And as a result, the image appears to be a very, very prominent image having low intensity
values, high intensity values and at the same time, if you look at the image, you find that
many of the details of the image are easily visible from this particular image.
So as we said that the histogram that the nature of the histogram shows that what is the global
appearance of the image of an image and which is also quite obvious from these four
different types of images that we have shown. The first one was the gray image which is a
dark image.
The second one was bright image, the third one was a medium category image but the but the
contrast of the image was very poor and we will say that this fourth one is an ideal image at
least for the visualization purpose where the image brightness is proper and at the same time,
the details of the objects present in the image are also, can also be very easily understood. So
this is an image which is a high contrast image.
So when you talk about this histogram based processing. Most of the histogram based
enhancement techniques, they try to improve the contrast of the image whether we talk about
the histogram in equalization or the histogram modification techniques. Now when we talk
about these histograms based techniques, this histogram based techniques, the histograms just
685
give you a description, a global description of the image. It does not tell you anything about
the content of the image and that is quite obvious in these cases.
Just by looking at the histogram, we cannot say that, what is the content of the image? We
can just have an idea of what is the global appearance of that particular image. And histogram
based techniques try to modify this histogram of an image to have an image to appear in a
particular way; either dark or bright or the image contrast is very high. And depending upon
the type of operations that we do using this histograms, we can have either histogram
equalization operation or we can have histogram modification operation.
So now let us see that once we have given that, what is a histogram and what does the
histogram tell us? Let us see that, how these histograms can be processed to enhance the
images. So the first one that we will talk about is the image equalization or histogram
equalization operation. So for this histogram equalization operation, initially we will assume,
that r to be a variable representing the gray level in an image.
So this r represents the gray level in an image. And for the time being, we will also assume
that the pixel values in an image are continuous and they are normalized in the range 0 to 1.
So we assume, the normalized pixel values and the pixel values can take values in the range 0
to 1, where 0 indicates a black pixel.
686
So 0 indicates a black pixel and 1 indicates a white pixel. Later on we will extend our ideas to
discrete formulation when we will consider the pixel values in the range 0 to L-1, where L is
the number of gray levels, discrete gray levels present in the image.
Now as we said that for point processing we are interested to find out a transformation where
the transformation is of the form s = T(r), where r is the intensity in the original image and s
is the intensity in the processed image or the transformed image or the enhanced image. Now
this T the transformation function has to satisfy two conditions. Firstly, the T(r) has to be
single valued and it has to be monotonically increasing in the range 0 to 1.
So the first condition is T(r), it must be single valued and monotonically increasing in the
range 0 r 1 ; and the second condition that T(r) must satisfy is 0 T(r) 1 , for 0 r 1 .
Now the first condition is very, very important because it maintains the order of the gray
levels in the processed image.
That is a pixel which is dark in the original image should remain darker in the processed
image. A pixel which is brighter in original image should remain brighter in the processed
image. So the intensity ordering does not change in the processed image; and that is
guaranteed by the first condition that is T(r) should be single valued and monotonically
increasing in the range 0 to 1 of the values of r.
The second condition that is 0 T(r) 1 . This is the one which ensures that the processed
image that you get that does not lead to a pixel value which is higher than the maximum
intensity value that is allowed. Ok? So this ensures that the processed image will have pixel
values which are always within the allowable minimum and maximum range. And it can be
found that if these conditions are satisfied by T(r) then the inverse, that is r is equal to.
The inverse of this, that is r = T -1 (s) will also satisfy these two conditions. So we want a
transfer function T which will satisfy this conditions and if this conditions are satisfied by
T(r) then the inverse transformation will also satisfy this particular condition. Thank You.
687
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-35.
Histogram Equalization and Specifications-II.
We will also satisfy this particular condition. Now let us see how the histograms help us to
get a Transformation function of this form. So now we assume, so as we said that we assume
that the images, assume the intensity values, normalized intensity values in the range 0 to 1
and as we said that r is an intensity value in the original image s is an intensity value in the
processed image.
688
(Refer Slide Time: 1:15)
Now given this, from elementary probability theory, we can, we know that if pr(r) and the
transformation function T(r), they are known and T-1(s) is single valued and monotonically
increasing. T-1(s) is single valued and monotonically increasing. Then we can obtain the PDF
dr
of s, that is ps s = p r r , where at r = T-1(s).
ds
So this is what is obtained from elementary probability theory that if we know pr(r) and we
also know T(r) and T-1(s) is single valued and monotonically increasing then ps(s) can be
dr
obtained from pr(r) as ps s = p r r . Now all the histogram processing techniques they
ds
try to modify the probability density function PDF ps(s) so that the image gets a particular
appearance and this appearance is obtained via the transformation function T(r).
689
(Refer Slide Time: 4:12)
So now what is that type of T(r), the transformation function T(r) that we can have. So let us
consider a particular transformations function. Say we take a transformation function of this
r
form say s = T r = pr(ω)dω , where the range of integration varies from 0 to r and r varies in
0
the range 0 to 1. So you find that this integral gives the cumulative distribution function of
the variable r. Ok?
Now if I take T(r) of this particular form, then this particular T(r) will satisfy all the
ds
conditions, both the conditions that we have stated earlier and from this we can compute ,
dr
which is nothing but pr(r). So by substitution in our earlier expression you find that ps(s) as
ds
we have said is nothing but p r r ; this we have said earlier, this is obtained from
dr
elementary probability theory.
pr r
And in this particular case this will be , which will be equal to 1. So you find that if we
pr r
take this particular transformation function which is nothing but cumulative distribution
function of the variable r. Then using this transformation function, the transformation that we
get, this generates an image which has a uniform probability density function of the intensity
values s.
690
And we have seen earlier, that an image high contrast image have a probability distribution
function or has a histogram which has intensity values, pixels having intensity values over the
entire range 0 to 255 of the pixel values. So if I go for this kind of transformation, as we are
getting a uniform Probability Distribution Function, Probability Distribution function of the
processed image.
then this is expected, then this is what is going to enhance the contrast of the image. And this
particular result is very, very important. That ps(s) =1 and you find that we have obtained this
result irrespective of T-1(s). And that is very, very important because it is, it may not always
be possible to obtain T-1 analytically. So whatever be the nature of T-1(s) if we take the
cumulative distribution function of r and use that as the transformation function T(r), then the
image is going to be enhanced.
So this simply says that using CDF, the Cumulative Distribution Function as the
transformation function. We can enhance the contrast of an image and by this contrast
enhancement what you mean is the dynamic range of the intensity values is going to be
increased. Now what we have discussed till now, this is valid for the continuous domain but
the images that we are going to consider. All the images are discrete images so we must have
a discrete formulation of whatever derivation that we have done till now.
So now let us see that how we can have a discrete formulation of these derivations. So for
nk
discrete formulation what we have seen earlier that pr(rk) is given by p r rk = , where nK
n
691
is the number of pixels having intensity value rk and n is the total number of pixels in the
image. And a plot of this pr(rk) for all values of rk gives us the histogram of the image.
So the technique to obtain the histogram equalization and by that the image enhancement will
be; first we have to do the find out the Cumulative Distribution Function, the CDF of rk. And
so, we will get sk which is sk = T(rk) and this T(rk) now is the Cumulative Distribution
ni
Function which is pr(ri) where i will vary from 0 to k and this is nothing but sum of , where
n
i will vary from 0 to k.
So let us see that what are results that we can get using such a kind of histogram equalization
operation. So here on the left hand side we have an image and it is obvious that the contrast
of this image is very, very poor. On the right hand side, we have shown the histogram of this
particular image and here again you find that from this histogram that most of the pixels in
this particular image have intensities which very close to zero and there are very few pixels in
this image which have intensities having higher values.
692
By this histogram equalization the image that you get is shown on the bottom and here you
find, that this image obviously has a contrast which is higher than the previous image because
many of the details in the image are not very clear in the original image whereas those details
are very clear in this second image. And on the right hand side, we have shown the histogram
of this processed image.
And if you compare these two histograms you will find that the histogram of this processed
image is more or less uniform equalization we can have such a kind of enhancement. This
shows another image again processed by histogram equalization, so on the top you will find
that the image of the, of a part of a car and because of this enhancement not only the image
appears to be better, but if you look at this number plate you will find that in the original
image, the numbers are not readable.
Whereas in this processed image, I can easily read this number say something like F N 0 9 6
8. So this is not readable in the original image but it is readable in the processed image and
the histogram that I get of this particular image which is almost, which is near to be uniform.
So this is one kind of histogram based processing technique that is histogram equalization
which gives enhancement of the contrast of the image.
Now this, though this enhancement, it gives contrast enhancement but histogram equalization
has got certain limitation. First of the limitation is using this histogram equalization, whatever
image you get, the equalized image you get that is fixed. I cannot have any interactive
manipulation of the image. So it generates only a single processed image.
693
Now to overcome this limitation, if some of the applications, if some application demands
that we want to enhance only certain region of the histogram. We want to have the details
within certain region of the histogram, not what is given by the histogram equalization
process. Then, the kind of technique that should be used what is called histogram matching or
histogram specification techniques.
Initially we assume that we, again we have two variables. One is variable r representing the
continuous gray levels in the given image and we assume a variable z representing intensities
in the processed image, r as the intensities in the original image and z represents the
intensities in the processed image where this is specified in the form of the probability
distribution function pz(z).
So this pz(z) specifies our target histogram. And from the given image r we can obtain pr(r)
that is the histogram of the given image. This we can obtain from the input image, whereas
pz(z) that is target histogram is specified. Now for this histogram matching what we have to
694
do is, if I equalize the given image using the transformation function s = T(r) as we have seen
r
earlier is equal to p (ω)dω , within range 0 to r.
0
r
So if I equalize the image using this particular transformation function then what I get is an
image having intensity values with Probability Distribution Function, probability density
function which is uniform. Now using this pz(z), we compute the transformation function
z
G(z). So this G(z) will be obtained as G(z) = pz(t)dt in the range 0 to z.
0
And then, from these two equations, what we can have is G(z) = T(r) = s. Ok? And this gives
z = G-1(s), which is equal to G -1 T(r) . So we find that the operations that we are doing is
firstly we are equalizing the given image using histogram equalization techniques. We are
finding out the transformation function G(z) from the histogram from the target histogram
that has been specified.
Then this equalized image is inverse transformed using the inverse transformation G-1(s). Ok?
And the resultant image by doing this operation, the resultant image that we will get, that is
likely to have an histogram which is given by this target histogram pz(z). So our procedure is,
first equalize the original image obtaining the histogram from the given image.
Then find out the transformation function G(z) from the target histogram that has been
specified, then do the inverse transformation of the equalized image using not T-1 but using
the G-1; and this G-1 has to be obtained from the target histogram that has been specified. And
by doing this the image that you get becomes an histogram modified image, a processed
modified image processed image whose histogram is likely to be same as the histogram that
has been specified as the target histogram.
695
So again this is a continuous domain formulation but our images are digital so we have to go
for discrete formulation of these derivations. So now let us see that how we can discretize this
particular formulation. So again as before we can find out sk which is equal to T(rk) which is
equal to sum of n i by n, where n varies from sorry where r varies from, i varies from 0 to K.
And this we obtain from the given image, from the input image. And from the target
histogram that is specified that is pz(z), we get a transformation function say vk = G(zk),
which is equal to sum of pz(zi) where now i varies from 0 to k and we set this equal to k and
this has to be for k = 0, 1, up to L-1. And then finally we obtain the processed image as the
G -1 T(rk) .
696
So this is the discrete formulation of the continuous, discrete formulation of the continuous
domain derivation that we have done earlier. Now let us see that using this, what kind of
operations that we have. So here we it shows a transformation function T(r) is equal to T(r)
on the left hand side, which is obtained from the given image and using the target histogram
we obtain the function G(z).
So this function T(r) gives the value sk for a particular intensity of value rk in the given
image. The function G(z), it is supposed to give an output value Vk for an input value Zk.
Now coming to G(z), you find that Zk is the intensity value which is not known to us. We
want to find out Zk from rk. So the operation that we will be doing for this is, whatever sk that
we get from rk we set that sk to this second transformation and now you do the inverse
Transformation operation.
So as shown in the next slide, we set Sk along the vertical axis of this v = G(z) transformation
function. Then you do the inverse transformation that is from sk, you come to zk. So what we
have to apply is an inverse transformation function to get the value zk for a given intensity of
value rk in the original image. Ok? Now conceptually or graphically this is very simple but
the question is how to implement it.
Here you find that in the continuous domain, we may not get analytical solution for G-1 but in
the discrete domain, the problem becomes simpler because we are considering, we are
dealing with only discrete values. So in case of discrete domain, this transformation function
697
that is rk to sk or s = T (r) or zk to vk that is v = G(z). These transformations can be
implemented by simple look up tables.
So by this what I mean is something like this. T(r) is represented by an array where for rk, the
k indicates an index into an array and the element in that particular array location gives us the
value sk. so whenever a value rk is specified using k, immediately you will go to this
particular array r and the content of that array location gives us what is the corresponding
value sk.
Similarly, for v equal to G(zk), vk = G(zk), we have similar operation that if zk is known, I can
use k as an index, go to the array z then I get the corresponding value vk. now the first case it
is very simple. I know what is the value of rk so I can find out what is the corresponding
value of sk from this array. But the second one is an inverse operation. I know sk or as we
have equated sk to vk, I know what is vk?
698
(Refer Slide Time: 24:33)
Now from this vk, I have to find out what is the corresponding value zk. So this is an inverse
problem and to solve this problem, we have to go for an iterative solution approach. So
iterative solution, we can obtain in this form. We know that G(zk), is equal to sk. so this gives
G z k - s k = 0 . So our approach will be to iterate on the values of zk to get a solution on this
So what we should do? The closest solution can be that we initialize z to a value say z . So
we initialize zk = z for every value of k, where this z is the smallest integer z is the smallest
integer which satisfies G z k - s k 0 . So our approach can be that we start with a very small
value, the smallest integer of z then go on incrementing z by 1 at every step until and unless
this condition is satisfied.
So when this condition is satisfied, then the value of z we get, that is the zk corresponding to
this given value sk. So now let us stop our discussion today, we will continue with this topic
in our next class. Thank you.
699
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-36.
Histogram Implementation-I.
Hello, welcome to the video lecture series on digital image processing. For last few classes,
we have started our discussion on image enhancement techniques. So in the previous class,
we have seen what is meant by histogram? We have seen how the global appearance of an
image is reflected in its histogram? We have seen that histogram based enhancement
techniques aims at modifying the global appearance of an image by modifying its histogram.
Then we have started discussion on histogram equalization technique and histogram
specification or histogram matching techniques.
700
(Refer Slide Time: 1:12)
So today’s class, we will talk about some implementation issues of histogram equalization
and histogram specification techniques, and we will talk about this implementation issues
with respect to some examples, then we will also compare the performance of histogram
specification and histogram equalization techniques with the help of some results obtained on
some images. Then lastly we will talk about two more point processing techniques for
histogram equalization, one of them is histogram subtraction and other one is histogram
averaging techniques.
701
So now, let us briefly recapitulate what we have done in the last class. As we have said, that
histogram of an image that indicates, what is the global appearance of an image, we have also
seen these images in the last class, but just for a quick recapitulation you find that on the left
hand side we have shown an image which is very dark, and we call this as the dark image.
And on the right hand side we have shown the corresponding histogram and you find that this
histogram shows that most of the pixels in this particular image, they are having an intensity
value which is near about 0.
And there is practically no pixel having higher intensity values, and that is what gives this
particular image a dark appearance.
Then the second 1 that we have shown is a bright image or a light image, and again from this
particular histogram you find that most of the pixels in this particular image have intensity
values which are near to maximum values that is 255 in this particular case. And since we are
talking about all the images in our application which are quantized or every pixel is quantized
with 8 bits so the intensity levels will vary from 0 to 255.
So in our case, the minimum intensity of a pixel will be 0, and the maximum intensity of a
pixel will be 255. So in this particular example, you will find that the intensity of the images
as this histogram shows that most of the pixels have intensity which are near 255, that is
maximum value.
702
(Refer Slide Time: 3:55)
Then the next image shows, that here the image has got in pixels having intensity value in the
middle range, but the range of intensity values is very narrow. So as a result, the image is
neither very bright nor very dark, but at the same time because the dynamic range of the
intensity values is very low, the image contrast is very poor.
So as the next slide shows, which we call a high contrast image, where you find that most of
the details of the objects present in the image are visible, and by looking at the corresponding
histogram, we find that the pixels in this particular image have wide range of intensity values
starting from very low value which is near about 0 to the maximum value which is near about
703
255. So this says that we will say that image, a particular image has a high contrast if its
intensity values, the pixel intensity values have a wide range of values starting from a very
low value to a very high value.
So all this 4 examples, tell us that how the global appearance of an image is reflected in its
corresponding histogram, and that is why all the histogram based enhancement techniques
they try to adjust the global appearance of the image by modifying the histogram of the
corresponding image. So the first technique of this histogram based enhancement that we
have discussed in the last class is called histogram equalization. So let us quickly review what
we mean by histogram equalization?
So as this expression suggests, that this particular expression tells us what is the probability
of a pixel having a value rk present in the image. And the plot of all these pr(rk) values for
different values of rk, defines what is the histogram of this particular image. Now when we
talk about the histogram equalization, the histogram equalization technique makes use of this
histogram to find out the transformation function between a intensity level in the original
image, to an intensity level in the processed image.
704
And that transformation function is given by, sk is equal to say transformation function we
k
ni
represent by T(rk) which is given by n , where i varies from 0 to k, and which is nothing
i=0
but summation of pr(ri), where i varies from 0 to k. So this is the transformation function that
we get which is to be used for histogram equalization purpose. Now we find, that in this
nk
particular case, because the histogram which is defined, that is p r rk = , it is a normalized
n
histogram. So every value of pr(rk) will be within the range 0 to 1, and similarly this
transformation function, that is T(rk), when it gives us a value sk corresponding to an intensity
level in the input image which is equal to rk, the maximum value of sk also in this particular
case will vary from 0 to 1.
So the minimum value of the intensity as suggested by this particular expression will be 0 and
the maximum value of the intensity will be equal to 1. But we know that when we are talking
about the digital images, the minimum intensity of an image can be a value 0 and the
maximum intensity can have a value rL-1 as k varies from 0 to L-1, and in our discrete case,
this rL-1 = 255, because in our case the intensity values of different images are quantized with
8 bits, so we can have an intensity varying from 0 to 255.
Whereas this particular transformation function, that is sk equal to T(rk) this gives us a
maximum intensity value sk in the processed image which is equal to 1. So for practical
implementation, we have to do some sort of post processing, so that all this sk values that you
705
get in the range of 0 to 1, can now be mapped to the maximum dynamic range of the image
that is from 0 to 255. And the kind of mapping function that we have to use is given by, say I
s - smin
can write it as, s' = Int x(L-1)+0.5 because we will be getting only integer values,
1 - smin
where L-1 is the maximum intensity level + you give a DC shift of 0.5.
So whatever value of s we get by this transformation sk = (rk), that value of s has to be scaled
by this function to give us an intensity level in the processed image which varies from 0 to
maximum level, that is 0 to L-1, and in our case this L-1 will be equal to a value 255.
Now let us take an example to illustrate this, suppose we have an input image having 8
discrete intensity values, that is r varies from 0, 1, 2 up to say 7, so we have 8 discrete
intensity values.
Similarly, the processed image that you want to generate that will also have 8 discrete
intensity values varying from 0 to 7. Now suppose the probability density functions or the
histogram of the input image is specified like this, so it given that pr(1) that is the probability
that an intensity value will be equal to 1, sorry pr (0) the probability that an intensity value
will be equal to 0 is equal to 0, pr(1) that is probability that intensity value will be equal to 1,
is same as pr(2) which is given as say 0.1, pr(3) that is given as 0.3, pr(4) = pr(5)= 0, pr(6) is
given as 0.4 and pr(7) is given as 0.1. Now, our aim is that given this histogram of the input
image, we want to find out the transformation function T(r), which will map such an input
image to the corresponding output image and the output image will be equalized.
706
(Refer Slide Time: 14:33)
So to do this, what we have to do is, we have to find out the mapping function T®. So this
mapping function, we can generate in this form, let us have all this values in the form of a
table, so I have this r, r varies from 0 to 7. The corresponding pr, the probability values are
given by 0, 0.1, 0.1, 0.3, 0, 0, 0.4 and 0.1. Then obviously, from this probability density
function, we can compute the transformation function T(r), which is nothing but summation
of, say pr(i) where i varies from 0 to r.
So if we compute the transformation function, you will find that the transformation function
comes out to be like this, this is 0, this is 0.1, here it is 0.2, here it is 0.5, because the next two
probability density function values as 0, so it will remain as 0.5, this will also remain as 0.5,
then this will be 0.9, and here I will get 1.0. So this is the transformation function that we
have. So this means that, if my input intensity is 0, this transformation function will give me a
value s, this is nothing but the value say, sk which will be equal to 0. If the intensity is 1,
input intensity is 1 the output sk will be equal to 0.1, if the input intensity is 2, output sk will
be equal to 0.2.
Similarly, if the input intensity is 6, the output intensity value will be point nine. But
naturally, because the output intensity is having to vary from 0 to the maximum value which
is equal to 7, so we have to scale this particular function, this particular s values to cover this
entire range of intensities. And for that, we use the same expression, the same mapping
s - smin
function as we have said that s' = Int 7 + 0.5 .
1 - smin
707
So doing this calculation and taking the nearest integer value whatever we get, that will be
my reconstructed intensity level. So if I do this, then you will find that for all this different
values of s, the reconstructed s' will be, for r = 0, the reconstructed s' = 0, for r = 1, the
reconstructed s' will also be equal to 1, for = 2, the reconstructed s' will also be equal to 2,
but for r = 3, so in this case r = 3, I get s = 0.5 and minimum s is 0. So this becomes 0.5 and
denominator is also equal to 1, so 0.5x7 that gives you 3.5 + 0.5 which is equal to 4.
So this first column, and the last column, this gives us the corresponding mapping between
the given intensity value to the corresponding output intensity value, and this is the image
which is the processed image of the enhanced image which is to be displayed. So this is how
the histogram equalization operations have to be done, and we have seen in the last class that
using such histogram equalization operations, we have got the results which are given like
this.
708
So here we have shown on image which is a very, very dark image, and on the right hand side
we have the corresponding histogram, and once we do histogram equalization, then what we
get is an equalized image or the processed image, and on the bottom row you find that we
have a brighter image, which is the histogram equalized image, and on the right hand side, we
have the corresponding histogram.
As we have menti1d in our last class that whenever we are going for histogram equalization,
then the probability density function of the intensity values of the equalized image, they are
ideally normal ideally uniform distribution. In this particular case you will find that this
histogram of this equalized image that we have got, this is not absolutely uniform. However,
this is near to uniform, so that theoretical derivation which shows us that the distribution
value, intensity distribution will be uniform that is a theoretical one. In practical cases, in
discrete situations in most of the cases we do not get a uniform probability distribution, the
uniform intensity distribution.
The reason being that in discrete cases, there may be situation that many of the allowed pixel
values will not be present in the image, and because of this the histogram that you get or the
intensity distribution that you get in most of the cases they will not be uniform. So this as
shows us the cases of histogram equalization. Thank you.
709
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-37.
Histogram Implementation-II.
Hello, welcome to the video lecture series on digital image processing. Now let us come to
the case of histogram specification or histogram modification as it is called. So, we will talk
about histogram specification. So as we have told in our last class that histogram equalization
is an automated process, so whatever output you get, whatever process image you get by
using the histogram equalization technique that is fixed. And histogram equalization
techniques is not suitable for interactive image manipulation, whereas interactive image
manipulation or interactive enhancement can be done by histogram specification techniques.
So in histogram specification techniques, what we do is, we have the input image, we can
find out what is the histogram of that particular image, then a target histogram is specified,
and you have to process the input image in such a way that the histogram of the processed
image will resemble, will be close to the target histogram which is specified. So here we have
nk
two different cases, firstly, we have pr(rk) which as we have seen is nothing but , where nk
n
is the number of pixels in the given image with an intensity value equal to rk, and this we
compute from the given image which is to be processed.
710
And we have a target histogram which is specified, which is to be so that our processed
image will have a histogram which is almost close to the target histogram, and the target
histogram is specified in the form pz(zk). So this is the target histogram which is specified,
you note that we do not have the image corresponding to this particular histogram, so it is the
histogram which is specified, okay. And we used the subscript, pr(rk) and pz(zk). So this
subscripts r and z they are used to indicate that this two probability distribution, probability
density functions pr(rk) and pz(zk), they are different.
So in case of histogram specification, what we have to do is, the process is done in this
manner, firstly using this pr(rk), you find out the transformation function corresponding to
equalization and that transformation function as we have seen earlier is given by sk = T(rk)
k
ni
which is nothing but n , where i varies from 0 to k, and which is obviously equal to pr(ri),
i=0
And to obtain this histogram specification, the process is like this, you define a variable, say
zk, such that which we will follow this property, we will have the transformation function say
vk = G(zk) okay, and that will be equal to we can compute this from the specified histogram
which is given in the form of pz(zi), where i varies from zero to k and we define this to be
equal to say sk. So you find that this intermediate stage that is vk = G(zk), where this
k
transformation function G z k = p z
i=0
z i , this is an hypothetical case, because we really
do not have the image corresponding to the specified histogram pz(zk), okay.
Now once I get this, to get the reconstructed image or the processed image, I have to take the
inverse transformation, so here you find that as we have define that for this particular zk, we
have vk = G(zk) which is equal to sk and G(zk) is computed in this form, and from here to get
the value zk, the intensity value zk in the processed image, we have to take the inverse
transformation but in this case the inverse transformation is not taken with respect to T, but
the inverse transformation has to be taken with respect to G.
So our zk, in this case will be zk = G-1(sk). So what we have to do is, for the given image we
have to find out the transformation function T(r), corresponding to the histogram of the given
image, okay. And using this transformation function, every intensity value of the input image,
which is rk has to be matched to and equalized intensity value, which is equal to sk, so that is
711
the first step. The second step is, from the specified histogram pz, we have to get a
transformation function G, and then the sk that we have obtained in the previous step, that has
to be inverse transformed using this transformation function G to give you the processed
image intensity value in the processed image which is equal to zk.
Now, so far, we have concerned we have discussed that finding out T(rk) is very simple that
is this forward process is very simple, but the difficulty comes for getting the inverse
transformation, that is G-1. It may not always be possible to get analytical expressions for T
and G, and similarly, it may not always be possible to get an analytical expression for G-1. So
the best possible way to solve this inverse process is to go for, is to go for an iterative
approach.
Now let us see that, what does this entire expression mean? So here we have shown the same
formulations graphically, on the left hand side we have the transformation function S equal to
T(r) and this transformation function is obtained from the histogram of the given image, and
on the right hand side we have given the transformation function V = G(z) and this
transformation function has to be obtained from the target histogram that is specified.
Now once we have these two transformation functions, you find that these two transformation
functions, it may tell you that given an rk, I can find out what is the value of sk, given a zk, I
can find out what is the corresponding value vk. But the problem is zk is unknown, we do not
know what is the value of zk, this is the one that we have to find out by using the inverse
transform G-1. So the process as per our definitions since we have seen that vk = sk, so what
712
we do is for the given rk, we find out the corresponding value sk by using this transformation
s = T(r) and once we get this, then we set this vk = sk that is from the first transformation
function we come to the second transformation function.
So we set vk = sk, and then find out zk in the reverse direction, so now our direction of the
transformation is reverse and we find out zk from this value of sk, using this transformation
curve G(z). But as we have mentioned that it may not always be possible to find out the
analytical expressions for r and G. So though the method appears to be very simple, but its
implementation is not that simple. But in the discrete domain, the matter, this particular case,
can be simplified. It can be simplified in the sense that both this transformation functions,
that is s = T(r) and V = G(z), they can actually be implemented in the form arrays.
So the arrays are like this, so I have an array r, as shown in the top, where the intensity value
of the input image rk is to be taken as an index to this particular array. And the content of this
array element, that is s is the value, the corresponding value sk. Similarly, the second
transformation function G(z), that can also be implemented with the help of an array, where
zk is an index, is an index to this array, and the corresponding element in the array gives us
the value vk.
Now find, that using this arrays the forward transformation is very simple, when we want to
find out sk = T(rk), what we do is using this rk, you simply come to the corresponding element
in this particular array, find out what is the value stored in that particular location and that
value gives you the value of sk. But the matter is not so simple when we go for the inverse
713
transformation, so for inverse transformation, what you have to do is, we have to find out a
location in this second array, that is we have to find out the value of zk, where the element is
equal to sk.
So this is what we have to do, so we find that as the forward transformation that is sk = T(rk)
was very simple, but the reverse transformation is not that simple. So to do this reverse
transformation, what we have to do is, we have to go for an iterative procedure.
The iterative procedure is something like this, so we do the iterative procedure following this.
You find that as per our definition, we have said that G(zk) this is vk, vk = G(zk) = sk, so if this
is equal to true, then we must have G(zk) - sk = 0, the solution would have been very simple if
zk was known, but here we are trying to find out the value of zk.
So to find out the value of zk, we take we take help of an iterative procedure, so what we do
is, we initialize the value of zk to some value say z , and try to iterate on this z until and
unless you come to a condition, until and unless a condition like G( z ) - sk 0. So until and
unless this condition is satisfied, you go on iterating the value of z , incrementing the value of
z by one at every iteration. And for this what you have to do is, we have start with a
minimum value of z hat, then increment the value of z by steps of one in every iteration,
714
And the minimum value of z for which this condition is satisfied, that gives you the
corresponding value of zk. So this is a simple iterative procedure. So again as before, we can
illustrate this with the help of an example, here again we take pr we assume that both r and z,
they varies from, they assumes value from zero to 7 and we take the probability density
functions of pr(r), like this, that is pr(0) = 0, pr(1) = pr(2) = 0.1, pr(3) 0.3, pr(4) = pr(5) =0,
pr(6) = 0.4, and pr(7) = 0.1.
So this what we assume that it is obtain from the given image. And similarly, the target
histogram is given in the form pz(z) where the values are pz(0) = 0, pz(1) = 0.1, pz(2) = 0.2,
pz(3) = 0.4, pz(4) = 0.2, pz(5) = 0.1, pz(6) = pz(7) = 0. So this is the target histogram that has
been specified. Now our aim is to find out the transformation function or the mapping
function from r to z.
715
(Refer Slide Time: 16:10)
So for doing this, we follow the similar procedure as we have done in case of histogram
equalization, and in case of histogram equalization, we have found that for different values of
r, we get pr(r) like this, for r = 0, we have pr(r) = 0, for r = 1, pr(r) = 0.1, for r = 2 this is also
0.1, 3, this is 0.3, for 4, this is 0, for 5 this is also 0, for 6, this is 0.4, for 7, this is 0.1.
And from this, we can find out what is the corresponding value of s, and the corresponding
values of s will be given by 0, 0.1, 0.2, 0.5, 0.5, 0.5, then here you get, 0.9 and here you get
1.0. Similarly, from the target histogram we get for different values of z, 0, 1, 2, 3, 4, 5, 6, 7
the corresponding histogram is given by pz(z) which is 0, 0.1, 0.2, 0.4, 0.2, 0.1, 0, 0. And the
corresponding G(z) is given by, 0, 0.1, 0.3, 0.7, 0.9, then 1.0, 1.0, and 1.0.
Now if I follow the same procedure, that to map from r to r to z, first I had to map from r to s,
then I have to map from s to z. And for that, I have to find out the minimum value of z, for
which G(z ) - s k 0 . So for this when I come to value of s = 0, so here I put the
corresponding values of z, let me put it as z' . So when s = 0, the minimum value of z for
which G(z ) - s k 0 is 0. For s = 0.1, again I start with z = 0, I find that the minimum value of
Here, the minimum value of z, for which G(z ) - s k 0 is equal to 2. When I come here, that is
r = 3, the corresponding value of s = 0.5, again I do the same thing, you find that the
minimum value of z for which the condition will be equal to 2, that is equal to 3. When I
716
come to r = 4, you find that here the value of s = 0.5 and when I compute G(z) - s, the
minimum value of z, for which G(z ) - s k 0 , that is equal to 3, because for 3, G(z) -s = 0.7
so this is also be equal to 3.
And if I follow the similar procedure, I will find that the corresponding functions will be like
this. So for r = 1, the corresponding processed image will have an intensity value r = 0, the
corresponding processed image will have an intensity value equal to 0, for r = 1, it will be
equal to 1, r = 2 processed image will be equal to 2, r = 3, processed image will also be equal
to 3.
But r = 4 and 5, the processed image will have intensity values which is, which are equal to 3.
For r = 6, the processed image will have an intensity value which is equal to 4, r = 7, the
processed image will have an intensity value which is equal to 5. So, these two columns, the
first column of r, and the column of z' , this two columns gives us that mapping between an
intensity level and the corresponding processed image intensity level when I go for this
histogram equalization sorry, histogram matching.
So, you find that our iterative procedure will be something like this, that first you obtain the
histogram of the given image, then precompute a mapped level sk from. for each level rk
giving this. using this particular relation, then from the target histogram, you obtain the
mapping function G and for that, this is the corresponding expression, then precompute
values of zk for each value of sk using the iterative scheme. And once these two are over, I
717
have a precomputed transformation function, which is in the form of a table which maps an
input intensity value to the corresponding output intensity value, and once that is done, then
for final enhancement of the images.
I take the input intensity value and map to the corresponding output intensity value using the
mapping functions, so that will be our final step, and if this is done for each and every pixel
location in the input image, the final output image that will be an enhanced image, where the
enhanced image whose intensity levels will have a distribution which is close to the
distribution that is specified.
So using this histograms equalization technique, you find that what are the results that you
get. Again, I take the same image, that is dolphin image, on the top is the original image, and
on the right hand side I have the corresponding histogram.
On the bottom, I have an equal histogram matched imaged, and on the right hand side, I have
the corresponding histogram. So you find that, these two histograms are quite different and I
will come a bit later that what is the corresponding target histogram that you have taken. Now
to compare the result of histogram equalization with histogram specification, you find that on
the top I have shown the same histogram equalized image as we have done, that we have
shown earlier. And on the bottom row, we have shown this histogram matched image.
And at this point, this histogram specification, target histogram which was specified was the
histogram which is obtained using this equalization process. So this particular histogram was
718
our target histogram, so this histogram was the target histogram and using this target
histogram, when we did this histogram specification operation, then this is the processed
image that we get, and this is the corresponding histogram.
Now if you compare these two, the histogram equalized image, and the histogram matched
image, you find, you can note a number of differences. For example, you find that this
background is contrast of the background is much more than the contrast of the histogram
equalized image. And also, the details on this water front, the water surface is more
prominent in the histogram specified image, than in the case of histogram equalized image.
And similar such differences can be obtained by specifying other histograms.
So this is our histogram specification operation. So this shows another result with histogram
specification, on the top we have the dark image, on the left bottom we have the histogram
equalized image, on the right bottom we have the histogram specified image.
Here again you find, that the background in case of histogram equalized image is almost
washed out, but the background is highlighted in case of histogram specified image. So this is
the kind of difference that we can get between an histogram equalized image and an
histogram specified image. So this is what is meant by histogram specification and histogram
equalization. Now as we have said, that I will discuss about two more techniques, one is
image differencing technique and the other one is image averaging technique for image
enhancement.
719
(Refer Slide Time: 25:07)
Now as the name suggests, when I, whenever we go for image differencing, that means we
have to take the difference of pixel values between two images. So given two images, say one
image is say f(x,y) and the other image say h(x,y), so the difference of this two images is
given by, g(x,y) is equal to f(x,y) minus h(x,y). Now as this operation suggests, that g(x,y)
and g(x,y), all those pixel locations will be highlighted wherever there is a difference between
corresponding locations in f(x,y), and the corresponding location in h(x,y). Wherever f(x,y)
and g(x,y) are same, the correspond f(x,y) and h(x,y) are same, the corresponding pixel in
g(x,y) will have a value which is near to zero.
So this kind of operation, image differencing operation mainly highlights the difference
between two images or the locations where the two image contents are different. Such a kind
of image difference operations is very, very useful in, particularly in medical image
processing. So in case of medical image processing, there is a operation which is called say
mask mode radiography. In mask mode radiography what is done is, you take the image of
certain body part of a patient an X-ray image which is captured with the help of a TV camera,
where the camera is normally placed opposite to a X-ray source.
And then what is done is, you inject a contrast media into the blood stream of the patient, and
after injecting this contrast media, again you take a series of images using the same TV
camera of the same anatomical portion of the patient body. Now once you do this, the first
one, the image which is taken before injection of this contrast media that is called a mask and
that is why the name is mask mode radiography.
720
So if you take the difference of all the frames that you obtain after the injection of the
contrast media, take the difference of those images from the mask image, then you will find,
that all the regions were the contrast media that flows to the artery, those will be highlighted
in the difference image.
And this kind of application and this kind of processing is very, very useful to find out how
the contrast media flows through artery of the patient, and that is very, very helpful to find
out if there is any arterial disease of the patient, so for example if there is any blockage in the
artery or similar such diseases. So this mask mode radiography makes use of this difference
image operation to highlight the regions in the patient body or the arterial regions in the
patient body and which is useful to detect any arterial disease or arterial disorder.
Now to show a result, here this particular case, you find that on the left hand side whatever
shown that is the mask which is obtained, and on the right hand side this is the difference
image, this difference image is the difference of the images taken after injection of the
contrast media with the mask. And here you find that all these arteries through which the
contrast media is flowing those are clearly visible.
And because of this, it is very easy to find out if there is any disorder in the artery of the
patient. Now the other kind of image processing applications, as we said, so this is our
difference image processing. So as difference image processing can be used to enhance the
contents of certain regions within the within image wherever there is a difference between
two images.
721
(Refer Slide Time: 29:56)
Similarly, if we take the average of a number of images of the same scene, then it is possible
to reduce the noise of the image, and that noise reduction is possible because of the fact that
normally if I have a pure image, say f(x,y), then the image that will capture, if I call it as
g(x,y), that is the captured image, this captured image is normally the purest image f(x,y), and
on that we have a contaminated noise, say η(x,y) . Now if this noise η(x,y) is additive and 0
mean, then by averaging a large number of such noisy frames, it is possible to reduce the
noise.
Because, simply because, if I take the average of k number of frames of this, then g(x,y) ,
1 k
which is the average of k number of frames, is given by gi(x,y) , where i varies from 1 to
k i=1
k, and if I take the expectation value or the average value of this g(x,y) , then this average
value is nothing but f(x,y), and our condition is the noise must be zero mean additive noise
and because it is zero mean I assume that at a, at empty pixel location, the noise is
uncorrelated and the mean is zero.
722
(Refer Slide Time: 31:41)
So that is why if you take the average of a large number of frames, the image, the noise is
going to get cancelled out. And this kind of operation is very, very useful for astronomical
cases, because in case of astronomy, normally the objects which are imaged, the intensity of
the images is very, very low. So the image that you capture is likely to be dominated by the
presence of noise. So here, this is the image of a galaxy, on the top left, on the top right, we
have the corresponding noisy image, and on the bottom, we have the images which are
averaged over a number of frames.
So, the last one is an average, is an average image, by the average is taken over 128 number
of frames and here the number of frames is less, and as it is quite obvious from this that as
you increase the number of frames, the amount of noise that you have in the processed image
is less and less. So with this we come to an end to our discussion on point processing
techniques for image enhancement operations. Now let us discuss the questions
723
that we have placed in the last class.
The first one is what is an image histogram? So you find that few of the questions are very
obvious, so we are not going to discuss about them. Now the fourth one is very interesting
that suppose a digital image is subjected to histogram equalization. What is the effect, what
effect will a second pass equalization have over the equalized image? So as we have already
mentioned that once a image is histogram equalized, the histogram of the processed image
will be a uniform histogram, that means it will have an uniform probability density function.
And if I want to equalize this equalized image, then you find the corresponding
transformation function will be a linear one, where the straight line will be inclined at an
angle of 45 degrees with the x axis. So that clearly indicates that whatever kind of
equalization we do over an already equalized image that is not going to have any further
effect on the processed image. So this is ideal case, but practically we have seen that after
equalization the histogram that you get is not really uniform, so there will be some effect in
the second pass, the effect but the effect may be negligible.
Six one is again a tricky one, what condition must be satisfied by the target histogram to be
used in histogram specification technique? You find that in case of histogram specification
technique the target histogram is used for inverse transformation that is G-1. So it must be that
the it must be true that the transformation function G has to be monotonically increasing, and
that is only possible if you have the value of pz(z) non zero for every possible value of z. So
that is the condition that must be satisfied by the target histogram.
724
(Refer Slide Time: 34:37)
Now coming to today’s questions, first one is, explain why the discrete histogram
equalization technique does not, in general, yield a flat histogram. The second, an image has
gray level PDF pr(r) as shown here and the target histogram as shown on the right, we have to
find out the transformation in terms of r and z that is what is the mapping from r to z. The
third question we have given the probability density functions, 2 probability density
functions, again you have to find out the transformation between r and z. Thank you.
725
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-38.
Image Enhancement: Mask Processing Techniques-I.
Hello, welcome to the video lecture series on digital image processing. Now in today’s class
we will talk about another spatial domain technique which is called mask processing
technique. The previous lectures also were dealing with the spatial domain techniques, and
we have said that image enhancement techniques can broadly be categorized into spatial
domain techniques and frequency domain techniques, the frequency domain techniques we
will talk later on.
So in today’s class, we will talk about another special domain techniques, which are known
as mask processing techniques and under this, we will discuss three different types of
operations, the first one is the linear smoothing operation, the second one is a nonlinear
operation which is based on the statistical features of the image which is known as the
median filtering operation, and the third kind of mask processing technique that we will talk
about is the sharpening filter.
726
(Refer Slide Time: 1:37)
Now let us see, what this mask processing technique means. Now in our earlier discussions,
we have mentioned that while going for this contrast enhancement, what we basically do is
given an input image say f(x,y), we transform this input image by a transformation operator
say T, which gives us an output image g(x,y), and what will be the nature of this output
image g(x,y) that depends upon what is this transformation operator T?
In point processing technique, we have said that this transformation operation T, that operates
on a single pixel in the image, that is it operates on a single pixel intensity value. But as we
earlier said that T is an operator which operates on a neighborhood of the pixels at location
(x,y). So for point processing operation, the neighborhood size was 1 by 1. So if we consider,
a neighborhood of size more than one, that is we can consider a neighborhood of size, say
3x3, we may consider a neighborhood of size say, 5x5, we may consider a neighborhood of
size 7x7 and so on.
So if we consider a neighborhood of size more than one, then the kind of operation that we
are going to have, that is known as mask processing operation. So let us see, what does this
mask processing operation actually mean?
727
(Refer Slide Time: 3:18)
Here, we have shown a 3x3 neighborhood around a pixel location (x,y), so this outer
rectangle represents a particular image, and in the middle of this we have shown a 3x3
neighborhood and this 3x3 neighborhood is taken around a pixel at location (x,y).
By mask processing what we mean is, so if I consider a neighborhood of size 3x3, I also
consider a mask of size 3x3. So you find that here on the right hand side, we have shown a
mask, so this a given mask of size 3x3 and this different elements in the mask, that is w-1,-1,
w-1,0, w-1,1, w0,-1, and so on up to w1,1, these elements represent the coefficients of this mask.
So for all this mask processing techniques, what we do is we place this mask on this image,
where the mask center coincides with the pixel location (x,y).
Once you place this mask on this particular image, then you multiply every coefficient of this
mask by the corresponding pixel on the image, and then you take the sum of all this products,
so the sum of this all these products is given by this particular expression. And whatever sum
you get, that is placed at location (x,y) in the image g(x,y). So by mask processing operation
1 1
this is the mathematical expression we get that g(x,y)= wi,j f(x+i,y+j) .
i=-1 j=-1
So this is the operation that has to be done for a 3x3 neighborhood in which case we get a
mask again of size 3x3. Of course as we said that we can have masks of higher dimension we
can have a mask of 5x5, if I consider a 5x5 neighborhood, I have to consider a mask of size
7x7 if I consider a 7x7 neighborhood and so on. So if this particular operation is done, at
728
every pixel location (x,y) in the image, then the output image g(x,y) for various values of x
and y that we get is the processed image g.
So, this is what we mean by mask processing operation. Now the first of the mask processing
operation that we will consider is the image averaging or the image smoothing operation. So
image smoothing is a special filtering operation, where the value at a particular location (x,y)
in the processed image is the average of all the pixel values in the neighborhood of x and y.
So because it is average, this is also known as averaging filter and later on we will see that
this averaging filter is nothing but a low pass filter.
So when we have such an averaging filter, the corresponding mask can be represented in this
form. So again here, we are showing a mask, a 3x3 mask and here you find that all
coefficients in this 3x3 mask are equal to one. And by going back to our mathematical
1 1 1
expression, I get an expression of this form that is g(x,y) = f(x+i,y+j) g(x,y).
9 i=-1 j=-1
So naturally, as this expression says, you find that what we are doing, we are taking the
summation of all the pixels in the 3x3 neighborhood of the pixel location (x,y) and then
dividing this summation by 9. So which is nothing but average of all the pixel values in the
neighborhood of (x,y) in the 3x3 neighborhood of (x,y) including the pixel at location (x,y).
729
And this average is placed at location (x,y) in the processed image g(x,y). So this is what is
known as averaging filter and also this is called a smoothing filter.
And the filter, and the particular mask, for which all the filter coefficients or mask
coefficients are same or equal to one in this particular case, this is known as a box filter. So
this particular filtering operation that we are getting, this particular mask is known as a box
filter. Now when we perform this kind of operation, then naturally because we are going for
averaging of all the pixels in the neighborhood, so the output image is likely to be a smooth
image.
That means, it will have a blurring effect, all the sharp transitions in the images will be
removed and they will be replaced by a blurred image. As a result, if there is any sharp edge
in the image, the sharp images, the sharp edges will also be blurred. So to avoid the effect of
blurring, there is another kind of mask, averaging mask, or smoothing mask which performs
weighted average. So such a kind of mask is given by this particular mask.
So here you find that in this mask, the center coefficient is equal to 4, the coefficients
vertically up, vertically down or horizontally left horizontally right, they are equal to 2 and all
the diagonal elements of the center elements in this mask are equal to 1. So effectively what
we are doing is, when we are taking the average, we are weighting ever pixel in the
neighborhood by the corresponding coefficients, and what we get is a weighted average. So
the center pixel, that is the pixel at the location (x,y) gets the maximum weightage, and as
730
you move away from the pixel locations, from the center location, the weightage of the pixels
are reduced.
So when we apply this kind of mask, than our general expression of this mask operation that
1 1 1
is valid, which becomes g(x,y) = wi,j f(x+i,y+j) , and that will give the value which is
16 i=-1 j=-1
to placed at location (x,y) in the processed image g(x,y). So this becomes the expression of
g(x,y). Now the purpose of going for this kind of weighted averaging is that because here we
are weighting the different pixels in the image for taking the average, the blurring effect will
be reduced in this particular case.
So in case of box filter, the image will be very, very blurred and of course the blurring will be
more if I go for bigger and bigger neighborhood size or bigger and bigger mask; mask size.
When I go for averaging, weighted averaging, in such cases the blurring effect will be
reduced.
Now let us see what kind of result we get, so this gives the general expression that when we
consider wi,j, we have to have a normalization factor, that is this summation has to be divided
by sum of the coefficients, and as we said that 3x3 neighborhood on is on the special case I
can have neighborhoods of other sizes.
So here it shows, that you can have a neighborhood of size M x N, where M = 2a +1, and N =
2b+1, where a and b are some positive integers, and obviously here you show the here it is
731
shown that the mask is usually of odd dimension, it is not even dimension. And that is
normally the mask of odd dimension which is normally used in case of image processing.
Now using this kind of mask operation, here we have shown some results, you find that the
top left image is a noisy image, when you do the masking operation or averaging operation
on this noisy image, the right top image shows the averaging with a mask of size 3x3.
The left bottom image is obtained using a mask of size 5x5, and the right bottom image is
obtained using a mask of size 7x7. So from these images, it is quite obvious that I, as I
increase the size of the mask, the blurring effects becomes more and more. So you find that
the right bottom image which is obtained by a mask of size 7x7 is much more blurred
compared to the other two images. And this effect is more prominent if you look at the edge
regions of these images, say if I compare this particular region with the similar region in the
upper image or the similar regions in the original image.
You find that here, in the original image that is very, very sharp, whereas when I do the
smoothing using a 7x7 mask, it becomes very blurred. Whereas the blurring effect when I use
a 3x3 mask is much less. Similar such result is obtained with other images also.
732
(Refer Slide Time: 14:06)
So here, is another image, again we do the masking operation or the smoothing operation
which different mask sizes, on the top left we have an original noisy image, and the other
images are the smooth images using various mask sizes.
So on the right top, this is obtained using a mask size of 3x3, the left bottom is an image
obtained using a mask of size 5x5 and the right bottom is an image obtained using a mask of
size 7x7. So you find that as we increase the mask; mask size, then reduction is in noise or
the noise is reduced to a greater extent, but at the cost of addition of blurring effect. So
though the noise is reduced, but the image becomes very blurred. So that is the effect of using
the box filters or the smoothing filters that, though the noise will be removed, but the images
will be blurred or the sharp contrast in the image will be reduced.
So there is a second kind of image, second kind of masking operations which are based on
orders statistics which will reduce this kind of blurring effects.
733
(Refer Slide Time: 15:42)
So let us consider, one such filter based on order statistics. So this kind of filters, unlike in
case of the earlier filters, these filters are nonlinear filters. So here, in case of this order
statistics filters the response is based on the ordering of the intensities, ordering of the pixel
values in the neighborhood of the point under consideration. So what we do is, we take the
set of intensity values which is in the neighborhood, which are in the neighborhood of the
point (x,y), then order all those intensity values in a particular order and based on this
ordering, you select a value which will be put at location (x,y) in the processed image g(x,y).
And that is how the output image you get is a processed image, but here the processing is
done using the order statistics filter. Thank you.
734
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-39.
Image Enhancement: Mask Processing Techniques-II.
Hello, welcome to the video lecture series on digital image processing. A widely used filter
under this order statistic is what is known as a median filter. So in case of a median filter,
what we have to do is, I have an image, and what I do is around point (x,y), I take a 3x3
neighborhood and consider all the nine pixels, intensity values of all the nine pixels in this
3x3 neighborhood. Then I arrange this pixel values, the pixel intensity values in a certain
order and take the median of this pixel intensity values.
Now how do you define the median? We define the median say, ξ of a set of values, such that
half of the values in the set will be less than or equal to ξ, and the remaining half of the values
of the set will be greater than or equal to zero. So let us take a particular example, suppose I
take a 3x3 neighborhood around a pixel location (x,y) and the intensity values in this 3x3
neighborhood, let us assume that this is 100, this is say 85, this is say 98, this may have a
value 99, this may have a value say 105, this may have a value say 102, this may have a value
say 90, this may have a value say 101, this may have a value say 108. And suppose this
represents a part of my image say f(x,y).
735
Now what I do is, I take all this pixel values, all this intensity values and put them in
ascending order of magnitude. So if I put them in ascending order of magnitude, you find that
the minimum of these values is 85, the next value is say 90, the next one is 98, the next one is
99, the next one is 101, the next one is 102, the next one is 105, and the next one is 108. So
all these nine intensity values, I have put in ascending order of magnitude, there will be one
more, so there is one more value 100.
So these are the nine intensity values which are put in the ascending order of magnitude. So
once I put them into ascending order of magnitude, from this, I take the 5th maximum value,
which is equal to100. So if I take the 5th maximum value, you find that there will be equal
number of values which is greater than this 5th value, greater than or equal to this 5th number,
and there will be same number of values which will be less than or equal to this 5th number.
So I consider, this particular pixel value of 100, and when I generate the image g(x,y), in
g(x,y) at location (x,y), I put this value 100, which is the median of the pixel values within
this neighborhood. So this gives my processed image g(x,y), of course the intensity is in other
locations, in other pixel regions will be decided by the median value of the neighborhood of
the corresponding pixels, that is if I want to find out what is the, what will be the pixel value
at this location, then the neighborhood that I have to consider will be this particular
neighborhood.
So this is how I can get the median filtered output and as you can see that this kind of
filtering operation is based on statistics. Now let us see that, what kind of result that we can
736
have using this median filter. So here you find that it is again on the same building image, the
left top is our original noised image, on the right hand side, it is the smooth image using box
filter, and on the bottom we have the image using this median filter.
So here again, as you see that the image obtained using the processed image, obtained using
the median filtered operation, maintains the sharpness of the image to a greater extent than
that obtained using the smoothing images.
Coming to the second one, again this is one of the image that we have shown earlier, a noisy
image having four coins. Here again you, you find that after doing the smoothing operation,
the edges becomes blurred, and at the same time the noise, noises are not reduced to a great
extent till the, this particular image is noisy.
So if I want to remove all this noise, what I have to do is, I have to smooth these images
using higher neighborhood size and the moment I go for the larger neighborhood size, the
blurring effect will be more and more. On the right hand side, the image that we have, so this
particular image is also the processed image, but here the filtering operation which is done is
median filtering. So here you find that because of the median filtering operation, the output
image that we get, the noise in this output image is almost vanished, but at the same time the
contrast of the image or the sharpness of the image remains more or less intact.
737
So this is an advantage that you get if we go for median filtering rather than smooth
smoothing filtering or averaging filtering. To show the advantage of this median filtering, we
take another example, so this is the image of a butterfly, a noisy image of a butterfly on
bottom left the image that is shown, this is an averaged image or the averaging is done over a
neighborhood of size 5x5. On the bottom right is the image which is filtered by using median
filtering.
So this particular image clearly shows, this result clearly shows the superiority of the median
filtering over the smoothing operation or averaging operation. And such median filtering is
very, very useful for a particular kind of noise, for the noise is a random noise which are
known as salt and pepper noise because of the appearance of the noise in the image. So these
are the different filtering operations which reduces the noise in the particular image or the
filtering operations which introduced blurring or smoothing over the image.
738
(Refer Slide Time: 8:39)
We will now consider another kind of spatial filters which increases the sharpness of the
image. So the spatial filter that we will consider now is called sharpening spatial filter, so we
will consider sharpening spatial filter. So the objective of this sharpening spatial filter is to
highlight the details, the intensity details or variation details in an image. Now through our
earlier discussion we have seen that if I do averaging over an image, or smoothing over an
image, then the image becomes blurred, or the details in the image are removed.
Now when I go for the derivative operations, I can use two types of derivatives, I can use the
first order derivative, or I can also use the second order derivative. So I can either use the first
order derivative operation, or I can use the second order derivative operation to obtain the, or
to enhance the sharpness of the image. Now let see what are the desirable effects that these
derivative operations are going to give.
739
(Refer Slide Time: 10:44)
If I use a first order derivative operation or a first order derivative filter, then the desirable
effect of this first order derivative filter is, it must be zero, the response must be zero in areas
of constant gray level in the image.
And the response must be non-zero, at the onset of a gray level step or at the onset of a gray
level ramp. And it should be non-zero along ramps. Whereas if I use a second order
derivative filter, then the second order derivative filter response should be zero in the flat
areas, it should be non-zero at the onset and end of gray level step or gray level ramp. And it
should be zero along ramps of constant slope. So these are the desirable features or the
desirable responses of a first order derivative filter and the desirable response of a second
order derivative filter. Now whichever derivative filter I use, whether it is a first order
derivative filter, or a second order derivative filter, I have to look for discrete domain
formulation of those derivative operations.
740
(Refer Slide Time: 12:17)
So let us see, how we can formulate the derivate operations, the first order derivative or the
second order derivative in discrete domain. Now we know, that in continuous domain that
derivative is given by, let us consider a one-dimension case that if I have a function f(x)
which is a function of variable x, then I can have the derivative of this which is given by
df f(x+Δx)-f(x)
lim .
dx Δx 0 Δx
So this is the definition of derivative in continuous domain. Now when I come to discrete
domain, in case of our digital images, the digital images are represented by a discrete set of
points or pixels, which are represented at different grid locations, and the minimum distance
between two pixels is equal to 1. So in our case, we will consider the value of Δx =1 and this
f
derivative operation, in case of one dimension, now reduces to = f(x+1) - f(x) .
x
Now here I use the partial derivate notation because our image is a two dimensional image.
So when I take the derivative in two dimension we will have partial derivatives along x, and
we will have partial derivatives along y. So the first derivative, the first order derivative in for
one dimensional discrete signal is given by this particular expression. Similarly, the second
order derivative, of a discrete signal in one dimension can be approximated by
2f
f(x+1) + f(x+1) - 2f(x) .
x 2
741
So this is the first order derivatives, and this is the second order derivative. And you find that
these two derivations, these two definitions of the derivate operations, they satisfy the
desirable properties that we have discussed earlier.
Now to illustrate the response of these derivative operations, let us take an example, say this
is a one dimensional signal, where the values of the one dimensional signals for various
values of x are given in the form of array like this.
And the plot of these functional values, these discrete values are given on the top. Now if you
take the first order derivative of this as we have just defined, the first order derivative is given
in the second array, and the second order derivative is given in the third array. So if you look
at this functional values, the plot of this functional value, this represents various regions, say
for example, here this part is a flat region, this particular portion is a flat region, this is also a
flat region, this is also a flat region.
This is a ramp region, this represents an isolated point, this area represents a very thin line
and here we have a step kind of discontinuity. So now if you compare the response of the first
order derivative and the second order derivative of this particular discrete function, you find
that the first order derivative is non-zero during ramp, whereas the first order derivative is 0
along a ramp, the second; second order derivative is 0 along a ramp, the second order
derivative is non-zero at the onset and end of the ramp.
742
Similarly coming to this isolated point, if I compare the response of the first order derivative
and the response of the second order derivative, you find that the response of the second
order derivative for an isolated point is much stronger than the response of the first order
derivative. Similar is the case for a thin line, the response of the second order derivative is
greater than the response of the first order derivative. Coming to this step edge, the response
of the first order derivative and the response of the second order derivative is almost the
same, but the difference is in case of second order derivative, I have a transition from a
positive polarity to a negative polarity.
Now because of this transition from positive polarity to negative polarity, the second order
derivatives normally lead to double lines the moment in case of a step discontinuity in an
image. Whereas, the first order derivative that leads to a single line, of course this double line
getting this double line usefulness of this we will discuss later. Now but as we have seen, the
first order, the second order derivative gives a stronger response to isolated points or to thin
lines and because the details in an image normally has the property that it will be either
isolated points or thin lines to which the second order gives us, second order derivative gives
a stronger response.
So it is quite natural to think that the second order derivative based operator will be most
suitable for image enhancement operations. So our observation is, as we have discussed
previously that first order derivative generally produces a thicker edge because we have seen
that during a ramp or along a ramp the first order derivative is non-zero, whereas the second
743
order derivative along a ramp is 0 but it gives non zero values at the starting of the ramp and
the end of the ramp.
So that is why the first order derivatives generally produce a thicker edge in an image. The
second order derivative gives stronger response to fine details such as thin lines and isolated
points. The first order derivative has stronger response to gray level step. And the second
order derivative produce a double response at step edges. As we have already said that as the
details in the image are either in the form of isolated points or thin lines, so the second order
derivatives are better suited for image enhancement operations.
So we will mainly discuss about the second order derivatives for image enhancement but to
use this for image enhancement operation obviously because our images are digital and as we
have said many times that we have to have a discrete formulation of this second order
derivative operations. And, the filter that we design that should be isotropic, that means the
response of the second order derivative filter should be independent of the orientation of the
discontinuity in the image.
And the most widely used or the popularly known second order derivative operator of
isotropic nature is what is known as a Laplacian operator, so we will discuss about the
Laplacian operator. And as we know that the Laplacian of a function is given by
2f 2f
2f . So this is the Laplacian operator in continuous domain. But what we have
x 2 y 2
to have is the Laplacian operator in the discrete domain, and as we have already seen that
744
2f 2f
or in case of discrete domain is approximated as f(x+1) + f(x+1) - 2f(x) . So this
x 2 x 2
is in case of a one dimensional signal.
In our case, our function is a two dimensional function, that is a function of variables x and y.
2f
So for this, we can write for two dimensional signal , which will be simply
x 2
2f
f x+1, y +f x - 1, y - 2f x,y . Similarly, will be given by
y 2
f x, y+1 +f x, y-1 - 2f x, y , and if I add this two, I get the Laplacian operator in discrete
2f 2f
domain which is given by f 2 2 and you will find that this will be given as
2
x y
be represented again in the form of a two dimensional mask, that is for this Laplacian
operator, we can have a two dimensional mask.
And the two dimensional mask in this particular case will be given like this, so on the left
hand side the mask that is shown, this, this mask considers the Laplacian operation only in
the vertical direction and in the horizontal direction, and we if we also include the diagonal
directions, then the Laplacian mask is given on the right hand side. So you find that using this
particular mask which is shown on the left hand side, I can always derive the expression that
we have just shown.
745
Now here I can have two different types of mask depending upon polarity of the coefficient at
the center pixel, I can have the center pixel to have a polarity either negative or positive. So,
if the polarity is positive say of the center coefficient, then I can have a mask of this form, but
the center pixel will have a positive polarity but otherwise the nature of the mask remains the
same.
Now if I have this kind of operation, then you find that the image that you get, that will have
a that will just highlight the discontinuous regions in the image, whereas all the smooth
regions in the image will be suppressed. So this shows an original image, on the right hand
side we have the output of the Laplacian, and if you closely look at this particular image you
will find that all the discontinuous regions will have some value. However, this is this
particular image cannot be displayed properly, so we have to have some scaling operation
because output of the Laplacian will be, will have both positive values as well as the negative
values.
So for scaling what we have to do is, the minimum negative value, we have to add to that
particular image than we have to scale it down so that the image can be fit or can be
displayed properly on the given terminal. So this shows the scaled Laplacian image. And the
right hand side shows an enhanced image, now what is this enhancement? By enhancement
what we mean in this particular case? So as we have just said that this Laplacian operator
simply enhances the discontinuities in the image, whereas all the smooth regions in the image
will be suppressed.
746
But if we want to superimpose this discontinuities, this enhanced discontinuities of the image,
enhanced discontinuities on the original image, in that case what we will have is an image
which is an enhanced version of the original image. Now how we can obtain such
enhancement? This can be obtained simply by adding the scaled Laplacian output that you
get either subtracting that from the original image or adding that to the original image.
So what we can have is, we can have a function of this form g x,y = f x,y - 2 f x,y , so
this will be done when the center coefficient is negative or we have to perform this operation.
g x,y = f x,y + 2 f x,y , when the center coefficient of the Laplacian mask is negative.
So if I do this, then all the details in the image will be added to the original image so the
background smoothness will be maintained, however the image that we will get is an
enhanced image. So that is what we have got in this particular case that on the right bottom,
the image that we have got is an enhanced image.
747
And the composite mask after performing this operation, we get as given over here you find
that the center coefficient in both the cases has been incremented by one, so that indicates
that whatever the Laplacian operator that we have got, the Laplacian output will be added to
pixel location f(x,y). So we will stop our discussion today and we will continue with our
discussion on the same topic as well as our frequency domain techniques in the next class.
Thank you.
748
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-40.
Image Enhancement: Mask Processing Techniques-III.
Hello, welcome to the video lecture series on digital image processing. So today, we will talk
about some more mask processing techniques, like we will talk about unsharp masking, we
will talk about high boost filter, and we will also see that how the first order derivative
operators can help in enhancement of image content particularly at the discontinuities and
edge regions of an image. And then we will go to our today’s topic of discussion which we
say is the frequency domain techniques for image enhancement.
And here again we will talk about various types of filtering operations, like low pass filtering,
high pass filtering, then equivalent to high boost filtering and then finally we will talk about
homomorphic filtering and all this filtering operations will be in the frequency domain
operations.
749
(Refer Slide Time: 1:23
So let us first quickly see that what we have done in the last class. So in the last class we have
talked about the averaging filters or low pass filters and we have talked about two types of
spatial masks which are used for this averaging operation.
One we have said as box filter and we have said that in case of box filter all the coefficients
in the filter mask, they have the same value and in this case all the coefficients have value
equal to one. The other type of mask that we have used is for weighted average operation and
here it shows the corresponding mask which gives the weighted averaging and we have said
that if you use this weighted average mask instead of the box filter mask, then what
advantage we get is this weighted average mask tries to retain the sharpness of the image or
contrast of the image as much as possible.
750
(Refer Slide Time: 2:23
Whereas, if you simply use the box filter then the image gets blurred too much. Then these
are the different kinds of results that we have obtained, here the result is shown for an image
which is on the top left, on the top right the image is averaged by a 3 by 3 box filter, on the
bottom left, this is an averaging over 5x5 filter, and on the bottom right this is an image with
averaging over 7x7 filter. And as it is quite obvious from this results that as we take the
average or smooth out the image with the help of these box filters, the images get more and
more blurred.
751
Similar such results are also obtained and as has been shown in this particular case here also
you find that using the low pass filter the content, the noise in the image gets removed but at
the cost of the sharpness of the image, that is when we take the average over a larger mask, a
larger size mask, then the, it helps to reduce the noise, but at the same time a larger mask
introduced large about of blurring in the original image.
So there we have said, that instead of using simple box filter or the simple averaging filter, if
we go for order statistics go for filtering based on order statistics like median filter where the
pixel value at a particular location in the processed image will be the median of the pixels in
the neighborhood of the corresponding location in the original image. In that case, this kind
of filtering also reduces the noise, but at the same time, it tries to maintain the contrast of the
image.
So here, we have shown one such result on the top left is the original noisy image, on the top
right is the image which is obtained using the box filter and the bottom image is the image
which is obtained using the median filter. And here it is quite obvious that when we go for
the median filtering operation, the median filtering reduces the noise, but at the same it
maintains the sharpness of the image. Whereas, if we go for box filtering of higher
dimension, of higher size then the noise is reduced, but at the same time the image sharpness
is also reduced, that means the image gets blurred.
752
(Refer Slide Time: 4:58
This is another set of results where you find that if you compare the similar result that we
have shown earlier using the median filter, the noise is almost removed, but at the same time
the contrast of the image is also maintained. So this is the advantage of the median filter that
we get, that in addition to removal of noise, you can maintain the contrast of the image. But
this kind of median filtering as we have mentioned that this is very suitable form, a particular
kind of noise, removal of a particular kind of noise which we have said the salt and pepper
noise the name comes because of the appearance of these noises in the given image.
753
Then, this shows another median filter output result the bottom two images on the left side it
is the image obtained using the box filter, on the right hand side it is the image obtained using
the median filter. The enhancement using the median filter over the box filter is quite obvious
from this particular image.
Then we have said that for enhancement operation we use the second order derivatives and
the kind of mask that we have used for second order derivative is the Laplacian mask and for
the Laplacian mask this are the two different masks which we have used for the Laplacian
operation, we can also use another type of masks where center coefficients are positive.
754
You find in case earlier masks, the center coefficients are negative, whereas all the
neighboring coefficients are positive in the Laplacian mask, in this case the center coefficient
is positive, whereas all other neighboring coefficients are negative.
Now using this Laplacian mask, we can find out the high frequency, the detailed contents of
an image, as has been shown in this particular one, here you find that the original image when
it is a processed using the Laplacian mask the details of the image are obtained on the left
hand side bottom left, we have shown the details of the image.
On the bottom right, what we have done is, it is the same image which is displayed after
scaling so that the details are displayed properly on the screen. Now here what has been done
is, we have just shown the details of the image, but in many applications what is needed is, if
this detailed information is superimposed on the original image, then it is better for
visualization. So these detailed images are to be added to the original image so that we can
get an enhanced image.
755
(Refer Slide Time: 7:39
So the next one shows, that if we have this original image, this are that same detailed images
that we have shown earlier, on the right bottom you have the enhanced image, where the
detailed images are added to the original image. And for performing this operation, we can
have a composite mask, where the composite mask is given like this, here you find on the
center pixel we have the center coefficient of the mask is equal to five, whereas if you
remember, you recollect that in case of Laplacian mask the center pixel of the corresponding
mask was equal to four.
So if I change from four to five, that means f(x,y) value, the original image is going to be
added with the detail image to give, give us the enhanced images. So that is what is done by
using this composite mask.
756
(Refer Slide Time: 8:22
And this is the result that we obtained using the composite mask similar to the one that we
have shown earlier, you find that, on the top we have the original image and on the bottom
right we have the enhanced image, bottom left is an enhanced image when we use a mask
where only the horizontal and the vertical neighbors are non-zero values.
Whereas, the bottom right is obtained using the mask where we consider both the horizontal
vertical and diagonal coefficients to be non-zero values. And as it is quite clear from this
particular result that when you go for this kind of mask having both horizontal vertical and
the diagonal components as non-zero values that enhancement is much more. Now today we
will talk about some more special domain or mask operations. The first one that we will talk
about is called an unsharp masking.
757
(Refer Slide Time: 9:58
So by unsharp making we mean, you know that for many years the publishing companies
were using a kind of enhancement, where the enhancement in the image was obtained by
subtracting a blurred version of the image from the original image. So in such cases, the
sharpened image was obtain as fs(x,y), if I represent it by fs(x,y) as the sharpened image, then
this was obtained by subtracting f(x,y) and f (x,y) .
So this f (x,y) is nothing but a blurred version or blurred f(x,y). So if we subtract the blurred
image from the original image, what we get is the details in the image or we get a sharpened
image. So this fs(x,y) is the sharpened image and this kind of operation was known as
unsharp masking. Now we can slightly modify this particular equation to get an expression
for another kind of masking operation which is known as high boost filtering.
So high boost filtering is nothing but a modification of this unsharp masking operation, so we
obtain high boost filtering as, we can write it in this form fhb(x,y), which is nothing but A
Af(x,y) - f (x,y) , for A 1 . So we find that if I set the value of this constant A =1, then this
high boost filtering becomes same as unsharp masking. Now I can rewrite this particular
expression, I can rewrite this in the form f hb x,y =(A-1)f x,y + f x,y - f (x, y) .
Now this f x,y - f (x, y) , this is nothing but the sharpened image fs(x, y). So the expression
that I finally get for high boost filtering is f hb x,y = (A-1)f x,y + fs(x,y) . Now it does not
matter in which way we obtain the sharpened image. So if I used the Laplacian operator to
758
obtain this sharpened image, in that case the high boost filtered output fhb(x,y) simply
becomes Af x,y - 2 f x,y and this is the case when the center coefficient in the
Or I will have the same expression which is written in the form Af x,y + 2 f x,y , when
the center coefficient in the Laplacian mask is equal to positive. So as we have seen earlier,
that this first expression will be used if the center coefficient in the Laplacian mask is
negative, and this second expression will be used if the center coefficient in the Laplacian
mask is positive.
So using this we can get a similar type of mask, where the mask is given by this particular
expression. So using these masks we can go for high boost filtering operation and if I use this
high boost filtering, I get the high boost output as we have already seen earlier.
Now so far, the kind of derivative operators that we have used for sharpening operation all of
them are second order derivative operators. We have not used first order derivative operators
for filtering so far. But first order derivative operators are also capable of enhancing the
content of the image, particularly at discontinuities and at region boundaries or edges.
759
(Refer Slide Time: 15:05
Now the way we obtain the first order derivative of a particular image is like this, what you
use for obtaining the first order derivative is by using the gradient operator, where the
gradient operator is given like this, f , as the gradient is a vector, so we will write as a
f
x
vector, is nothing but f . So this is what gives the gradient of a function f, and what
f
y
we are concerned about for enhancement is the magnitude of the gradient, so magnitude of
the gradient we will write it as f , which is nothing but magnitude of the vector, f , which
1
f 2 f 2 2
is usually given as f f + . But you find that this particular
x y
expression if I use, this leads to some computational difficulty in the sense that we have to go
for squaring and then square root and getting an square root in the digital domain is not an
easy task.
760
(Refer Slide Time: 16:54
So this is what gives us the first order derivative operator on an image, and if I want to obtain
f f
, you find that this can simply be computed as
x x
[f(x+1,y-1)+f(x+1,y+1)+2f(x+1,y-1)]-[f(x-1,y-1)+f(x-1,y+1)+2f(x-1,y)] . So this is the first
order derivative along x direction and in the same manner we can also obtain the first order
derivative in the y direction.
Now once we have this kind of discrete formulation of the first order derivatives, so similarly
f
I can find out , which will also have a similar form. So once I have such discrete
y
formulations of the first order derivatives, we can have a mask which will compute the first
order derivative of an image.
761
(Refer Slide Time: 18:37
So for computing the first order derivative along x direction, the left hand side shows the
mask and for computing the first order derivative along y direction, the right hand side shows
the mask, and later on we will see that these operators are known as Sobel operators.
And using these first order derivatives, when we apply these first order derivatives on the
images, the kind of processed image that we get is like this. So you find that on the left hand
side, we have the original image and on the right hand side we have the processed image and
in this case you find that this processed image is an image which highlights the edge regions
or discontinuities regions in the original image. Now, in many practical applications, such
simple derivative operators are not sufficient, so in such cases what we may have to do is we
762
may have to go for combinations of various types of operators which gives us the enhanced
image.
So with this, we come to the end of our discussion on spatial domain processing techniques,
thank you.
763
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-41.
Frequency Domain Processing Techniques.
Hello, welcome to the video lecture series on digital image processing. Now we start
discussion on the frequency domain processing techniques. Now so far, you must have
noticed that these mask operations or the spatial domain operations using the masks whatever
we have done that is nothing but a convolution operation in two dimension.
So, what we have done is we have the original image f(x,y), we define a mask corresponding
to the type of operation that we want to perform on the original image f(x,y) and using this
mask, the kind of operation that is done, the mathematical expression of this is given on the
bottom, and if you analyze this, you will find that this is nothing but a convolution operation.
So using this convolution operation, we are going for spatial domain processing of the
images. Now we have seen, we have already seen during our earlier discussions that a
convolution operation in the spatial domain is equivalent to multiplication in the frequency
domain. Convolution in the spatial domain is equivalent to multiplication in the frequency
domain. Similarly, a convolution in the frequency domain is equivalent to multiplication in
the spatial domain.
764
(Refer Slide Time: 2:01)
So, what we have seen is that if we have a convolution of, say two functions f(x,y) and h(x,y)
in the spatial domain. The corresponding operation in the frequency domain is multiplication
of F(u,v) and H(u,v), where F(u,v) is the Fourier transform of this spatial domain function
f(x,y) and H(u,v) is the Fourier transform of the spatial domain function h(x,y). Similarly, if
we multiply two functions f(x,y) and h(x,y) in the spatial domain, the corresponding
operation in the frequency domain is the convolution operation of the Fourier transforms of
f(x,y) which is F(u,v) that has to be convolved with H(u,v).
So these are the convolution theorems that we have done in, during our previous discussions.
So to perform this convolution operation, the equivalent operation can also be done in the
frequency domain, if I take the Fourier transform of the image f(x,y) and I take the Fourier
transform of the spatial mask, that is h(x,y). So the Fourier transform of the spatial mask
h(x,y) as we have said that this is nothing but H(u,v) in this particular case.
So the equivalent filtering operations, we can do in the frequency domain by choosing the
proper filter H(u,v), then after taking the product of F(u,v) and H(u,v) if I take the inverse
Fourier transform, then I will get the processed image in the spatial domain. Now to analyze
this further, what we will do is, we will take the cases in one dimension and we will consider
the filters based on Gaussian functions for analysis purpose. The reasons we are choosing
these filters based on Gaussian function is that, the shapes of such functions can be easily
specified and easily analyzed.
765
(Refer Slide Time: 4:52)
Not only that, the forward transformation, the forward Fourier transformation and the inverse
Fourier transformation of Gaussian functions are also Gaussian. So if I take a Gaussian filter
in the frequency domain, I will write a Gaussian filter in the frequency domain as
2
H u = Ae
-u
2σ 2
, where σ is the standard deviation of the Gaussian function. And if I take the
inverse Fourier transform of this, then the corresponding filter in the spatial domain will be
given by h x 2 Ae2 x .
2 2 2
Now if you analyze these two functions, that is H(u) in the frequency domain and h(x) in the
spatial domain, you find that both this functions are Gaussian as well as real. And not only
that, both these functions they behave reciprocally with each other, that means, when H(u)
has a broad profile, this particular function H(u) in the frequency domain, it has a broad
profile, that is it has a large value of standard deviation sigma.
The corresponding h(x) in the spatial domain will have a narrow profile. Similarly, if H(u)
has a narrow profile, h(x) will have a broad profile. Particularly, when this sigma tends to
infinity, then this function H(u), this tends to be a flat function, and in such case the
corresponding spatial domain filter h(x) this tends to be an impulse function. So, this shows
that both H(u) and h(x) they are reciprocal to each other. Now let us say what will be the
nature of these functions, nature of such low pass filter functions.
766
(Refer Slide Time: 7:08)
So here, on the left hand side we have shown the frequency domain filter H(u) as a function
of u and on the right hand side we have shown the corresponding spatial domain filter h(x)
which is a function of x. Now from these filters, it is quite obvious that all the values once I
specify filter H(u) as a function of u in the frequency domain, the corresponding filter h(x) in
the spatial domain, they will have all positive values, that is none h(x) never becomes
positive negative for any value of x.
And the narrower the frequency domain filter, more it will attenuate the low pass frequency
components resulting in more blurring effect. And if I say make the frequency domain filter
narrower, that means the corresponding spatial domain filter or spatial domain mask will be
flatter, that means the mask size in the spatial domain will be larger.
767
(Refer Slide Time: 8:17)
So this slide shows two such masks that we have already discussed during our previous
discussion. So this is a mask where all the coefficients are positive and same, and in this
mask the coefficients are all positive, but the variation shows that it is having some sort of
Gaussian distribution in nature.
And we have already said that if the frequency domain filter becomes very narrow, it will
attenuate even the low frequency components leading to a blurring effect of the processed
image. Correspondingly in the high pass, correspondingly in the spatial domain, the mask
size will be larger and we have seen through are results that if I use a larger mask size for
smoothing operation, then the image gets more and more blurred. Now, in the same manner
as we have said the low pass filter, we can also make the high pass filters again in the
Gaussian domain.
768
(Refer Slide Time: 9:26)
So in this case, in case of Gaussian domain, using the Gaussian function a high pass filter
2
using the Gaussian function. If I take the inverse Fourier transform of this, the corresponding
spatial domain filter will be given by h x A(δ(x) - 2 e 2 x ) . So if I plot this in the
2 2 2
frequency domain, this shows the high pass filter in the frequency domain, so as it is quite
obvious from this plot that it will attenuate the low frequency components, whereas it will
pass the high frequency components.
769
And the corresponding filter in the spatial domain is having this form which is given by h(x)
as a function of x. Now as you note, from this particular figure, from this particular function
h(x) that, h(x) can assume both positive as well as negative values. And an important point to
note over here is, once h(x) becomes negative, it will remain negative it does not become
positive any more. And in the spatial domain, the Laplacian operator that we have used
earlier, the Laplacian operator was of similar nature.
So the Laplacian mask that we have used, we have seen that the center pixel is having a
positive value, whereas all the neighboring pixels have the negative values, and this is true
for both the Laplacian masks if I consider only the vertical and horizontal components or
whether along with vertical and horizontal components, I also consider the diagonal
components. So these are the two Laplacian masks, where the center coefficient is positive
and the neighboring coefficients once they become negative, they will remain negative.
So this shows that using the Laplacian mask in the spatial domain, the kind of operation that
we have done is basically a high pass filtering operation. So now first of all we will consider
the smoothing frequency domain filters or low pass filters in the frequency domain. Now as
we have already discussed, that edges as we as sharp transitions like noises, they lead to high
frequency components in the image. And if we want to reduce these high frequency
components, then the kind of filter that we have used is a low pass filter.
770
(Refer Slide Time: 13:08)
Where the low pass filter will allow the low frequency components of the input image to be
passed to the output and it will cut off the high frequency components of the input image
which will not be passed to the output. So our basic model for this filtering operation will be
like this, that we will have the output in the frequency domain, which is given by
G u,v = H u,v .F u,v , where this F(u,v) is the Fourier transform of the input image and
we have to select a proper filter function H(u,v) which will attenuate the high frequency
components and it will let the low frequency components to be passed to the output.
Now here we will consider an ideal low pass filter, where we will assume the ideal low pass
filter to be like this, that H(u,v) = 1, where D(u,v) is the distance of the point (u,v) in the
frequency domain from the origin of the frequency rectangle. So if D u,v D 0 , then
H u,v = 1 and this will be equal to 0 if the distance from the origin of the point
D u,v > D 0 .
So this clearly means, that if I multiply F(u,v) with such an H(u,v), than all the frequency
components lying within a circle of radius D0 will be passed to the output and all the
frequency components lying outside this circle of radius D0 will not be allowed to be passed
to the output.
Now if the Fourier transform F(u,v) is centered, is the centered Fourier transform, that means
the origin of the Fourier transform rectangle is set at the middle of the rectangle, then this
D(u,v) the distance value is simply computed as, where we are assuming that we have an
771
image of size MxN. So for an MxN image size, D(u,v) will be computer like this, if the
Fourier transform F(u,v) is the centered Fourier transformation.
A plot of this kind of function is like this, so here you find that in the left hand side shows the
prospective plot of such an ideal filter, whereas on the right hand side we just show the cross
section of such an ideal filter. And in such cases, we define a cut off frequency of the filter to
be the point of transition between H(u,v)=1 and H(u,v)=0. So in this particular case, this point
of transition is the value D0, so we consider D0 to be the cut off frequency of this particular
filter.
Now it may be noted that such a sharp cut off filter is not realizable using the electronic
components. However, using software, using computer program it is different, because we
are just letting some values to be passed to the output and we are making the other values to
be zero. So this kind of ideal low pass filter can be implemented using software, whereas
using electronic components, we may not be, we are not able to implement such ideal low
pass filters.
772
(Refer Slide Time: 17:05)
1
D0
The response of or the plot of such a Butterworth filter is shown here, so here we have shown
the Butterworth Butterworth filter, the prospective plot of the Butterworth filter, and on the
right hand side we have shown the cross section of this Butterworth filter. Now if I apply the
773
ideal low pass filter and the Butterworth filter on an image, let us see what will be the kind of
the output image that we will get. So in all these cases, we assume that first we take the
Fourier transform of the image, then multiply that Fourier transformation with the frequency
response of the filters, then whatever the product that we get, we take the inverse Fourier
transformation of this to obtain our processed image in the spatial domain.
So here we use two images for test purpose, on the left hand side we have shown an image
without any noise, and on the right hand side we have shown an image where we have added
some amount of noise. Then if I process that image, using the ideal low pass filter, and using
the Butterworth filter, the top row shows the results with ideal low pass filter when the image
774
is without noise and the bottom row shows the result by applying the Butterworth filter again
when there is no noise contamination with the image.
Here you find, as the top row shows that if I use the ideal low pass filter, for the same cut off
frequency, say 10, the blurring of the image is very high, compared to the blurring which is
introduced by the Butterworth filter. If I increase the cut off frequency, when I go for cut off
frequency of twenty, in that case you find that in the original image, in the ideal low pass
filtered image, the image is very sharp but the disadvantage is that, if you simply look at
these locations, say along these locations you find that there is some ringing effect. That
means there are a number of lines, undesired lines which are not present in the original
image.
Same is the case over here, so the Butterworth filter, Butterworth low pass filter, it introduces
the ringing effect, the ringing effect which are not visible in case of Butterworth filter. Now
the reason why the ideal low pass filters introduces the ringing effect is that, we have seen
that for an ideal low pass filter, in the frequency domain, the ideal low pass filter response
was something like this, so if I plot u versus H(u) this was the response of the ideal low pass
filter. Now if I take the inverse Fourier transform of this, corresponding h(x) will have a
function of this form. Like this.
So here, you find that there is a main component which is the central component and there are
other secondary components. Now the spread of this main component is inversely
proportional to D0 which is the cut off frequency of the, of the ideal filter, ideal low pass
775
filter. So as I reduce D0, this spread is going to increase and that is what is responsible for
more and more blurring effect of the smoothed image. Whereas all the secondary
components, the number of these components again over a unit length is, again an inverse
function, inversely proportional to this cut off frequency D0.
And these are the ones which are responsible for ringing effect. When I use Butterworth
filter, the outputs that we have shown here using the Butterworth filters, these outputs are
obtained using Butterworth filter of order one, that is value of n = 1. So Butterworth filter of
order 1 does not lead to any kind of ringing effect. Whereas, if go for butter Butterworth filter
of higher order that may leads to the ringing effect. In the same manner we can also go for
Gaussian low pass filter.
And we have already said, that for a Gaussian low pass filter, the filter response
-D2 (u,v)
H u,v = e 2σ 2
, and if I allow σ to be equal to the cut frequency say D0, then this H(u,v),
-D2 (u,v)
for filtering operation and as we have already said the inverse Fourier transform of this is also
Gaussian in nature, so using the Gaussian filters, we will never have any ringing effect in the
processed image.
So this is the kind of the low pass filtering operation or smoothing operations in the spatial
domain that we can have. We can also have the high frequency operation or sharpening filters
776
in the frequency domain. So as low pass filters give the smoothing effect, the sharpening
effect is given by the high pass filter. Again we can have the ideal high pass filter, we can
have the Butterworth high pass filter, we can also have the Gaussian high pass filter.
So just in the reverse way, we can define an ideal high pass filter as, for an high pass filter,
the ideal high pass filter will be simply H(u,v) = 0, if D u,v D 0 and this will be equal to 1,
if D u,v > D 0 . So this is the ideal high pass filter. Similarly, we can have Butterworth high
1
pass filter, where H(u,v) will be given by the expression H u,v = 2n
, and we
D0
1+
D u,v
-D2 (u,v)
can also have the Gaussian high pass filter, which is given by H u,v = e 2Do2
.
777
(Refer Slide Time: 25:53)
And you find that in all this cases, the response, the frequency response of an high pass filter,
if I write it, write it as a H hp (u,v)= 1 – H lp (u,v) , so the high pass filter response can be
obtained by the low pass filter response where the cut off frequencies are same. Now using
such high pass filters, the kind of results that we can obtain is given here, so this is the ideal
high pass filter response, where the left hand side gives you the prospective plot and the right
hand side gives you the cross section.
778
779
This shows the Butterworth filter prospective plot as well as cross section of a Butterworth
filter of order one. And if I apply such high pass filters to the image, to the same image then
the results that we obtain is something like this. So here on the left hand side, this is the
response of an ideal high pass filter, on the right hand side we have shown the response of a
Butterworth high pass filter and in both this cases, the cut off frequency was taken to be equal
to 10. This one, where the cut off frequency was taken to equal to fifty and if you closely
look at the ideal filter output, here again you find that you can obtain, you can find that there
are ringing effects around these boundaries.
Whereas in case of Butterworth filter, there is no ringing effect, and again we said that this is
a Butterworth filter of order one if I go for higher order Butterworth filters that also may lead
to ringing effects. Whereas, if I go for a high pass filter which is a Gaussian high pass filter,
the Gaussian high pass filter does not leads to any ringing effect. So using these high pass
filters I can go for smoothing operation using the low pass filters I can go for the smoothing
operation and using the high pass filters I can go for image sharpening operation.
The same operation can also be done using the Laplacian in the frequency domain. It is
simply because if I take the Laplacian of a function, if for a function f(x,y), I get the
corresponding frequency domain say F(u,v) the corresponding Fourier transform, then the
Laplacian operator, if I perform 2 f x,y and take the Fourier transform of this, this will be
nothing but, it can be shown, it will be equal to - u 2 +v 2 .F u,v . So, using this operation if I
780
consider say H u,v = - (u 2 +v 2 ) and using this as a filter, I filter this F(u,v) and after that I
compute the inverse Fourier transformation then the output that we get is nothing but a
Laplacian operated output which will obviously be an enhanced output.
Another kind of filtering that we have already done during a in connection with our spatial
domain operation that is high boost filtering. So there we have said that in spatial domain, the
high boost filtering operation, the high boost filtering output f(x,y) if I represented, represent
this as f hb x,y = Af x,y – f lp x,y and which is can be represented as, (A-1).f(x,y) + high
pass filtered output fhp(x,y). In the frequency domain, the corresponding operation the
corresponding filter can be represented by H hb u,v = A – 1 + H hp u,v . So this is what is
781
So if I apply this high boost filter to an image, the kind of result that we get is something like
this, where again on the left hand side is the original image, and on the right hand side it is
the high boost filtered image.
Now let us consider another very, very interesting filter, which we call as homomorphic
filter, Homomorphic filter. The idea aims from our one of the earlier discussions where we
have said that the intensity at a particular point in the image, is a product of two terms, one is
the illumination term, other one is the reflectance term. That is f(x,y) we have earlier said that
it can be represented by an illumination term i(x,y) multiplied by r(x,y), where r(x,y) is the
reflectance term.
782
Now coming to the corresponding frequency domain, because this is a product of two terms,
one is the illumination, other one is the reflectance, taking the Fourier transform directly on
this product is not possible. So what we do is, we define a function, say z x,y = ln f x,y
and this is nothing but z x,y = ln i x,y +ln r x,y . And if I compute the Fourier
transform, then the Fourier transform of z(x,y) will be represented by Z(u,v) which will have
two components, Fi(u,v) + Fr(u,v).
Where this Fi(u,v) is the Fourier transform of ln i(x,y) and Fr(u,v) is the Fourier transform of
ln r(x,y). Now if I define a filter, say H(u,v) and apply this filter on this Z(u,v), then the
output that I get is, say S(u,v) = H(u,v).Z(u,v), which will be nothing but,
H u,v .Fi u,v + H u,v .Fr u,v . Now taking the invers Fourier transform, I get
s x,y = i' x,y + r' x,y and finally, I get g(x,y), which is nothing but es(x,y) which is
nothing but ei' x,y e r' x,y , which is nothing but io x,y .ro x,y .
So the first term, is the illumination component, and second term is the reflectance
component. Now because of this separation, it is possible to design a filter which can enhance
the high frequency components and it can attenuate the low frequency components. Now it is
generally the case that in an image the illumination component leads to low frequency
components, because illumination is slowly varying, whereas the reflectance components
leads to high frequency components particularly at the boundaries of two reflecting objects.
783
(Refer Slide Time: 36:11)
As a result, the reflectance term leads to high frequency components and illumination term
leads to low frequency components. So now if we define a filter like this, a filter response
like this, and here if I say that I will have, say γH >1, and γL < 1, this will amplify all the high
frequency components that is the contribution of the reflectance and it will attenuate the low
frequency components that is contribution due to the illumination.
Now using this time, type of filtering, the kind of result that we get is something like this,
here on the left hand side is the original image, and on the right hand side is the enhanced
image, and if you look in the boxes you find that many of the details in the boxes which are
784
not available in the original image, is now available in the enhanced image. So using such
homomorphic filtering, we can even go for this kind of enhancement where the illumination,
the contribution due to illumination will be reduced. So even in the dark areas we can take
out the details.
So with this, we come to an end to our discussion on image enhancements. Now let us go to
some questions of our today’s lecture. The first question is a digital image contains an
unwanted region of size 7 pixels. What should be the smoothing mask size to remove this
region? Why Laplacian operator is normally used for image sharpening operation? 3rd
question, what is unsharp masking? 4th question, give a 3x3 mask for performing unsharp
masking in a single pass through an image. 5th, state some applications of first derivative in
image processing.
785
(Refer Slide Time: 38:14)
Then, what is ringing? Why ideal low pass and high pass filters lead to ringing effects? How
does blurring vary with cutoff frequency? Does Gaussian filter lead to ringing effect? Give
the transfer function of a high boost filter. And what is the principle of homomorphic filter?
Thank you.
786
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-42.
Image Restoration Techniques-I.
Hello, welcome to the video lecture series on digital image processing. Now in today’s
lecture, or in a number of lectures starting from today, we will talk about image restoration
techniques. So we will talk about image restoration techniques, and we will see what is the
difference between image enhancement and image restoration, we will talk about image
formation process and the degradation model involved in it, and we will see the degradation
model and the degradation operation in continuous functions and how it can be formulated in
the discrete domain.
Now when we have talked about the image enhancement, particularly using a low pass filter
or using smoothings masks in the spatial domain, we have seen that one of the effect of using
a low pass filter or the effect of using a smoothing mask in the spatial domain is that the noise
content of the image gets reduced. The simple reason is the noise content leads to high
frequency components in the displayed image. So if I can remove or reduce the high
frequency components that also leads to reduction of the noise.
Now this type of reduction of the noise is also a sort of restoration. But these are not usually
termed as restoration, rather a process which tries to recover or which tries to restore an
image which has been degraded by some knowledge of a degradation method which has
787
degraded the image. This is an operation which is known as image restoration. So in case of
image restoration, the image degradation model is very, very important. So we have to find
out what is the phenomenon or what is the model which has degraded the image and once
that model the degradation model is known, then we have to apply the inverse process to
recover or restore the desired image.
So this is the difference between an image enhancement or simple noise filtering in terms of
image enhancement and image restoration, that is in case of image enhancement or simple
noise filtering, we do not make use of any of the degradation model or we do not bother
about what is the process which is degrading the image. Whereas in case of image
restoration, we will talk about the degradation model, we will try to estimate the model that
has degraded the image and using that model we apply the inverse process and try to restore
the image.
So the degradation modeling is very, very important in case of image restoration. And when
we try to restore an image, in most of the cases we define some goodness criteria. So using
this goodness criteria, we can find out an optimally restored image which more or less which
is almost same as the original image. And we will see later that image restoration operations
can be applied as in case of image enhancement both in the frequency domain as well as in
the spatial domain.
(Refer Slide Time: 4:15)
So first of all, let us see that what is the image degradation model that we will consider in our
subsequent lectures. So let us see the image degradation model first, so here we assume that
788
our input image is image f(x,y) it is a two dimensional function as before and we assume that
this f(x,y) the input image f(x,y) is degraded by a degradation function H. so we will put it
like this, that we have a degradation function H which operates on the input image f(x,y).
Then the output of this degradation function is added to an additive noise, so here we add a
noise term which we represent by, say η(x,y) which is added to the degradation output, and
this finally gives us the output image g(x,y). So this g(x,y) is the degraded image which we
want to recover, so from this g(x,y) we want to recover the input image the original input
image f(x,y) using the image restoration techniques. So for recovering, this f(x,y) what we
have to do is, we have to perform some filtering operation and we will see later that these
filters, they are actually derived using the knowledge of the degradation function that is H.
And output of the filters is our restored image and let us put it as f x,y and we put it as
f x,y because in most of the cases, we are unable to restore the image exactly. That means
Rather, by using the goodness criteria that we have just mentioned, what we can do is, we can
get an approximation of the original image f(x,y). So that is this reconstructed image f x,y
which is an approximation of the original image f(x,y). So the blocks from here to here, that
is up to obtaining g(x,y) this is actually the process of degradation. So you find that in the
degradation, we first have a degradation function H which operates on the input image f(x,y),
then the output of this degradation function block that is added with an additive noise which
in this particular case we have represented as η(x,y) and this degradation function output
added to this additive noise, that is what is the degraded image that we actually observed.
And this degraded image is filtered by using the restoration filter, so this filters that we use
they are actually restoration filters. So this g(x,y) is passed through the restoration filters
where we get the filter output as the reconstructed image f x,y . And we as we have just
said that this f x,y is an approximation of the original image f(x,y). So this particular block
which represents an operation this is a restoration operation and as we have said that the
process we call as image restoration, in that the knowledge of the degradation model is very,
very essential.
789
So one of the fundamental task, one of the very important task in the restoration process is to
estimate the degradation model of, the degradation model which has degraded the input
image. And later on we will see various techniques of how to estimate the degradation model,
that is how to estimate the degradation function H. And we will see in a short while from now
that this particular operation, that is the conversion from f(x,y) to g(x,y) this can be
represented in spatial domain as g x,y = h x,y f x,y + η x,y .
So this is the operation which is done in the spatial domain and the corresponding operation
in frequency domain will be represented by G u,v = H u,v .F u,v + N u,v , where
H(u,v) is the Fourier transformation of h(x,y), F(u,v) is the Fourier transformation of the
input image f(x,y), N(u,v) is the Fourier transform of the additive noise η(x,y) and G(u,v) is
the Fourier transform of the degraded image g(x,y).
And this operation is the frequency domain operation and the equivalent operation in the
spatial domain it is the upper one and here you see that in the spatial domain we have
represented this operation as a convolution operation and we have said earlier that a
convolution in the spatial domain is equivalent to multiplication in the frequency domain. So
that is what this second term that is G u,v = H u,v .F u,v + N u,v . So here the
convolution in the spatial domain is replaced by the multiplication in the frequency domain.
So these two are very, very important expressions and we will use, make use of this
expressions subsequently more or less throughout our discussion on image restoration
process.
790
(Refer Slide Time: 11:23)
Now before we proceed further, let us try to recapitulate some of definitions. So first we will
look at some of the definitions that will be used throughout our discussion on image
restoration. So here what we have is, we have a degraded image g(x,y) which now let us
represent it like this, H f x, y + h x, y , where in this particular case, we assume that this
H is the degradation operator which operates on the input image f(x,y) and that when added
with the additive noise η(x,y) gives us the degraded image g(x,y). Now here if we assume or
for the time being, if we neglect the term η(x,y), or we set η(x,y) = 0 for the time being for
simplicity of our analysis, then what we get is g(x,y) = H [f(x,y)] and as we said that here this
H we assume that this is the degradation operator.
(Refer Slide Time: 13:28)
791
Now the first term that we will define in our case is what is known as linearity. So what do
we mean by the linearity? Or we say that this degradation operator H is a linear operator. So
for defining linearity, we know that if we have two functions, say f1(x,y) and f2(x,y), then we
say that if H k1f1 x,y + k 2 f 2 x,y = k1H f1 x,y + k 2 H f 2 x,y . So if for this two
functions f1(x,y) and f2(x,y) and for these two constants k1 and k2, this particular relation is
true, that is H k1f1 x,y + k 2 f 2 x,y = k1H f1 x,y + k 2 H f 2 x,y , if this relation is
And we know very well from our linear system theory that this is nothing but the famous
superposition theorem, so this is what is known as the superposition theorem and as per our
definition of a linear system, we know already that the superposition theorem must hold true,
if the system is a linear system. Now using this same equation, if I set say k1 = k2 = 1 then,
the same equation leads to H f1 x,y + f 2 x,y this is nothing but equal to
H f1 x,y + H f 2 x,y . Simply we have replaced k1 and k2 by 1, and this is what is
So the additivity property simply says that the response of the system to the sum of two
inputs is same as the sum of their individual responses. So here, we have two inputs f1(x,y)
and f2(x,y), so if I take the summation of f1(x,y) and f2(x,y), and then allow H to operate on it,
then whatever result we will get, that will be same as, when H operates on f1(x,y) and f2(x,y)
individually, and we take the sum of those individual responses.
792
(Refer Slide Time: 17:20)
And these two must be equal to true for a linear system and this is what is known as the
additivity property. So this is what is the additivity property in this particular case. Now here,
again if I assume that f2(x,y) = 0, so this gives H k1f1 x,y = k1H f1 x,y , and this is a
property which is known as homogeneity property. So these are the different properties of a
linear system, and the system is also called position invariant, if certain properties hold. The
system will be position invariant or location in variant, if H f x-α, y-β is same as
g x-α, y-β .
So in this case obviously, what we have assumed is g(x,y) = H[f(x,y)], so when this is true,
that g(x,y) = H[f(x,y)], then this particular operator H will be called to be position invariant,
if H f x-α, y-β g x-α, y-β and that should be true for any function f(x,y) and any value
of α,β. So this position invariant property, this simply says that the response at any point in
the image, the response of H at any point in the image should solely depend upon the value of
the pixel at that particular point, and the response will not depend upon the position of the
point in the image.
And that is what is given by this particular expression, that is H f x-α, y-β g x-α, y-β .
Now given this definition, let us see that what will be the degradation model, what will be the
degradation model in case of continuous functions.
(Refer Slide Time: 20:37)
793
So to look at the degradation model in case of continuous functions, we make use of an old
mathematical expression where we have seen that if I take a delta function say, δ(x,y) and the
definition of δ(x,y) we have seen earlier that δ(x,y) = 1, if x=0 and y=0, and this is equal to
0, otherwise. So this is the definition of a delta function that we have already used, and we
can use a shifted version of this delta function, that is δ(x - x0, y - y0) = 1, if x = x0 and y = y0,
and it will be 0 otherwise.
So this is the definition of a delta function. Now earlier we have seen that if we have an
image say, f(x,y) or a two dimensional function f(x,y), then f(x,y)δ x-x , y-y dxdy , then
0 0
the result of the integral will be simply equal to f(x0, y0). So this says that if I multiply a two
dimensional function f(x,y) with the delta function δ(x - x0, y - y0) and integrate the product
over the interval -∞ to ∞, then the result will be simply the value of the two dimensional
function f(x,y) at location (x0, y0).
(Refer Slide Time: 22:46)
794
So by slightly modifying this particular expression, we can have an equivalent expression
which is given by, I can formulate the two dimensional function f(x,y) as a similar integral
operation and in this case I will take f(x,y)= f(α,β)δ(x-α,y-β)dαdβ .
- -
Now, for the time being if we consider, say the noise term η(x,y) = 0 for simplicity, then we
can write the degraded image g(x,y), we have seen earlier that g(x,y) we have written as
H[f(x,y)] + η(x,y). So for the time being, we are assuming that this additive noise term
η(x,y)= 0 or it is negligible, then the degraded image g(x,y) can now be written in the form H,
I replace this f(x,y) by this integral term. So this will be simply
g(x,y) = H f(α,β)δ(x-α,y-β)dαdβ .
- -
So I can write, I can get an expression of the degraded image g(x,y) in terms of this integral
definition of the function f(x,y) which is operated by the degradation operator H. Now once I
795
get this kind of expression, now if I apply the linearity and additivity property of the linear
system, then this particular expression gets converted to g(x,y) = H[f(α,β)δ(x-α,y-β)dαdβ]
- -
And this is what we have obtained by applying the linearity and additivity property to this
earlier expression of this degraded image. Now here you find that this term f(α,β) this is
independent of the variables x and y. So because the term f(α,β) is independent of the
variables x and y, this same expression can now be rewritten in a slightly different form.
(Refer Slide Time: 26:31)
So that form gives us that g(x,y) can now be written as, g(x,y) = f(α,β)H[δ(x-α,y-β)]dαdβ.
- -
Now this particular term H[δ(x-α,y-β)] , we can write this as h(x, α, y, β) and this is nothing
but what is known as the impulse response of H. So this is what is known as the impulse
response. That is the response of the operator H when the input is an impulse, given in the
form δ(x-α,y-β) . And in case of optics, this impulse response is popularly known as point
spread function or PSF. So using this impulse response, now the same g(x,y), we can write as
g(x,y) = f(α,β)h(x,α,y,β)dαdβ.
- -
And this is what is popularly known as superposition integral of first kind, now this particular
expression is very, very important, it simply says that if the impulse response of the operator
796
H is known, then it possible to find out the response of this operator H to any arbitrary input
f(α,β). So that is what has been done here, that using the knowledge of this impulse response
h(x, α, y, β), we have been able to find out the response of this system to an input f(α,β).
And this impulse response is the one which uniquely or completely characterizes a particular
system, okay. So given any system, if we know what is impulse response of the system, then
we can find out what will be the response of that system to any other arbitrary function. Now
in addition to this, if the function H, this operator H is position invariant, so we use H to be
position invariant.
(Refer Slide Time: 30:06)
particular expression, you find that this expression is not, is nothing but the convolution
operation, this is nothing but the convolution operation of the two functions f(x,y) and h(x,y).
And that is what we said that when we have drawn our degradation model, we have said that
input image f(x,y) is actually convolved by the degradation process that is g(x,y). So this is
nothing but that convolution operation. And now if I take you find that earlier we have
797
considered this noise term η(x,y) = 0. So now if I consider this noise term η(x,y), then our
degradation function, or the degradation model becomes simply
g(x,y) = f(α,β)h(x-α,y-β)dαdβ +η(x,y) .
- -
So this is the general image degradation model, and you find that here we have assumed that
the degradation function H is linear and position invariant. And it is very important to note
that many of the degradation operations which are, which we encounter in reality can be
approximated by such linear space invariant or linear position invariant models. The
advantage is, once a degradation model is can be approximated by a linear position invariant
model.
Then the inter mathematical rule of linear system theory can be used to find out the solution
for such image restoration process, that means we can use all those tools of linear system
theory to estimate what will be the restored image f(x,y) from a given degraded image g(x,y)
provided we know we have some knowledge of the degradation function, that is h(x,y) and
we have some knowledge of what is the noise function η(x,y). Thank you.
798
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-43.
Image Restoration Techniques-II.
Hello, welcome to the video lecture series on digital image processing. Now this formulation
that we have done until now, this formulation is for the continuous case. And as we have said
many times that, in order to use this mathematical operation for our digital image processing
techniques, we have to find out a discrete formulation of this mathematical model. So let us
see that, how we can have an equivalent discrete formulation of this particular degradation
model.
So f(x) is the input function and h(x) is the degradation function. For discretization of the
same formulation, what we have to do is, we have to uniformly sample these two functions,
f(x) and h(x). And we assume that f(x) is uniformly sampled to give an array of dimension A
799
and h(x) is uniformly sampled to give an array of dimension B. that means, for f(x), in the
discrete case x varies from 0, 1, to A-1, and h(x), for h(x), x varies from 0, 1, to B-1.
Then what we will do, we will add additional zeros to this a, f(x) and b(x) to make both of
them of the same dimension and dimension equal to say M. So we make both of them to be of
dimension M by adding additional number of zeros and we assume that both f(x) and h(x)
after addition of these zero terms and making both of them to be of dimension M, they
become periodic with a periodicity M.
So once we have done this, now the same convolution operation that we have done in case of
our continuous case, now can also be written in case of discrete case. So in discrete case, the
convolution operation we will write in this manner, so after converting both f(x) and h(x) in 2
arrays of dimension M, these new arrays that we get we represent it by fe(x) that is f extended
x, as we have extended it, and h we represent by he(x) that is the extended version of h(x).
values from 0 to M-1. So this is the discrete formulation of the convolution equation that we
have obtained in case of continuous signal cases. Now if you analyze this convolution
expression, you find that this convolution expression can be written in the form of a matrix,
matrix operation, so we can have the matrix form. In matrix form, these equations will be like
800
fe 0
f e 1
this, g = Hf , where the function f or array f will be simply . and function g,
.
f M-1
e
ge 0
g e 1
similarly will be . .
.
g M-1
e
So you recollect, you just recollect that fe and ge, these are the names which are given to the
sampled versions of the functions f(x) and g(x) after extending the functions by addition of
addition, by adding additional number of zeros to make them of dimension of capital M.
And in this particular case, the matrix H will have the matrix H will be of dimension MxM,
he(0) he(-1) . . he(-M+1)
he(1) he(0) . . he(-M+2)
where the elements of H = . . . . . So this
. . . . .
he(M-1) he(M-2) . . he(0)
is the form of the matrix capital H, which is the degradation matrix in this particular case.
801
And here you find that the elements of this degradation matrix capital H, are actually
generated from the degradation function he(x). Now remember, that we have assumed that
our he(x), this function is actually periodic, this is which we have assumed with periodicity of
capital M.
by using this periodicity assumption, now this particular degradation matrix H can be written
in a different form where this matrix H will now be represented as
he(0) he(M-1) he(M-2) . he(1)
he(1) he(0) he(M-1) . he(2)
H = he(2) he(1) he(0) . he(3)
. . . . .
he(M-1) he(M-2) he(M-3) . he(0)
Now if you analyze this particular matrix, you find that this degradation matrix capital H has
a very, very interesting property. That means the first property is different rows of this matrix
are actually generated by rotation to the right of the previous row. So here if you look at, the
second row you find that this second row is actually generated by rotating the first row to the
right. Similarly, third row is generated by rotating the second row to right by one.
So this is so in this particular matrix, the different rows are actually generated by rotating the
previous row to the right. So this is called a circulant matrix because different rows are
generated by a circular rotation. And the circularity in this particular matrix is also complete
802
in the sense that if I rotate this last row to right what I get is the first row of the matrix. So
this kind of matrix is known as a circulant matrix. So here I find, we find that in case of
discrete formulation, the discrete formulation is also a convolution operation.
And here in the matrix equation of the degradation model the degradation matrix H that we
obtained that is actually a circulant matrix. Now let us extend the concept of this discrete
formulation from one dimension to two dimension. So let us see what we get in case of two
dimensional functions, that is in case of two dimensional images. So in case of two
dimension, we have the input function or the image function which is given by f(x,y) and we
have the degradation function which is given by h(x,y).
And we assume, that this f(x,y) is sampled to an array of dimension AxB and say h(x,y) is
sampled to an array of dimensions say CxD. Now as we have done in one dimensional case,
that is the functions f(x,y) and h(x) are actually extended by using by putting additional
number of zeros to make both of them of same size say capital M. In the same manner, here
we add additional number of zeros to both these f(x,y) and h(x,y) to get the extended
functions fe(x,y) and he(x,y) to make both of them of dimensions say MxN.
And we also assume that this fe(x,y) and he(x,y), they are periodic and in x dimension, the
periodicity will be of period M, and in y dimension the periodicity will be of period N. Now
following similar procedure, we can obtain a convolution expression in two dimension which
M-1 N-1
is given by ge x,y = f m,n h x-m, y-n .
m=0 n=0
e e
803
(Refer Slide Time: 14:22)
And if I write this convolution expression in the form of a matrix and incorporating the noise
term η(x,y), I will get a matrix equation which is of the form g = Hf + η. Where this matrix
where this vector f is a vector of dimension capital MxN, which is obtained by concatenating
different rows of the two dimensional function f(x,y), that is the first n number of elements of
this vector f will be the elements of the first row of matrix f(x,y). Similarly we also obtain
this particular vector η by concatenation of rows of the matrix η(x,y). And this particular
degradation matrix h(x) in this case will be of dimension MNxMN.
And this matrix H will have a very, very interesting form, this matrix H can now be
H0 HM-1 . . H1
H1 H0 . . H2
represented as H = . . . . . , where each of this terms Hj is a matrix,
. . . . .
HM-1 HM-2 . . H0
so each of this Hj is actually a matrix of dimension NxN, where this Hj is generated from the
jth row of the degradation function h(x,y).
804
(Refer Slide Time: 16:15)
So you find, that this matrix Hj which is actually a component of the degradation matrix
capital H is a circulant matrix that we have defined earlier. And using this block matrix the
degradation matrix H is also have been subscripted in the form of a circulant matrix. So this
matrix H in this particular case is what is known as a block circulant matrix. So this is what is
805
called a block circulant matrix. So in case of a two dimensional function, that is in case of a
digital image we have seen that the degradation model can simply be represented by this
expression g = Hf + η, where this vector f is a vector of dimension MxN.
And the degradation matrix H, which is of dimension MNxMN is actually a block circulant
matrix, where for each block the matrix is obtained from the jth row of the degradation
function h(x,y). So now in our next lecture we will see what will be the applications of this
particular degradation model for, to restore an image from its degraded version. So now let us
see some of the questions of this particular lecture.
806
So the first question is what is the difference between image enhancement and image
restoration? Second question is, what is a linear position invariant system, third question,
what is homogeneity property? Fourth, what is a circulant matrix? What is a block circulant
matrix? Why does the degradation matrix H become circulant? Thank you.
807
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 09 Lecture Number 44
Estimation of Degradation Model and Restoration Techniques - 1
Hello, welcome to the video lecture series on digital image processing. In the last class we have
started discussion on image restoration. We have said that there are certain cases where image
restoration is necessary. In the sense that in many cases while capturing the image or while
acquiring the image some distortions appear in the image. For example, if you want to capture a
moving object with a camera, in that case because of the movement of the camera it is possible
that the image that is captured will be blurred, which is known as motion blurring.
There are many other situation, say for example the camera is not properly focused then also the
image that you get is a distorted image. So in such situation what we have to go for is restoration
of the image or recovery of the original image from the distorted image.
Now regarding this in the last class, we have talked about what is image restoration technique. In
previous classes we have talked about image filtering. That is if the image is contaminated with
noise. Then we have talked about various types of filters both in spatial domain as well as in
frequency domain to remove that noise. And we just mentioned in our last class that this kind of
808
noise removal is also a sort of restoration. Because they are also, we are trying to recover the
original image from a noisy image. But conventionally this kind of simple filtering is not known
as restoration.
But by restoration what I, what we mean is that if we know a degradation model by which the
image has been degraded and on that degradation model and on that degradation image some
noise has been added. So recovery or restoration of the original image from a degraded image
using the apriori knowledge of the degradation function of the model using which the image has
been degraded. So that kind of recovery is normally known as restoration process.
So this is the basic difference between restoration and image filtering or image enhancement.
Then we have seen an image formation process, where the degradation is involved and we have
talked about that degradation model in continuous function as well as its discrete formulation.
So in today’s lecture we will talk about the estimation of degradation model and we will see that
there are basically three different techniques for estimation of the degradation model. One is
simply by observation that is by looking at the degraded image we can estimate that what is
degradation function which is involved that has degraded the original image. The second
approach is by mean through experimentation. So there you can estimate the degradation model
809
by using some experimental setup. And the third approach is by using mathematical modeling
techniques.
Now once you know the degradation model, I mean whichever way we estimate the degradation
model whether it is by observation or by estimation or by using the mathematical models. Once
we know the degradation model then we can go for restoration of the original image from those
degraded images. So will talk about various such degradation techniques. The first one that will
see is what is called inverse filtering. The second one will be called minimum mean square error
or wiener filtering and the third approach is called constrained least square filtering approach.
Now in our last class we have seen a diagram like this. So in this diagram you see that we have a
shown the degradation function. So here we have an input image f(x,y) which is degraded by a
degradation function H, as has been shown in this diagram. So H is the degradation function. So
once I degraded, once we get the degraded image at the output of this degradation function H
then a noise η(x,y), a random noise η(x,y) is added to that degraded image and finally here we
get what is degraded image we call as g of x and y.
So this degraded image g(x,y) which is normally available to us. And from this g(x,y) by using
the knowledge of this degradation function H we have to restore the original image. And for that
810
what you have to make use of is a kind of restoration filters. And depending upon what kind of
restoration filter we use. We have different types of restoration techniques.
Now in our last class based on this model we have said that the degradation mathematical
expression of this degraded operation, degraded operation can be written in one of these three
forms. The first one is given by g x,y = h x,y f x,y + η x,y . So here f(x,y) that is that
original image and the degradation function h(x,y). They are specified in the spatial domain.
So in spatial domain the original image is f(x,y) is convolved with the degradation function
h(x,y) and then a random noise η(x,y) is added to that to give you the observed image which in
this case we are calling as g(x,y). So this is the operation that has to be done in the spatial
domain. And we have seen earlier that a convolution operation in spatial domain is equivalent to
performing multiplication of their corresponding Fourier transformation.
So if for spatial domain image f(x,y) the Fourier transformation is capital F(u,v) and for the
degradation function h(x,y) its Fourier transformation is capital H(u,v) then if I multiply this
H(u,v) and F(u,v) in the frequency domain and then take the inverse transform of it to obtain the
corresponding function in the spatial domain then I will get the same result. That is convolution
in the spatial domain is equivalent is to performing multiplication in the frequency domain.
And by applying that convolution theorem this second mathematical expression of this a
degradation model which is given by G u,v = H u,v .F u,v + N u,v . Where this N(u,v) is
nothing but Fourier transform of the random noise η(x,y) and G(u,v) is the Fourier transform of
the degraded image that is the g(x,y). So I can, we can either we can perform this operation in the
frequency domain using the frequency coefficients or we can also perform the same operation in
the directly in the spatial domain.
And in the last class we have derived another mathematical expression for the same degradation
operation but there the mathematical expression was given in the form of a matrix. And that
matrix equation as has been shown here is given by g = Hf + η where this g is column matrix or
column vector of dimension MxN, where the image is of dimension MxN. f is also column
vector of the same dimension of the MxN.
811
This degradation matrix H this is of dimension of MNxMN. So there will be MN number of rows
and MN number of columns. So you find that the dimension of this degrade degradation matrix
H is quite high if our input image is if of dimension MxN. And similarly this η is a noise term
and these, all these three terms together that gives you the degradation expression in case with, in
the form of matrix equation.
Now this particular expression that is matrix expression direct solution using this matrix
expression is not an easy task so we will talk about this matrix expression the restoration using
this matrix expression a bit later. But for the time being we will talk about some other simple
expressions which are direct fall out of the mathematical expression which is given in the
frequency domain. Now here you note one point that whether we are doing the operation in the
frequency domain or we are doing the operation in the spatial domain, all we make use of this
matrix equation for restoration operation.
In all of these cases a knowledge of the degradation function is essential. Because that is what is
our restoration problem that is we try to restore or recover the original image using apriori
knowledge of the degradation function. So accordingly as we have said earlier that estimation of
the degradation function which degrades, which has degraded the image is very, very essential.
812
And we have three different approaches using which we can estimate the degradation functions
and those approaches we have said that there are three basic approaches; the first approach is by
observation. That is, we observe a given image, a given degraded image and by observing the
given degraded image, we can estimate, we can have a estimation of what is the degradation
function.
The second approach is by experimentation. That means, we will have an experimental set up
using which we an estimate what is the degradation function. That is degraded the image. And
the third approach is by mathematical modeling. So we can estimate the degradation function
using one of these three approaches and whichever degradation function of the degradation
model we get using that we try to restore our original image from the observed degraded,
degraded image.
And the method of restoring the original image from the degraded image using the degradation
function obtained by one of these three methods is what is called a blind convolution. The reason
is it is called a blind convolution operation is that, using one of these estimation techniques the
degradation model of the degradation function that you get is just an approximation. It is not the
actual degradation that has taken place to get the degraded image. So because it is not the actual
degradation function, it is just an approximation the method of getting the inverse process that is
restored image using one of this degradation functions is known as blind convolution operation.
813
So will talk about this degradation functions one by one. The first one that will talk about is
estimation of the degradation function by observation. So when we try to estimate a degradation
function by observation. When no apriori knowledge of the degradation function is given, so
what we have is the degraded image g(x,y) and by looking at this degraded image g(x,y) we have
to estimate what is the degradation function involved.
Now for doing this what you do is you look at the degraded image then try to identify a region
which is having some similar structure. So if we have a complete degraded image. In this
complete degraded image, you identify a small region, the region which contains some simple
structure. Say for example it may be an object boundary where a part of the object as well as a
part of the background is present.
Now after you identify such a region having simple structure then what we do is? We try to
estimate an original image which should have been degraded to give you this degraded image.
And this original image should be of same size as the image that has been chosen from the sub
image which has been chosen from the degraded image, their structure should be same and the
gray level regions in this estimated image should be obtained by observing the gray level in
different regions of the image of the sub image of the degraded image that has been chosen.
And once I get this, this is my approximate reconstructed image say f x,y , and this is my
degraded image, let me call it is g x,y . And once I get this then I take the Fourier transform of
this g x,y or because it is sub image instead of it calling it as g x,y , let me a call it as Gs(x,y),
Similarly, the image that has reconstructed that have formed by observation what should be the
actual image. I call it fs(x,y) and from this if I take the Fourier transform, I get the Fourier
coefficient written by Fs(u,v). Now our purpose is that we can have an estimation of the
G s u,v
degradation function which is given by Hs(u,v), that should be estimated as .
Fs u,v
So while doing this the find that we have got this particular expression. What we have done is,
we have neglected the noise term. Now, in order for this to be logical a logical one this approach
814
to the logical one. When I chose a sub image in the original image of which the reconstructed
image should have been this. This sub image should be in a region where the image content is
very strong. To minimize the effect of the noise in this particular estimation of Fs(u,v), Hs(u,v).
Now this Hs(u,v) has been approximated over a small sub region of the degraded image and then
we have formed an approximation of that degraded image, that what should have been the
original image. So naturally this Hs(u,v) is of smaller size. But for the restoration purpose we
need H(u,v) to be of size m by n if my original image is of size MxN. So the next operation will
be that you extend this Hs(u,v) to H(u,v) to encompass all the pixels all the frequency component
of that particular image.
Now, let just look at an example. Say here we have shown a degraded image, so this is degraded
image, which has been cut out from a bigger degraded image. So this degraded image has been
cut out from a bigger degraded image. And by observation we form an original image like this,
so infer that if this is the degraded image then the original image should have been something
like this, and while construction of this approximate original image you find that in this region
the intensity value is maintained to be similar to the intensity value in this region.
Similarly, in this region the intensity value is maintained to be similar to the intensity value of
this. So this is my fs(x,y) and this one is my Gs(x,y). So from this by taking Fourier transform we
815
will compute Fs(u,v), and from here by using the Fourier transformation we will compute
Gs(u,v). So by combining these two from these two now, I can have an estimation of the
G s u,v
degradation function which is given by H s u,v = . So this is the method that we can
Fs u,v
The next technique, the other technique for estimation for the degradation function is by
experimentation. So what we do in case of this experimentation. Here we try to get an imaging
set up which is similar to the imaging set up using which the degraded image has been obtained.
So first we have to get an imaging set up similar to the original imaging set up, and the
assumption is that using this using imaging setup which is similar to the original imaging setup,
if I can estimate what is the degradation function of this imaging set up which has been acquired
which is similar to the original then the same degradation function also applies to the original
one.
So here our purpose will be to find out the point spread function or the impulse response of this
imaging set up. So our idea will be to obtain the impulse response of this imaging set up. And we
have said earlier during our earlier discussion that it is the impulse response which fully
characterizes any particular system. So once the impulse response is known, the response of the
system to any arbitrary input can be computed from the impulse response. So our idea here that
816
we want to obtained the impulse response of this imaging set up and we assume that of this
imaging set up is similar to the original that the same impulse response is also valid for the
original imagining set up.
So here the first operation that we have to do is, we have to simulate an impulse, so first
requirement is impulse simulation. Now how do you simulate an impulse? An impulse can be
simulated by a very bright spot of light. And because of our imaging set up is camera so will
have a bright spot as small as possible of light falling on the camera. And this bright spot if it is
very small then it is equivalent to an impulse and using this bright spot of light as an input
whatever image that you get that is the repose to that bright spot of light which in our case is an
impulse.
So the image gives you the impulse response to an impulse which is imparted in form of bright
spot of light. And the intensity of light that you generate that tells what is the strength of that
particular impulse. So, by this simulated impulse and from the image that you get, I get impulse
response and this impulse response is the one which uniquely characterizes our imaging set up,
and in this case we assume that this impulse response will also be valid for the original imaging
set up.
817
So now let us see that how this impulse response will look like. So that is what has been shown
in this particular slide. The left most image is the simulated impulse. Here you find at the center
we have a bright spot of light. Of course this spot is shown in a magnified form. In reality this
spot will be even smaller than this. And on the right hand side the image that you have got this is
the image which is captured by the camera when this impulse falls on this camera lens.
So this is my impulse, simulated impulse, and this is what is my impulse response. So once, I
have the impulse and this impulse response then from this, I can find the out what is the
degradation function of this imaging system. Now, we know that from our earlier discussion that
for a very, very narrow impulse the Fourier transformation of an impulse is a constant.
That means F(u,v), where f(x,y) is the input image in this particular image it is the impulse in
that case Fourier transform of a f(x,y), which is F(u,v) this will be a constant say constant A. And
our relation is that the observed image G u,v = H u,v .F u,v . Now, because this F(u,v) is
now the impulse response in frequency domain, so from here I straight way get H(u,v) that is the
G u,v
degradation function which is same as .
A
So in this case this G(u,v) is the Fourier transform of the observed image and here this Fourier
transform is nothing but the Fourier transform of the image that we have got which is response to
818
the simulated impulse that has fallen on the camera. A is the Fourier transform of the impulse
falling on the lens and the ratio of these two, that is G(u,v) by this constant A That gives us what
is the deformation or what is the degradation model of this particular imaging set up.
So here you find that we have got the degradation function through an experiment or
experimental set up is we have an imaging set up. And we have a light source which can
simulate an impulse. Using that impulse, we got an image which is a impulse response imaging
system. We assume that the Fourier transform of the impulse that is to, is a constant A as has
been shown here. We obtained the Fourier transform of the response which is G(u,v).
G u,v
And now this should be equal to the degradation function H(u,v), which is the
A
degradation function of this particular imaging set up. So I get the degradation function. And the
same degradation function we assume that it is also valid for the actual imaging system. Now, in
this point regarding this one point should be kept in mind that the intensity of light which is the
simulated should be very, very high so that the effect of noise is deduced.
If the intensity of light is not very high, if the light is very feeble in that case it is that noise
component which will be very, very dominant and using that whatever estimation of this H(u,v)
we get that estimation will not be correct estimation. Or in any case we will not get a correct
estimation, but it will be very far from the reality.
819
Now the third approach of this estimation technique as we said that is estimation by
mathematical modeling. Now this mathematical modeling approach for estimation of the
degradation function has been used from many, many years. There are some strong reasons for
using this mathematical approach. The first one is, it provides an insight into the degradation
process. Once I have a mathematical model for degradation I can have an insight into the
degradation process. The second reason is such a mathematical model can model even the
atmospheric disturbance which leads to degradation of the image.
Now, one such mathematical model which is used to model the degradation and this also can
model the atmospheric turbulence which leads to the degradation of, degradation of the image is
-k(u 2 +v 2 ) 5
given by this expression. H(u,v) = e 6
. So this is one of the mathematical model of
degradation which is capable of modeling the turbulence, the atmospheric turbulence that also
leads to degradation in the observed image.
And here this particular constant K this gives you what is the nature of the turbulence? So if the
value of K is large that means the turbulence is very strong or if the value, if the value K is very
low, it says that the turbulence is not that strong. It’s a mild turbulence, so by varying the value
of K we can have the intensity of the turbulence that is to be model.
820
Now using this we can have a number of degraded images as has been shown in this particular
slide. So here you find that on top left we have this original image, this shows an original image.
This is a degraded image where the value of K, was something like 0.00025. This is the value of
K in this particular case. Here the value of K was something like 0.005 and in this case the value
of K was something like 0.001, sorry here it was 0.001 and in this case it was 0.005.
So the first image this particular image here the turbulence is very poor so this has been degraded
using the same model as we, as we have just said which models mild turbulence. Here the
turbulence is medium and here the turbulence is strong. And if you closely look at these images
you will find that all these three are degraded to some extent. In this particular case the
degradation is maximum, here the degradation is minimum. So this is the one which gives you
modeling of degradation which occurs because of turbulence. Thank you.
821
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 09 Lecture Number 45
Estimation of Degradation Model and Restoration Techniques - 2
Hello, welcome to the video lecture Series on Digital Image Processing. Now there are other
approaches of degradation mathematical model for to estimation, estimate the degradation which
are obtained by fundamental principles. So from the basic principles also we can obtain what
should be the degradation function?
So one such case, so here we will discuss the basic principles the degradation model estimation
from basic principles. And I try to find out what will be the degradation model, degradation
function what the image is degraded by linear motion, and this is a very, very common situation
that if we try to estimate or if you try to image of fast moving object. In many cases we find that
the image that you get is degraded. There is some short of blurring which is known as motion
blurring. And this motion blurring occurs due to the fact that whenever we take the snap of the
scene the shutter of the camera is open for certain duration of time.
And during this period during, which is the shutter is open. The object is not stationary the object
is moving. So considering any particular point in the imaging plane here the light which arise
822
from same doesn’t not come from a single point. But the light you get at your particular point on
the imaging sensor is the aggregation of the reflected light from various point in the scene. So
that tells us that what should be the basic approach to model, to estimate the degradation model
in case of motion of the scene with respect to the camera.
So that is what we are trying to estimate here. So here we assume that the image f(x,y) this
undergoes motion and when f(x,y) undergoes a motion then there will be some moving
component. So assume two components xo(t) and yo(t), which are the moving components or the
time varying components. So these are the time varying components along x direction and y
direction respectively.
So once the object is moving then the intensity, the total exposure at any point in the imaging
plane can be obtained by aggregation operation or integration operation, where the integration
has to be gone over the period during which the shutter remains open. So if I assume the
shuttered remains open for a time duration given by capital T, in that case the total exposure at
any point which is the observation at point (x, y) given by g(x, y) will be of this from
T
g x,y = f x- x o t , y- y o t dt . So here the capital T is the duration of time during which
t=0
the shutter of the camera remains on and xo(t) and yo(t), these two terms, they are the time
varying components along x direction and y direction respectively. And this g(x,y) gives us the
observed blurred image. Now from this we have to estimate what is the degradation function or
the blurring function. So once we get g(x,y) then our purpose is to get the Fourier transform of
this that means we are interested in the Fourier transformation G(u, v) of g(x, y).
823
(Refer Slide Time: 05:15)
And this G(u, v) as we know from the Fourier Transformation equations is given by
G u,v = g(x,y)e
-j2π(ux+vy)
dxdy . So it is this expression using this Fourier Transformation
expression, we can find out what will be G(u,v), that is Fourier Transformation of the degraded
image g(x,y). And if I derive this it will be of this form, G u,v = g(x,y)e
-j2π(ux+vy)
dxdy .
824
T
Now this g(x,y) is to be replaced by g x,y = f x- x o t , y- y o t dt . So if we just do some
t=0
reorganisation of this particular integration equation, we can write G(u,v) in the form,
T
G u,v = f x- x o t , y- yo t e-j2π(ux+vy) dxdy dt .
t=0
Now from this particular expression you find that the expression within this bracket. So the final
T
expression will be like this G u,v = f x- x o t , y- yo t e-j2π(ux+vy) dxdy dt . So this will
t=0
be the final expression.
Now in this if you look at this inner part this is nothing but the Fourier Transformation of shifted
f(x,y) where the shift in the x direction is by xo(t) and shift in the y direction is by yo(t). And
from the properties of Fourier Transformation we know that the Fourier Transformation is shift
in invariant in the sense that the Fourier Transform magnitude will remain the same only it will
introduce some phase term.
825
So by doing that we can say that this f x- x o t , y- yo t , this will have a Fourier
Transformation which is nothing but F(u,v)e-j2π[uxo(t)+vyo(t)] . So this is from the translation invariant
property of the Fourier Transformation.
So using this expression, now the expression for G(u,v) can be written as
T
G u,v = F(u,v)e
-j2π(uxo(t)+vyo(t)]
dt , and because this term F(u,v) is independent of t, so you can
t=0
take this term F(u,v) outside the integration. So the final expression that you get is
T
G u,v = F(u,v) e-j2π(uxo(t)+vyo(t)]dt . So from this you find that now if I define my degradation
t=0
function H(u,v) to be this particular integration. So if I define H(u,v) to be this. then I get
expression for G(u,v) = H (u,v).F (u,v).
So here this motion term, the Degradation function is given by integration of this particular
expression and in this expression this xo(t) and yo(t) they are the motion variables, which are
known. So if the motion variables are known then using those motion variables the values of the
motion variables I can find out what will be the degradation function. And using that degradation
function I can go for the degradation model. So I know H(u,v), I know G(u,v) and from that I can
find out the restored image F(u,v).
826
(Refer Slide Time: 13:10)
at bt
Now in this particular case if assume that x o t = . And similarly yo t = . That means
T T
over a period of capital T during, which the camera shutter is open in the x direction the
movement is by an amount a. And in the y direction the movement is by amount b. So by using
1
by assuming this we can find that H u,v = sin[π(ua+vb)]e-jπ(ua+vb) . So this is what is
π(ua+vb)
the degradation function or the blurring function.
827
(Refer Slide Time: 14:35)
So now let us see that using this degradation function, what is the kind of degradation that is
actually obtained. So here again on the left hand side we have an original image and on the right
hand side this is this corresponding blurred image or the blurring is introduced assuming uniform
linear motion. And for obtaining this particular blurring here we assumed a = 0.1 and b = 0.1. So
using this values of a and b. We have obtained this, we have obtained the blurring function or
degradation function.
And using this degradation function we have obtained this type of degraded model. And you find
that this is a quit a common sense whenever you take the image of a very first moving object.
The kind of degradation that you obtained in the image is similar to this. Now the problem is we
have obtained a degradation function, now once I obtained a degradation function or an
estimated degradation function. Now given a blurred image how to restore the original image or
how to recover the original image?
So as we have mentioned that there are different types of filtering techniques for obtaining or for
restoring the original image from a degraded image.
828
(Refer Slide Time: 16:15)
The simplest kind of filtering technique is what is known as inverse filtering. Now the concept of
inverse filtering is very simple. Our expression is G (u,v) that is the Fourier transform of the
degraded image is given by G u,v = H u,v . F u,v , where H(u,v) is the degradation function
in the frequency domain and F(u,v) is the Fourier transform of the original image, G(u,v) is the
Fourier transform of the degraded image.
Now because this H(u,v). F(u,v) this is a point by point multiplication that is for every value of u
and v the corresponding F component and the corresponding H component will be multiply
together to give you the final matrix which is again in the frequency domain. Now from this
G u,v
expression it is quite obvious that I can have F (u,v) which is given by F u,v = .
H u,v
So have this H(u,v) is our degradation function in the frequency domain and G(u,v), I can always
compare by taking the Fourier transformation of the degraded image that is obtained. So if I
divide the Fourier transformation of the degraded image by the degradation function in
frequency domain what I get is the Fourier transformation of the original image. And as I said
that when I compute this H(u,v) this is just an estimated H(u,v). It will never be exact.
829
So the reconstruction of the recovered image that we get is not the actual image but it is an
approximate image, approximate original image which we represent by F u,v . Now here as we
have already said that G (u,v) if I consider the noise term is given by
G u,v = H u,v .F u,v +N u,v . Now from here if I compute the Fourier transform of the
G u,v
reconstructed image that will be F u,v which is given by F u,v = .
H u,v
G u,v N u,v
And from this expression this is nothing but F u,v = =F u,v + . So this
H u,v H u,v
expression says that even if H(u,v) known as exactly the perfect reconstruction may not be
possible. Because we have seen earlier that in most of the cases the Fourier transformation
coefficient are very, very small when the value of u and v is very large. So that means for those
N u,v
cases , this term will be very high that means the reconstructed image will be dominated
H u,v
by noise.
And that is what is obtained practically also. So to have to avoid this problem what will be, what
we have to do is for reconstruction purpose instead of considering the entire frequency plane we
have to restrict or reconstruction to a component of the frequencies in the frequency plane which
are nearer to 0. Since, if I do that kind of reconstruction that limited reconstruction in that case
the dominance of noise can be avoided. So now let us see that what kind of result we can obtain
using this inverse filtering.
830
(Refer Slide Time: 20:40)
So this shows an inverse filtering result. Here we have the original image, in the middle we have
the degraded image. So this degraded image we had already shown. So this is the degraded
image. And you find that on the right hand side this is a reconstructed image using the inverse
filtering, going for reconstruction all of the frequency coefficient are considered. And as we have
said that as you go away from (0,0) component in the frequency domain that is as you go away
from the origin H(u,v) term become very, very negligible so it the noise term which tends to
dominate.
So you find that in this reconstructed image nothing is available. Whereas if we go for restricted
reconstruction that is we consider only few frequency components near the origin as has been
shown here that you considered only those frequency terms within a radius of 10 from the origin.
So this is the reconstructed image and as it is obvious, because our domain of reconstruction the
frequency component that we have considered very, very limited so the image becomes
reconstructed image becomes very blurred, and that is the property of the low pass transform.
This is nothing but a low pass filter and that is the property of low pass filter. If the cut of
frequency is very low then the reconstructed image has to be very, very blurred. In the middle of
the bottom row again we have shown the reconstructed image, but in this case we have the
increase the cut of frequency. The cut of frequency instead of using 10, now we have used cut of
831
frequency for the 40. And here you find that if you compare the original image with this
reconstructed image, you find that the reconstruction is quite accurate.
If increase the cut of frequency further as we said that the it is the noise term which is going to
dominate so from the right most here we have increase the cut of frequency to 80. So here you
find that we can observe the reconstructed image but as if the objects are behind a curtain of
noise. That means it is the noise term which is going to dominate as we increase the cut of
frequency of the filter.
So with this we complete our today’s discussion now, let us come to the questions on today’s
lecture. So the first one is what is point spread function? The second one is how can you estimate
the point spread function of an imaging system? Third question which degradation function can
model atmospheric turbulence? And the fourth question what problem is faced applying inverse
filtering method to restore an image degraded by uniform motion? Thank you.
832
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 10 Lecture Number 46
Other Restoration Techniques - 1
Hello, welcome to the video lecture series on Digital Image Processing. For last few classes we
were discussing about restoration of blurred images. So what we have done in our last class is
estimation of degradation models, we have seen that whatever restoration technique we use
knowledge the restoration techniques mainly use the knowledge apriori knowledge of the
degradation model, which degrades the image. So estimation of the degradation model degrading
an image is very, very important for the restoration operation.
So in our last class we have seen three methods for estimation of the degradation model. The
first, method we discussed is the estimation of the degradation model by observation, the second
technique that we have discussed is estimation by experimentation. And the third technique that
we have discussed is the mathematical modeling of degradation. Then we have also seen what
should be the corresponding restoration technique and in our last class we have talked about the
inverse filtering technique. And today we will talk about the other restoration technique. Which
also make use of the estimated degradation model, of the estimated degradation function?
833
(Refer Slide Time: 02:04)
So in today’s lecture we will see the inverse filtering. The restoration of the motion blurred
image using the inverse filtering technique, in our last class we have seen inverse filtering
technique where the image was degraded by the turbulence atmospheric, turbulence model. We
will also talk about the minimum mean square error or wiener filtering approach for restoration
of a degraded image. We will also talk about another technique called constrained least square
filter, where the constrained least square filter mainly uses the mean and standard deviation of
the noise which contaminates the image the degraded image. And then will also talk about the
restoration technique where the noise present in the image is a periodic noise.
834
(Refer Slide Time: 03:05)
So firstly let us quickly go through what we have done in our last class. So we are talking about
the estimation of the degradation model because that is a very, very basic requirement for the
restoration operation. So the first one that we have said is the estimation of degradation model by
observation. So in this case what is given to us is the degraded model and by looking at the
degraded model we have to estimate that what is the degradation function. From the degraded
image we have to estimate what is the degradation function? So here we have shown one such a
degraded image and we have said that once a degraded image is given we have to look for a part
of the image, which contains some simpler structure.
And at the same time the energy content, the signal energy content in that part in that sub image
should be very high to reduce the effect of the noise. So if you look at this particular degraded
picture you will find that this red rectangle it shows an image region in this degraded image
which contain a simple structure and from this it appears that there is a rectangular figure present
in this part of the image. And there are two district grey levels one is of the object which is
towards the black and other one is the background which is a greyish background.
835
(Refer Slide Time: 04:30)
So by having a small sub image from this portion what I do is, I try to manually estimate that
what should be the corresponding original image. So as shown in this slide the top part is that
degraded image which is cut out from the image that we have just shown at the bottom part is the
estimated original image. Now what you do is from this if you take Fourier transform of the top
one, what I get is the G(u,v) as we said the it is a Fourier transformation of the degraded image.
And the lower one we are assuming this to be original so if I take the Fourier Transform of this
what I get is F(u,v) that is the Fourier Transform of the original image. And obviously here the
degradation model of the degradation function in the Fourier domain is given by
G u,v
H u,v = . And you remember that in this case division operation has to be done point
F u,v
by point. So this is how the degradation function can be estimated by observation when only the
degraded images are available.
836
(Refer Slide Time: 05:55)
The other approach for estimation the degradation function. We have said is the estimation by
experimentation. So our requirement is that whichever imaging device or imaging setup that has
been used for getting a degraded image which has been used to record a degraded image in our
laboratory for experimental experimentation purpose. We will have a similar such imaging set
up. And then we try to find out that, what is the impulse response of that imaging set up?
As we have already discussed that it is the impulse response which completely characterizes any
system. So if you know what is the impulse response of the system? We can identify we can
always calculate, what is the response of the system to any type of input signal? So by
experimentation what you have done is we have taken a similar imaging set up and then you
simulate an impulse by using a very narrow strong beam of light. So as has been shown in this
particular diagram.
From the left hand side what is shown is one such simulated impulse in this particular diagram.
So the left hand side this one shows such simulated impulse. And on the right hand side what we
have is the response of this impulse as recorded by the imaging device. So now if I take the
Fourier transform of this, this is going to give me F(u,v) . And if I take the Fourier transform of
this, this is going to give me G(u,v). And you see that because of the input, the original is an
impulse.
837
The Fourier transform of an impulse is a constant. So if I simply take the Fourier transform of
the response which is the impulse response or in this case it is the point spread function then this
divided by the corresponding constant will give me the degradation function, which is H(u,v). So
this is how we estimate the degradation function of, or the degradation model of the imaging set
up through experiment.
The third approach for obtaining the degradation model is by mathematical modelling. So in our
last class we have considered two such mathematical models. The first mathematical model that
you have considered which try to give the degradation function corresponding to atmospheric
turbulence. And the function is given, the degradation function in the frequency domain is given
838
(Refer Slide Time: 09:05)
So in this particular case we have shown four images. Here the top image, the top left image this
is the original one. And the other three are the degraded image which have been obtained by
using the degradation model that we have just said. Now in that degradation model when I said
So low value of K indicates the disturbance is very low. Similarly higher value of K indicates
that disturbance is very high. So here this image has been obtained with a very low value of K.
This image has been obtained with a very high value of K. And this image as has been obtained
with a medium value of K. So you find that depending upon the value of K how the degradation
of the original images changes?
839
(Refer Slide Time: 10:10)
The second mathematical approach that, or the second model that we have considered is motion
blurred or the blurring which is introduced due to motion. And we have said that because the
camera shutter is kept open for a finite duration of time, the intensity which is obtained at any
point on the imaging sensor is not really coming from a single point of the same. But the
intensity at any point of the sensor is actually the integration of the intensities or light intensities
which are falling to that particular point from different points of the moving object.
And this integration has to be taken over the duration during which the camera shutter remain on.
So using that concept, what we have got is a motion, a mathematical model for motion blur
T
which in our last class we have derived that it is given by H u,v = e
-j2π(uxo(t)+vyo(t)]
dt
t=0
Where this xo(t) is the movement, this xo(t) indicates movements along x direction. And yo(t)
at bt
indicates movement along the y direction. So if I assume that x o t = , and yo t = . Then
T T
this particular integral can be computed and we get our final degradation function or degradation
model as given in the lower equation.
840
(Refer Slide Time: 11:55)
And after this in our last class what we have done is we have used the inverse filtering. And
using this motion blurring model this is the kind of blurring that we obtained. This is the original
image on the right hand side we have shown the motion blurred image. Then we have seen that
once you have the, the model for the blurring operation then you can employ inverse filtering to
restore a blurred image. So in your last class we have used the inverse filtering to restore the
images which are blurred by atmospheric turbulence.
841
And here again on the left hand side of the top row we have shown the original image, so this is
the original image this is the degraded image. And we have said in the last class because by
inverse filtering, the function, the blurring function H(u,v) comes in the denominator. So if the
G u,v
value of H(u,v) is very low then that particular term very becomes very high. So if I
H u,v
considered all the frequency components or all values of F(u,v) in H(u,v) for inverse filtering, the
result may not be always good.
And that is what has been demonstrated here that this image was reconstructed considering all
values of u and v and you find that this fully reconstructed image does not contained the
information that you want. So what you have to do is along with this inverse filtering we have to
employ some sort of low pass filtering operation so that the higher frequency components of the
higher values of u and v will not be considered for the reconstruction purpose.
So here on the bottom row the left most image this shows, the reconstructed image where we
have considered only the values of u and v which are within a radius of 10 from the center of the
frequency plane. And because this is a low pass filtering operation where the cut of frequency
was very, very low so the reconstructed image is again very, very blurred.
Because many of the low frequency components along with the high frequency components have
also been cut out. The middle image shows to some extent very good image why there is a
distance within which H(u,v) values were considered for the construction where taken to be 40.
So here you find that this reconstructed image this contain most of the information which was
contain in the original image.
So this is a fairly good restoration. Now, if I increase the distance function, the value of the
distance. If I go to 80 that means many of the high frequency components also we are going to
incorporate while restoration. And you find that the right most image on the bottom row, that is
this one where the value of the distance was equal to 80. This is also reconstructed image but it
appears, that the image is behind a curtain of noise.
So this clearly indicates that if I go on increasing or if I take more and more u and v values. The
frequency components for restoration using inverse filtering, then the restored image quality is
going to be degraded. It is likely to be dominated by the noise components present in the image.
842
Now, though this inverse filtering operation. That’s fine for this turbulence kind of blurring the
blurring introduce by the atmospheric turbulence.
But this direct inverse filtering does not give good result in case of motion blur. So you find that
here we have showed the result of direct inverse filtering in case of motion blur. So on the top
most one this is the original image. On the left we have shown the degraded image. And the right
most one on the bottom is the image, restored image obtained using the inverse direct inverse
direct filtering.
And the blurring which you have considered in this particular case is the motion blur. Now let us
say why this direct inverse filtering does not give satisfactory result in case of motion blur. The
reason is if you look at the degradation function.
843
(Refer Slide Time: 16:45)
The motion degradation function, say for example, in this particular case you find that the
degradation function H(u,v) is given by this expression in the frequency domain. Now this term
will be equal to 0, whenever this component ua + vb this is going to be integer. So for any
integer value of ua + vb, then the corresponding component of H(u,v) =0.
And for nearly integer values of this ua +vb term, H(u,v) is going to be very, very low. So about
G u,v
direct inverse filtering, when we go for to have the Fourier transformation of the
H u,v
reconstructed image, wherever H(u,v) is very low near about 0. The corresponding F(u,v) term
will be, will be abnormally high.
And when you take inverse Fourier transform of this that very, very high value is deflected in the
reconstructed image and that is what gives to a reconstructed image as shown in this form. So
what is the way out, Can’t we use the inverse filtering for restoration of motion blurred image?
So we have attempted round about approach, what you have done is again we have taken
impulse. Try to find out what will be the point spread function of, if I employ this kind of motion
blur.
844
(Refer Slide Time: 18:35)
So by using the some motion blur function or the motion blur model you blur this impulse and
what I get is an impulse response like this which is the point spread function in this particular
case. Now once I have this point spread function then as before. What I do is? I take the Fourier
transform of this. And this Fourier transformation now gives me G(u,v) and because my input
was an impulse where, so for this impulse F(u,v) is equal to constant which is a something like
A.
G u,v
Now from this two I can recompute H(u,v) which is given by H u,v = . Obviously the
A
value of constant is same as what is the intensity of this impulse so. If it an unit impulse, if I take
an unit impulse then the value of constant A = 1 and in that case the Fourier Transformation of
the point spread function directly gives me the degradation function H(u,v).
845
(Refer Slide Time: 19:45)
Now if I perform the inverse filtering using this recomputed degradation function then we find
that the reconstruction result is very, very good. So this was the blurred image. This is the
reconstructed image during, by using the inverse filtering direct inverse filtering, but here the
degradation model was recomputed from the point, from the Fourier transformation of the point
spread function.
So though using the direct inverse transform of the mathematical model of the motion blur does
not give me good result, but recomputation of that degradation function gives a satisfactory
result. But again with this Inverse filtering approach the major problem is as we said that we
have to considered the (u,v) values for reconstruction which is within a limited domain.
Now how do you say that up to what extent of (u,v) value we should go that is again image
depended. So it is not very easy to decide that to what extent of frequency component we should
consider for the reconstruction of the original image if I go for direct inverse filtering. So there is
another approach which is the minimum mean square error approach or it is also called the
wiener filtering approach.
In case of wiener filtering approach, the wiener filtering tries to reconstruct the degraded image
by minimizing an error function.
846
(Refer Slide Time: 21:10)
the wiener filtering tries to minimize is the error function which is given by e = E[(f - f ) 2 ] . So
the error value e, which is given by the expectation value of (f - f ) 2 , where f is the original
degraded image and f is the restored image from the degraded image.
So (f - f ) 2 , this gives you the square error and this wiener filtering tries to minimize the
expectation value of this error. Now here our assumption is that the image intensity and the noise
intensity are un-correlated, and using that particular assumption this wiener filtering works. So
here will not go into the mathematical details of the derivation.
847
(Refer Slide Time: 23:15)
But it can be shown that a frequency domain solution of this I means whenever this error
function is minimum. The corresponding F(u,v) in frequency domain is given by
H* u,v Sf u,v
F u,v = G u,v . Where this H* u,v indicates, it is the
Sf u,v . H u,v + Sη(u,v)
2
complex conjugate of H(u,v) and G(u,v) as before it is the Fourier transform of the degraded
image. And F u,v is the Fourier transform of the reconstructed image and in this particular case
this term Sf u,v , this is the power spectrum, power spectrum of original image, undegraded
848
(Refer Slide Time: 25:05)
Now, if I simplify this particular expression, I get an expression of this form, that
1 H u,v
2
F u,v = . G u,v . So this is the expression of the Fourier
H u,v H u,v 2 + Sη(u,v)
Sf u,v
Now in this case you might notice that if the image does not contain any noise then obviously
Sη(u,v) , which is the power spectrum of the noise will be equal to zero. And in that case this
wiener filter becomes identical with the inverse filter. But if the noise contains additive, if the
degraded image also contain additive noise in addition to blurring in that case wiener filter and
the inverse filter is different.
Now here you find that this wiener filter considers the ratio of the power spectrum of the noise
power of the, the noise power and the power of the power spectrum of the original undegraded
image. Now even if I assume that the additive noise which is contained in the degraded image is
a wide noise for which the noise power spectrum will be constant.
849
But it is not possible to find out. What is the power spectrum of the original undegraded image?
Sη(u,v)
So for that purpose what is done is, normally this ratio that is , that is the ratio of power
Sf u,v
spectrum of the noise to the power spectrum of the original undegraded image is taken to be a
constant K. So if I do this in that case the expression for
F u,v = 1 . H u,v
2
G u,v .
H u,v H u,v + K
2
So where this K, the term K is a constant, which has to be adjusted manually for the optimum
reconstruction of the, for the reconstructed image which appear to be visually best. So using this
expression let us see what kind of image that or what kind of reconstruction image that we get?
So here we have shown the image restoration of the degraded image using wiener filter. Again
on the left hand side of the top row it is the original image. Right hand side of the top row gives
you the degraded image. Left hand, left most image of the bottom row that shows you the full
reconstructed image or the reconstructed using the inverse filtering where all the frequency
where considered. The middle one shows the reconstructed image using inverse filtering where
only the frequency component within a distance 40 from the center of the frequency plane has
been considered for reconstruction.
850
And the rightmost one is the one which is obtained using wiener filtering and for obtaining this
particular reconstructed image, the value of k was manually adjusted for best appearance. Now if
you compare these two, that is the inverse filtered image with the distance equal to 40 with the
wiener filtered image, you will find that the reconstructed images or more or less same. But if
you look very closely it may be found that the wiener filtered image is slightly better than the
inverse filtered image.
However visually they appeared to be, they appear to be almost same. The advantage in case of
wiener filter is that I don't have to decide that what extent of frequency component I have to
consider for reconstruction or for restoration of the undegraded image. But still the wiener filter
has got a disadvantage that is the manual adjustment of the value of K.
As we have said that the value of K has been used for simplification of the expression where this
constant K is nothing but a ratio of the power spectrum of the noise to the power spectrum of the
undegraded image. And in all the cases have, you taking this ratio to be constant may not be
justified approach. Thank you.
851
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 10 Lecture Number 47
Other Restoration Techniques - 2
Hello, welcome to the video lecture series on digital image processing. So we have another
operation another kind of filtering operation, which is called constrained least square filtering.
So now will considered a filtering operation, which is called constrained least square filter. Now
unlike in case of wiener filtering, what the performance of the wiener filtering depends upon the
correct the estimation of the value of K, that is the performance of wiener filtering depends upon,
how correctly you can estimate, what is the power spectrum of the original undegraded image. In
case this constant least square filter it does not make any assumption about the original
undegraded image. It makes use of only the noise probability distribution function, probability
density function, now is PDF.
And mainly it uses the mean of the noise which we will write as say mη and the variance of the
noise which we will write as ση2. So will come, will see that how the reconstruction using this
constrained least square filter approach makes use of this noise parameters like mean of the noise
and the variance of the noise. In case of this constrained least square filter, to obtain this
852
constrained least square filter. We will start with the expression that we got in our first class that
is g = Hf + n .
So you remember that this is an expression which we had derived in the first class which tell us
that, what is the degradation model for degrading the image, where H is the matrix which is
derived from the impulse response H and n is the noise vector. Now, here you will notice that he
the value H is very, very sensitive to noise. So to take care of that what you do is, we define an
optimality criteria and using that optimality criteria the reconstruction has to be done.
And because this degradation function H of the degradation matrix H is noise dependent it is
very, very sensitive to noise. For so for reconstruction the optimality criteria that we will use is
the image smoothness. So we know from earlier discussion that the second derivative operation
or the Laplacian operator it tries to enhance the irregularities or discontinuity in the image.
So if we can minimize the Laplacian of the reconstructed image that will ensure that the image,
the reconstructed image will be smooth. So the our optimality criteria in this particular case is
M-1 N-1
given by C = f(x,y) , where y varies from 0 to N-1 and x varies from 0 to M-1. So our
x=0 y=0
2
assumption is the image that we trying to reconstruct of that or blurred image that we have
obtained that is of size MxN.
So our optimality criterion is given by this, where 2f(x,y) is nothing but the Laplacian
operation. So this optimality criteria is Laplacian operator based. And our approach will be that
2
will try to minimize this criteria, subject to the constraint that g - Hf = n , where this f , this
2
is the reconstructed image. So we will try to minimize this optimality criteria subject to the
2
constraint that g - Hf = n
2
and that is why it is called constrained least square filtering.
853
(Refer Slide Time: 06:05)
Again without going into the details of mathematical derivation we will simply give the
frequency domain solution of this particular constrained least square estimation. Where the
H* u,v
frequency domain solution now is given by F u,v = G u,v . Again
H u,v + γ P u,v
2 2
as before this H* indicate that it is the complex conjugate of H.
Here again we have a constant term given as γ, where the γ is to be adjusted so that the specified
2
constant that is g - Hf = n , this constant is met. So here is γ is a scalar quantity, scalar
2
constant whose value is to be adjusted to for, so that this particular constraint is maintained. And
this quantity P(u,v) it is the Fourier spectrum or Fourier transform of the mask given by
0 -1 0
p(x,y) = -1 4 -1 .
0 -1 0
So this is my p(x,y) and this P (u,v) is nothing but the Fourier spectrum of or the Fourier
transform of this p(x,y). And you can easily identify that this is nothing but the Laplacian
operator mask, of the Laplacian mask that we have already discussed in our earlier discussion.
854
Now here for implementation of this, for computation of this you have to keep in mind that our
image is of size MxN.
So before we compute Fourier transformation of p(x,y) which is given in the form of a 3x3 mask,
we have to pad appropriate number of zeros, so that this p(x,y) also becomes a function of
dimension NxN or an array of dimension MxN. And after only converting this two an array of
dimension of Mx N we can compute P(u,v) and that P(u,v) has to be use in this particular
expression.
So as we said that this γ has to be adjusted manually for obtaining the optimum result and the
purpose is that this adjusted value of γ, the γ is adjusted so that the specified constant is
maintained. However, it is also possible to automatically estimate the value of γ by an iterative
approach.
So for that iterative approach what we do is? We use, defined a residual vector say r, where this
residual vector is nothing but r = g - Hf . So remember that this g was, is obtained from the
degraded image, the matrix degradation matrix H is obtained from the degradation function H,
and f is actually the estimated restored image. Now, here since f . We have seen earlier that f
sorry, we seen have earlier that F u,v and so this f in the spatial domain, they are functions in
855
the spatial domain, they are functions of γ, so obviously r which is function of f , so this r will
also be a function of γ.
Now if I define a function say φ(γ) , which is nothing but rTr or which is nothing but the
Euclidean form of r. It can be shown that this function is monotonically increasing function of γ
that means whenever γ increases this Euclidean norm of r also increases if γ decreases the
Euclidean norm of r also decreases.
And by making use of this property it is possible to find out, what is the optimum value of γ
within some specified accuracy. So approach in this case, our aim is that we want to estimate the
2 2
value of γ such that the Euclidean norm of r, that is r = n ±a , where this a is nothing but
what is the specified accuracy factor, of this gives the tolerance of reconstruction.
2 2
Now obviously you find that if r = n , then the specified constant is exactly met. However, it
is very, very difficult, so we specify some tolerance by giving the accuracy factor a. And we
want that the value of γ should be such that the Euclidean norm of r will be within this range.
Now given this background an iterative approach, iterative algorithm for estimation of the value
of γ can be put like this. So you select an initial value of γ, then compute φ(γ) , which is nothing
856
2 2
but Euclidean norm of r, then you terminate the algorithm if r = η ±a , here it has been
2
written as η ±a .
If this is not the case, then you processed to step number 4 where you increase the value of γ.
2 2 2 2
And increase the value of γ, if r η - a or you decrease the value of γ, if r η + a .
Now, using whatever new value of γ that you get, you recompute the image and for that the
image reconstruction function as we have said in frequency domain is given by this particular
expression. And with this reconstructed value of f you go back to step number 2 and you do this
2 2
iteration until and unless this termination condition that is r = η ±a , this condition is met.
Now using this kind of approach, what we have obtained is we have got some reconstructed
image. So here you find that it is the same original image, this is the degraded version of that
image. On the bottom row the left hand side gives you the reconstructed image using the wiener
fettering. And again on the bottom row on the right hand side this gives you the reconstructed
image which is obtained using the constrained least square filter.
857
(Refer Slide Time: 15:01)
The same for the motion degraded image. Here we have also considered some additive noise so
again on the top row on the left this is the image which is obtained by direct inverse filtering.
And you find the prominence noise in this particular case. The right one is the one which has
been obtained by wiener filtering, here also you find that the amount of noise has been reduced
but the still the image is noisy and the bottom one is the one that has been obtained by using the
constrained least square filtering.
Now if you look at these three images you find that the amount of noise is greatly reduced in the
bottom one that is this particular image which has been obtained by this restored image, which
has been obtained by constrained least square filtering approach. And as we said the constrained
least square filtering approach makes you use of the estimation of mean of the noise and the
standard deviation of the noise. So it is quite expected that the noise performance of the least
square constrained least square filter will be quite satisfactory.
And that is what is observed here that this image which is obtained using this constrained least
square filter, the image has been, the noise has been removed to a great extent whereas the other
reconstructed image cannot remove the noise component to that extent. However, if you look at
this reconstructed images, the reconstruction quality of this image is not that good.
858
So that clearly says that using the optimality criteria the reconstructed image that you get. The
optimum reconstructed image may not always be visually the best, so to obtain a visually best
image the best approach is you manually adjust that particular constant γ.
Now, as I said that this constrained least square filtering approach makes use of the noise
parameters that is the mean of the noise and the variance of the noise. Now, how can we estimate
the mean and variance of the noise from the degraded image, Xf.
It is possible that if you look at a more or less uniform intensity region in the image if you take a
sub image of the degraded image where the intensity is more or less uniform, and if you take the
histogram of that, the nature of the histogram is same as the probably density function PDF of
the noise which is contaminated with that image.
859
(Refer Slide Time: 18:20)
2
So we can have obtained the noise estimate or we can compute the noise term that is η in our
expression which is used for this constrained least square filtering in this way. We have the noise
2
1 M-1 N-1
η(x,y) - mη .
2 2
variance which is given by η , which is nothing but σ = η
MN x=0 y=0
1 M-1 N-1
And the noise mean, that is mη is given by the expression mη = η(x,y) , where again
MN x=0 y=0
where y varies from 0 to N-1 and x varies from 0 to M-1. Now, from this you find that this
2
particular term. This particular term this is nothing but what is ours η . So while making use
2
of this and making use of the mean of the noise we get that η , this noise term is nothing but
2
η = MN[σ 2η - mη] .
2
And as in our constrained that we have specified it is η , which is use in constrained term, and
which is only dependent upon ση and mη. So this clearly says that this optimum reconstruction is
possible if I have the information of the noise standard deviation of the noise variance and the
noise mean. Now the estimation of noise variance and noise mean is very, very important. If I
860
have the only the degraded image what I will do is, I will look at some uniform grey-level region
within the degraded image.
Find out the probability; find out the histogram of that particular region and the nature of the
histogram is same as the probability density function of the noise.
So as has been shown here in this particular diagram. Here you find that this bar that has been
shown, this is taken from one such noisy image and this is the histogram of this particular region.
And this histogram tells you that, what is the PDF of the noise which is contaminated with this
image?
So once I have this probability density function, from this I can compute what is the standard or
the variance ση2, and I can also compute what is the mean that is mη. And in most of the cases for
the noise term the noise is assumed to be zero mean. So this mη = 0, so what is important is for
us is only this ση2 and using this ση2 we can go for optimum reconstruction of the degraded
image.
Now, in some situation in many cases it is also possible that the image is contaminated with
period noise. So how do we remove the periodic noise present in the image? You find that if you
take the Fourier transformation of the periodic noise and display that Fourier transformation, in
861
that case because the noise is periodic, the corresponding dots the corresponding, at the
corresponding (u,v) location in the Fourier transformation plane you will get very, very bright
dots.
And that dot indicates that what is the frequency of the periodic noise, present in the image then
we can go for a very simple approach that once I know the frequency component. I can go for
band pass filtering. Just to remove that part of the coefficient on the Fourier transform. And
whatever is the remainder Fourier coefficient we have. If you go for inverse Fourier
transformation of that we will get the reconstructed image.
So as has been shown here, so here you find that we are taking the same image which we are
taking number of time earlier. And if you look at the right most image, and if you look closely to
this right most image you find that this is contaminated with periodic noise. And if I take the
Fourier transform of this here you find that in Fourier transform. There are few bright dots, one
bright dot here, one bright dot here, one bright dot here, one bright dot here, one bright dot here,
one bright dot here. One is at this location and one is that this location.
So all this bright dot tell us that what is the frequency of the periodic noise which contaminates
the image. So once I have this information I can go for an appropriate band pass filtering to filter
862
out that region from the Fourier Transformation or that part of the Fourier coefficient. So that is
what has been shown next so this is, what is a band reject filter?
(Refer Slide Time: 24:40)
So this is the perspective plot of an ideal band reject filter and here the band reject filter are
shown super imposed on the frequency plane. So on the left what I have is an ideal band reject
filter and on the right what I have is the corresponding Butterworth band reject filter. So by using
this band reject filter we are removing a band of frequencies from the Fourier coefficient
corresponding to the frequency of the noise.
863
(Refer Slide Time: 25:40)
So after removal of this, this frequency component, if I go for inverse Fourier Transform then
I’m going to get back my reconstructed image, and that is what we get in this particular case. So
here you find that on the left top this is again the original image, on the right top it is the noise
image, contaminated with periodic noise. If I go for ideal band reject filter and then reconstruct
and this is the image I get which is on the bottom left and.
If I go for the Butterworth band reject filter then the image reconstructed image that I get is in
the bottom right. So we have talked about the reconstruction of the restoration images using the
various operations. And the last one that we have discussed is, if we have an image contaminated
with periodic noise then we can make use of band reject filter, do frequency operation employ a
band reject filter to move those frequency component and then go for inverse filtering to
reconstruct the image.
And her you find that the quality of the reconstructed images a quite good where we used this
band reject filter the frequency domain. So with this we complete our discussion on image
restoration. Now, let us have some question on today’s lecture.
864
(Refer Slide Time: 26:55)
So our first question is what is the advantage of wiener filter over inverse filter? The second
question what is the drawback of wiener filter? Third one under what condition wiener filter and
inverse filters become identical? Fourth one what is the difference between wiener filter and
constrained least square error filter? And the last question, how can you estimate the noise
parameters from a given noisy image or from the given blurred image? Thank you.
865
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 10 Lecture Number 48
Image Registration - 1
Hello, welcome to the video lecture series on Digital Image Processing. For our last few classes
we have talked about different types of image restoration techniques. In today’s lecture we are
going to talk about another topic which is called Image Registration.
So in our last few lectures we have talked about restoration of degraded image. We have seen
different techniques for estimation of the degradation model. And the different model estimation
techniques we have discussed or the estimation of the degradation model by observation.
Estimation of the degradation model by experimentation and the mathematical modeling of
degradation.
And once you have the model the degradation model then we have talked about the restoration
techniques for restoring a degraded image. So among the different restoration techniques we
have talked about the inverse filtering techniques. We have talked about minimum mean square
error or wiener filtering technique. We have talked about the constrained least square filtering
866
approach. And we have also talked about the restoration of an image if the image is
contaminated by periodic noise.
So in such case we have seen that if I take the Fourier transform of the degraded image, if the is
image actually degraded by a periodic noise or a combination of periodic noise then those noise
components, the noise frequencies appears as very bright spots, very bright dots in the Fourier
transformation or in frequency plane. So there we can apply a band reject filter or some time
notch filter to remove that particular frequency component of the Fourier transform, and after
performing the band reject operation whatever is remaining if we take the inverse Fourier
transformation of that, then we get the restored image which is free from the periodic noise.
So in today’s lecture we will talk about the image registration techniques. So you see what is
Image Registration? Then we go for image registration and then we have to think of the
mismatch or match measures. So will talk about the different mismatch or match measures or
similarity measures. We will see that whether the cross correlation or will see what is the cross
correlation between two images.
And we will also see whether this cross correlation can be used as a similarity measure when we
go for image registration. And then will talk about some application of this image registration
techniques with examples.
867
(Refer Slide Time: 03:25)
So by image registration what we mean is that the registration is a process which makes the
pixels in two images precisely coincide to the same points in the scene. So by registration what
you mean is, if we are having two images of the same scene, may be the images, two or images
are acquired with different sensor located at different positions or maybe the images are acquired
using the same sensor but at different time instance at different instants of time.
So in such cases if we can find out for every pixel in one image, the corresponding pixel in the
other image or images that is the process of registration or the process of matching. And this has
various applications that we will talk about a bit later. So once registered the images can be
combined or fused, this is called fusion or combination.
So once we have the images from different sensors may be located at different location or maybe
if it is remote sensing Image taken through a satellite, where we have images taken in different
bands of frequencies, then if we register all those different images then those images can be
combined or they can fused. So that the information extraction or the image, the fused image
becomes more rich in the information contained.
So once registered the images can be combined or fused in a way that improves the information
extraction process. Because this fused image now they have combination of the information from
868
different images or different bands of frequencies they will have more information or there will
be richer in the information contained.
So this image registration technique has many applications. The first application we have already
said that the stereo imaging techniques. In stereo imaging what we do is, we take the image of
the same scene or the images of the same object by two cameras which are slightly displaced and
there we have assumed that the cameras are otherwise identical that is only apart from the
displacement along a particular axis y axis or x axis, the features of the camera are identical that
is they have same focal length same view angle, and all this things.
So once I acquired these images that one of them we call as left image and the other one is called
as right image, then if I go for this point by point correspondence that is for a particular point in
the left image if I can find out, what is the corresponding point in the right image then from this
two, I can find out what is the disparity for that particular point location.
And if this disparity is obtained for all the point in the images then from the disparity we can find
out what is the depth or the distance of the different object points from the camera. So that is in
case of stereo imaging we have to find out this point correspondence, or the point registration or
this is also called point matching.
869
The other application that we have just said that in remote sensing where the images can be taken
by different sensors, working different bands, even the sensors may be located in different
locations. So the images of the same scene are taken by different sensors working in different
bands, they are also in different geometrical location. They are also, if we go for image
registration that is point by point correspondence among different images, then we can fuse those
different images or you can combine those different images so that your fused image become
more rich in terms of information contain. So the information extraction from such fused images
becomes much more easy.
The other application again the images may be taken at different times. So if the images taken
from different times and in all those images if we can register those images that is we can find
out which is the point that a at a particular for a particular point in a given image, what is the
correspondence point in the other image which is taken at some other time instant.
Then by this registration we can find out what is the variation at different point location in the
scene. And from this variation we can extract many information like vegetation growth or may
be the land erosion or deforestation or the occurrence of a fire so all this different information
can be obtained when we go for registration of the images which area taken at different instants
of time.
There are other application like finding a place in picture where it matches a given pattern. So
here we have a small image which is known as a pattern or a template and we want to find out
that in another image which is usually of bigger size, where does this template match the best.
Now this has various application like automated navigation where want to find out that, what is
the location of a particular object with respect to a map?
870
(Refer Slide Time: 10:00)
Now to explain the registration techniques, let us take the first example that example of the
template matching. So for this template matching we take a template of a smaller size. We called
the template to be a template f, say f is a two dimensional image of a smaller size and we have an
image g which is a bigger size. So the problem of template matching is to find out where this f,
the template f matches best in the given image g.
So this f is called a template or it is also called a pattern. So our aim is that given a very big
image say g, a two directional image g and a template f which is usually of size smaller than that
of g. We want to find out that where this template f matches best in the image g. So to find out
what the template f matches best in the given image g we have to have some measure of
matching, which is called which may be called as mismatch measure or the opposite of this that
is the match measure or the similarity measure.
So we have to take different match or similarity measure, so you call them match measure or
similarity measure. To find out where this template f matches the best in the given image g. So
there are various such match or mismatch measures. And let us see that, what are the different
measures that can be used?
871
(Refer Slide Time: 11:55)
So we have the given image g, g is the given image. And we have the template f. So f is the
template. So we have to find out the measure of similarity between a region of the image g and
the template f. So there are various ways in which this similarity can be measured so that we can
find out the match between the f and g over a certain region say given by A. So one of the
similarity image or the simplest similarity image is we take the difference of f and g. So you take
the absolute difference between f and g.
And find out the maximum of the absolute difference. And this maximum has to be computed
over the region say A. The other similarity measure can be that again we take the absolute
difference of f and g. And then integrate this absolute difference over the same region A. The
other similarity measure can be f - g .
A
So you find that in case when it is the difference between f and g, so when I’m talking about the
difference it is pixel by pixel difference. So it takes the difference between f and g take the
absolute value and the maximum of that computed over the given region A. In the second case it
is f - g . So this is in the analog case if I convert this in the digital form this will take the form
A
872
So you find that this is nothing but what is called the sum of the absolute difference. So sum of
the absolute difference between the image g and the template f over the region A, and if I convert
f(i,j) - g(i,j)
2
this expression again in the digital form this becomes .
i,j A
So this is nothing but some of the difference square so, if I say that f(i,j) - g(i,j) is the difference
2
between two images or the error, so this last expression that is a f - g
A
. In the digital domain
f(i,j) - g(i,j)
2
it becomes , this is something equivalent to sum of square error.
i,j A
2
Now out of these three different measures. It is the last one
A
f - g , this is very, very
2
interesting. So if I expand this terms f - g
A
, if I expand this then this becomes
2 2 2
f - g
A
f g 2 f .g . So all these double integrations are to be taken over the given
A A A
region A.
873
2
Now find that for a given image this f , for a given template at this is fixed and also for a
A
2 2
given image over the region A, g
A
this is also fixed. Now, what is this f - g
A
, this is
nothing but a sum of the square differences that means this gives the degree of mismatch, or this
2
is nothing but the mismatch measure, ok. So if this f - g
A
is minimum because this is the
mismatch measure so wherever this is the minimum at that particular location f matches the best
over that region that particular region of g.
2 2 2
Now when I expand this, this becomes f - g
A
f g 2 f .g , and as I said for a given
A A A
2 2
template f
A
is fixed at for a given image and a given region g
A
is also fixed that means this
2
A
f - g this will be minimum, when this f .g this term will be maximum.
A
So whenever the mismatch measure is minimum that will lead to f into g double integration over
2
the region f this will be maximum. So when f - g
A
is taken as the measure of mismatch we can
take the f .g over the region, given region A, this to be the match measure or the similarity
A
measure. So this means that whenever the given template matches the best in a particular region,
in a particular portion of the given image g in that case f .g over the region A will have the
A
maximum value and we take this as the similarity measure or the match measure.
874
(Refer Slide Time: 20:05)
Now the conclusion can also be drawn from what is called Cauchy Schwartz Inequality. So this
f.g f . g
2 2
Cauchy Schwartz inequality says that . And these two terms will be equal
f.g f . g
2 2
only when g = cf . So this Cauchy Schwarz inequality says that , the left
hand side and the right hand side will equal, whenever g = cf , otherwise left hand side will
always be less than the right hand side.
So this also says that whenever f or the template is similar to a region of the given image g
within with a multiplicative factor of constant c then this f.g will take on the maximum value,
otherwise it will be less. If I convert this into the digital case then the same expression is, can be
For this i and j belonging to the region A, here also for i and j belonging to the region A. And
this left hand side and the right hand side will be equal only when g i, j = cf i, j and this has
to be true for all values of (i, j) within the given region. So for this template matching problem
we have assume that f is the given template and g is the given image and we also assumed that f
is less than, or the size of f is less than the size given image g.
875
(Refer Slide Time: 23:45)
Now, again from the Cauchy Schwartz inequality, so I go back to this Cauchy Schwartz
1
2
inequality, what we get, f(x,y).g(x+u, y+v)dxdy f 2 (x,y)dxdy. g 2 (x+u, y+v)dxdy .
A A A
Now the reason we are introducing term this two variables u and v is that whenever we try to
match the given pattern f against the given image g we have to find out what is the match
measure or the measure similarity measure at different location g.
So for this f has to be shifted at all possible location in the given image g and that shift, the
amount of shift that has to be given to the pattern f to find out the similarity measure at that
particular location we introduced this two shift components u and v. So this says that shift along
x direction is u and shift along y direction is v. So here, in this expression the similarity between
the given template f and the image g with shift u and v is computed. And this is computed over
the given region A.
Now because this f(x,y) is small and the value of f(x,y) is zero outside the region A, so we can
replace this left hand side by f(x,y).g(x+u, y+v)dxdy . And as I said that because f(x,y) is
876
zero outside the region A. So this definite integral, the integral over A, can now be replaced by
f(x,y).g(x+u, y+v)dxdy . So this is what we get from this left hand side of the expression.
Now if you look this particular expression that is f(x,y).g(x+u, y+v)dxdy , this is nothing but
the cross correlation between f and g. And then if you look at the right hand side, this
f
2
(x,y)dxdy . This is a constant for a given template, whereas this particular components that
A
g
2
is (x+u, y+v)dxdy , this is not a constant, because the value of this depends upon the shift u
A
and v.
So though from the left hand side we have got that this is equivalent to cross correlation between
the function f and g but this cross correlation directly cannot be used as similarity measure
f
2
because the right hand side is not fixed, though (x,y)dxdy is fixed but
A
g
2
(x+u, y+v)dxdy integral is not fixed, it depends upon the shift u and v. So because of this
A
the cross correlation measure cannot be directly used as a similarity measure or a match measure.
So what we have to go for is, what is called a normalized cross correlation
877
(Refer Slide Time: 29:30)
So if we call this cross correlation measure f(x,y).g(x+u, y+v)dxdy , if I represent this as the
cross correlation Cfg, then the normalized cross correlation will be given by
Cfg
1
.
2
g (x+u, y+v)dxdy
2
A
878
So in our previous one, so what is said is that Cfg has to be, the normalized cross correlation will
1
2
be Cfg divided by g 2 (x+u, y+v)dxdy . And you see that once I take this normalized cross
A
correlation this is what we are calling as normalized cross correlation. So once we consider this
normalized cross correlation, you find that Cfg will take the maximum possible value, which is
1
2
given by f (x,y)dxdy and because this is fixed so this region of integration is not very
2
A
important.
Square root of this, this is the maximum value which will be attained by this normalized cross
correlation for a particular value of u and v. So for the value of u and v for which this function g
becomes some constant Cf. So for that particular value, that particular shift (u, v), where g = cf ,
this normalized cross correlation will take the maximum value and the maximum value of this
1
2
normalized cross correlation is given by f 2 (x,y)dxdy . Thank you.
A
879
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 10 Lecture Number 49
Image Registration - 2
Hello, welcome to the video lecture series on Digital Image Processing. Say here this is the
image g, which is given in the form of a 2 dimensional matrix, 2 dimensional array of size 6x6
and our template is of 3x3 matrix which is given on the right hand side, ok. Now, if I calculate
the Cfg for this, so what I have to do is, to find out the match location, I have to take the template,
shift it all possible location in the given image g and find out the cross correlation or the
similarity measure at that, for that particular shift.
So initially let us put this template at the left most corner, and what we do is because our
3 3 2
template f = 3 2 2 . Let us match, let us place the center of the template at the location at
2 2 2
which we want to find out the similarity measure. So because of this 3 will be placed over here.
this particular element will be placed over here, this 2 will go here, this 2 will come here, this 2
880
will come here. And on the left hand side the other part of the template will be like this 2, 3, 3, 3,
2.
So this will be the position of the template and at this location we have to find out what is the
similarity value. So for that let us find out what is Cfg, the cross correlation between f and g for
this particular shift. Now, here if you compute you find that all this elements of the template are
going beyond the image. So if I assume that the image components are zero beyond this
boundary of the image then this element will not take part in the computation of cross
correlation.
The cross correlation will be computed only considering these 4 elements, and here if you
compute you will find that this Cfg will for this particular position, for this particular sheet will
attain a value of 47, because here it is 40 +2+2+3, so this becomes assume a value of 47.
Similarly, if I want to find out the cross correlation at this particular location over here, where
the template is placed, the center of the template is placed over here. The other components of
the template will come like this; you will find that cross correlation at this location is given by
Cfg = 56.
So like this for all possible shifts within this particular image, I have to find out what is the cross
correlation value. And if I complete this particular cross correlation computation in that case you
881
will find that, finally I get a cross correlation matrix which is like this. So this gives the complete
cross correlation matrix when this template is shifted to all possible locations in this given image
and the cross correlation value is computed for all such possible shifts.
Now from this you find that, here the maximum cross correlation value is coming at this
particular location which is given by 107 and if I take this 107 to be the similar or the cross
correlation values to be the similarity measure, that means this is where I get the maximum
similarity. And because of this it gives a false match that it we, it appears that the template is
matching the best in this particular, as shown by this red rectangle.
But that is not the case because if I just by checking visually we can see that the template is best
match in this location. So that is why we said that the cross correlation measure directly cannot
not be used as a similarity measure.
So over here as we have said that we cannot use the cross correlation measure directly as a
similarity measure. So what we have to do is we have to compute the normalized cross
correlation value. So for commutation of the normalized cross correlation, we have to compute
1
2
this particular components that is g 2 (x+u, y+v)dxdy . So this is the one that we have to
A
882
compute for all possible shifts in the given image g and we have to normalized the cross
correlation that we compute with the help of this quantity.
So if I compute this again let us take the same location, if I compute this
1
2
g (x+u, y+v)dxdy
2
then we will find that this particular value will come out to be
A
something like 20.07 because this is nothing but 202+ 12 + 12 + 12, other elements for this shifts
within this 3x3 window is equal to zero. So this is what we get, I have to take the square root of
this.
So if I compute these components I get the value of 20.07. So this is how if I compute this
1
2
quantity. That this g 2 (x+u, y+v)dxdy and summation over the region A for all possible u
A
and v that is for all possible shifts, then finally this normalization components that I get is given
by this given by this particular matrix.
So this cross correlation normalization coefficient has been computed for all possible values of u
and v, and then what we have to do is we have to normalized the cross correlation with this
normalization factors.
883
(Refer Slide Time: 08:15)
So if I do that normalization this is my original cross correlation coefficient matrix that I have
computed earlier, and by using the normalization what I get is the normalization, normalized
cross correlation coefficient matrix which comes like this.
And now you find the difference in the original cross correlation matrix the maximum was given
in this location which is 107. And in the normalized cross correlation matrix the maximum is
coming at this location which is 7.48. So now let us see that using this as the similarity measure
that is normalized cross correlation as the similarity measure, what happen to our matching.
884
(Refer Slide Time: 08:20)
So this is the same matrix the normalized cross correlation matrix, the maximum 7.48 is coming
at this location and for this maximum the corresponding matched location of the template within
the image given over here as shown by this red rectangle. So now you find that this is a perfect
matching where the template is, where the similarity measure given the perfect measure where
the template matches exactly.
So that is possible with the normalized cross correlation but it is not possible using the simple
cross correlation. So the simple cross correlation cannot be used as a similarity measure but we
can use this normalized cross correlation as a similarity measure.
885
(Refer Slide Time: 09:20)
Now coming to the application of this so this here we have shown that application over real
image. So this is an aerial image taken from the satellite, and the smaller image on the left hand
side which is a part of the image which is cut out from this original image, and we are using this
as a template. The smaller image we are using as a template. So this is our template f which we
want to match against this image g. So we want to find out where in this given image g this
template f matches the best.
So for doing that as we have said earlier that what we have to do is we have to paste this
template f that is we have to shift this template f at all possible location in the given image g at
for all possible such location we have it find out what is the normalized cross correlation? And
wherever we get the normalized cross correlation to be maximum that is location where this
template matches the best. So let us placed this template at different location in the image and
find out what is the corresponding similarity measure we are getting.
886
(Refer Slide Time: 10:45)
So this is the template and this the given image. If I place this template over here we are calling
this as location1. So this location is given by this red rectangle. Then the similarity measure that
we are getting is a value 431.05. If I place it location 2, which is over here the similarity
measure, or the normalized cross correlation is get is given by 462.17. If I place it at location 3
then at location 3 the similarity measure is 436.94. If I place the template at location 4 the
corresponding similarity measure is coming out to be 635.69. If I place it location 5 the
corresponding similarity measure is coming out to be 417.1. If I place it at this location 6 the
corresponding similarity measure comes out to be 511.18.
So you see that from its similarity measure values and if I compute for all possible location, of
course for all possible location cannot show it on the screen. So if I compute this for all possible
location then you will find that this similarity measure value is coming out to be maximum that
is 635.69 for this location, which is location 4 in the given image. And exactly this is the location
from where this particular template f was cut out.
And if you look at this picture after placing this template you find that there is an almost a
perfect match. So for a given image and a given template if I find out the normalized cross
correlation for all possible shifts u and v, then for the shift u and v where you get the normalized
cross correlation to be maximum, that is the location where the template matches the best. So
887
obviously this is a registration problem where we want to find out, where the given template
matches the best in a given image.
Now to come to the other application of this registration, you find that this registration is also
applicable for image restoration problem. Earlier we have talked about the image restoration
problem, where you, where you have it estimate that what is the degradation model which
degrades the image. And by making use of this degradation model we can restore the image by
different types of operation like inverse filtering, wiener filtering, constrained least square
estimation and all those defined kinds of techniques.
Now here we are taking about a kind of degradation, where the degradation is given in the form a
geometric distortion. And that is a distortion, which is introduced by the optical system of the
camera. So if you take an image of a very large area you might have noticed that as you move
away from the center of the image, the points try to become closer to each other. So that is
something which leads to a distortion in the image as the point goes away from the center of the
image.
So here we have shown one such distortion. So suppose this is the figure that we want to, or for
which we want to take the image. But actually the image comes in this particular form then this
distortion, which is introduced in the image that can be corrected by applying this image
888
registration technique. So for doing this what we have to do is we have to register different
points in the expected image and the different points in the restored image, in the degradation
image or the distorted image.
So if I have, once I can do that kind of registration, so is this case this is a point which
corresponds to this particular point. So if somehow we can register these two, that is we can find
out we can establish the correspondence between this point and this point, we can establish the
correspondence between this point and this point, we can establish the correspondence between
this point and this point, and we can establish the correspondence between this point and this
point then, it is possible to estimate the degradation model.
So here you find that find out for estimating this degradation, we have to go for registration. So
this registration is also very, very important for restoring a degraded image where the
degradation is the introduced by the camera optical system. So the kind of restoration that will
be, that can be applied here is something like this.
Say if I have an original point say, (x,y) in the original image and this point after distortion is
mapped to location say x prime y prime, then what we can do is, we can go for estimation of a
polynomial degradation model. In the sense that I estimate that x' is a polynomial function of x
889
and y. So I write in this form that x' = k1x+k2y+k3xy+k4 . Similarly, y' can be written as
y' = k4x+k6y+k7xy+k8 .
So from this you find that if we can estimate this constant coefficient k1 to k8 then for any given
point in the original image we can estimate what will be the corresponding point in the degraded
image. So for estimation, for computing this k1 to k8, these a constant coefficient because there
are 8 such unknowns, I have to have 8 such equations. And those equations can be obtained by 4
pairs of corresponding points from the 2 images.
So that is what, if I look at this figure. I have to get 4 such correspondence pair so once I have 4
such correspondence pairs I can generate 8 equations, and 8 using those 8 equations I can solve
for all those constant coefficient k1 to k8. And once I get that then, what I can do is I can say that
I want an original image and undistorted image, and to a particular point to the undistorted image
I apply that distortion, find out a point in distorted image and whatever is intensity value at that
location in the distorted image I simply copy that in my estimated location in the original image.
So I can find out a restored image from the distorted image. Obviously while doing this we will
find that there will be some location where I don’t get any information that is for a particular
location, say p, q in the estimates undistorted image, when I apply the distortion then in the
distorted image I don’t get any point at that particular location. So in such cases we have to go
890
for interpolation techniques to estimate, what will be the intensity value at that point location in
the distorted image? So for that the different interpolation operation that we discussed earlier can
be used.
So this is how this image restoration technique is also playing a measure role in restoration of
distorted images. This image registration technique is also very, very useful as I said in image
fusion or combining different images. And for that also we have to go for image registration. Say
for example here we have given shown 2 types of images, one at magnetic resonance images and
CT scan images.
Now magnetic images that give you a measure of water content, whereas in case of CT images,
CT X ray images is that gives you a brightest region for the bone region ok. So if I combine, if I
can combine the magnetic resonance image along with the CT X ray image. The fused image
that you get, where you can get both the information that is the water contained as well as the
nature of the bone in the same image. So naturally the information extraction is much more
easier in the fused image.
Now again for doing this the first operation has to be image registration because the alignment of
images, of the MR images and the CT images even if there same region they may not be proper
aligned or they may not have properly scaled. They may not be, there may be some distortion
891
images. So the first operation we have to do is we have to go for registration, using registration
we have to get that what is the transformation that can be applied to align, properly align the 2
images. And after that transformation applying the transformation when we align the 2 images
then only they can fused properly.
So here there are 2 MR images and there are 2 CT images and obviously you find that this MR
images and this CT images though they are of the same region but they are not properly align.
Similarly on the bottom row this MR images and this CT images they are not properly aligned.
So first operation we have do is alignment.
So in this slide, form the right hand side what we have got is the result after alignment, and of the
right hand side it is the result after fusion. So in this fused image you find that the green regions
this shows you what is the bone structure and this bone structure has been obtained from the CT
image. And the other regions can get the information from MR image. So this is much more
convenient to interpret then taking the MR image and the CT image separately.
892
(Refer Slide Time: 22:30)
The other application of this is for image mosaicking that is the normally the cameras have very
narrow field of view so using a camera which is having a narrow field of view you cannot image
a very large area.
So what can be done is you take the images, smaller images of different region of in the scene
and then you try to stretch those smaller images to give you a large field of view image. So that
is what has shown here on the top there are two smaller images. These 2 images are combined to
give a big, a bigger image as shown on the bottom. So this is a problem which is called image
mosaicking and here again because these are 2 images they may be scaled differently, their
orientation may be different, so firstly we have to go for normalization and alignment, and for
this normalization and alignment again the first operation has to image registration.
893
(Refer Slide Time: 23:25)
This shows another mosaicking example where this bottom image has been obtained from top 8
images. So all this top 8 image have been combined properly to give you the bottom images. So
this is the mosaicking that image get. So with this we complete our discussion on image
registration. Thank you.
894
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 10 Lecture Number 50
Color Image Processing: Color Fundamentals
Hello, welcome to the video lecture series on Digital Image Processing. In our last lecture we
have talked about image restoration, image registration problem.
So we have talked about different image mismatch or match measures. We have talked about the
cross correlation between 2 images. And we have also seen some application of registration
technique. Now, including this image registration technique, whatever we have done till now in
our digital image processing course you have seen that all our discussion where mainly based on
black and white images. That is we have not considered any color image during our discussion.
Now studying from today and coming few lectures we will talk about the color image
processing.
895
(Refer Slide Time: 01:25)
So today, what we are going to do is we are going to introduce the concept of color image
processing, we are going to see that what are primary and secondary colors, we are going to talk
about color characteristics, then we will see chromaticity diagram and how the chromaticity
diagram can be used to specify a color. We will see 2 colors models, 1 of them is RGB color
model or red, green and blue color model and other 1 is HSI color model. And we will also see
that how we can convert from one color model to another color model.
Now first let us talk about why we want color image processing, when we get information from
black and white images itself. The reason is color is a very, very powerful descriptor and using
the color information we can extract the objects of interest from an image very easily, which is
not so easy in some cases using, using black and white or simple grey level image. And the
second motivation we why we go for color image processing is that or we talk about color
images is that, human eyes can distinguish between 1000 of color and color shades whereas
when we talk about on the black and white image or grey scale image we can distinguish only
about 2 dozens of intensity distribution or different grey shades.
So that is the reason the color image processing is a very, very important topic, firstly because
we can distinguish between more number of colors and secondly we can identify some object
seen at color image very easily which otherwise may be difficult from a simple intensity image
896
or a gray level image. Now, coming to color image processing there are two major areas. One of
the areas we call as full color image processing.
We say full color processing, and other area is pseudo color processing. Now, what is meant by
this full color processing or pseudo color processing? When we talk about full color processing,
the images which are acquired by, by full color TV camera or a by full color scanner, then you
find that almost all the colors that you can perceive they are present in the images. So that is
what is meant by a full color image and when you try to such a full try to process such a full
color image what will try to process we will be taking into consideration all those colors which
are present in the image.
Whereas when we talk about this pseudo color processing, the pseudo color processing is a
problem where we try to assign certain colors to a range of gray levels. When we take an
intensity image or simply a black and white image, which has intensity levels from say 0 to 255,
what I can do is we can sub divide, we can divide this entire intensity range, into a sub number of
sub ranges. Say for example I can divide 0 to say 50, this intensity level will be in one range.
May be 50 to 100 intensity levels will be in another range. And to this range I can assign one
particular color, whereas in this range 50 to 100 I can assign another particular color.
897
And the pseudo color image processing is mostly useful for human interpretation. So as we said
that we can hardly distinguish around 2 dozens of intensity of gray shades, whereas if we go for,
so in such cases it may not be possible for us to distinguish between two colors gray regions
which are very near to each other, that intensity values are very near to each other. So in such
cases if you go for pseudo color pseudo coloring technique that is we assign different colors to
different ranges of intensity values then from the same intensity image of black and white image
we can extract the information much more easily. And this is mainly useful as I said for human
interpretation purpose.
Now, what is the problem with in the color image processing? The interpretation of color as the
color is interpreted by the human beings, this problem is a psycho physiological problem. And
we have not yet been fully understand, what is the mechanism by which we really interpret eye
color? So though the mechanism is not fully understood, but the physical nature of color we can
represent formerly we can express it formally. And a formal expression is really supported by
some experimental results.
Now the concept of color is not very new. You know from your school level physics, from
school level optics that way back in 1666, it was Newton who discovered the color spectrum. So
what he did is, his experimental set up was something like this.
898
You have an optical prism and you pass white light through this optical prism, and as the white
light passes through the optical prism, on the other side when this white light comes out of it the
light does not remain white anymore. However it is broken into a number of color components
which is known as spectrum. So as has been shown in this particular diagram, you find that at
one end of the spectrum. What we have is the violet at one end we have the violet and at the
other end we have the red color. So the color components that vary from violet to red at this was
really discovered by Newton way back in 1666.
Now the thing is how do we perceive color or how do we say that an object is of a particular
color. We have seen earlier that we perceive an object we see an object because light falls an
object or the object is illuminated certain source of light. The light gets deflected from the object
it reaches our eye then only we can see the object. Similarly we can perceive the color depending
upon the nature of the light, which is deflected by the object surface.
So because we have to perceive this nature of the light, so we have to see what is the spectrum of
light which is or the spectrum of the energy which is really in the visible range, Because is only
in the visible range that we are able to perceive any color.
So if you consider the electromagnetic spectrum say as shown here. The electromagnetic, the
complete electromagnetic spectrum ranges from gamma rays to radio frequency waves. And you
899
find that this visible spectrum, the visible light spectrum it occupies only a very narrow range of
frequencies in this entire electromagnetic spectrum. And here you find that the wavelength of the
visible spectrum that roughly varies from say 400 nanometer to 700 nanometers. So at one end it
is around 400 nanometer wavelength and at the other end it is around 700 nanometer in
wavelength.
So whenever a light falls on an object and if the object reflects lights of all wavelength in this
visible spectrum in a balanced manner, that is all the wavelength reflected in the appropriate
proportion in that case that object will be appearing as a white object. And depending upon a
wavelength, the dominant wavelength within this visible spectrum the object will appear to be a
colored object and the object color will depend upon what is the wavelength of light? That is pre
dominantly reflected by that particular object surface.
Now coming to the attributes of light, if we have an achromatic light, that means a light which
doesn’t not contain any color component. The only attribute which describes that particular is the
intensity of the light. Whereas if it is a chromatic light in that case as we have just seen the
wavelength of the chromatic light within the visible range can vary from roughly 400 nanometer
to700 nanometer. Now there are basically three quantities which describe the quality of light.
900
So what are those quantities? One of the quantity is what is called radiance. The second quantity
is called is called luminance. And the third quantity is called Brightness. So we have these three
quantities Radiance, Luminance and Brightness which basically describe, what is the quality of
light? Now, what is this radiance? Radiance is the total amount of energy, which comes out of a
light source. And as this is the total amount of energy. So this radiance is to be measured in the
form of in units of watts. Whereas Luminance it is the amount of energy that is perceived by an
observer. So you find the difference Radiance and luminous.
Radiance is the total amount of energy which comes out of a light source, whereas Luminance is
the amount of energy which is perceived by an observer. So as the Radiance is measured in units
of watts. It is Luminance which is measured in units of what is called Lumen. Whereas the third
quantity that is Brightness it is actually subjective measure, and it is practically not possible to
measure the amount of Brightness. So though we can measure the Radiance, luminance,
Radiance and luminance but practically we cannot measure, what is Brightness?
Now again coming to our color images or color lights. Most of you must be aware that which we
consider colors, or when we talk about a colors, we normally talk about three primary colors.
And we say that the three primary colors are red, green and blue.
901
So we consider the primary colors of light, colored light. So which is considered the primary
colors to be red, green and blue. So you consider these three colors to be the primary colors. And
normally we represent it as R, G and B. Now you find that from the spectrum which was
discovered by Newton there were actually 7 different colors. But out of those 7 colors we have
chosen only these three different colors red, green and blue to be the primary colors. And we
assume that by mixing these primary colors in different proportions we can generate all other
colors.
Now why do we choose these three colors to be the primary colors? The reason is something like
this that actually there are some cone cells in our eyes which are responsible for color sensation.
So there are around 6 to 7million cone cells, around 6 to 7 million cone cells which are really
responsible for color sensation. Now out of this 6 to 7million cone cells around 65 % of the cone
cells they are sensitive to red light 33 % of the cone cells are sensitive to this sense green light
and roughly 2% of the cone cells they sense blue light.
So because of the presence of this three different cone cells in our eyes which sense red, green
and blue these three color components. So we considered red, green and blue to be our primary
colors and assume that by mixing these primary colors in appropriate proportion we are able to
generate all other different colors. Now as per this CIE standard specified three different
wavelengths for three different colors. So CIE specified red to have a wavelength of 700
nanometers. Green to have a wavelength of 546.1 nanometer, and blue to be a wavelength of
435.8 nanometer.
902
(Refer Slide Time: 18:01)
But the experimental result is slightly different from this. Let us see how the experimental result
looks like. This one this diagram shows the sensitivity of those three different cones in our eyes
that we have just said. So you find that the cones which are sensitive to blue light, to blue color.
These cones actually receive wavelength ranging from around 400 nanometer to 550 nanometer
whereas the cones which are sensitive to green lights, they are sense sensitive to wavelengths
ranging from slightly higher than 400 nanometer to an wavelength of something around say 650
nanometer.
Whereas the cones which area sensitive to red light, they are sensitive to wavelength starting
from 450 nanometer to around 700 nanometer though the sensitive is maximum for this type of
cones. Says blue cone is maximally sensitive at a wavelength of 445 nanometer as is shown in
this diagram.
903
(Refer Slide Time: 19:15)
So as is shown in this diagram you find that the blue cells the blue cones. They are most sensitive
to a wavelength of 445 nanometers. The green cones are most sensitive to a wavelength 535
nanometer. Whereas the red cone is most sensitive to a wavelength of 575 nanometers. So these
experimental figures are slightly different from what was specified by CIE. And one point has to
be kept in mind that though CIE standard specifies a red, green and blue to be of certain
wavelength but no single wavelength can specify any particular color.
In fact from the visible spectrum, from the visible domain of the spectrum that we have just seen
it is quite clear that when we considered two spectrum colors, two adjacent spectrum colors there
is no clear cut boundaries between those two adjacent spectrum colors. Rather one color slowly
or smoothly gets merged into the other colors.
904
(Refer Slide Time: 20:40)
So as you can see from the same diagram that whenever we have a transition from say green to
yellow. You find that we don’t have any clear cut boundary between green and yellow. Similarly
whenever there is transition from say yellow to red, the boundary is not clearly defined. But we
have a smooth transition from one color to the another color. So that clearly says that no single
color may be called or no single wavelength may be called red, green and blue. But it is a band
of wavelengths which give you color sensation, a band of wavelength that gives you green color
sensation, a band of wavelength that give you red color sensation, at the same time a band of
wavelength that give you say blue color sensation.
So having specific wavelengths as standard does not mean that these fixed RGB components
alone, components alone when mixed properly will generate all other colors. But rather we
should have a flexibility that we should also allow the wavelengths of object of these three
different colors to change because as you have just seen that green actually specifies a band of
wavelength, red actually specifies a band of wavelength, similarly blue also specifies a band of
wavelengths.
So to generate all possible colors we should allow the wavelengths of these colors are red, green
and blue also to change. Now, when I say that this are the different primary colors that is red,
green and blue, mixing of the primary colors generate the secondary colors. So when I mixed say
red and blue.
905
(Refer Slide Time: 22:45)
If we mixed red and blue, you find that both red and blue they area primary colors. Red and blue
will generate a color called magenta, which is a secondary color. Similarly, if we mix green and
blue this will generate a color which is called cyan. And if we mix yellow sorry red and green, if
we mix red and green these two generate color yellow. So as we have said that the red, green and
blue we consider these three colors as primary colors, by mixing the primary colors we generate
the secondary colors.
So these three colors magenta, cyan and yellow they will be called secondary colors of lights.
Now here another important concept is the pigments. As we have said that red, green and blue
are the primary colors of light, and if we mix these colors we generate the secondary colors of
lights which for example magenta, cyan and yellow. When it comes to the pigments the primary
color of a pigment is defined as a wavelength which is absorbed by the pigment and it reflects
the other two wavelengths.
So the primary colors of light should be the opposite of the primary color of a pigment. So as
red, green and blue they are the primary colors object of light, whereas magenta, cyan and
yellow they are the primary colors of a pigments. So when it comes to pigment we will consider
this magenta, cyan and yellow, they are to be the primary colors. So these are the primary colors
for pigments. And in the same manner this red, green and blue which are the primary colors of
light, these three will be the secondary colors for pigments.
906
And as we have seen that for the colors of light, the primary colors of light if we mix red, green
and blue in appropriate proportion we generate a white light. Similarly, for the pigments if we
mix the cyan, magenta and yellow in appropriate proportion we will generate the black color. So
for pigments, appropriate mixing of the primary colors will generate black, whereas for light the
appropriate mixing of primary colors will generate white.
Now, so far what we discussed that is the primary colors of light which are red, green and blue or
the primary colors of pigments which are magenta, cyan and yellow, these are the concepts of the
colors components, we consider when we talk about the color reproduction or this are from the
hardware point of view. That is for a camera or for a display device, for a scanner, for a printer
we talk about this primary colors component.
But when we perceive a color as human being and when we look at a color we don’t really think
that how much of red components or how much blue components or how much of green
component, that particular colors has. But the way we try distinguish the color is based on the
characteristics which are called Brightness, Hue and saturation.
So for us for perception purpose the color components will be taken as or the characteristics are
Brightness, Hue and saturation instead of the red, green and blue or cyan, magenta and yellow.
Now, let us see that, what does these three attributes mean? So what is Brightness, brightness is
907
nothing but a chromatic notion of intensity. As we have seen that in case of black and white
image we talk about the intensity. Similarly for a color image there is a chromatic notion
intensity it is not really intensity which we call as bright or Brightness. Similarly Hue, it
represents the dominant wavelength in a mixture of colors.
So when you look at a secondary color, which is a mixture of different primary colors there will
be one wavelength which is dominant one, dominant wavelength. And the overall sensation of
that particular color will be determined by the dominant wavelength. So this Hue, this particular
attribute it indicates that, what is the dominant wavelength present in a mixture of color?
Similarly the other term saturation, you find that whenever we talk a about particular colors red
there may be various shades of it.
So this saturation indicates that, what is the purity of that particular color or in other words what
is the amount of light which has been mixed to a mixed to a particular color to make it a diluted
one. So these are basically the three different attributes, which we normally use to distinguish
one color from another color. Now coming to the spectrum colors because the spectrum colors
are not diluted there is no white light, white components added to a spectrum color. So spectrum
colors are fully saturated.
Whereas if we take any other color which is not a spectrum color, say for example if we consider
a color say pink, pink is nothing but a mixing of white with red. So red plus white this makes a
pink color. So red is a fully saturated color because it is a spectrum color and there is no white
light mixed in it. But if we mix white light with red the color generated is pink, so pink color is
not full saturated. But red is fully saturated.
908
(Refer Slide Time: 30:05)
So we have these three concepts for color perceptions that is Hue and saturation and the other
one is Brightness. And as we said that Brightness indicates a chromatic notion of the intensity,
whereas Hue and saturation they gives you the color sensation. So you say that Hue and
saturation together they indicate what is the chromaticity of the light, whereas this Brightness
gives you some sensation of intensity. So using this hue saturation and intensity what we are
trying to do is. We are separating the Brightness part and the chromaticity part.
So whenever we try to perceive a particular color we normally perceive it in the form of Hue
Saturation and Brightness whereas from the point of view of hardware, it is the red, green and
blue or magenta, cyan and yellow which are more appropriate to describe the color. Now, the
amount of light or the amount of red, green and blue lights, which are required to from any
particular color is called a tristimulus, we call it tristimulus.
909
(Refer Slide Time: 31:45)
And obviously because this indicates what is the amount of red light, green light and blue light
which are to be mixed to from any particular color. So this will have three components one x
components, y component and z component. And a color is normally specified by what is called
chromatic coefficients. So we call them as chromatic coefficients, and this chromatic coefficient
X
are obtained as the coefficient for red is given by x, x = . So this capital X is the amount
X+Y+Z
of red light, capital Y is the amount of green light and capital Z is the amount of blue light which
are to be mixed, to form a particular color.
So the chromatic coefficient for red which is given by lower case x, which is computed like this.
Y
Similarly chromatic coefficient for green is computed as y = , and similarly for blue it is
X+Y+Z
Z
z= . So this lower case x, y, z, these are called the chromatic coefficients of a particular
X+Y+Z
color. So whenever we want to specify a color, we have to specify it by its chromatic coefficient.
And from here, you find that this sum object of the chromatic coefficient that is x + y + z = 1 . So
this is represented in normalize form.
So as any color can be specified by its chromatic coefficient, in the same manner there is another
way in which a color can be specified that is with the help of what is known as a CIE
910
chromaticity diagram. So a color can be specified both by its chromatic coefficient as well as it
can be specified with the help of chromaticity diagram. Thank you.
911
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 11 Lecture Number 51
Color Model
So, let us see what is this chromaticity diagram, so here we have shown this chromaticity diagram.
And you find that it’s a color diagram in a 2 dimensional space. We have the horizontal axis which
is the x, which is the axis representing the x and we have the vertical axis which is the axis
representing y. That means the chromatic coefficient for red is along the horizontal axis and the
chromaticity coefficient for green is along the vertical axis.
And if we want to specify any particular color, say for example I take this particular point and I
want to find out that how this particular color can be, say here how this particular color can be
specified. So as we have said that we have to, we can specify by its chromatic coefficient. Now
two of the components of the chromatic coefficient that is x and y that is red component and green
component we get, we can get from the horizontal axis and the vertical axis. And the third
component that is z obviously in this case will be given by z = 1 - x+y .
912
So x and y, I obtained from this chromaticity diagram and I can get the chromatic coefficient
component z simply by using the relation that x + y + z = 1 . And if you study this chromaticity
diagram you find that all the spectral colors. They are represented along the boundary of this
chromaticity diagram ok. So along the boundary we have all the spectral colors. In this
chromaticity diagram there is a point which is marked as say this point which is marked as point
of equal energy. That means all the red, green and blue components they are mixed in equal
proportions, and this is the one which is the CIE standard of white ok. And as we said the notion
of saturation, you find that all the point on the boundary because they are the spectrum colors so
all the colors along the boundaries they are fully saturated.
And as we move inside the chromaticity diagram, away from the boundary, that we as move away
from the boundary the color becomes less and less saturated. So, one use of this chromaticity
diagram is that we can specify a color using this chromaticity diagram, we can find out what are
the chromatic coefficient x, y and z by using this chromaticity diagram. And not only that this
chromaticity diagram is also useful for color mixing.
Let us see how. Say for example within this I take two color points. Say I take one color point
somehow here and I take; I take one point somewhere here and I consider another point somewhere
here. And if I join these two points by a straight line, say like this in that case this straight line
indicates that what are the different colors that I can generate by mixing the color present at this
913
location and the color present at this location. So all possible mixture of these two colors can create
all the colors which are lying on this straight line segments connecting this two color points.
And same is true for three points. Say instead of just taking this two, these two points, if I take a
third point somewhere here, then you form a triangle connecting this three color points. Then if I
take the color present at this location the color present at this location and the color which is
specified at this location, and by mixing all this different all these three colors in different
proportions, I can generate all the colors lying within this rectangular region. So this chromaticity
diagram is also very helpful for color mixing operation.
For example, we can get some other information from this chromaticity diagram like this, so as we
have said that we have this point of equal energy which is the CIE standard of white. Now, if I
take any color point on the boundary of this chromaticity diagram which we said that this is nothing
but a color which is fully saturated. And we mentioned that is as we move away from this boundary
the colors that we get there are getting less and less saturated.
And saturation at the point of equal energy, which is the CIE standard of white just we have said
here the saturation is zero. The point is not saturated at all. Now if I draw a straight line from any
of these boundaries joining this point of equal energy, like this ok. So this indicates that what are
914
the different shades of color of this particular saturated color that we can obtained by mixing white
light to this saturated color.
So as we have said that as you mix white light the saturation goes on decreasing that is we can
generate different shades of any particular color. So this particular straight line which connects the
point of equal energy, which is nothing but the CIE standard of white, and a color on the boundary
which is a fully saturated color then what I can get is. All the shades of this particular color is
actually lying on this particular straight line which joins the boundary point to the point of equal
energy.
So you find that using this chromaticity diagram we can generate different colors, we can find out
that in which what proportions the red, green and blue they must be mixed. So that we can generate
any particular color, and this we can do for two colors, mixing of two colors, use it mixing for
three colors and so on.
Now just to mention that as we have said that we have taken red, green and blue as primary colors
you find that in this chromaticity diagram. Your green is if I take green to be somewhere here. Say
if I considered this point to be the green point. Say this point is the red point and the blue is the
somewhere here and if I join this three points by straight lines what I get is a rectangle so by using
this red, green and blue, I can generate as we have just discussed all the color which are present
915
within this triangular region. So as this triangular region does not encompass the entire part of this
chromaticity diagram because this chromaticity diagram is not really triangular diagram.
So just we have said that using three fixed wavelengths as red, green and blue we cannot generate
all the colors in the feasible region and that is also quite obvious from this chromaticity diagram,
because if we considered only three fixed wavelengths to represent red, green and blue points using
those three wavelengths. We can just form a triangular region and none of the triangular region,
single triangular region can fully encompass the chromaticity diagram.
So as a result of that by using fixed wavelength for red, green and blue as primary colors cannot
generate all the colors as given in this chromaticity diagram. But still using the chromaticity
diagram we can have many information, many useful information says, as we have said that we
can go for color mixing. We can find out different shades of colors. We can specify a color and so
on.
Now, coming to this next topic that is color model, now we need a color model to specify a
particular color. A color model helps us to specify a particular color in a standard way. Now what
is the color model? Color model is actually a space or we can represent it as a coordinate system
within which any specified color will be represented by a single point. Now as we have said that
we have two ways of describing a color. One is a by using the RGB model, by using the Red, green
916
and blue components or cyan, yellow and magenta components which are from the hardware point
of view.
And similarly form the perception point of view, we have to consider the Hue, saturation and
brightness. So considering these two aspects we can generate two different color models, two types
of color models. One color model is oriented toward hardware that is the color display device or
color scanner device or color printer. And the other color model is to cater for, to cater for take
care of the human perception aspect. And we will see that this is not; this will not only take care
of human perception aspect it is also useful for application purpose.
So accordingly we can have a number of different color models, one of the color models will call
as RGB model or red, green and blue model. And the other color model is cyan, magenta and
yellow and there is an extension of this that is CMYK that is cyan, magenta, yellow and black.
This RGB color model this is useful for image display like monitor,
CMY or CMYK color model these are useful for image printers. And you find that both these color
models are hardware oriented because both of them try to specify a color using the components as
primary color components either red, green and blue or cyan, magenta and yellow. As we have
said that red, green and blue they are the primary colors of light whereas cyan, magenta and yellow
they are the primary colors of pigments.
917
And the other color models that will also called considered is HSI color model that is hue,
saturation and intensity or brightness. And this HSI color model it is application oriented, also it
is perception oriented. That is how we perceive a particular color and we have also discussed that
this HIS color model this actually decouples the color form the gray scale information. So we have
that this I part this actually gives you the gray scale information, whereas hue and saturation taken
together gives the chromatic information.
So because of this as in HSI model we are decoupling the intensity information from the chromatic
information, so the advantage that we get is many of the application or the algorithm, which are
actually developed for gray scale images they can also be applied in case of color images because
here we are decoupling the color from the intensity. So all the intensity oriented, all the algorithm
which are actually developed for gray scale images can also be applied on the intensity component
or I component of this color image.
Now first let us discuses about the RGB model, so will talk about RGB color model first. As we
have said that in case of RGB a color image is represented by three primary components, and the
primary components are red, green and blue. So these are the three components color components
which mixed which when mixed in proper form in appropriate proportions this generates all
possible colors.
918
(Refer Slide Time: 16:20)
So RGB color model is a based on Cartesian coordinate system. And we will see in this diagram
that this shows, diagrammatically an RGB color model that will normally use. So you find that
this RGB color model is based on the Cartesian coordinate system where the red, green and blue,
RG and B these components are placed along the coordinate axis.
And the red component is placed at location (1,0,0) as shown in this diagram. So this is the location
which contains the red, this is the location (0,1,0) which contains, which is the green point and
(0,0,1) this is the blue point. So we have this red point, green point and blue point. They are along
three corners of a cube in this RGB in this Cartesian coordinate system. Similarly cyan, magenta
and yellow they are at other three corners in this cube.
Now, here let me mention that this model which is represented, it is in the normalized form that is
all the three color components that red, green and blue they are varying within the range 0 to 1. So
all these color components are represented in a normalized form. Similarly the cyan, magenta and
yellow they are also represented in normalized form. Now the origin of this color model that is at
location (0,0,0) this represents black and the farthest vertex that is (1,1,1) that represents white.
And you find that if I join a straight line, connecting the origin to this white point so these straight
line actually represents a gray scale and we also call it an intensity axis. So as you move from the
origin which is black to the farthest vertex in this cube which is white, what we will generate is
919
different intensity values, ok. So as a result of this we also call it the intensity axis or gray scale
axis. So we stop our discussion today we just in, we have just introduced the RGB color model
and we will continue with our discussion in our next class. Thank you.
920
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 11 Lecture Number 52
Conversion of one Color Model to another – 1
In our last class we have started our discussion, we have covered the fundamentals of color
image processing. We have seen what is a primary color and what is secondary color? We have
seen the characteristics of different colors. We have seen the chromaticity diagram and the use of
the chromaticity diagram. And we have started our discussion on color models and there we just
started the discussion on RGB color model.
921
(Refer Slide Time: 01:05)
Today we will start our discussion with the color model. So we will complete our discussion on
RGB color model. We will also talk about the HSI or hue, saturation and intensity color model.
We will see how we can convert the colors from one color model to another color model that is
given a color in the RGB space, how we can convert this to a color in the HSI space and
similarly given a color in the HSI space how we can convert that to the RGB space.
Then will start our discussion on image, color image processing techniques. So will talk about
pseudo image color processing and there mainly we will talk about two techniques, one is called
intensity slicing and the other one is gray level to color image transformation. So let just briefly
recapitulate what we have done in the last class.
922
(Refer Slide Time: 02:15)
In the last class we have mentioned that all the colors of the visible light or the visible spectrum,
color spectrum occupies a very narrow spectrum in the total electromagnetic band of frequency
or band of spectrum and the visible spectrum the wavelength normally varies, from 400
nanometers to 700 nanometers. So at one end we have the violet and in the other end we have the
red color. And out of this we normally take three color components.
923
That is red, green and blue as the primary color components because we have mentioned that in
our eye there of three types of cells, cone cells, which are responsible for color sensation. There
are maximum, there are some cones say which are responsible for color sensation, there are
maximum there are some cone cells which are responsible which sense the light in the red
wavelength. There are some cone cells which sense the green light, and there are some cone cells
which sense the blue light.
And this light is mixed together in different proportions in an appropriate way so that we can
have the sensation of different colors. And this is the main reason why we say that red, green and
blue they are the primary colors and by mixing these three primary color in different proportions
we can generate almost all the colors in the visible spectrum.
Then we have talked about two different colors, two types colors, one is the color of light other
one is the color of the pigment. Now color of the light as we see any particular object we can see
the color which is reflected from the object, because of the wavelength of the light which gets
reflected from the object surface. Now when it comes to pigments color and the color falls on it
then the pigments color it absorbs a particular wavelength out of the three primary colors and
reflects the other wavelengths.
So the primary colors of light are really the secondary colors of pigments and the secondary
colors of light they are the primary colors of pigments. And because of this the colors of light
they are called additive primaries, whereas the colors of the pigments they are called subtractive
primaries.
924
(Refer Slide Time: 04:50)
And here you can see in this particular slide that the three primaries of light red, green and blue
when they are mixed together then red and green mixed together form what is called from the
yellow light. Then green and blue when they are mixed together this two form cyan, and red and
blue mixed together form the magenta and red, green and blue all this three colors together form.
What is the white light?
Similarly, even it comes to pigments primaries, yellow which is a secondary color for light is a
primary color of a pigment. Similarly, magenta, which is a secondary color of light is also a
primary color of pigment. Cyan which is a secondary color of light is a primary color of
pigments. And here you find that when this pigments primaries they are mixed together then they
form what are the primary colors of light.
So yellow and magenta this two together form the red light. Yellow and cyan mixed together
form the green light. And magenta and cyan joint together, mixed together form the blue light.
However, all these three pigments primaries that is yellow, magenta and cyan mixed together
form the black light. So this is the black color so by mixing different colors of light or the
different colors of different colors of primary colors of light or different primaries of the
pigments we can generate all types of different colors in the visible spectrum.
925
(Refer Slide Time: 06:50)
Then we have also seen what is the chromaticity diagram and we have seen the usefulness of the
chromaticity diagram. So the chromaticity diagram is useful mainly to identify that in which
proportions different primary colors are to be mixed together to generate any color, so if I take
three points in this chromaticity diagram. So one corresponding to green, one for the primary red,
and other for the primary blue.
Then given any point within this chromaticity diagram I can find out that in which proportions
red, green and blue there are to be mixed. So here you find the horizontal axis tells the red
component. The vertical axis gives us the green component and the blue component so if I write
this as x and this as y then the green component z = 1- x + y .
So I can find out that how much of red, how much of green and how much of blue these three
components are to be mixed to generate a color which is at this particular location in this
chromaticity diagram. It also tells us that what are all different possible shades of any of the pure
color which are available in the light spectrum that can be generated by mixing different amount
of white light to it.
So you find that we have a point of equal energy that you have mentioned in the last class in this
chromaticity diagram which is white as per CIE standard. So, if I take any pure color on the
boundary of this chromaticity diagram and join this with this with white point then all the color
926
point, all the colors along this line they tell us that if I mix different amount of white light to this
pure color then what are the different shades of this color that can be generated.
Then we have started our discussion on the color model and we have said that color model is
very, very useful to specify any particular color. And the, we have said we started our discussion
on RGB color model and we have discussed in our last class that RGB color model is basically
represented by a Cartesian coordinate system.
Where the three primary colors of light that is red, green and blue they are represented along
three Cartesian coordinate axes. So we have as per this diagram, we have this red axis, we have
the green axis and we have the blue axis. And in this Cartesian coordinate system the colors,
colors space is represented by a unit cube. So when I say it is unit that means the colors are
represented in a normalized form.
So in this unit cube you find that at the center of the cube. We have R, G and B all these 3
components are equal to zero. So these points represent black. Similarly, the farthest vertex from
this black point or the origin, where the R = 1, G = 1 and B = 1. That means all these three
primary colors are mixed in equal proportions, and this point represents white.
The red color is placed at location (1,0,0), where the R = 1, G = 0 and B = 0. Green is located at
location (0,1,0), where both R and B components are equal to 0 and G = 1. And blue is located at
927
the vertex location (0,0,1), where both red and green components are equal to 0 and B = 1. So
these are the locations red, green and blue. That is (1,0,0), (0,1,0) and (0,0,1), these are the
location of three primary colors of light that is red, green and blue.
And you find that in this cube we have also placed the secondary colors of light which are
basically the primary colors of pigment that is cyan, magenta and yellow. So these three colors
cyan, magenta and yellow they are placed in other three corners, other three vertices of this unit
cube. Now find that from this diagram if I joined these two points that is black at location (0,0,0),
with white at location (1,1,1) then the line joining this two points black and white this represents
what is called a gray scale.
So all the points on this particular line will have different gray shades they will not exhibit any
colors component. Now given any specific colors having some proportions of red, green and blue
that colors will be represented by a single point in this unit cube, in a normalized form or we can
also say that, that colors will be represented by a vector, or vector is drawn from the origin to the
point representing that particular colors having a specific proportion of red, green and blue.
So this is what is the RGB color model and you find that from this RGB color model. We can
also have the cyan, magenta and yellow components by simple transformation. So given any
point in the RGB color plane what we can do is, if I look at the different color shades on
928
different faces of this colors cube, you find that the shades will appear like this. So in this color
cube you find that we have said that the point (1,0,0) that represents red and you find that along
the horizontal axis the color varies from red to yellow. Similarly, this is a point, which is (1,1,1),
so this point represents white color.
And in this particular case all these color components that is red, green and blue, each of this
color component are represented by 8 bits that means we have all together 24 different colors
shades which can be generated in this particular color model. So the total number of colors that
can be generated is 224. And you can easily imagine that is huge number of colors which can be a
generated, if we assign 8 bits to each of the colors components that is red, green and blue.
But in most of the cases what is useful is called safe RGB model. The safe RGB model, in safe
RGB model we don’t consider all possible colors that means all the 224 different colors. But
rather the number of different colors which are used in such cases is 216. So this 216 colors can
be generated by having 6 different colors in red, 6 different color shades in green and 6 different
colors shades in blue.
So from that right hand side we have drawn a safe RGB color cube. So here you find that we
have six different shades of any of the colors that is red, green and blue. And using the six
different shades we can generate up to 216 different colors and this 216 different colors and these
216 different colors are known as safe RGB colors because they can have displayed in any type
of color monitor, so you should remember that in case of true RGB though we can have total of
224 different colors but all the color displays may not have the provision of displaying all 224
colors but we can display 216 colors in all in almost all the color displays. So this is what is
called safe RGB color model. And the corresponding cube is the safe RGB colors cube.
So it is quite obvious from this discussion that any color image will have three different color
components one component color for red, one color component for green and one color
component for blue.
929
(Refer Slide Time: 16:55)
So if I take this particular color image, you find that the top left image is a color image and the
other three are the three planes of it. So the red color component of this color image is
represented in red, the green color component is represented in green and the blue color
component is represented in blue. So here you find that though here represented these three
different components in different colors that is red, green and blue but they are actually
monochrome images. And these monochrome images or black and white images and these black
and white images are used to excite the corresponding phosphor dot on the color screen.
So this, the red component will activate the red dot, the green component will activate the green
dots and the blue component will activate the blue dots and when these three dots are activating,
activated together with different intensities that gives you different color sensations. So
obviously for any type color image like this we will have three different planes one plane
corresponding to the red component the other plane corresponding to the green component and a
plane corresponding to the blue component.
Now, as we said that this red, green and blue they are mostly useful for the display purpose. But
when it comes to color printing the model which is used is the CMY model or cyan, magenta and
yellow model. So for the image, color image printing purpose we have to talk about the CMY
model. However, the CMY can be very easily generated from the RGB model.
930
So as it is obvious from the color cube, the RGB cube that we have a drawn and the way the
CYM cyan, magenta and yellow colors are placed at different vertices on that RGB cube, from
there it is quite obvious that specified any color in the RGB model we can very easily convert
that to CMY model.
The conversion is simply like this that given RGB components, so we have the red, green and
blue components of a particular color. And what we want to do is, we want to convert this into
CMY space. And the conversion from RGB to CMY is very simple. What we have to do is, we
C 1 R
have to simply make this conversion that M = 1 - G . So here we remember that this RGB
Y 1 B
components are represented in the normalized form. And similarly by this expression the CMY
components that we get that will also be represented in normalized form.
And as we have said earlier that equal amounts of cyan, magenta and yellow should give us what
is a black color. So if we mix cyan, magenta and yellow these three pigments primaries in equal
proportions then I should then, we should get the black color. But in practice what we get is not a
pure black but this generates a muddy black. So to take care of this problem along with CM and
Y cyan, magenta and yellow another component is also specified which is the black component
and when we also specify the black component.
931
In that case we get another color model which is the CMYK model, so the cyan, magenta and
yellow, so this is CMYK model. So cyan, magenta and yellow that is they are same as in CMY
model. But we are specifying an additional color which is black giving us the CMYK model. So
you find that in case of CMYK model we actually have four different components cyan,
magenta, yellow and black. However, given the RGB we can very easily convert that to CMY.
Similarly, the reverse is also true given a specification, a color in the CMY space we can very
easily convert that to a color in the RGB space. Thank you.
932
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 11 Lecture Number 53
Conversion of one Color Model to another – 2
Now the mixed color model that we will consider is the HSI color model that is Hue, Saturation
and Intensity model. So as we have mentioned in our last class that both the RGB as well as
CMY or CMYK, they are actually hardware oriented. The RGB color model is oriented towards
the color display or color monitor, similarly CMY or CMYK these 2 models are oriented toward
color printers. Whereas when it comes to human interpretation, we said that we do not really
think of, that given any particular color, how much of red, how much of green and how much of
blue is contained within in that particular color.
But what we really think of is, what the prominent color in that particular specified color? So
which is what is known as Hue. Similarly, we have said the Saturation; it indicates that how
much a pure spectrum color is really diluted by mixing white color to it? So, if you mix white
colors to a pure spectrum color in different amounts, what we get is different shades of that
933
particular spectrum color, and as we said that I the Intensity this actually is the chromatic notion
of brightness of black and white image.
So what we have is Hue, which tells us that what is the prominent color, prominent primary color
or spectrum color in that particular specified color, we have the Saturation, which indicates that
how much white light has been added to a pure spectrum color to dilute it and we have this
component Intensity, which is actually achromatic notion of the brightness, ok.
Now the given is problem is given a color in RGB space, can we convert that to HSI space? Now
this HSI model has other importance also in in addition to just interpretation by in addition to
human interpretation, because we find that this HSI model, it decouples the Intensity information
from the color information. So this I gives you the Intensity information, whereas H and S, the
Hue and Saturation together this gives you the chromatic information.
So as we can decouple the chromatic information from the Intensity information many of the
image processing algorithms which are developed for black and white image or gray scale
images can be applied to this H to the images specified in HSI space. So conversion of an image
from the RGB space to the HSI space is very, very important.
Now, let us see that how we can convert an image specified or a color specified in RGB space to
the, to a color in the HSI space. Now in order to do this what we can do is we can reorient the
RGB cube, the RGB space. In such a way that the black point or the origin in the RGB cube is
kept at the bottom and the white comes directly above it, so as shown here.
934
(Refer Slide Time: 04:45)
So find that it is the same RGB cube and what we have done is we have simply reoriented this
RGB color cube. So that the black comes at the bottom, so this is the black one, the black comes
at the bottom, and the white comes directly above this black point. So naturally as before the line
joining black and white this represents the Intensity axis ok. Which show, any point on this
particular line which joins black and white, they will not show any color information but they
will have different intensities or different gray shades.
Now once we have, once we reorient this RGB cube like this. Now suppose we have a color
point. We have any color point specified within this RGB cube. So, I have this color point
specified in the RGB space. Now for this color point, now our aim is how we can convert this
RGB specification into HSI specification. So as we said that the line joining black and white, this
line is the Intensity axis. So in the HSI space, I can very easily compute the Intensity component
because for this point say, I say this is a point say x. I can represent this point as a vector joining
from black to this particular point, and the Intensity component that is associated with this RGB
value is nothing but projection of this vector on the Intensity axis.
So if I project this vector on the Intensity axis then the length of this projection tells us, what is
the Intensity component associated with this particular RGB specified color point ok. Now what
to get this what I can do is, we can draw a plane we can just pass a plane, which is perpendicular
935
to the Intensity axis and containing this particular point x. So the point at which this plane will
cut the intensity axis that point represents the that associated with the RGB components specified
for this particular point x. So this is how we can compute the intensity component and then you
find the next component that is the Saturation.
How we can compute, so this one tells us that what is the Intensity. Now how can we compute
the Saturation, because the line joining black and white the intensity axis any point on the
Intensity axis has only gray shades this does not have any color component. So we can say that
the Saturation of all the RGB point, the Saturation associated with all the RGB points lying on
that this Intensity axis is equal to zero, and the Saturation will increase as the point will move
away from this Intensity axis.
So keeping that in mind we can say the distance of this point x from this Intensity axis, this
distance tells us that what is the Saturation that is associated with the RGB components of point
x. So we can very easily compute the Intensity and Saturation corresponding to any RGB point
given in the space. Now the next question is computation of the Hue component, so out of Hue
Saturation and Intensity. We have been able to compute the Saturation and Intensity very easily.
The next component which is left is the Hue component.
936
Now for Hue component computation the concept is slightly more complicated. Now we find
that in this diagram, we have shown a plane passing through the points black, white and cyan. So
as we said that the line joining black and white is the intensity axis so Intensity at point black is
equal to zero and Intensity at point white is maximum that is equal to one and the other point,
which defines this particular plane is the cyan, that is this particular point. Now we will
appreciate that for any points in this particular plane define by this 3 points black, white and cyan
we will have the same Hue, because as we said that Hue indicates that what is the prominent
wavelength of light present in any particular color.
And from our earlier discussion we can, you can also very easily verify that for any point given
on this particular plane define by this 3 points cyan, white and black. For any points, the color
components can be specified by linear combination of this 3 points cyan, white and black. Now
because white is a balanced color which contains all the primary component in equal proportion
and black does not contain any color component, so this 2 points white and black cannot
contribute to the Hue components associated with this point lying in this plane. So the only point
which can contribute to the Hue component is the cyan.
So for all the points lying in this plane the Hue will be same and it will be same as the Hue
associated with this point cyan. So here we find that if I rotate this particular plane around, this
black and around the Intensity axis by an angle of 360 degrees then I trace all the possible points
that can be specified in the RGB color space. And by tracing, by rotating this plane by 360
degrees around the Intensity axis, I can generate all possible Hues that can be, all possible Hues
corresponding to every possible RGB point in the RGB color cube.
937
(Refer Slide Time: 12:12)
Now in to order to that what I do, is like this. Suppose I take projection of this RGB cube on a
plane, which is perpendicular to the Intensity axis. So if I take the projection then the different
vertices of the cube will be projected on a hexagon as shown in this particular figure. So the cube
vertices corresponding to red and black they will be projected at the center of the hexagon. So
this point will be the projected point for both white and black and the other primary colors of
light and the primary colors of pigments they will be projected at different vertices of the
hexagon.
So here we find that if I draw a vector or if I draw a lines joining the center of the hexagon to all
the vertices of the hexagon then red and green they will be separated by an angle of 120o.
Similarly, green and blue they will be separated by an angle of 120o, similarly blue and red they
will be also separated by an angle of 120o. In the same manner for the secondary colors yellow
and cyan, they will be separated an angle of 120o, cyan and magenta will be separated by angle
of 120o and similarly magenta and yellow they will be also separated by an angle of 120o.
However, the angular separation between red and yellow this is equal to 60o. So if I take the
projection of the RGB cube on a plane, which is perpendicular to the Intensity axis then this is
how the projection is going to look like.
938
Now along with this, we will find that the projection of the shaded plane that we have seen in our
previous slide will be a straight line like this. So for any point specified in the RGB color space
there will be a corresponding point in our projected plane, say this is a plane corresponding to a
color point in the RGB color space. And the plane on which this color point will lie, this the
plane defined by that corresponding color point the black point and white point on which this
point will lie that will be projected as a straight line on this particular plane.
939
So as we rotate the plane by 360o around the Intensity axis this particular straight line will also
be rotated by an angle of 360o around the center of the hexagon. So if I rotate this particular
shaded plane by an angle of 360o around this black and white axis of the Intensity axis. It is
projection on to the plane, on to this perpendicular plane which is a straight line will also be
rotated by an angle of 360o around the center of this hexagon.
Now this gives us a hint that, how we can find out the Hue associated with a particular color
point specified in the RGB color space. So the Hue can be computed like this that the straight
line, it is the angle between the straight line which is the projection of this shaded plane with one
of the primary colors and normally this primary color is taken to be red. And this angle is
normally is measured in anti-clock wise direction. So that using this particular convention the red
will have a Hue, which is given by 0o and as we rotate this shaded plane along the around
Intensity axis. This particular straight line which is the projection of it we will also be rotated by
360 degree and as it is rotated the Hue is going to be increased.
So Hue is normally by the angle between the red axis and the line, which is the projection of this
plane on this projection of this shaded plane on this plane of this hexagon. Now given these
particular concepts that is how we can obtain the Hue, Saturation and Intensity components for
any color specified in RGB color space. Now we can find out that if you follow the geometry of
this particular formulation then we can have very easy relations to compute the H S and I
components from the R, G and B components.
940
(Refer Slide Time: 17:55)
So here the Hue component, the H will be simply given by an angle θ and as we said that this θ is
measured in anti-clockwise from the direction of red. So this will be equal to θ if B G and it
will be 360o - θ, if B > G .
So this is how we can compute the Hue components in the HSI model, where the value of θ is
1
[(R - G)+(R - B)]
given by θ = cos -1 2 . Where R is the red component, G is the green
1
2
[(R - G) +(R - B)(G - B)]
2
component and blue B is the blue component of the color specified in the RGB space.
So from this RGB component we can compute the value of θ following this expression, and from
this θ we can find out the Hue component in the HSI space as H = θ, if B G , and H = 360o - θ
, if B > G .
3
Similarly, following the same geometry we can find out that the S = 1- [min(R, G, B)]
(R+ G+ B)
1
and the I = [R + G + B] . So from these red, green and blue components we can very easily find
3
out the Hue, Saturation and Intensity components.
941
So as we have converted from RGB space to the HSI space, similarly, we should also be convert
any color specified in HSI space into the components in the RGB space. So to do that conversion
or the inverse conversion the corresponding expressions can be found as, you find that whenever
want to convert from the HSI to RGB then there are 3 region of interest.
One region is called RG region, and in RG region, H lies between 0o and 120o. The other region
is called GB region, and in GB region, H lies between 120o and 240o. And the third region is
called BR region and in BR region, H lies between 240o and 360o and here in the RG region you
find that getting red, green and blue components in very easy, is very easy.
What I do is? Simply the blue component is given by B = I(1 - S) , I is the Intensity and S is the
ScosH
Saturation. The red component is given by R=I 1+ o and green component is simply
cos(60 -H)
given by G =1 - (R + B) . Similarly, in the GB region the first operation that we have to do is we
have to modify H, so we have to make H = H - 120o . And once we do this we get the R, G and B
ScosH
components like this, R = I(1 - S) , G =I 1+ o and blue in the same manner is given
cos(60 -H)
by B =1 - (R + G) .
942
And in the third sector that is in the BR region, we have to modify first H like this, H = H - 240o
and once we do this modification then the G = I(1 - S) . The blue component is given by
ScosH
B =I 1+ o and obviously the R component is given by R =1 - (B + G) . So you find
cos(60 -H)
that using these simple expressions we can convert a color specified in the RGB space to a color
in the, to the color components in the HSI space. Similarly, a color specified the HSI space can
be easily converted to color components in the RGB space.
So here in this diagram we have shown the effects of this different components H, S and I on the
colors. So in the 1st row you find that the first rectangle is a color for which H = 0, I = 128 and S
= 255. So as we said that our Hue is measured from the red axis form the red line, so H = 0
indicates that it is red color. In this case the S = 255, which is the maximum that means it is the
pure red color and Intensity here is 128.
So for other rectangles in the same row what we have done is we have kept Hue and Intensity
constant whereas the Saturation is decreased. So here we find that as we move from left to right
it appears that this red has become, is becoming milky gradually. So as we have S = 200, which
is less than 255, it appears that some amount of white light has been added in this particular red
943
component, and that is very, very prominent when S = 100 or even so is equal to 50, where a
large amount of white light has been added in this red component.
In the 2nd row, what we have done is we have kept is Hue and Saturation constant that is H=0
and S = 255 and what we have varied is the Intensity component or the I component. So here we
find that as we decrease the I component the different rectangles still show the red color as we
move from the left to right in the second row, they are still red but the Intensity of the red goes
on decreasing.
So if you just note the difference between the 1st row and the 2nd row, in the first row it appears
that some white light has been mixed with red. Whereas in the 2nd row there is no such
appearance of mixing of white light but it is the Intensity which is being decreased. If you look at
the 3rd row, in the 3rd row what you have done is, we have kept is the Intensity and the Saturation
constant but it the Hue component which has been changed.
So we have started with H = 0, which is red and here you really find that as we change the Hue
component it is really the color which gets changed. Unlike the previous 2 rows, in the first row
it is the Saturation or more and more white light is being added to the pure color in the 2nd row
the Intensity is getting changed, but in the third row where keeping the Intensity and Saturation
same which is the Hue component it is the color itself that gets equal to changed.
So here you find that when we have H = 100 it is the green color, when we have H = 150, it is
the blue color. Whereas when H = 50, it is a color between a yellow and green. So with this, we
have introduced the various color models we have introduced the RGB color space. We have
introduced CMY or cyan, magenta and yellow color space and also CMY K that is cyan,
magenta, yellow and black color space. And we have also introduced the HSI color space and we
have seen that given any color, given the specification of any color in any of the spaces we can
convert from one space to another, that is from RGB to CMY, the conversion is very easy. We
can also convert from RGB to HSI, where the conversion is slightly more complicated.
944
(Refer Slide Time: 30:15)
Now with this introduction of the color spaces, next what we will talk about is the color image
processing. So far what we have discussed is, the representation of a color or the representation
and one when we take as an image take the color component, the colors present in the image.
Then so far we discussed how to represents those colors in either the RGB, RGB plane or CMY
or CMYK space or the HSI space, and the images represented in any of this models can be
processed.
So in color image processing we basically have 2 types of processing, one kind of processing is
called Pseudo color processing. This is also sometimes known as False color and the other kind
of processing is what is called Full color processing. In Pseudo color processing as the name
implies that these colors are not the real colors of the image but we try to assign different colors
to different Intensity values.
So Pseudo color processing actually what it does is, it assigns colors to different ranges of gray
values based on certain criteria. Now what is purpose of assigning colors to different ranges of
gray values, as we have mentioned earlier that if we have a simply black and white image, we
can distinguish hardly 2 dozen of the gray shades, whereas in color we can distinguish thousands
of color shades.
945
So given a gray scale image or an simply black and white image if we can assign different colors
to different ranges of gray values then the interpretation of different ranges of gray values is
much more easier in those Pseudo color images than in the gray scale images. So we will
discussion about how we can go for Pseudo coloring of an image given in the Pseudo coloring of
a black and white image.
So as we said that this coloring has to be done following some criteria. So with this introduction
and of course in case of full color processing as the name indicate, the images are represented in
full color and the processing will also be used, processing will also be done in the full color
domain. So given this introduction of the color processing techniques, the 2 types of color
processing, Pseudo color processing and full color processing. We finish our lecture today, thank
you.
946
Digital Image Processing
Prof. P. K. Biswas
Department of Electronics and Electrical Communications Engineering
Indian Institute of Technology, Kharagpur
Module 11 Lecture Number 54
Pseudo Color Image Processing
That when we talk about color image processing techniques. Generally, we have two categories
of color image processing, one is called Pseudo color image processing or this is also known as
False color. So Pseudo color processing and the other category is what is known as Full color
processing. So we have just said that this Pseudo color image processing, the basic purpose of
Pseudo color image processing technique is to assign different colors in different, for different
intensity ranges in a black and white image.
The purpose is, as we have told earlier that given a black and white image or human eye can
simply distinguish between only around 2 dozen of black and white shades or intensity shades.
Whereas given a color, given a color image, we can distinguish among thousands of color
shades. So given a black and white image or an intensity image if we go for Pseudo color
processing techniques that is assign different colors to different ranges of intensity values, in that
947
case interpretation of such an intensity image is more convenient than the interpretation of an
ordinary or simple intensity level image.
Now the basic way which the Pseudo coloring can be used that is, as we said that the purpose of
Pseudo coloring technique is to assign different colors to different ranges of intensity values, the
simplest approach in which the Pseudo coloring can be done is by making use of intensity slices.
So what we can do is, we can consider an intensity image to be a 3D surface.
So as shown in this particular slide that given an intensity image say f(x,y) which is a function of
x and y. So different intensity values at different locations of x and y, if we consider them to be a
3D surface then what we can do is we can place planes which are parallel to the image plane that
is parallel to the x-y plane. So as shown in this particular diagram if I place such a plane at some
intensity value say li, so at this intensity value, say li, we have placed a plane which is parallel to
x-y plane.
Now we find that this particular plane, which is parallel to x-y plane this slices the Intensities
into different 2 different halves. So once I get these 2 different halves, what I can do is, I can
assign different color to, two different two different sides of this particular plane. So on this side,
I can assign one particular color, whereas on the other side that is this side, I can assign another
948
color. So this is the basic technique of Pseudo coloring that is you slice the intensity levels and to
different slices you assign different colors.
So in our case, we assume, let us assume that our image, the intensity values, the discrete
intensity values in a black and white image varies from say 0 to L-1. So I have total L number of
intensity values in our image, L number of intensity values. So one we have this L number of
intensity values and we assume that an intensity value says l0, which represents an intensity level
say black, this means that the corresponding f(x,y) at location (x,y), where the intensity that is
l0 = 0 .
Similarly, the L-1, in that level I assume that this is equal to white, that means all the
corresponding pixels f(x,y) will have a value equal to L-1, and let us also assume that we have
drawn P number of planes, number of planes perpendicular to the intensity axis. So
perpendicular to intensity axis means they are parallel to the image plane and these planes will be
placed at the intensity values given by say l1, l2 up to say… lp.
So the first plane will be placed at intensity values l1, the second plane will be placed at intensity
l2 and this way the Pth plane will be placed at intensity values value lp. So obviously, in this case
P, the number of planes has to lie from 0 to L-1, where L is the number of grey level intensities
949
that we have. So, once we place such P number of planes, which are perpendicular to the
intensity axis, these P number of planes divide the intensities into P + 1 number of intervals.
So once I divide, the intensity ranges into P + 1 number of intervals then our color assignment
approach will be that a particular location, the color to location f(x,y) this color will be equal to
Ck or instead of calling it f, let me call it some function say h. So the color assigned to location
(x,y), which is h(x,y) will be Ck, if the corresponding intensity value at that location (x,y), f (x,y)
lies in the range Vk, where Vk is the intensity range, which is defined by the planes placed at
location (lk, lk+1).
So as we said that there are P number of planes, so these P number of planes will divide our
intensity range into P plus 1 number of ranges or intervals and we call this interval as interval V1,
V2 up to interval Vp+1. So we assign color Ck to a particular location (x,y), so we write
h x,y = C k , if the intensity value at the corresponding location which is given by f(x,y), this
Now by using this simple concept that is you divide your intensity range into a number of
intervals and to a particular location in the intensity image you assign a color which is
determined by, in which of intervals the intensity of the image at that particular location belong
then what we get is a Pseudo colored image.
950
(Refer Slide Time: 10:40)
So let us see some example of this Pseudo colored image, here we have say said that on the left-
hand side we have an intensity image of black and white image. If I apply Pseudo coloring if I go
for Pseudo coloring techniques, then the Pseudo colored image is as shown on the right-hand
side. Similarly, the bottom one this is an image which is an enhanced version of this and if I
apply Pseudo coloring technique to this particular black and white image then the corresponding
Pseudo colored image is given on the right-hand side.
So here you find that interpretation in the Pseudo colored image or the distinction between
intensity levels in the Pseudo colored image is much easier than the distinction in corresponding
intensity image or the grey scale image.
951
(Refer Slide Time: 11:35)
Now this particular application will be more prominent in this particular diagram. Here again
you find that on the left-hand side we have an intensity image or grayscale image and you find
that in this regions the intensity values appear to be more or less flat that means I cannot
distinguish between different intensity levels, which are present in this particular diagram.
Whereas on the right-hand side if I go for Pseudo coloring, you find that these different colors
which are assigned to different intensity levels in this particular black and white image. This
clearly tells us, what are the different regions of different intensity values in this particular black
and white image? So another application, the other application of Pseudo coloring technique is
from grey to color transformation.
952
(Refer Slide Time: 13:10)
So here what we have shown is two different intensity intervals we have assigned different
colors. Now when we go for grayscale to color transformation then what we have to do is, if I
have an intensity image or grayscale image that corresponds to a single plane, I have to convert
that to three different planes that is RGB, red, green and blue planes and those red, green and
blue planes when they combine together, they combine together they give you an interpretation
of a color image.
So that kind of color, grey to color transformation can be done by using this type of
transformation function. So here you find that our input image f(x, y) this is a intensity image or
grayscale image. Then what we are doing is, this grayscale image is transformed by three
different transformations, one corresponds to the red transformation, the other corresponds to the
green transformations and the third one corresponds to the blue transformation.
This red transformation generates the red plane of this image, which is given by fR(x,y) the green
transformation generates the fG(x,y) or the green plane corresponding to this intensity image
f(x,y) and the blue transformation generates fB(x,y), which is the blue plane corresponding to this
Intensity image f(x,y). So when these three images that is fR(x,y), fG(x,y) and fB(x,y), the red,
green and blue planes, they are combined together and displayed on a color display, what we get
is a Pseudo colored image.
953
But in this case you find that the color is not assigned to different intensity ranges but that color
is decided, the color of the entire image is decided by the corresponding transformation
functions. So the color content of the color image that we will generate that is determined by the
transformation functions that we use. Now let us see that, what are the kind of color images that
we can obtain using this grayscale to color transformation.
So in this diagram, as it is shown on the left-hand side we have an intensity image or a black and
white image, which is transformed into a color image. So on the right hand side is the
corresponding color image and the color transformations that has been used are like this, here we
have used that fR(x,y) = f(x,y) that means whatever is the black and white intensity image that is
simply copied to the red plane. The green plane, the fG(x,y) is generated by 0.33f(x,y) that means
the intensity values at any location in the original black and white image is divided by 3 and
whatever value we get that is copied to the corresponding location in the green plane.
Similarly, fB(x,y) the blue plane is generated by the, by dividing the intensity image by 9, by
multiplying the intensity image by a value 0.11 or dividing the intensity image by a value 9. So
by this transformation functions, we have generated fR(x,y), the red component, fG(x,y), the
green component and fB(x,y), the blue component and when we combine this red component,
green component and blue component the corresponding color image which is generated is like
this.
954
Now, here you should remember one point that this color image that is being generated it is
Pseudo colored image. Obviously, it is not a full color image or the color of the original image is
not generated in this manner. So the only purpose is the different intensity, the different intensity
regions will appear as different colors in our colored image. So this coloring is again a Pseudo
coloring it is not the real coloring.
Now we have another example on the Pseudo coloring, here it is a natural seen where again on
the left hand side we have the intensity image or the black and white image and when we go for
grey scaling to color transformation, now the transformation is like this, here the green
component is same as the original intensity image. So we have taking f G x,y = f x,y ,
The red component is generated as 1 3f x,y and the blue component is generated as 1 9 f x,y
1/9f(x,y). So by generating the red, green and blue components, blue planes from the original
f(x,y) in this manner and if you combine them, the corresponding Pseudo colored image that we
get is given on the right hand side. So, here you find that if I compare the earlier image with this.
955
(Refer Slide Time: 18:33)
In our earlier case, the colored image was showing more of red component because in this case,
fR(x,y) was as seen as f(x,y), whereas green and blue were scale down version of f(x,y).
Whereas in this particular case our Pseudo colored image appears to be green because here f(x,y)
the green component, the green plane is same as f(x,y) whereas red and blue are taken as scaled
down version of f(x,y). So if we change the weightage of this different function of this different
red and blue, green, red and blue planes. The color appearance will again be different. So a grey
956
scale image can be converted to a Pseudo coloring image by this kind of conversion by applying
different transformations for different red, green and blue plane.
Now many of you might have seen the X-ray security machine like what is used in airports. Here
you find that this is an X-ray image on the left hand side of a baggage, which is screened by an
X-ray machine. If you have looked at the screen, which the security people checks, on the screen
this image appears in this particular form. Where you find that the background has appeared as
red, the different garment bags they have appeared as blue of course there are different shades,
whereas there is a particular region over here which is appeared as again red.
Now again this is as Pseudo coloring technique, which is applied to obtained this kind of image,
and the purpose is if you have a Pseudo colored image like this, you can distinguish between
different objects present in this particular image. And in this particular case, normally the kind of
transformation function for red, green and blue which are used are given like this.
957
(Refer Slide Time: 20:50)
The transformation functions are usually a sinusoidal functions. So here what you have is you
have, this is the red transformations, this is the intensity values along the horizontal axis, we
have the intensity values of grey scale image, which is varies from 0 to the maximum value L-1.
The top curve sinusoidal curve it shows the red transformation, the middle one shows the green
transformation and the last one shows the blue transformation and here you find that these
different sinusoidal curves it appears as to be a fully rectified sinusoidal curve is shifted from one
another by certain amount. So as if we have given some phase shift to this different sinusoidal
curves.
Now when the transformation are given like this, so if you have an intensity values is somewhere
here, then the corresponding red component will be generated as this value, and the
corresponding green components will be generated as this value, and the corresponding blue
component will be generated and as this value. So this particular intensity level will be coded as
a color point, as a color pixel having red component given by this much, green component given
by this much and blue component given by this much.
958
(Refer Slide Time: 22:20)
Now what is done for this Pseudo coloring purpose is that, you define different bands of intensity
values and the different bands so are given to different objects. For example, a band somewhere
here, the band somewhere here, this is for identification of a say an explosive, a band somewhere
is for identification of a garments bags and so on. So here you find that if this is the and which is
given, which is used to detect the explosive the amount of red light which is generated, the
amount of red component which is generated by this particular band is the maximum one.
So an explosive will appear to be a red one, whereas for this particular one which is for the
garment bags, where the red component is not as high as this so this will not appear as that red as
an explosive. So different band of frequencies are identified or different bands of intensity values
are identified are specified to identify different types of items and by using this kind of
transformation we can distinguish between different objects which are there in the bags.
So by using the Pseudo coloring techniques, we can give different intensity values to different
intensity ranges and as we have just seen that we can convert a grayscale or an intensity image to
a color image, where the color image as it is as Pseudo colored image. It will not really have the
exact color component but this Pseudo color image give us the advantage that we can distinguish
between different objects present in the image from its color appearance, thank you.
959
Digital image processing.
Professor P. K. Biswas.
Department of electronics and electrical communication engineering.
Indian institute of technology, Kharagpur.
Lecture-55.
Full color image processing.
Hello, welcome to the video lecture series on digital image processing. That we will discuss
is full color image processing. And as we have said that unlike in case of pseudo color
techniques, in case full-colored image processing what we will do is we will consider the
actual colors present in the image. And as we have said that as there are different color
models a color image can be specified in different color models, for example, a color image
can be specified in RGB color space, a color image can also be specified in HSI color space.
Now, because we have these different color components of any particular color pixel. So, we
can have two different categories of color image processing. One category of color image
processing is per color plane processing. So, in case of this in this category what you do is?
You process every individual color components of the color image and then these different
processed components that you have, you combine them together to give you the colored
processed image and the other type of processing is by using the concept of vectors.
So, as we have said that every colored pixel has three color components. So, any color can be
considered as a vector. So, if it is the color is specified in RGB space then it is a vector drawn
to the point which it specifies the color from the origin of the RGB color space. So, there are
two kinds of processing one is per color plane processing in which case every plane is
960
processed independently and then the processed planes are combined together to give you the
processed color output and the other type of processing, the other category of processing is,
when all the color components are processed together and there the colors, different colors
are considered as vectors.
So, obviously the color at a particular point (x, y), is c(x,y) if we are going for an RGB color
space if it specify the RGB color space the point (x, y) will have three color components.
One is the red color component at location (x, y), the other one is green component at
location (x, y) given by g(x, y), other one is blue component at location (x, y) which is given
by b(x, y). So, every color is represented by a vector and the processing is done by the
considering these vectors, that means all the color components are considered together for
processing purpose.
So, accordingly we will have two types of color processing techniques. The first kind of
processing that we will consider is what we called as color transformation. Now, you may
recall from our discussion with the gray scale images or black and white images that where
we have defined a number of transformations for enhancement purpose and there we have
defined the transformations as say s is equal to some transformation T(r), where r is an
intensity value. Intensity at a location in the input image f(x, y) and s is the transformed
intensity value in the corresponding location of the processed image g(x, y) and there the
transformation function was given by s = T r .
961
Now, we can extend the same idea in our color processing techniques. The extension is like
this now in case of intensity image we had only one component. that is the intensity
component. In case of color image, we have more than one components that is may be RGB
component if the color is specified in RGB space or HSI component if the color is specified
in HSI space. Correspondingly we can extend the transformation in case of color as
si = Ti ( r1, r2,...., rn) , for i equal to 1, 2, up to n.
So, here we assume that the color every color is specified by a three component vector having
values r1 to rn. Si is a color component in the processed image g(x, y). Okay and every ri is a
color component in the processed image f(x, y). So, si is a color component in the processed
image g(x, y) and ri is a color component in the input image, in the input color image f(x, y)
and here n is the number of components in this color specification and the Ti that is T1 to Tn,
it is actually the set of transformations or color mapping functions that operate on ri to
produce si.
Now, if we are going for RGB color space or HSI color space then actually the value of n = 3
because in all these cases we have three different components. Now, first application of this
intensity transformation of this color transformation that we will use is intensity modification.
Now, as we can represent a color in different color models or different color spaces.
962
So, theoretically it is possible that every kind of color processing can be done in any of those
color spaces or using any of this color models. However, it is possible that some processing
some kind of operation is more convenient in some color space but it is less convenient in
some other color space. However, in such cases we have to consider the cost of converting
the colors from one color model to another color model. Say, for example, in this particular
case you find that if I have a color image which is given in RGB color space, the different
color planes of the same image, this is the red color plane, this is the green color plane and
this is the blue color plane.
So, this color image can have these different, three different color planes in the RGB model.
Similarly, the same image can also be represented in HSI color space where this leftmost
image gives you the edge component, this gives the saturation component and this gives the
intensity component. Now, from this figure it is quite apparent as we claimed earlier that it is
the intensity component in the HSI model which is the chromatic notion of brightness of an
image.
So, here you find that this actually indicates what should be the corresponding black and
white image for this color image. So, as we can represent a color image in this three different,
in this different model. So, it is possible theoretically possible that any kind of operation can
be performed in any of these models.
963
(Refer Slide Time: 10:05)
Now, as we said that the first application that we are thinking of that we are talking about is
intensity modification, this intensity modification, transformation is simply like this say
g x, y = kf x, y , where f(x, y) is the input image, this is the input image and g(x, y) is the
processed image and in this particular case if we are going for color scaling then our
intensity, intensity reduction the value of k lies between 0 and 1.
Now as we said this operation can be done in different color planes. So, if we consider the
RGB color space then our transformation will be si equal to the constant, same constant k
times ri, for i varying from i = 1, 2 and 3. Where, the index 1 will used to indicate the red
964
component, index 2 is used to indicate the green component and index 3 is used to indicate
blue component.
So, this indicates that all the different color planes the red plane, green plane and blue plane
all of them are to be scaled by the same scale factor k, whereas if I do the same
transformation in HSI space, then as we said that the intensity information is contained only
in I. So, the only transformation the transformation that will be needed in this particular case
is s3 = kr3 , whereas the other two components corresponding to hue and saturation can remain
the same.
So, we will have s1 = r1 , that is hue of the processed image will then remain same as hue of
the input image. We have s 2 = r2 that is saturation of the processed image will remain same as
saturation of the input image, only the intensity component will be scaled by the scale factor
k. The similar such operation if we perform in CMY space then equivalent operation in CMY
space will be given by si = kri + 1- k and this has to be done for all the i, that is all the
So, if I compare the operations that we have to do in RGB color plane, RGB space the
operation in HSI space and the operation in CMY space, you find that the operation in HSI
space is the minimum of these 3 different spaces because here only the intensity value is to be
scaled, hue and saturation value remains unchanged, whereas both in RGB and CMY space
you have to scale all the three different planes. However, as we said that though the operation
the transformation takes minimum time in the HSI space.
The transformation has minimum complexity in HSI space but we also have to consider that
what is the complexity of converting from RGB to HSI or from CMY to HSI because that
conversion also has to be taken into consideration.
965
(Refer Slide Time: 14:26)
Now, if I apply this kind of transformation then the transformed image that we get is
something like this. Here, the operation has been done in the HSI space, on the left hand side
we have the input image and on the right hand side we have the intensity modified image.
So, this is the image for which the intensity has been modified by a scale factor of around
point 5. So, we find that both the saturation and hue they appear to be the same but only the
intensity value in this particular case has been changed. Okay. Of course this equivalent
operation can as we said can also be obtained in case of RGB plane as well as in CMY plane
but there the transformation operation will be, will take more computation then the
transformation operation in case of HSI plane. where we have to scale only the intensity
component keeping the other components intact.
966
(Refer Slide Time: 15:38)
The next application of this full color image processing that we will consider is color
compliments. Now, to define this color compliments let us consider, let us first look at color
circle. So, this is a color circle, you find that in this particular color circle if I take the color at
any point on the circle, the color which is located at the diagonally opposite location in the
circle is the compliment of the other color. Okay.
So, as shown in this figure that here if I take a color on this color circle its compliment is
given by the color on this side and similarly, the reverse the color on this side has a
compliment on the color on the other side. So, this simply says that hues, which are directly
opposite to one another in the color circle they are compliments of each other.
967
(Refer Slide Time: 17:03)
Now, this color compliment as we have said the color compliment, this is analogous to the
gray scale negatives. When we have talked about the gray scale or intensity image
processing, we have also talked about the negative operation.
This color compliment is analogous to that gray scale negative operation. So, the same
operation which we had used in case of gray scale image to obtain its negative if I apply the
same transformation to all the R, G and B planes of a color image represented in RGB space
then what I get is a compliment of the color image or this is true, really the negative of the
color image.
968
So, those color images can be obtained by a transformation function of this form. In case of
intensity image, we had a single transformation but in case of color image I have to apply the
same transformation on all the color planes that is I have to get it si = T ri , which is equal to
si = T ri =L-1 - r , for all values of i, that means here i will be from 1, 2 and 3, that is for all
the color planes red, green and blue, I have to apply this same transformation.
So, by applying this I get, I an image like this, here you find that left hand side I have a color
image and on the right hand side by applying the same transformation on all the three planes
that is red, green and blue planes I get a compliment image or you find that this is same as the
photographic negative of my color image. In the same manner this is another color image and
969
if I apply the same transformation to the red, green and blue components of this particular
color image, then I get the corresponding negative or the compliment color image as shown
on the right hand side.
The next application that we will consider of this full color image processing is color slicing.
You find that in case of RGB image, we have said that the application of color slicing is to
highlight the regions of certain intensity ranges or certain intensity region. In the same
manner the application of color slicing in case of color image is to highlight certain color
ranges and this can be applied, this is useful for identification of objects of certain color from
the background or to differentiate objects of some color from some other color.
The simplest form of color slicing can be that we can assume that all the colors of interest lies
within a cube of width say w and this cube is centered at a prototypical color whose
components are given by some vectors say (a1, a2, a3). So, as given in this particular
diagram. So, here I assume that I have this cube of width w, which under colors of interest
are contained within this cube and the center of this cube is at a prototypical color which is
given by the color components (a1, a2, a3).
970
(Refer Slide Time: 21:04)
And the simplest type of transformation that we can apply is we can have the transformation
w
of this form that si = 0.5 if say rj - aj > , for all values of j in [1 ,3] and I said this is equal
2
to ri, otherwise. Okay and this computation has to be done for all values of i, i = 1, 2 and 3.
So, what it means that all those colors, which lies outside this cube of width w centered at
location (a1, a2, a3), all those colors will be represented by some insignificant color where all
the red, green and blue components will attain a value of 0.5. But inside the cube I will retain
the original color.
971
So, by using the, this transformation you find that from this color image if I want to extract
the regions which are near to red then, I get all those red component as extracted in this right
image and for all other points, where the color is away from red you find that they have got a
gray shade.
Now, for this kind of application instead of considering all the colors of interest lying within
the cube we can also consider all the colors to be lying within a sphere centered at location
(a1, a2, a3). So, here it is needless to say that if the vector, the central location (a1, a2, a3),
this tells you that what is the color of interest and the width of the cube or the radius of the
sphere, whichever may be the case tells us that what is the variation from this prototype color
that we say that those colors are also of interest.
The other kind of application of this full color image processing is say correction of tones or
tone correction. Now, again I can find on analogue in intensity image in a simple black and
white intensity image where we have said that an image can be low contrast it may be an
image may be dark, it may be light or bright or it may be low contrast depending upon
distribution of the intensity values. In the same manner for color images we define the tone.
So, a color image may have a flat tone, it may have a light tone or it may have a dark tone
and these tones are determined by the distribution of the intensity values of different RGB
components within the image. So, let us see that how these images look like in case of a color
image.
972
(Refer Slide Time: 24:49)
So, here you find that on the left we have shown an image, which is flat in nature in the
middle we have an image which is having light tone and on the extreme right we have an
image which is having dark tone.
Now, the question is how we can correct the tone of this color image? Again we can apply
similar type of transformations as we have done in case of intensity image for contrast
enhancement. So, the kind of transformations that we can be that can be applied here is
something like this. If an image is flat the kind of transformation function that we can use for
this flat image is of this form. So, here it is L-1, here also it is L-1. So, if you apply this type
of transformation to all the red, green and blue components of this flat image what we get is a
corrected image.
Similarly, an image which is light whose tone is light, here also we can apply a kind of
transformation, here you find that what is needed to be done is, if this image appears to be
darker then that will be a corrected image. So, the kind of transformation that we can apply is
something like this. So, here it is L-1, here also it is L-1. So, here what happens is a wide
range of intensity values, of the intensity values in the input image is mapped to a narrow
range of intensity values in the output image. So, that gives you the tonal correction for an
image which is light.
Similarly, for the image which is dark, the kind of transformation that can be applied here is
just reverse of this. So, the transformation that we will apply in this case we will have this
type of nature. So, here we have L-1, that is the maximum intensity value here also we have
973
the L-1 that is maximum intensity value. So, here the kind of operation that we are doing is a
narrow range of intensities in the input image is mapped to a wide range of intensities in the
output image.
So, by applying these types of transformations we can even go for tonal correction of the
color images. Of course the other kind of transformation that is histogram based processing
can also be applied for color images as well, where the histogram equalization or histogram
matching kind of techniques can be applied on different planes, different color planes red,
green and blue color planes of the input color image. And of course, in such cases in many
cases it is necessary that after the processing the processed image that you get that need to be
balanced in terms colors.
So, all these different color image processing techniques that we have discussed till now you
find that they are equivalent to point processing techniques that we have discussed in
connection with or intensity images or black and white images. Now, in case of our intensity
image we have also discussed another kind of processing technique that is the neighborhood
processing technique. Similar neighborhood processing technique can also be applied in case
of color images where, for processing an image it is not only the color at a particular intensity
location that we will consider but we also consider the colors at the neighboring intensity
values.
So, we will talk about two such processing operations the first one that we will consider in
this category is smoothing operation. So, for this smoothing what will have is, that the in a
974
smooth image the color components C(x, y), C x, y will be given by
1
C x, y = C(x,y) . So, where the C(x, y) this is actually a vector having three
k (x,y)Nx,y
components in RGB space this will be red, green and blue components and this averaging has
to be done for all (x, y) for all locations (x, y) which is in the neighborhood of point (x, y).
Okay.
So, here I can simply do this operation in a plane wise manner, where we can write that
1
C x, y = R(x,y) for all (x, y) within the neighborhood of
k (x,y)Nx,y
(x, y). Similarly,
1 1
k (x,y)Nx,y
G(x,y) again for all (x, y) within the neighborhood of (x, y) and B(x,y)
k (x,y)Nx,y
where again this summation is carried out over the same neighborhood of (x, y) and these
vectors, the average of these vectors gives us is called the smooth image.
So, the smooth image in this particular case we find that on the left hand side we have the
original color image and of the right hand side have the smooth image where this smoothing
is carried over, a neighborhood size of side by side. So, as we have done the smoothing
operation in the same manner we can also go for sharpening operation and we have discussed
in connection with our intensity images that an image can be sharpened by using second
derivative operators like Laplacian operator.
975
So, here again if I apply the Laplacian operator on all three planes the red plane, green plane
and the blue plane separately and then combine those results, what I get is a sharpened image.
So, by applying that kind of sharpening operation a sharpened image can appear something
like this. So, here on the left hand side we have showed original image and on the right hand
side you find that the image is much more sharp than the image on the left.
Now, we know go for this neighborhood operations like image smoothing or image
sharpening, the type of operations that we have discussed is, the per color plane operation
that is every individual color plane is operated individually and then those processed color
planes are combined to give you the colored processed image. Now, as we said the same
operation can also be done by considering the vectors.
Okay or if I do the same operation in the HSI color plane, where we can modify only the
intensity component keeping the HS components unchanged. In such cases it the results that
you obtained in the RGB plane and the result that obtained in case of HSI plane may be
different and I give you as an exercise to find out what why this difference should come. So,
with this we finish our discussion on color image processing. Thank you.
976
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-56.
Different Approaches for Image Segmentation.
Hello, welcome to the video lectures series on digital image processing. Today, when we start
our discussion on image segmentation, we are going to talk about or we are going to start our
discussion on another domain of image processing which is called image analysis. So, here
our aim will be to extract some information from the images so that, those information can be
used for high level image understanding operation.
So, in today’s discussion we will see what is image segmentation, we will talk about what are
different approaches of image segmentation and we will see that image segmentation is
mainly categorized into one of the 2 categories the segmentation is either discontinuity based
or the segmentation is region based. Then we will talk about different edge detection
operations and these edge detection operations are useful for discontinuity based image
segmentation technique. Then we will see that how to link those edge points which are
extracted through different image, through different image edge detection operators, so that
we can get a meaningful edge.
And under this linking of edge points we will talk about 2 specific techniques. One is the
local processing technique other one is the global processing or Hough transformation based
technique. Now, let us see what is meant by image segmentation? By image segmentation,
977
what you mean is a process of subdividing an image into the constituent parts or objects in
the image.
So, the main purpose of subdividing an image into it is constituent parts or objects present in
the image is that that we can further analyze each of these constituents or each of the objects
present in the image, once they are identified or we have some divided them. So, each of
these constituents can be analyzed to extract some information, so that those information are
useful for high level machine vision applications.
Now, one you say that segmentation is nothing but a process of subdivision of an image into
its constituent parts a question naturally arises that at which level this subdivision should
stop. That is what is our level of segmentation? Naturally, the subdivision or level of
subdivision or the level of segmentation is application dependent. Say, for example, if we are
interested in detecting the movement of vehicles on a road. So, on a busy road we want to
find out what is the movement pattern of different vehicles and the image that is given that is
an aerial image taken either from a satellite or from a helicopter.
So, because in this particular case our interest is to detect the moving vehicles on the road.
So, the first level of segmentation or the first level of subdivision should be to extract the
road from those Aerial images and once we identify the roads then we have to go for further
analysis of the road so that we can identify every individual vehicle on the road and once we
have identified the vehicles then we can go for vehicle motion analysis.
So, here you find that in this particular application though an aerial image will contain a large
area, many of the areas will have information from the residential complexes many of the
areas will have information of water body say for example, sea, a river or a pond, many of the
areas will contain information of agricultural lands. But our application says that we are not
interested in water bodies, we are not interested in residential areas, neither we are interested
in agricultural lands but we are only interested in the road segment and once we identify the
road segment then we have to go for further subdivision of the road so that we can identify
each and every vehicle on the road.
So, as I said that our subdivision of an image at the first level should stop after we are able to
extract or identify the road component, the road segments and after that we have to
subdivided, subdivide the road component to identify the vehicles and we need not go for
segmentation of the vehicle in its consequent part because that is not of our interest.
978
Similarly, we should not or we need not segment or analyze the residential complexes or
water bodies or agricultural lands for further subdivision into its constituent parts.
So, as we said that this segmentation or level of subdivision is application dependent. Now,
for any automated system, what we should have is automatic processes which should be able
to subdivide an image or segment an image to our desired level. So, we will appreciate that
image segmentation is one of the most important task in machine vision applications at the
same time image segmentation is also one of the most difficult tasks in this image analysis
process and you will easily appreciate, that the success of the image analysis operations or
machine vision applications is highly dependent on the success of the autonomous
segmentation of objects or segmentation of an image.
So, this image segmentation though it is very difficult but it is very, very important task and
every machine vision application software or system should have a very, very robust image
segmentation allowed. So, now let us see that what are the different image segmentation
algorithms or techniques that we can have. Now, as we have just mentioned that image
segmentation approaches are mainly of 2 different types.
So, we have 2 different approaches of image segmentation. One of the approach, as we have
just said is the discontinuity based approach and the second approach is what is called
similarity based approach. In discontinuity based approach, the partition or subdivision of an
image is carried out based on some abrupt changes in intensity levels in an image or abrupt
changes in gray levels of an image.
979
So, under discontinuity based approach our major interest, we are mainly interested in
identification of say isolated points or identification of lines present in the image or
identification of edges. So, under discontinuity based approach we are mainly interested in
identification of isolated points or identification of lines or identification of edges. In the
similarity based approach, the approach is slightly different here what we try to do is, we try
to group those pixels in an image which are similar in some sense.
So, the simplest approach under this similarity based technique is what is called thresholding
operation? So, by thresholding what we mean is? As we have already said that if we have
images where every pixel is coded with 8 bits then we can have intensities varying from 0 to
255 and we can decide a threshold following some criteria. So, we decide that we will have a
threshold level of say 128. So, we decide that all the pixels having intensities of, having an
intensity value greater than 128 we will belong to some region whereas all that pixels having
intensity value less than 128 will belong to some other region.
So, this is the simplest thresholding operation that can be used for image segmentation
purpose. The other kind of segmentation under this similarity based approach can be a region
growing based approach. Now, the where this region growing works is, suppose we start
from any particular pixel in an image, then we group all other pixels which are connected to
this particular pixel that means the pixels which are adjacent to this particular pixel and
which are similar in intensity value.
So, our approach is that you start from a particular pixel and all other pixels which are
adjacent to this particular pixel and which are similar in some sense in the simplest cases,
similar in some sense means we say that the intensity value of that adjacent pixel is almost
same as the intensity value of the pixel from which, from where we have started growing the
region.
So, starting from this particular pixel you try to grow the region based on connectivity or
based on adjacency and similarity. So, this is what is the region growing based approach. The
other approach under this similarity based technique is called region splitting and merging.
So, under this region splitting and merging what is done is First you split the image into a
number of different components following some criteria and after you have split the image
into a number of smaller sized sub images or smaller size components, then you try to merge
sum of those sub images which are adjacent and which are similar in some sense.
980
So, your first approach is the first operation is, you split the image into smaller images and
then try to merge those smaller sub images, wherever possible to have a larger segment. So,
these are the different segmentation approaches that we can have and in today’s discussion
and in subsequent discussions, we will try to see details of these different techniques. So, first
we will start our discussion on this discontinuity based image segmentation approach.
So, as we have already said that in discontinuity based image segmentation approach our
interest is mainly to identify the points, isolated points or we want to identify the edges
present in the image or we identify, try to identify the lines present in the image and for
detection of these kind of discontinuities, that is either detection of points or detection of lines
or detection of edges, the kind of approach that we will take is use of a mask.
981
So, using the masks we will try to detect isolated points or we will try to detect the lines
present in the image or we will try detect the edges in image. Now, these masks, use of masks
we have discussed earlier in connection with our discussion of image processing like image
smoothing, image sharpening, image enhancement and so on. So, there we have said that if I
consider a 3x3 neighborhood like this. We take a mask of size 3x3.
So, here on the right hand side, this is a mask of size 3x3 having different coefficient values
given as W-1,-1, W-1,0, W-1,1 and so on taking the center coefficient in the mask having a value
W0,0. Now, in this mask processing operation what is done is, you shift this mask over the
entire image to calculate some weighted sum of pixels at a particular location.
Say, for example, if I place this mask at a location (x, y) in our original image, then using all
other different mask coefficients we try to find out a weighted sum like this. Say,
1 1
R= Wi,j f(x+i,y+j) and this component we called as a value R. Use of this mask as I
i=-1 j=-1
have said, that we have seen in connection with image sharpening where you are taken
different values of the coefficients. In case of image smoothing we have taken the values
mask coefficients to be all 1s so that leads to an image averaging operation.
So, depending upon what are the coefficient values of this mask that we chose, we can have
different types of image processing operations. Now, here you find that when I use this mask,
then depending upon the nature of the image around point (x, y), I will have different values
of R. So, when it comes to isolated point detection we can use a mask having the coefficient
values like this, that the center coefficient in the mask will have a value equal to 8 and all
other coefficients in the mask will have a value of -1.
982
(Refer Slide Time: 17:11)
Now, we say the point is detected at a location say (x, y) in the image where the mask is
centered, if the corresponding R value, we are computing the value of R. So, we say that a
point is located at location (x, y) in the original image, if the corresponding value of R, the
absolute value of this is greater than certain threshold say T, where this T is a non-negative
threshold value. This is a non-negative threshold.
So, if the value of R computed at location (x, y), where this mask is centered is the absolute
value of R is greater than T, where T is a non-negative threshold then we say that a point, an
isolated point is detected at the corresponding location (x, y). Similarly, for detection of lines
the mask can be a something like this. Here, for detection of horizontal lines you find that
you have used mask where the center row or the middle row having all values equal to 1 and
the top row and the bottom row is having all values equal to -1, all the coefficient values
equal to - 1 and by a moving this mask over the entire image it detects all those points which
lies on a horizontal line.
983
(Refer Slide Time: 19:20)
Similarly, the other mask which is marked here as 45, if you move this mask over the entire
image this mask will help to detect all the points in the image which are lying on a line,
which is inclined at an angle of 45o. Similarly, this mask will help to detect all the points
which are lying on a line, which is vertical and similarly, this mask will detect all the points
lying on a line which is inclined at an angle of - 45o.
Now, for line detection, what is done is you apply all these masks all these four masks on the
image. And if I take a particular masks say ith mask and any other masks say jth mask and if I
find that the value computed with the Ri with the ith mask, the absolute value of this is greater
984
than Rj, where Rj is the value computed with the jth mask for all j i , this says that the
corresponding point is more likely to be associated with a line in the direction of the mask i.
So, as we said what we are doing is we are taking all the four masks, apply all the four masks
on the image compute the value of R for all these masks. Now, if for an ith mask if I find that
Ri that the absolute value of Ri is greater than absolute value of Rj, for all j which is not equal
to i in that case we can conclude that this particular point at which location this is true, this
point is more likely to be contained on a line which is in the direction of mask i.
So, these are the 2 approaches, the first point we have said we have given a mask which is
useful for identification of isolated points and the second set of masks is useful for detection
of points which are lying on a straight line. Now, let us see that how we can detect an edge in
an image. Now, edge detection is one of the most common approaches, most commonly used
approach for detection of discontinuity of an image in an image.
So, we say that an edge is nothing but a boundary between 2 regions having distinct intensity
levels or having distinct gray levels. So, it is the boundary between 2 regions in the image,
these 2 regions have distinct intensity levels.
So, as is shown in this particular slide. So, here you find that on the top, we have taken 2
typical cases. In the first case we have shown a typical image region, where we have a
transition from a dark region to a brighter region and then again to a dark region.
985
So, as you have move from left to right, you find that you have transitions from dark to bright
then again to dark and in the next one, we have a transition as we move from left to right in
the horizontal direction, there is a transition of intensity levels from bright to dark and again
to bright. So, these are the typical scenarios in any intensity image, where we have different
regions having different intensity values and an edge is the boundary between such regions.
Now, here in this particular case if I try to draw the profile intensity profile along a horizontal
line you find that here the intensity profile along a horizontal line will be a something like
this. So, we have a transition from dark region to bright region then from bright region to
dark region, whereas in the second case the transition will be in the other direction. So, bright
to dark and again to bright.
So, here you find that we have modeled this transition as a gradual transition not as an abrupt
transition the reason is, because of quantization and because of sampling all, almost all the
abrupt transitions in the intensity levels are converted to such gradual transitions. So, this is
your intensity profile along horizontal scan line. Now, let us see that if I differentiate this, if I
take the first derivative of this intensity profile then the first derivative will appear like this,
in the first case, the first derivative of this intensity profile will be something like this and the
first derivative of the second profile will be something like this.
So, we find that the first derivative responds, whenever there is a discontinuity in intensity
levels that is whenever there is a transition from a brighter intensity to a darker intensity or
wherever there is a transition from the darker intensity to a brighter intensity. So, this is what
we get by first derivative. Now, if I do the second derivative, the second derivative will
appear something like this and in the second case, the second derivative will be just the
opposite, it will be something of this form that is that will be like this.
So, you find that first derivative is positive at the leading edge, whereas it is negative at the
trailing edge. Similarly, here and you find that the second derivative if I take the second
derivative, the second derivative this positive on the darker side of the edge and it is negative
on the brighter side of the edge and that can be verified in both the situations that the second
derivative is becoming positive on the darker side of the edge but it is becoming negative on
the brighter side of the edge.
However, we will appreciate that this second derivative is very sensitive to the noise present
in the image and that is the reason that the second derivative operators are not usually used
986
for edge detection operation but as the nature says that we can use these second derivative
operators for extraction of some secondary information that is we can use the sign of the
second derivative to determine where the point is lying on the darker side of the edge or a
point is lying on the brighter side of the edge. And not only that here you find that there are
some 0 crossings in the second derivative.
And this 0 crossing information can be used to exactly identify the location of an edge,
whenever there is a gradual transition of the intensity from dark to bright or from bright to
dark. So, it clearly says that using these derivative operators we have seen earlier that the
derivative operators are used for image enhancement to enhance the details present in the
image. Now, we see that this derivative operators, operations can be used for detection of
edges present in the image.
Now, how to apply these derivative operations? So, here you find, if I want to apply the first
derivative, then first derivative can be computed by using the gradient operation. So, when I
have an image say f(x, y), I can define the gradient of this image f(x, y) in this form. Say,
Gx
gradient of this image f = , a vector obviously the gradient is a vector so it will be
Gy
Gx f f
Gy and this Gx is nothing but x , y , which is the Gy.
So, Gx is the partial derivative of f along x direction and Gy is the partial derivative of f along
y direction. So, we can find out the gradient of the image f by doing this operation. Now, for
987
edge detection operation, what we are interested in is the magnitude of the gradient. So, the
magnitude of the gradient we will write like this f , which is nothing but magnitude of the
12
vector f and which is nothing but f = G 2x + G 2y . And you find here that computation
of the magnitude involves squaring the 2 components Gx, Gy, adding them and then finally
taking the square root of this addition.
Obviously, squaring and computing the square root, these 2 are the computationally intensive
process, so an approximation of this is taken as magnitude of the gradient to be sum of |Gx|,
that is gradient in the x direction + |Gy|, that is gradient in the y direction. So, this magnitude
of the gradient whether I take this or an approximation that is to be this. This tells us what is
the strength of the edge at location (x, y).
It does not tell us anything about what is the direction of the edge at point (x, y). So, we have
to compute the direction of the edge that is, the direction of f . Okay and the direction of
Gy
gradient vector f at location (x, y), we can write it as α x, y = tan -1 , where Gy as we
Gx
have said that it is gradient in the y direction and Gx is the gradient in the x direction.
Now, we find that this α x, y it tells us, what is the direction of gradient? Okay, that is the
vector. But actually x direction is perpendicular to the direction of the f . Okay. So, we
have the first derivative operators or the gradient operators and using the gradient operators
we can find out what is the strength of an edge at a particular location (x, y) in the image and
988
we can also find out what is direction of the edge at that particular location (x, y) in the image
and there are various ways in which these first derivative operators can be implemented. And
here we will show some operators, some masks which can be used to compute the image
gradient.
So, the first one that we are showing is called a Prewitt edge operator. You have find that in
case of Prewitt edge operator, we have 2 masks one mask identifies the horizontal edges and
the other masks identifies the vertical edges. So, the mask which finds out the horizontal
edges that is equivalent to having the gradient in the vertical direction and the mask which
computes the vertical edges is equivalent to taking the gradient in the horizontal direction.
So, using these 2 masks by passing these 2 masks over the intensity image, we can find out
the Gx and Gy component at different locations in the image and once we compute the Gx and
Gy, we can find out what is the strength of an edge at that particular location and what is the
direction of an edge at that particular location? The second mask which is also a first
derivative in mask is called a Sobel operator.
989
(Refer Slide Time: 33:53)
So, here again you find that we have 2 different masks. One mask is responsible for
computation of horizontal edges; the other mask is responsible for computation of the vertical
edges. Now, if you try to compare this Prewitt operator, Prewitt edge operator and Sobel edge
operator. You will find that this Sobel edge operator gives an averaging effect over the
image. So, because this Sobel edge operator gives an averaging effect. So, the effect due to
the presence of spurious noise in the image is taken care of to some extent by the Sobel
operator but it does not take, but it is not taken care of by the Prewitt operator.
990
Now, let us see that what kind of result we can use we can have by using these edge detection
operators you find that here we have shown result on a particular image. So, this one is our
original image on the top left this is our original image on the top right, it is the edge
information which is obtained using the Sobel operator and the edge components in this
particular case are the horizontal components.
The third image is again by using the Sobel operator, but here the edge components are the
vertical edge components and the fourth one is the result which is obtained by combining this
vertical component and the horizontal component. So, here you find that if you compare this
image with your original image, you find that different edges present in the original image,
they have been extracted by using this Sobel edge operator and by combining the outputs of
the vertical masks and the output of the horizontal mask we can have the edges, edge
components I mean we can identify the edges which are there in various directions.
So, that is what we have got in the fourth slide, in fourth image. So, this Prewitt operator and
the Sobel operator as we have said that these 2 operators are basically first derivative
operators and as we have already mentioned that for edge detection operation the kind of
operators, derivative operators which are used mainly the first derivative operators and out of
these 2 the Prewitt and Sobel operator.
It is the Sobel operator which is generally preferred because the Sobel operator also gives a
smoothing effect and by which we can reduce the spurious edges which can be generated
because of the noise present in image. And we have also mentioned that we can also use the
second derivative operator for edge detection operation but the disadvantage of the second
derivative operator is it is sensitive to noise and secondly as we have seen that second
derivative operator gives us double edges.
Once for every transition, we have, we have double edges which is generated by the second
derivative operators. So, that is the, these are the reasons why second derivative operators are
not normally preferred for edge detection operation. But the second derivative operators can
be used to extract the secondary information. So, as we have said that by looking at the
polarity of the second derivative operator output we can determine whether a point lies on the
darker side of the edge or the point or a point lies on the brighter side of the edge and the
other information that we can obtain from the second derivative operator is from the zero
crossing.
991
We have seen that second derivative property are always keeps zero crossing between the
positive side and the negative side and the zero crossing points accurately determine the
location of an edge whenever an edge is a smooth edge. So, those second derivative operators
are not normally used for edge detection operation but they can be used for such a secondary
information extraction.
So, one such second derivative operator is what is called the Laplacian operator. We have
seen the use of Laplacian operator for enhancement of image details. Now, let us see that how
these Laplacian operators can be used to help in edge detection operation. And as you already
2f 2f 2f
know that the Laplacian operator of the function f is given by , where is the
x 2 y 2 x 2
2f
second derivative along x direction and is the second derivative in the y direction and
y 2
we have also seen earlier that a mask which implements the second derivative operator is
given by this, where we are considering only the horizontal direction and the vertical
direction for computation of the second derivative and we have also discussed earlier that if
in addition to this horizontal and vertical directions we also consider the diagonal directions
for computation of the second derivative in that case the center element will be equal to 8 and
all the diagonal elements will also be equal to - 1.
So, this is the one that we will get if we consider in addition to horizontal direction and
vertical direction the diagonal directions for computation of the second derivative and we can
992
also have the inverse of this where all the negative signs will become positive and the
positive sign will become negative.
So, this is how we can have a mask for computation of the second derivative or computation
of Laplacian of a function f. But as we have said that this Laplacian operator normally is not
used for edge detection operation because it is very, very sensitive to noise and secondly it
leads to double edges at every transition but this plays a secondary role to determine whether
a point lies on the bright side or a point lies on the darker side and it is also used to accurately
locate or to accurately find out the location of an edge.
Now, along with this Laplacian operator as we said that the Laplacian operator is very
sensitive to noise. To reduce the effect of noise what is done is, the image is first smoothed
using a Gaussian operator and that smooth image can now be operated by this Laplacian
operator and these 2 operations can be used together to have an operator something like,
something which is called a Laplacian of Gaussian or LOG operator.
So, the essence of LOG or Laplacian of Gaussian operator, LOG that is Laplacian of
Gaussian operator we can have a Gaussian operator, the Gaussian can be represented by this
x 2 +y 2
say h(x,y) = exp - 2
So, this is our Gaussian operator with which is having standard
2σ
993
deviation of sigma. Now, here if we let x 2 + y 2 = r 2 , then the Laplacian of this
r 2 - σ2 r2
2 h = 4 exp - 2 .
σ 2σ
So, as we said that our operation is firstly we want to smooth the image using the Gaussian
operator and that smooth image has to be operated by the Laplacian operator and if these 2
operations are done one after another then this reduces the effect of the noise present in the
image. However, these 2 operations can be combined to have a Laplacian of Gaussian
operation that means we can operate the image with the Laplacian of a Gaussian.
So, Laplacian of a Gaussian operation of the image gives us an equivalent result. Now, we
find that in this slide we have shown that this is a Laplacian operator, this is a Gaussian mask
in 2 dimension and if I take the Laplacian of this, Laplacian of the Gaussian will appear as
shown here. Now, this Laplacian of Gaussian can again be represented in the form of a mask
which is called a Laplacian of Gaussian mask.
994
So, if I represent this Laplacian of Gaussian in the form of a 2 dimensional mask, the
Laplacian of Gaussian mask appears like this. So, here you find that a Laplacian of Gaussian
mask or LOG mask that we are shown is of 5x5 mask and if you compare this with the LOG,
the Laplacian of Gaussian expression and the surface you find that it says that at x = 0, the
LOG, the Laplacian of Gaussian is positive then it comes to negative, maximum negative
then tries to move towards a value 0 and the same is obtained using this particular mask that
here you find that that at the center the value is maximum positive which is 16 just away from
this it becomes - 2 then it goes towards zero that it is becoming - 1.
995
So, if I apply this LOG the Laplacian of Gaussian mask on an image, I can detect the location
of the edge points. So, the location of the edge points you find that here in this particular
image we have shown an image and of the right hand side, we have shown the output that is
obtained using the Sobel operator. So, this is the output which is used using the Sobel
operator and the bottom one shows the output of the LOG operator. So, here you find that all
these bright edges these are actually the location of the edges present is the original image.
So, this establishes as we said earlier that LOG operator, the Laplacian of Gaussian operator
can identify, can determine what is the location of an edge present in an image. Now,
whichever operator we used for detection of edges, whether these are the first derivative
operators or the second derivative operators, as we said that the second derivative operators is
not normally used for edge detection operation because of other problems but it is used to
extract the secondary information.
But the first derivative operators like Sobel operator should ideally give us all the edge points
that is any transition from a bright region to darker region or from darker region to a brighter
region but you find that when you take an image, may be it is because of the noise or may be
because of non uniform illumination of the same.
When you apply the Sobel operator to an image the edges are not always connected, edge
points that you get they are not always connected. So, what we need to do is we have to link
the edge points to get some meaningful edges, to extract some meaningful edge information.
now, there are usually 2 approaches in which this linking can be done. So, for edge linking
we can have 2 approaches. One is called one is the local processing approach and the other
one is global processing approach.
996
So, our aim is whether we are going for local processing or we are global processing. We are
going for global processing. Our aim is that we want to link all those edge points which are
similar in some sense. So, that we can get a meaningful edge description. So, firstly we will
talk about the local processing approach for edge linking. So, first let us talk about the local
processing technique, in local processing technique what is done is, you take an image which
already edge operated.
So, for edge operation if I assume that we are using the Sobel edge operation. Suppose the
image is already operated by the Sobel edge operator, then we consider say every point in
that edge image if I call it as an edge image I consider each and every points in the edge
image. So, I consider, let us take a point (x, y) in the image which is already operated by the
Sobel edge operator, then we will link all other points in that edge image which are in some
997
neighborhood of (x, y) and which are similar to (x, y). So, when I say that 2 points are similar
we must have some similarity measure. So, we have to have some similarity measure.
So, for this similarity measure, what we use is the first one is the strength of the gradient
operator and we also use the direction of the gradient. So, these 2 together are taken as
similarity measure to consider whether we will say that 2 points are similar or not.
So, our operation will be something like this that, we take a point say (x', y') , which is in the
neighborhood of some point (x, y) in the image. And we say that these 2 points (x', y') and the
point (x, y) they are similar if f x, y , that is the strength of the gradient operator at
location (x, y) and f x', y' , they are very close that means this should be some
nonnegative threshold T. And you also said that the directions should also be similar that
means |α(x, y) - α (x', y') |. This should be less than some angle threshold A.
So, whenever we have a point (x', y') , which is in some neighborhood of (x, y) and the points
are similar that means they have the similar gradient magnitude value and the similar angle
for the edge orientation, we say that these 2 points are similar and those points will be linked
together and such operation has to be done for each and every other point in the edge detected
image to give us some meaningful edge description.
So, let us stop our discussion at this point today, we will continue with our discussion in our
next class. Thank you.
998
999
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-57.
Image Segmentation: Global Processing (Hough Transform).
Hello, welcome to the video lecture series on digital image processing, that will be discussing
today that is a global processing approach which is also called Hough transformation. So,
after today’s lecture the student will be able to explain and implement the local processing
technique for linking the edge points and also the global processing technique that is the
students will be implement the Hough transformation to link the edge points.
1000
So, let us just see that what we have seen in the last class. We have said that ideally edge
detection technique should identify the pixels line on the boundary between the regions. We
say it is the boundary between the regions because we assume that it is a transition, this
region is a transition region from a low intensity values, from low intensity region to a high
intensity region or from a high intensity region to a low intensity region. But while trying to
implement this, it has been found that the edge points which we expect to be continuous to
give us a meaningful boundary description of a segment that cannot be achieved in practice.
Now, this is mainly due to two reasons. First one is due to non-uniform illumination of the
scene, if the scene is not uniformly illuminated that leads to detection of edge points where
the boundary points will not be continuous and the second reason for getting this non-
continuous boundary points is the presence of noise, that is if the image is noisy then after
doing the edge detection operation either the boundary points will not be continuous or there
may be some spurious edge points which are not actually edge points or the boundary points
of the any of the regions.
So, to tackle this problem we have to go for linking of the edge points so that after linking we
will get a meaningful description of the boundary of a particular segment. So, we have said
that there are mainly two approaches for edge linking operation. The first approach is the
local processing approach and the second approach is the global processing approach. In the
local processing approach what we do is we take an edge detected image, that is the image
that we have as an input, this is an image containing only the edge pixels.
1001
So, we assume that edge points will be white and all the non edge points will be black and in
this edge detected image we analyze each pixel in a small neighboρod. So, for every point (x,
y) if that is an edge pixel we take a small neighboρod of point (x, y) and we link the other
edge points within this neighboρod with the point (x, y) if they are similar in nature. So,
whenever we find that within the neighboρod we have two edge points which are similar in
nature then we link these edge points and after linking all such edge points, we get a
boundary of pixels that are similar in nature.
So, basically what we get is a well-defined boundary of a particular segment. So, when we
say that we have to link the edge points which are similar in nature, then we have to have
some similarity measure. So, remember that after edge detection operation for every edge
point we get two quantities, one is the boundary strength at that particular edge point and the
second quantity that is the direction of edge at that particular edge point.
So, by comparing the boundary strength as well as the direction of boundary at point (x, y)
and at a point which is in the neighboρod of (x, y) we try to find out whether these two points
are similar or not. So, if these two points are similar we simply link them.
1002
(Refer Slide Time: 5:21)
So, for this what you have to do is, we take an edge point (x, y) and we find out a point
(x', y') , which is in the neighboρod of (x, y). So, what we are doing is, we are taking this
point (x, y) and considering the point (x', y') , which is in the neighboρod of (x, y) and we
find out the difference of edge strength. We know that this f x, y is the image function
and f(x, y) gives the intensity value at location (x, y) in the image. And f x, y this gives
So, we compute the gradient at location (x, y) and also the gradient at location (x', y') and if
the difference between these two is less than or equal to certain threshold T, where T is a
non-negative threshold and at the same time so, this gives you whether the strength is similar
or not and at the same time we also have to check whether the direction at this edge which is
given by this α(x, y) at location (x, y) and α (x', y') at location (x', y') . So if there, if the
orientation or the direction of the edge is also similar, that is the difference is less than or
equal to some angle threshold value A, then we consider these two points (x, y) and (x', y') to
be linked together.
So, in this particular case our (x', y') has to be in the neighboρod of (x, y). So, that is what is
represented by (x', y') N xy . So, this is the local processing technique but as we said that we
are going for linking of the edge points because the edge points are discontinuous and
normally the neighboρods size that is taken is a small neighboρod. So, if (x', y') is not within
1003
the neighboρod of (x, y) over a given, over a given definition of the neighboρod in that case
(x', y') cannot be linked with the edge point (x, y).
So, to solve this problem because (x', y') , the two edge points can be far apart depending
upon the amount of noise that you have or the depending upon the lighting condition, but we
have to be, we should be able to link those points as well. So, in such cases the local
processing technique does not help to link the edge points. What we have to go for is the
global processing technique and the global processing technique that we will discuss today is
called the Hough transformation. So, it is Hough transform. So, what is this Hough
Transform?
The Hough transform is a mapping from the spatial domain to a parameter space. So, let us
take an example, suppose I have this x, y co-ordinate system and I have a single straight line
in this x, y co-ordinate system and we know that in the slope intercept form, this straight line
is described by an equation which is given by y = mx + c , where m is the slope of the
straight line and c is the intercept value.
Now, for a particular straight line the values of m and c will be constant. So, I represent them
by m1, c1 indicating that these two values, the slope and the intercept are constant for a
particular given straight line in the x-y plane. So, you find that this particular straight line is
now defined, is now specified by two parameters, one is one of the parameters is m1, which is
the slope of the straight line and the other parameter is c1, which is intercept.
1004
Now, if I mapped this straight line to the parameter space because I have two parameters m
and c, that is slope and intercept. So, our parameter space will also be a 2 dimensional space.
So, what I am trying to do is? I am trying to map this straight line in the parameter space. So,
I draw this mc plane. I will have the slope along one direction and the intercept c along
another direction and since for this given straight line y = m1x + c1 , m1 that is the slope and
So, this particular straight line will be represented by a single point in the m-c plane and this
point is at location m1 and c1. So, we find that when I mapped a given straight line in the
spatial domain to the parameter space, a straight line gets mapped to a single point. Now, let
us see what happens if we are given a point in the spatial domain that is in the x-y plane, we
are given a particular point let us see the situation, what will happen in this case.
So, now what I have is, I have again this x-y plane and in the x-y plane, I have a single point
and let us assume the co-ordinate of this point is (x1, y1). Now, you find that equation of any
straight line in the x-y plane as we have seen earlier in the slope-intercept form is given by y
is equal to mx plus c. Now, if this straight line y is equal to mx plus c has to pass through this
given point (x1, y1), then (x1, y1) must satisfy this equation.
So, in effect what I will get is, I will get an equation that is y1 = mx1 + c , because this line
y = mx + c is passing through the given point (x1, y1) and this is the equation that has to be
satisfied by all the straight lines that passes through this point (x1, y1). Now, we find that
1005
ideally I can have infinite number of straight lines passing through this given point (x1, y1).
So, there will be infinite number of straight lines like this and for each of these straight lines
the value of the slope that is m and the intercept c it will be different.
So, if I now map this single straight line in our parameter space that is mc plane, you find that
this m and c, these two become the variable whereas y1 and x1 they are the constants. So, now
what I can do is? I can rewrite this equation that is y1 = mx1 + c , in this way, I can write it as
c = -mx1 + y1 . So, here what I have is, I have this x1 and y1, these two are constants and c and
m are variable.
So, if I map this point (x1, y1) into our parameter space. So, what I will have now is? I will
have this mc-plane and in the mc-plane you have find that c = - mx1 + y1 this now the
equation of a straight line. So, effectively what I get is? I get a straight line in the mc-plane
following the equation c = - mx1 + y1 . So, we have seen two cases that is one case a straight
line in the x-y plane is mapped to a point in the mc-plane and in the other case if we have a
point in the x-y plane that is mapped to a straight line in the mc-plane. And this is the basis
of the Hough transformation by using which we can link the different edge points which are
present in the image domain which is nothing but the spatial domain or we can say that this is
nothing but our x-y plane.
So, now let us see that, what happens if I have two points in our spatial domain or the x-y
plane. So, again I go to our spatial domain or x-y plane. So, this is my x-axis and this is my y-
1006
axis and suppose I have two points, one is say (xi, yi) and the other point I have in this spatial
domain is (xj, yj). Now, if I draw a straight line passing through these points (xi, yi) and (xj,
yj). So, this is the straight line, which is passes through these two points (xi, yi) and (xj, yj)
and we know that this straight line will have an equation of the form y equal to say m dash x
plus c dash.
So, what we have to do by using the Hough transformation is that, given these two points (xi,
yi) and (xj, yj) we have to find out he parameters or the equation of the straight line which is
passing through these two points (xi, yi) and (xj, yj). Now, as we have seen earlier that a point
in the x-y plane is mapped to a straight line in the mc-plane or in the parameter space. So,
here in this case since we have two points in the x-y plane this will be mapped to two
different straight lines in the mc-plane.
So, if I draw in the mc-plane if I find the mapping in the mc-plane it will be something like
this. So, I have this parameter space or mc-plane. The first straight line, the first point (xi, yi)
will be mapped to a straight line like this, where the equation of this straight line will be
given by c = - mx i + yi and the second point will be mapped to another straight line say
something like this, where the equation of this straight line is the given by c = - mx j + y j and
you find that the point at which these two straight lines meet, that is this particular point this
is the one which gives me the values of m and c.
So, this will give me the value of m' and c' , and this m' and c' are nothing but the
parameters of the straight line in the x-y plane, which passes through these two given points
(xi, yi) and (xj, yj). Now, if I consider that there are infinite numbers of points or there are
large number of points lying on the same straight line in the x-y plane, each of these points
will be mapped to a particular straight line in the mc-plane like this. But each of these straight
lines will pass through this single point (m', c') in the parameter space.
So, by this what we seen is as we know that if there is a single point in the parameter in the
spatial domain or in the x-y plane, that is mapped to a single straight line in the parameter
Space that is mc-plane. So, if I have a set of collinear points in the x-y plane each of these
collinear points will be mapped to a single straight line in the parameter space or in the mc-
plane. But all these straight lines corresponding to the points which are collinear in the x-y
plane will pass through, will intersect at a single point and this point in this case is (m', c') the
values of which, that is m' and c' , this give us the slope and intercept values of the straight
1007
line on which all those collinear points in the spatial domain lie. And this is the basic essence
of using Hough transformation for linking the edge points.
Now, how do you compute this Hough transformation? So, far what we have discussed, this
in the continuous domain that is our assumption is the values of m and c, or m and c are
continuous variable. But in our implementations since we are considering the digital cases we
cannot have the continuous values of m and c.
So, we have to see that how this Hough transformation can really be implemented. So, for
implementation, what you have to do is? This entire mc space has to be subdivided into a
number of accumulator cells. So, as has been shown in this particular figure.
So, here you find that what we have done is this mc space mc-plane is divided into a number
of smaller accumulator cells. So, here we have a range of the slops which are the expected
range of slops in a particular application and the range is from a minimum, that is of the
minimum slope to a maximum slope.
So, this is the minimum slope value mmin and this is the maximum slope value, mmax.
Similarly, for c this is also sub-divided and the total range is within an expected maximum
value and a minimum value. So, cmin is the expected minimum value of the intercept c and
cmax is the expected maximum value of the intercept c. So, within this minimum and
maximum, within this range, this space is divided into a number of accumulator cells. And
1008
this array, this 2 dimensional array let us name is, name this as an array say A and each of
these accumulator cell locations can be indexed by say index i and j.
So, a cell location at location say (i, j) and I call this location as say cell location (i, j) and the
corresponding accumulator cell will have a value say A(i, j). So, this A(i, j), this particular
cell the (i, j)th cell corresponds to the parameter values, let us say mi and cj. So, an (i, j)th an
accumulator cell (i, j) having an accumulated value A(i, j) corresponds to the corresponding
parameter values mi and cj and for implementation what we do is? We initialize each of these
accumulators to 0.
So, initially A(i, j) is set to 0 and that is our initialization. Okay. So, once we have this array
of accumulator cells then what we do in the spatial domain we have a set of boundary points
and in the parameter space we have a 2 dimensional array of accumulator cells.
So, what we have to do is, we have to take a boundary point say (xk, yk), a single boundary
point in the spatial domain and we have seen earlier that these boundary point (xk, yk) is
mapped to a straight line in the parameter space that is in the mc-plane and the equation of
this straight line is given by c = - mx k + y k . Okay.
So, what we have to do is we have to find out the values of m and c from this particular
equation. Now, in our case what we have is, the values of m and c are not continuous but they
are discrete and as we have said that an (i, j)th accumulator cell corresponds to the
1009
corresponding parameter values mi and cj. So, to solve for values of m and c, our basic
equation is c = - mx k + y k .
So, here what we do is? We allow the value of m to vary from the minimum to the maximum
as we have said that we have chosen a range mmin to mmax. So, we allow the value of m to
take all possible values or all allowed values as specified in our accumulator cell ranging
from the mmin to mmax. And for each of these values of m we solve for the corresponding
value of c. Following this particular equation c = - mx k + y k .
Now, the value of c that you get by solving this particular equation for a specific value of m
that may be a real number, whereas we have to deal with the discrete case. So, whenever we
get a value which may be a real number or may be the value of c that we get which is not
allowed as per our definition of the accumulator cell then what we have to do is? This value
of c has to be rounded off to the nearest allowed value of c as specified in the accumulator
cell.
So, if I have say m number of such possible values of m, I get the m number of corresponding
values of c by solving this equation c = - mx k + y k . So, suppose by following this, a
particular choice of m, say mk or we have already used k for something else instead of calling
mk, let us call it say particular value of m, say mp. When I put this value of mp in this
equation suppose the corresponding value of c, I get after solving this equation is say cq. And
you remember that we have initialized our accumulator cells to all the cells having a value of
0.
So, whenever for a particular value of mp, I get a value of cq, that is the intercept value then
the operation I do is the corresponding accumulators cell A(p, q) is incremented by 1. So, I
make A p, q = A p, q +1 . So, this I have to do for all the points all the boundary points in
our spatial domain, that is in the x-y plane and for each of these boundary points I have to
compute it for every possible or every allowed value of m as allowed in our parameter space.
So, at the end you will find because for every computed value or computed values in mp, cq, I
am incrementing the accumulator cell by 1, where the accumulator cells were initialized to 0.
So, you find that at the end of the process if an accumulator cell A(i, j) contains a value say
Q. So, we are considering this after we consider all the boundary points in the spatial domain.
For each of these, we compute the corresponding say mp and cq for all allowed values of m, I
1010
find out the corresponding allowed values of c and for each such pair of mp and cq, I do this
incrementation operation on the accumulator cell.
So, at the end if an accumulator cell say, A(i, j) contains a value of say Q. This indicates that
there are Q number of points lying on a straight line whose equation is given by y = mi x + c j
because as we said that, this accumulator cell (i, j) corresponds to the slope value of mi and
the intercept value of cj and for every point, wherever I get a corresponding value of m and c.
The corresponding accumulator cell is incremented by 1.
So, at the end of the process if a particular accumulator cell A(i, j) contains a value Q. This is
an indication that in the spatial domain. I have capital Q number of points or boundary points
which are lying on the straight line y = mi x + c j . Now, the question is what is the accuracy of
this particular procedure, that is how accurate is this estimation mi and cj. That depends upon
how many number of accumulator cells I will have in the accumulator array.
So, if I have a very large number of accumulator cells in the accumulator array then the
accuracy of the computed m and c will be quite high, whereas if I have small number of
accumulator cells in our accumulator array then the accuracy of this computed values will be
quite less. Now, the question is how can we use this to detect the number of straight lines
present in the spatial domain. Let us consider a case like this.
So, I have an image in the spatial domain and the boundary points of this image are
something like this. So, these are the boundary points in the straight line. So, when I compute
1011
this Hough transformation I get an accumulator cell. So, you find that in this particular
straight line there are 1, 2, 3, 4, 5, 6, 7, 8, Eight number of points on this side, on this straight
line there are 1, 2, 3, 4, 5, Five number of points, on this part of the straight line there are
again five number of points, on this part of the straight line there are 4 number of points. And
if I consider that this part is also a straight line, So, on this straight line there are 3 number of
points.
So, in our accumulator cell at the end of this Hough transformation operation I will get one
cell with value equal to 8, I will get another cell with value equal to 5, I will get one more cell
with value equal to 5, I will get one cell with value equal to 4 and I get another cell with
value equal to 3. And if I chose all these values. Say, if I say that I will consider a straight
line to be significant if the corresponding accumulator cell contains a value greater than or
equal to 3, then by this process the number of straight lines that I will be able to detect is this
straight line, this straight line, then this straight line as well as this straight line. But if I say
that I consider only those straight lines to be significant where the number of points, number
of collinear points lying on the straight line is greater than or equal to 4, then I will be
detecting only this straight line, this straight line, this straight line and this straight line.
So, again here you find that by choosing that how many points to be I should consider to
lying on a straight line so that the straight line will be significant and I consider this to be a
boundary straight line that also can be varied or tuned depending upon the application by
choosing the threshold on this number of points lying on the straight line.
Now, here you find that in the mc-plane though we are able to find out the straight line
segments but this particular formulation of Hough transformation that is mapping from x-y
domain to the parameter domain that is the mc-plane has a serious problem. The problem is
in mc-plane what we are trying to do is we are trying to find out the slope intercept value of
the straight line in the spatial domain.
Now, the problem comes when this straight line tries to be vertical, that is parallel to x-axis.
If the straight line is parallel to x-axis, then the slope of the straight line that is the value of m
tends to be ∞. And in this formulation we cannot tackle the value of m which becomes very
large or which tends to become ∞. So, what should we do to solve this problem? So, to solve
this problem instead of considering the slope intercept form, what we can do is we can make
use of the normal representation of a straight line.
1012
(Refer Slide Time: 38:52)
So, the normal representation of the straight line is given by this, the formula is
ρ = x cosθ + ysinθ and what do we get in case of this normal representation? The line that we
get is something like this. So, here again, I have this straight line in the x-y plane but instead
of taking the slope intercept form, where the equation was given by y = mx + c , I take the
normal representation, where the equation of the straight line is given by ρ = x cosθ + ysinθ .
What is ρ? ρ is the length of the perpendicular, it is the length of the perpendicular dropped
on the straight line drawn from the origin of the x-y plane and θ is the angle made by this
perpendicular with the x-axis. So, you find that the parameters of the straight line which is
defined in this normal form, ρ = x cosθ + ysinθ . The parameters are ρ, which is the length of
the perpendicular drawn from the origin to the given straight line and θ, which is the angle
formed by this perpendicular with the x-axis.
So, unlike in the previous case where the parameters are the slope m and c now my
parameters become ρ and θ, and when I have these two parameters ρ and θ then the situation
is quite manageable that is I do not have the situation of leading to a parameter which can
take an infinite value. So, we find that in this particular case what can be the maximum value
of ρ and the maximum value of θ.
1013
(Refer Slide Time: 41:23)
We consider the value of θ to be ranging in the range of ±90o and the value of ρ that is the
length of the perpendicular to the straight line from the origin to be ρ = M 2 +N 2 , where
MxN is the image size. And this is quite obvious, because if I have an image of dimension
say Mx N say here the image dimension is there are M number of rows and say N number of
columns and this is the origin of the image plane. And in this you can find that I cannot draw
any straight line for which the value of θ will be beyond this range ±90o and the value of ρ
this should be ρ = M 2 +N 2 . But what is the difference of our earlier formulation with this
formulation.
1014
Now, in this particular case our equation of the straight line is taken as ρ = x cosθ + ysinθ .
Whereas in our earlier case our equation was y is equal to mx plus c. So, here you find that in
this particular case given a single points say (x1, y1) in the spatial domain, in the parameter
domain, in the mc-plane the corresponding equation becomes c = - mx1 + y1 , which is again
and y1 these two are constants for a given point (x1, y1) in the parameter space, whereas the ρ
and θ they are variables.
So, a particular point in the spatial domain is now mapped to a sinusoidal curve in the
parameter domain or in the ρ-θ space. However, if I have Q number of collinear points in the
x-y plane which will be mapped to Q number of straight lines in the mc-plane, but all those
straight lines will pass through a single point. In this case, the same Q number of collinear
points in the x-y plane will be mapped to Q number of sinusoidal curve in the ρ-θ plane but
all those sinusoidal curves will intersect at a particular point, at the single point, which gives
us the values of ρ and θ, which are the parameters for the straight line on which all those Q
number of collinear points lie.
Now, this is the only difference between the mc-plane and the ρ-θ plane. Apart from this the
formulation is exactly same as before. So, for computation again here what we have to do is,
we have to define the ρ-θ space into a number of accumulator cells. So, the accumulator cells
are given like this as in this figure. So, here again any (i, j)th accumulator cell, an accumulator
cell say this (i, j)th accumulator cell which will have an accumulator value of say A(i, j)
corresponds to our parameters θi and ρj.
So, again as per as we have done in a previous formulation that for a given point as we have
said seen that our equation becomes ρ = x cosθ + ysinθ , and for a given point say (xk, yk) in
the spatial domain, our equation becomes ρ = x k cosθ + yk sinθ . What we do is, we allow the
value of this variable θ to assume any of these allowed values as given in this accumulator
cell.
1015
So, the θ can assume any of these allowed values between this given maximum and minimum
and we solve for the corresponding value of the ρ. And because the solution of ρ that you get
may not be one of the allowed values. So, what we have to do is, we have to round off the
value of ρ to one of the nearest allowed values in our ρ axis.
So, again as before that at the end of the process if an accumulator cell say A(i, j) contains a
value equal to Q. This means that, there are Q number of collinear points in the spatial
domain, which in the spatial domain lying on the straight line, which satisfy the equation
ρ j = xcosθi + ysinθ i . So, again as before depending upon the number of points, putting a
threshold on the number of points which we consider to be a significant point or not, I can
determine how many straight lines in the given boundary image, I will extract which will give
me a meaningful boundary description.
1016
(Refer Slide Time: 50:02)
Now, let see by applying this technique, what kind of result we can get here. Here what we
have shown is as we have said that every point in the spatial domain is mapped to a
sinusoidal curve in the parameter domain, in the ρ-θ plane. So, you find that this point 1, this
point 1 over here has been mapped to a straight line. The point 2 has been mapped to a
sinusoidal curve as has been given by this, the point 3 again has been mapped to this
particular sinusoidal curve, 4 has been mapped to this particular sinusoidal curve and 5 has
been mapped to this particular sinusoidal curve. And now we find that if I want find out the
equation of the straight line passing through say 2, 3 and 4, these three points. Then you find
the 2, 3 and 4 these three sinusoidal curves are meeting at this particular point in the ρ-θ
plane.
So, the corresponding cell will have a value equal to 3, indicating that there are three points
lying on the straight line, which satisfy this particular value of θ and this particular value of ρ.
So, from here I can get the parameters of the straight line on which these three points 2,3,4
they are lying and same is true for other cases as well, for example, 1, 3 and 5, So, we find
that this is curve for 1, this is the curve for 3 and this is the curve for 5 and all of them are
meeting at this particular point.
1017
(Refer Slide Time: 51:53)
So, here whatever the value of θ and ρ, I get, that is the parameter of the straight line passing
through these points 1, 3 and 5. So, by applying this you find that in one of our previous
classes we had shown this image, and after edge detection operation what we get is the edge
points as given on this right hand side. And now, if I apply the Hough transformation and try
to detect the four most significant straight lines then by applying Hough transformation, I
find these four straight lines which are most significant and the boundary which is specified
by these four straight lines.
So, I can always find out that what are these vertex locations and this is a rectangular region
which is actually the boundary of this particular object region. So, with this we come to the
end of our today’s discussion that is the global edge linking operation where the global edge
linking is done by using Hough transformation and as we have said that this Hough
transformation is nothing but a process of mapping from the spatial domain to the parameter
space. Thank you.
1018
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-58.
Region Based Segmentation Operations. Thresholding Techniques.
Hello, welcome to the video lecture series on digital image processing. For last few lectures,
we were discussing about image segmentation operations and image analysis operations. So,
in our last lecture we have talked about the discontinuity based image segmentation, we have
seen earlier or we have discussed earlier that are mainly two approaches of image segmentation.
One is the discontinuity based segmentation and the other one is similarity based image
segmentation.
For last two classes, we have talked about the discontinuity based image segmentation and in
discontinuity based image segmentation we have seen, that the segmentation is done using the
characteristics of variation of intensity values when there is a variation of intensity from say
background to a foreground object. So, under this we have seen various point and the line and
edge detection operations which are used in this segmentation process.
Here the basic purpose was that an object is to be described by its boundary or its enclosing
boundary which are to be obtained using one of this discontinuity based operations. And we
have discussed that though we want that the object boundary should be continuous or it should
have a complete definition but because of noise or may be because of non-uniform illumination
1019
after performing these different edge detection operations the edge points that we get they are
not normally continuous.
So, to take care of this problem after this edge detection operation, the edge points that we get
they are to be linked. So, for that we have discussed about two different approaches. One is
local linking operation, where the edge points in the neighborhood are linked together if we
find that those two edge points are similar in nature, and for that as similarity criteria we have
taken the strength of the gradient operator or strength of the edge operator as well as the
direction of the edge at those points.
So, if we find that within a neighborhood two edge points have the similar edge strength and
also they have similar edge direction in that case those two points are linked together to be part
of the same edge. Now, here again the problem is that if the points are not in the small
neighborhood which is defined but the points are at a larger distance, in that case this local
edge linking operation does not help.
So, in such cases, what we have to go for is the global edge linking operation. So, we have
discussed a technique that is Hough transform. So, using Hough transform we have been able
to link the distant edge points and this is an operation called global edge linking operation or it
is the global processing technique.
Now, today we will start our discussion on the other type of segmentation, which is the
similarity based segmentation. So, under similarity based segmentation there are mainly three
1020
approaches. One is called thresholding technique, the second approach is region growing
technique and the third approach is region splitting and merging technique.
Under thresholding technique again, we have four different types of thresholding, one is called
global threshold, the other type of thresholding is called dynamic or adaptive thresholding,
there is something called optimal thresholding and there is also a thresholding operation which
is called local thresholding. So, we will discuss about these different region based segmentation
operations either thresholding or region growing and the region splitting and merging
techniques one after another.
Now, let us first start our discussion with the thresholding technique. So, first we will discuss
about the thresholding technique for segmentation. Now, thresholding is one of the simplest
approach of segmentation. Suppose, we have an image and as we have said earlier that image
is described by or represented by a 2 dimensional function f(x, y) and let us assume that this
image contains dark object against a light background.
so, in such cases if there is a dark object against a light background or even if it is the reverse
that we have a light object against a darker background then you will find that the intensity
values they are mainly concentrated near two regions or we call them two modes. One of the
regions will be towards the darker side or towards the lower intensity values and other one
other mode will be towards the brighter side or towards the higher intensity values.
1021
So, if you plot the histogram of such an image, so here we assuming that we have one object
and let us assume that the object is brighter and the background, the background is dark. So, if
we plot the histogram of such an image the histogram will appear something like this. So, on
this side we put say intensity value z and this side say is our histogram of z. So, as we said that
because we having one object and we are assuming that the object is bright which is placed
against dark background.
So, the intensity values will try to be accumulated will, the histogram will give rise to a bimodal
histogram, where the intensities will be concentrated on dark side as well as on the brighter
side. So, for such a bimodal histogram you find that there are two peaks. One peak here and
the other peak here and these two modes or these two peaks are separated by a deep valley. So,
this is the valley and this is one peak and this is the other peak and as we have assumed that
our object is bright and the background is dark, so all these pixels which are grouped in the
lower intensity region. These pixels belong to background and the other group of pixels they
belong to the object.
Now, the simplest form of segmentation is if we can choose a threshold value say T in this
valley region and we take a decision that if a pixel at location (x, y) or have the intensity value
f x, y > T , then we say this pixel belongs to object, whereas if f x, y T then this pixel
belongs to the background. So, this is our simple decision rule which is to be used for
thresholding purpose. So, what we have to do is we have to choose a threshold in the valley
region and then check the image.
The segmentation is simply testing each and every pixel to check whether it is intensity value
less than the threshold or the intensity value is greater than the threshold. So, if the intensity
value is greater than the threshold then we will say that it belongs to, the pixel belongs to an
object, whereas if the intensity value is less than or equal to threshold we say that the pixel
belongs to the background.
1022
(Refer Slide Time: 10:37)
Now, the situation can be even more general that is instead of having bimodal histogram we
can have multimodal histograms, that is a histogram can even be of this form like this. So, this
is our pixel intensity z and of this side is the histogram. So, here you find that the histogram
has three different modes which are separated by two different valleys. So, now what we can
do is we can choose one threshold say T1 in the first valley region and the other threshold T2 in
say second valley region.
So, what this histogram indicates is that there are three different regions or three different
intensity regions, which are separated by some other intensity band. Okay and those three
different intensity regions are represented or gives rise to these three different peaks in the
histogram.
So, here our decision rule can be something like this that if we find that the intensity value f(x,
y) at a pixel location (x, y) is greater than threshold T2, then we say that the point (x, y) belongs
to say objects O2. So, all the intensity values, all the pixels having intensity values greater than
T2, these pixels we will say that they belong to the object O2. In the other case if a pixel has an
intensity value in this region that is greater than T1 and less than T2 then we will say that this
particular pixel belongs to object O1.
So, our decision rule will be that T1 < f x, y T2 , then this indicates that the corresponding
pixel (x, y) it belongs to object O1 and obviously, the third condition will be that if f(x, y), the
1023
intensity value at a location (x, y) is less then threshold T1 in that case we say that the
corresponding pixel (x, y) it belongs to the background.
So, even in cases we can have histograms which are even, which will have even more number
of peaks, more than three peaks. Such cases also similar such classification is possible but what
we have to do for this thresholding based segmentation technique is that we have to choose
proper threshold values.
Now, this threshold value or the thresholding operation can be considered as an operation that
involves testing against a function T, where this function T is of the form
T = T x, y , p x, y , f x, y . So, this thresholding operation what we are doing is we are
considering or this can be viewed as an operation to test the image pixels against a function T
where this function T is of this form, this function T is a function of (x, y), which is nothing
but the pixel location in the image, f(x, y), which is nothing but the intensity value at location
(x, y).
So, this is pixel intensity at location (x, y) and p(x, y) it is some local neighborhood property,
some local property in a neighborhood centered at (x, y). So, in general this threshold T is a
function, can be a function of pixel location the pixel value as well as the local property within
a neighborhood around a pixel location (x, y). So, any combination of these three that is pixel
location, pixel value and neighbor property, neighborhood property. This neighborhood
property can even be the average intensity values within a neighborhood around pixel (x, y).
1024
So, any combination of this T can be a function of any combination of these three terms and
depending upon this combination this T can be either a global threshold or a local threshold or
it can even be an adaptive threshold.
So, in case the T is, the threshold T is only a function of f(x, y) we say that the threshold is a
global threshold, whereas if T is a function of f(x, y) and the local property that is p(x, y) then
we say that the threshold T is a local threshold. And if in the addition to all these T is also a
function of the location of the pixel that is in the more general case if T is the function of (x,
y), f(x, y) as well as p(x, y) then we say that this threshold T is an adaptive or dynamic
threshold.
Now, whichever the nature of the threshold of the T is, whether it is local or global or adaptive,
our thresholding operation is, by using this threshold we want to create a thresholded image
say g(x, y) from our input image f(x, y) and we set the value of g x, y = 1 , if the
corresponding function or the intensity of the image at that location, that is f x, y > T .
Now, this threshold T can be either global or local or adaptive and we set g x, y = 0 , if
f x, y T . So, you find that the basic aim of this thresholding operation is we want to create
a thresholded image g(x, y), which will be a binary image containing pixel values either 0 or 1
and this value will be set to 0,1 depending up on whether the intensity f(x, y) at location (x, y)
is greater than T or it is less than or equal to T.
1025
so, if we have a bright object against a dark background, in that case g x, y = 1 , this indicates
that the corresponding pixel is an object pixel, whereas g x, y = 0 this will indicate the
corresponding pixel is a background pixel. On the contrary if we have dark objects against
bright background in that case, what we will do is we will set g x, y = 1 if f x, y T ,
again indicating that in the thresholded image, a pixel location having an intensity value of 1,
that indicates the corresponding pixel belongs to the object. And in such case we will put
g x, y = 0 , if f x, y > T , again indicating that the pixel in the thresholded image
Now, the question is how to choose this threshold value? Now, for that let us come to the case
again considering the histogram, we have said that if my histogram is a bimodal histogram of
this form then what I can do is by looking at the histogram. So, this is our intensity value z and
on this we have h(z). By inspecting this histogram we can choose a threshold in this deep valley
region and using this threshold I can go for the segmentation operation.
1026
(Refer Slide Time: 21:47)
Now, by doing this, I will show you one particular result say, for example, in this particular
case. Here, you find that we have an image where the objects are dark whereas the background
is bright. So, naturally in this case I will have a histogram, where the histogram will be a
bimodal histogram. So, the nature of the histogram will be like this. So, this will be a bimodal
histogram.
So, here if I choose a threshold T in this region and using this threshold, I segment this image
then the then the kind of segmentation that we get is as given here. So, here you find in this
second image in the segmented image that your background and object regions have been
1027
clearly separated, even the shadow which is present in the original image that has been removed
in the segmented image.
So, this segmentation though it is a very simple operation if you choose the threshold in the
valley region between the two modes in a bimodal histogram then this segmentation, this
simple segmentation operation can clearly take out the object regions from the background.
But here what we have done is we have chosen the histogram to choose the threshold that is
you inspect the histogram and then from inspection of the histogram you have to choose the
threshold value. But is it possible to automate this process? That is instead of finding the
histogram by, instead of finding the threshold value by looking at the histogram, can we
automatically determine, what is the threshold value which should be used for segmenting an
image?
So, this operation can be done by using an iterative procedure. So automatic threshold. So, here
again, for detecting this threshold automatically what we can do is, we can first choose an initial
value of threshold. So, arbitrarily or by or somehow we can choose an initial value of threshold
and using this initial value of threshold what we can do is, we can have a segmentation of the
image.
So, when we segment the image using this initial value of threshold the segmentation operation
basically will partition your histogram into two partitions or the image will be divided into two
groups of pixels. So, we can say that one group of pixels we termed them as group G1 and the
other group of pixels we term them as group G2. So, the pixel intensity values in group G1 will
1028
be similar and the pixel intensity values in group G2 will also be similar but these two groups
will be different.
Now, once I separate a partition the image intensities into these groups G1 and G2 the next step
that what we will do is, you compute the means or the average intensity values μ1 for group G1
and the average intensity value μ2 for group of pixels G2. So, once I get this μ1 and μ2 that is
the average intensity value in the group of pixels G1 and also the average intensity value for
the groups of pixels G2, then in the fourth step what I do is, I choose a new threshold T, which
μ1 + μ 2
is T = . And after doing this you go back to step to two and perform the operation
2
thresholding operation once again.
So, what we are doing is, we are choosing an initial value of threshold, using that initial value
of threshold we are thresholding the image. By thresholding what we are doing is, we are
separating intensity values into two groups G1 and G2. For group G1, I find out the average
intensity value μ1 for group G2 I also find the average intensity value G2, then I find out a new
μ1 + μ 2
threshold which is the mean of these two averages that is T = and using this new
2
threshold I threshold the image again.
So, thereby these groups G1 and G2 will be modified and I repeat this process that is
thresholding to grouping then finding out the intensity averages in the two different groups,
two separate group. Recalculating the threshold this entire process will be repeated until and
unless I find that the variation in two successive iterations in the computed value of T is less
than some pre specified value.
So, this operation has to continue until you find that in a one at iteration Ti and the next iteration
Ti + 1, the threshold value in the ith iteration, Ti and in the (i+1)th iteration, Ti+1. The difference
between these two is less than or equal to some pre specified value say T’. So, when I attempt
this condition, I stop my thresholding operation. So, here you find that we do not have to go to
the histogram to choose the threshold rather what we do is we choose some initial value of
threshold.
Then go on modifying this threshold value iteratively finally you converge, you come to a
situation where you find that in two subsequent iterations the value of the threshold does not
1029
change much and at that position, whatever the thresholded image that you have got that is
your final thresholded value.
So, using this kind of adaptive threshold the kind of result that can be obtained is something
like this. So, here you find that this is one input image and you can identify this, that this is
fingerprint image, this is the histogram of that particular image. So, obviously in this from this
histogram also I can choose a threshold here. But this thresholded output that has been obtained
is not by choosing a threshold from the histogram but this is by automatic threshold selection
process that is by doing this iterative process.
And it can be observed that from this histogram, whatever threshold you chose by this
automatic process the threshold will be similar to that, and here you find that since the threshold
that you have chosen this does not consider the pixel location or the local neighborhood of the
pixel intensity values, here the threshold is a global one, that is for the entire image you choose
one particular threshold and using that threshold you go for segmenting the image.
So, the kind of thresholding operation that we have done in this particular case this is called a
global thresholding operation. Now, you find that in this particular case this global thresholding
will give you very good result if the intensity or the illumination of the scene is uniform. But
there may be cases where the scene illumination is non uniform and in case of such non-uniform
illumination getting a global threshold which will be applicable over the entire image is very,
very difficult.
1030
(Refer Slide Time: 31:11)
So, let us take one particular examples in this particular case. On the top we have an image and
you can easily find out that for this image if I plot the histogram. The histogram will be as
shown on the right hand side. Clearly, this histogram is a bimodal histogram and there is a
valley in between the two modes. So, these two modes a separated by a deep valley. So,
obviously for such a kind of histogram I can always choose a threshold inside the valley and
segment this image successfully.
But what happens if the illumination is not proper? If the background illumination is not
uniform, then this image because of this non-uniform illumination may turn out to be an image
like this. And whenever I have such an image with poor illumination and you find that the
histogram of this image appears as given on the right hand side, and here you find that though
the histogram appears to be a bimodal one but the valley is not well defined.
So, this simple kind of thresholding operation or the global thresholding operation is likely to
fail in this particular case. So, what should we do for segmenting this kind of images using a
thresholding operation? Now, one approach is you sub-divide this image into a number of
smaller sub images, assuming that in each of this sub image the intensity will be more or less
uniform or the illumination is more or less uniform then for each of the sub image we can find
out a threshold value. And using this threshold value you can threshold the sub images and then
the combination of all of them or the union of all of them will give you the final thresholded
output.
1031
(Refer Slide Time: 33:09)
So, let us see what you get in this case. As we said that for this kind of images where the
illumination is non-uniform if I apply a single global threshold then the kind of output, the
thresholded output that we are going to get is something like this. So, here you find that the
thresholding has failed miserably. Whereas if I sub divide this image into a number of sub
images as given on this left hand bottom and then for each of these sub images I identify the
threshold and using that thresholding you go for segmenting that particular sub image and the
thresholded output that you get is given on this right hand side.
Here, you find that excepting these two rest of the sub images have been thresholded property.
So, at least your result is better than what you get with a global threshold operation. So, now
because we are going for a threshold selection of a threshold which is position dependent
because every sub image has a particular position. So, now because this threshold selection is
position dependent it becomes an adaptive thresholding operation. Now, let us try to analyze
that why this adaptive threshold has not been successful for these two sub regions.
1032
(Refer Slide Time: 34:43)
So, if I look at the nature of the image, here if you look at this top image you find that in this
top image here is a boundary. Okay where this small portion belongs to the background and
this large portion of the image belongs to the object. Now, if I plot the histogram of this the
histogram will be something like this, because the number of pixels in the background is very
small so the contribution of those pixels to the histogram that is within this region is almost
negligible. So, instead of becoming a bimodal histogram the histogram is dominated by a single
peak, and that is the reason why this threshold operation has not. has not given good result for
this particular sub region.
So, how to solve this problem? Again our solution approach is same you sub divide this image
into smaller sub division. So, you go for sub dividing further and for each of these smaller sub
divisions now you try to find out the threshold and segment each of the sub divisions with each
of the sub sub divisions using this particular threshold. So, if I do that you find that the kind of
result that we get is here and here in the segmentation output is quite satisfactory.
So, if the scene illumination is non-uniform then a global threshold is not going to give us a
good result. So, what we have to do is? We have to subdivide the image into a number of sub
regions and find out the threshold value for each of the sub regions and segment that sub regions
using this estimated threshold value. And here because your threshold value is position
dependent, it depends up on the location of the sub region, so the kind of thresholding that we
are applying in this case is an adaptive thresholding.
1033
Now, in all these thresholding whether it is global thresholding or adaptive thresholding that
we have discussed so far, none of this cases we have talked about the accuracy of the
thresholding or how accurate or what is the error that has been involved that is, by this
thresholding process.
So, we can go for a kind of thresholding by making use of some stastical property of the image,
where the mean error of the thresholding operation will be minimum. So, that is a kind of
thresholding operation which is called optimal thresholding. So, what is this optimal
thresholding? Again let us assume that the image contains two principle gray levels, intensity
regions.
One intensity region corresponding to the object and the other intensity region corresponding
to the background. And we use a variable and we assume that these intensity variables can be
modeled as a random variable and this random variable is represented by a variable say z.
Now, once we represent the random variable by this z, then the histogram of this particular
image or the normalized histogram can be viewed as a probability density function of this
random variable z. So, the normalized histogram can be viewed as a probability density
function p(z) of this random variable z. Now, as we have assumed that the image contains two
major intensity regions, two dominants intensity values.
So, our histogram is likely to be a bimodal histogram. So, the kind of histogram that we will
get for this image is a bimodal histogram. So, it will be something like this or, as we said that
1034
threshold histogram we are assuming to be, to be a density function of the intensity variable z.
So, this bimodal histogram can be considered as a combination of two probability density
functions or combination of two pdfs. Okay. So, one of them is say probability distribution
function p1(z), the other one is probability density function say p2(z).
So, p1(z) indicates the probability distribution function, the probability density function of the
intensities of pixels which belong to say background and p2(z) is the probability density
function of the pixel intensity values which belong to say object. Now, this overall histogram
that is p(z) can now be represented as the combination of p1(z) and p2(z). So, this overall p(z)
we can write as p(z) = P1.p1 z + P2 .p 2 z , where this P1 indicates the probability that a pixel
will belong to the background and P2 indicates that, indicates the probability that a pixel
belongs to an object.
So, obviously this P1 + P2 =1 . So, these are the pixel probabilities which belong to either
foreground or the background. So, here our assumption is that we have a bright pixel against a
dark background because we are saying that the P1 it is the probability that a pixel belongs to
the background and P2 is the probability that a pixel belongs to the foreground or the object.
Now, what is our aim in this particular case? Our aim is that we want to determine a threshold
T, which will minimize the average segmentation error. Now, we find that since this overall
probability is modeled as a combination of two different probabilities, so it is something like
this. I can say that I have one probability distribution function which is given by this and the
1035
other probability distribution function is say given by this, so that my overall probability
distribution function is of this form.
This is my overall probability distribution function. So this blue color this indicates p2(z) and
the pink color this indicates p1(z) and the yellow color indicates my overall probability density
function that is p(z). Okay. So, in this particular case if I choose a threshold T somewhere here.
So, this is my threshold T and I say that if f x, y > T , then (x, y) belongs to object. Okay.
Now, here you find that though we are taking a hard decision that if f x, y > T , then (x, y)
belongs to object but the pixel with intensity value f(x, y) also has a finite probability, say given
by this that it may belong to the background. So, while taking this decision we are incorporating
some error. The error is the area given by this probability curve for the region intensity value
greater than T.
So, the probability of considering a background point as an object point or the error, leads to
an error. Okay, that is a background point may be classified as an object point. So, the error
that you encounter in that particular case is given by say E1(t) because this error is threshold T
T
dependent. So, write this as E1 t p z dz ,. So, what is this? This is the probability that,
2
this is the error incorporated that an object pixel may be classified as a background pixel.
Similarly, if a background pixel is classified as an object pixel then the corresponding error
will be given by E 2 t p1 z dz . So, this gives you the two error values one of them gives
T
the error that you encounter if you classify a background pixel as an object pixel and the other
one if you segment object pixel as a background pixel.
1036
(Refer Slide Time: 46:01)
So, from these two error expressions, the overall error probability can now be represented as
E t = P2 . E1 t + P1. E 2 t . So, you find that this E2(t) was the probability, it was the error
of classifying a background pixel as a foreground pixel and E1(t) was the error of classifying
an object pixel as a background pixel, and P1 is the probability that a pixel belongs to
background and P2 is the probability that a pixel belongs to the object. So, the overall
probability of error will be given by this expression E t = P2 . E1 t + P1. E 2 t .
E
Now, for minimization of this error what we have to do is we have to take the derivative
T
and equate this to 0. So, whatever is the value of the T that you get, that what is going to give
you minimum error. So, if I put this restriction then this above expression we are not going to
the details of mathematical derivation, I will just give you the final result this can be given by
P1. p1 t = P2 .p 2 (t) .
So, we are going to get an expression of this form. And the solution of this equation gives the
values of t. So, if we try to solve this, you find that what I need is the knowledge of this
probability density functions p1(t) and p2(t). So, as we know that in most of the cases we
normally assume the gaussian probability density function. So, if I assume that gaussian
probability density function, in that case the overall probability p(z) is represented by
(z-μ1 )2 (z-μ 2 )2
P1 - 2σ12 P2 -
2σ 22
p(z)= e + e .
2πσ1 2πσ 2
1037
Where μ1 is the average intensity value of the background region and μ2 is the average intensity
value of the object region and σ1 and σ2, they are the standard deviations of the intensity values
in the background region and the intensity values in the object region.
So, by assuming this Gaussian probability density function, we get the overall probability
density function as given by this expression. And by assuming this and then from this particular
expression the value of T can now be found out as the solution for T is given by, solution of
this particular equation AT 2 + BT + C = 0 , where this A = σ12 - σ 22 , B = 2(μ1σ 22 - μ 2 σ12 ) and
σ P
C =σ12μ 2 - σ 22μ1 +2σ12 σ 22 ln 2 1 .
σ1P2
And here if we assume that σ12 = σ 22 is equal to say σ 2 , then the value of the threshold T comes
μ1 + μ 2 σ2 P
out to be T= + ln 2 . So, this is a simple expression for the value of threshold
2 μ1 - μ 2 P1
that you can obtain in this optimal thresholding operation. And this is optimal in the sense that
these value of the threshold gives you minimum average error. And here again you find that if
the probability the P1 and P2 they are same, in that case the value of T, T will be simply
μ1 + μ 2
T= , that gives the mean of the average intensities of the foreground region and the
2
background region. Okay.
1038
So, as we said that by estimating a threshold by this process we, if we segment the image then
the average error of segmentation will be minimum, that is minimum number of background
pixels will be classified as object pixels and minimum number of object pixels will be classified
as background pixels. Now, let us see an example that why this optimal thresholding can give
us good results.
Let us take a very complicated case like this. This is the cardiogram angiography which, in
which the purpose is to detect the ventricle boundaries. Okay. You find that the image that is
given here is very, very complex and though we can somehow figure out that there is a
boundary somewhere here but it is not very clear. So, the approach that was taken is this image
was divided into a number of sub images, for every sub image the threshold was estimated, the
optimal threshold was estimated, and then the thresholding was done.
So, for this optimal thresholding what was done is, for each of the sub image say, for example,
this was divided into a number of sub images like this. For each of the sub image what was
computed is the histogram and the threshold was computed for those sub images which shows
a bimodal histogram like this. Whereas you find that if I take a sub image here this normally
shows a unilevel histogram which is given here.
This for, these sub images no threshold was detected the threshold was detected only for those
sub images which showed bimodal histogram. And the threshold for other sub images where
estimated by interpolation of the threshold of the regions having bimodal histogram. And then
a second level of interpolation was done, iteration was done to estimate the threshold value at
1039
each of the pixel locations and after doing that for each pixel location using that particular
threshold the decision was taken whether the corresponding value should be equal to zero or
the in the thresholded image the corresponding value should be equal to 1.
So, using this the thresholded image was obtained and the boundary of such thresholded image
when superimposed on this particular image you find that this one shows the boundary of the
thresholded image. So, as was estimated that this was the estimated boundary the boundary
points are quite well estimated in this particular case. So, with this we stop this particular lecture
on thresholding operations. Thank you.
1040
Digital Image Processing.
Professor P. K. Biswas.
Department of Electronics and Electrical Communication Engineering.
Indian Institute of Technology, Kharagpur.
Lecture-59.
Region Splitting and Merging Technique.
Hello, welcome to the video lecture series on digital image processing. We are discussing
about the image segmentation operations, that in similarity based image segmentation
operation there are mainly three approaches, one of them is the thresholding based technique
where you can go for either global thresholding or dynamic or adaptive thresholding or
optimal thresholding or local thresholding.
So, in our last class we have discussed about the global thresholding operation, we have also
discussed about the dynamic or adaptive thresholding operation and we have also discussed
about the optimal thresholding operation. And we have seen that in case of global
thresholding, a threshold value is selected, where the threshold value depends only on the
pixel intensities in the image, whereas in case of dynamic or adaptive thresholding it not only
depends upon the pixel values or the intensity values or the pixels in the image. It also
depends upon the position of the pixel in the image.
So, the threshold for different pixels in the image will be different. In case of optimal
thresholding, we have tried to find out threshold by assuming that the histogram of the image
is a representative of the probability density function of the pixel intensity values. So, there if
you have a bimodal histogram, the bimodal histogram is considered as a combination of 2
1041
probability density functions, and from the probability density functions, we have tried to
estimate that what is the error incurred by performing the threshold operation, when a pixel is
decided to belong to an object or the pixel is decided to belong to the background.
So, because of the probability distribution function of different intensity values, it is possible
that a pixel which actually belongs to the background, may be decided to belong to an object
or a pixel which actually belongs to an object after thresholding, it may be classified to
belong to a background. Now, because of this there is an amount of error which is
incorporated by this thresholding operation.
So, in case of optimal threshold what we have done is? We have tried to estimate that how
much is the error incorporated if we choose a particular threshold, then you choose that value
of the threshold, where by which your average error will be minimized. There is another kind
of thresholding operation, which is the local thresholding operation that will be discussing
today and we have said that local thresholding operation takes care of the neighborhood
property or the pixel intensity values in the neighborhood of a particular location (x, y).
We will also discuss about the other two operations, other two segmentations, similarity
based segmentation operations that is region growing technique and region splitting and
merging techniques.
So, our today’s discussion will be concentrated on local threshold operations, where we will
consider in addition to the pixel value, the intensity value, its location, we will also consider
1042
the local neighborhood property and the other two similarity based segmentation techniques
that is region growing technique and region splitting and merging technique.
So, first of all let us concentrate the local thresholding operation it is now clear that selection
of a good threshold value is a very simple, if the histogram of the particular image is a
bimodal histogram, where the modes are tall, they are narrow and separated by a deep valley
and in addition the modes are symmetric. That means if we have a histogram of like this, so
on this side we put the pixel intensity values and on this side we put the histogram.
So, if an histogram is of this form, then we can very easily choose a threshold within this
valley region. These are the two histogram modes or two histogram peaks which are
separated widely by value, by a valley and within this valley region we can choose a
threshold. And by using this threshold we can segment the image properly but what happens
in most of the cases is that the histogram is not so clear.
It is not so clearly bimodal and this threshold selection is also becoming easy if the histogram
is symmetric, that means the area occupied by the object and the area occupied by the
background pixels they are more or less same, the problem that occurs that if I have an image
like this.
So, I have an image and within this image very small number of pixels actually belong to the
object and a large number of pixels belong to the background. And when I have an image like
this, the resulting histogram will be something like this. So, this may be the object pixels and
1043
the background pixels give rise to a histogram of this form. And here you find that the
contribution to the histogram by the object pixels is almost negligible because the number of
pixels belonging to the object is very small compared to the number of pixels belonging to
the background.
So, the bimodal of nature of the histogram is not very visible rather the histogram is
dominated by a single mode by the pixels which belong to the background. Now, how to
solve this problem?
So, this problem can be solved if instead of considering all the pixels in the image to produce
the histogram, if somehow we can identify the pixels which are either on the boundary or
near the boundary between the object and the background, in a sense what we are trying to do
is that given an image with an object inside. Okay. What we are trying to do is? We are trying
to identify the pixels in a very small strip in a narrow strip around this boundary.
So, if we consider only these pixels around the boundary to form the histogram, the
advantage in this case is, since we are considering only these pixels near the boundary to
form the histogram the histogram is will be symmetric. That is the area of the pixels within
the object region and the area of the pixels and the number of pixels within the background
region which are being considered to form the histogram, these two number of pixels
belonging to the object and the number of pixels belonging to the background, they will be
more or less same almost same.
1044
So, our histogram will be symmetric and it will not be dependent upon the relative size of the
object and the background. And the second object is the advantage is, the probability of a
pixel belonging to the object and the probability of a pixel belonging to the background
within this narrow strip they are almost equal. Because if I consider the entire image then the
probability and in the image the object region is a very small region then the probability of a
pixel belonging to object is small compared to the probability of the pixel belonging to the
background, whereas if I consider the pixels within a narrow strip around the object boundary
in that case the probability of the pixels belonging to the background and the probability of
the pixel is belonging to the object they are almost same.
So, by considering only those pixels around this narrow strip I get two advantages. One is the
pixel belonging, the probability of pixel belonging to the background and the probability of
the pixel belonging to the object they are nearly equal, and at the same time the area of the
foreground region of the object region and the area of the background region which is used
for computation of the histogram that is also nearly same making your histogram a
symmetrical histogram. And once I have this kind of histogram then the thresholding
operation is very, very simple.
Now, the question is if I simply use this kind of approach in that case I have to know that
what is the object boundary or what is the boundary between the object region and the
background region. But which is not easily obtained because the basic purpose is
segmentation. Basic purpose of segmentation is that we are trying to find out the boundary
between the object and the background.
So, this simple approach as it has been presented that we want to consider pixels lying on the
boundary or the pixels around the boundary this cannot be used in this simple form, because
the boundary itself is not known, that is the one that we are trying to determine. Then what is
the solution? So, what is the solution? How do we solve this particular problem?
1045
(Refer Slide Time: 12:21)
Now, the solution is, that if we use the image gradient and the Laplacian, image Laplacian.
We know that if I have a region something like this. So, I plot the variation of intensity
values. So, this is the pattern of intensity values in an image. So, obviously we are putting it
in one dimension the two dimension is now mapped to one dimension. So, this is my pixel
location say x and this is the f(x).
So, this the variation of intensity along the x direction, if I take the gradient of this and as you
know that the gradient is first order derivative operation. So, if I compute the gradient of this,
the gradient will appear something like this. Okay. So, again this is my x direction and on this
f
side what I am putting is , it is the gradient. And also if I take the Laplacian which you
x
know is the second order derivative operator, the Laplacian will appear in this form.
So, this is the second order derivative again on this direction we are putting x on this
2f
direction we are putting . So, this is f(x), this is gradient and this is Laplacian. So, we
x 2
have seen earlier that an estimate of the edge points can be obtained from the gradient
operator and from the Laplacian operation and we have discussed earlier that the Laplacian
operator is affected to a large extent by the presence of noise.
So, the output of the Laplacian operator is not directly used for edge detection purpose but it
is used to provide secondary information. So, what we do is you do the gradient operator
output to determine the position of the edge points and the output of the Laplacian operator is
1046
used to determine whether a point is lying on the darker side of the edge point or it is lying on
the brighter side of the edge point.
So, as has been shown here that coming to this intensity distribution, you find that this is the
bright side and this is the dark side and if I compare this Laplacian you find that on the bright
side of the edge, the Laplacian becomes negative whereas on the dark side of the edge the
Laplacian becomes positive.
So, by making use of this information, we can say that whether a point is lying on the dark
side of the edge or it is lying on the bright side of the edge. Okay. So, our approach is though
we have said that we want to consider only those pixels for generation of the histogram which
are lying either on the boundary, either on the edge between the object and the background.
So, that information can be obtained by using, from the output of the gradient because for all
the pixels which are lying on the boundary or near the boundary, the gradient magnitude will
be quite high. And then to decide that out of these points which points lies on the dark side
and which points lies in the bright side we can make use of the Laplacian output, where the
Laplacian will be negative if a point is lying on the bright side of the edge and the Laplacian
will be negative if the point lies on the dark, and the Laplacian is positive if the point lies on
the dark side of the edge.
1047
And you have seen earlier that in case of an image where the image is modeled as a two
dimensional function f(x, y) the gradient of this image that is f is given by f Gx + Gy
f(x, y) f(x, y)
or f G 2x + G 2y , where this Gx is nothing but and Gy is nothing but .
x y
f(x, y) f(x, y)
So, G x = and G y = and similarly, the Laplacian of this image that is
x y
2f 2f
2f . And we have seen earlier that to implement these operations in case of
x 2 y 2
digital image we can have different types of operators, differential operators. One of the
operator can compute this f and the other operator that is 2 f ,we can compute the
Laplacian of the given image f(x, y)
So, here what we are trying to do is? We are trying to estimate whether a point is lying on the
edge or the point is within a small region near the edge and then whether the point is lying on
the dark side of the edge or it is lying on the bright side of the edge. So, if I assume that we
have an image, where we have dark object against a bright background. In that case for the
object pixels the Laplacian near the edge will be positive and for the background pixel the
Laplacian near the edge will be negative.
So, simply be making use of this property what we can do is, we can create from f(x, y) then
f and 2 f . From these three I can create an image which is say s(x, y) and we will put
1048
s x, y = 0 if f T , where it indicates that if the f , as we have said that on the edge
points or the points near the edge the gradient value will be high.
So, if the f value is the less than some threshold T, we assume that this point does not
belong to edge point does not belong to an edge or this point is not even within a region near
the edge. So, for such point we are making s x, y = 0 and we will put s x, y = + , if
f T , indicating that this is an edge point or this is the point near the edge. And at the
same time if 2 f 0 , which indicates that this point is on the dark side of the edge that
means in this particular case, since we are assuming that we have dark objects against a
background.
So, this is a point on the object side so, or it is an object point near the object and edge
boundary. And we will put s x, y = - , if it is an edge point or a point near the edge for
which again f T and the Laplacian that is 2 f < 0 . So, what we are doing is we are
creating an image s(x, y) which will have values either ‘0’ or ‘+’ or ‘-’.
Now, for implementation, what we can do is these three symbols ‘0’, ‘+’ or ‘-’ can actually
be represented be three distinct intensity values. Say, for example, ‘0’ may be represented by
‘0’, ‘+’ may be represented by an intensity value say 128 and ‘‘-’’ may be represented by an
intensity value say 255. So, three distinct intensity values will represent these three different
symbols ‘0’, ‘+’ and ‘-’, and then what we have to do is we have to process this intermediate
image s(x, y) to find out the object boundaries or the object regions.
So, here you find that in this representation if s x, y = 0 that represents the point does not
belong to the boundary, boundary between object and the background. If it is ‘+’, then the
pixel belongs to the object region, if it is ‘-’, then the pixel belongs to the background region.
So, by using this kind of processing an intermediate image that you can get will be something
like this.
1049
(Refer Slide Time: 23:33)
So, here you find that we have an image, which contains one of these three symbols either
‘0’, ‘+’, ‘-’. And here, what we have done is, this was an object, dark object against bright
background may be some handwritten characters with an underline. And this information can
be processed to find out this intermediate image can be processed to find out the object region
and the background region.
So, once I get an image of this form you find that if I scan the image either along a horizontal
direction or along a vertical direction then I am going to get a pattern of these three symbols.
Now, what will be the nature of this pattern? Say, for example, whenever there is an edge, so,
I have this image this intermediate image and I want to scan the image from along a
horizontal line from left to right.
1050
(Refer Slide Time: 24:56)
Now, while scanning this since I have assumed that I have dark objects against the bright
background. So, whenever there is a transition from the background region to the object
region then I will get a situation something like this. I will get a point having a negative level
followed by a point having a positive level.
So, a ‘-’ followed by a ‘+’ this indicates that I have a transition from background to object.
Similarly, when I am scanning, I am moving from object to the background region then the
combination of these two symbols will be just opposite. So, here because I am moving from
object region, which is dark to the background region which is bright. So, the combination of
the symbols that I will get is a ‘+’ followed by a ‘-’.
1051
(Refer Slide Time: 26:46)
So, whenever I get this kind of transition that is from a ‘+’ to a ‘-’ this indicates that I have
transition from object to background. So, by making use of this observation if I scan a
particular horizontal line or a vertical line then I get a sequence of symbols, where the
sequence of symbols will be something like this. I will put this as say *, *, * followed by a ‘-’
followed by a ‘+’ and then I will have a ‘0’ or ‘+’ followed by ‘+’ followed by ‘-’ and again,
then again a number of stars.
So, if this intermediate image I check either along a horizontal line or along a vertical line,
and if that particular scan line contains a part of the object, in that case my scan pattern will
be something like this, where this *, *, this indicates any combination of ‘0’, ‘+’ or ‘-’. Okay.
So, here you find that firstly I can get any combination of ‘0’, ‘+’ or ‘-’, and then whenever I
have a transition from the background region to the object region I will have a ‘-’ followed by
a ‘+’.
And then within the object region I can have either ‘0’ or ‘+’ symbols then when I am
moving from the object region to the background region I can have a transition from ‘+’ to ‘-’
and then again on the rest part of this scan line I can have any combination of ‘0’, ‘+’ or ‘-’
and you find that this is what is actually represented in this particular image.
When I move along any scan line say, for example, I am moving along this particular scan
line. So, if I move along this particular scan line, you will find that initially I have all ‘0’s
then I have ‘-’ symbol followed by I have ‘+’ symbol. Then within this it is either ‘0’ or ‘+’,
1052
then again I will have a transition from ‘+’ to ‘-’, then again I will have a number of ‘0’s and
this is how it continues.
So, by making use of this particular pattern I can identify that on this particular scan line
which part is, which portion of the scan line belongs to the object and which portion of the
scan line belongs to the background. So, the kind of scan lines or symbol on the scan lines
that we have obtained is like this. First I have any combination of ‘+’, ‘0’ or ‘-’, then I have ‘-
’, ‘+’. Then I have either ‘0’ or ‘+’ then I have ‘+’ followed by ‘-’ and then again I can have
any combination of ‘0’, ‘+’ or ‘-’. And here you find that this inner parenthesis.
This transition from ‘0’ to ‘+’ or from ‘+’ to ‘0’ this indicates the occurrence of edge points.
And this inner parenthesis, where I have only ‘0’ or ‘+’ symbols these actually indicates the
object region. So, for segmentation purpose what I can do is, when I scan this intermediate
image s(x, y) either along a horizontal line or along a vertical line then only this part of the
scan line which is represented by this inner parenthesis all those points I will make equal to
one and the rest of the points on this scan line I will make it equal to ‘0’. And that gives me a
segmented output where in this output image all the scan lines or the part of the object is
represented by a pixel value equal to one and all the background pixels, background regions
are represented by a pixel value equal to zero.
1053
So, if I apply this technique on an image you find what kind of result that we get. You find
that in this particular case, this is an image, the top part of it, is the scanned image of a bank
cheque. And here, you find that the signatures and other figures, they are appearing in a
background and it is not very easy to distinguish which is the object or which is the signature
or figure part and which is really the background part. And by making a use of this kind of
processing and filling all the object regions either with zero or one, we can clearly segment
out the signature part and the figure part.
And here, you find that this kind of output possibly we cannot get by making use of, any of
the global thresholding approach, but here by using this local thresholding and we call it local
thresholding, because to a find out this threshold what we have made use of is the gradient of
the image and the Laplacian of the image. And the gradient and Laplacian these are local
properties, local to a particular pixel location.
So, the kind of thresholding which is inbuilt in this kind of segmentation operation that is
what we call as local thresholding. So, with this we have discussed about the different kind of
thresholding operations. In our earlier class, we have discussed about global thresholding, we
have discussed about the dynamic or adaptive thresholding, we have discussed bout the
optimal thresholding and now, what we have discussed about is the local thresholding. Where
this local thresholding operation makes use of the image gradient and image Laplacian and as
we said that this gradient and Laplacian, these are local properties to a particular pixel
location.
1054
So, the kind of thresholding which is embedded in this application is nothing but a local
thresholding operation. Though, this segmentation operation is obtained by scanning the
intermediate image that is generated, there is no direct thresholding operation involved in it.
But the kind of operation that is embedded in this approach is nothing but what we call as a
local thresholding operation. Now, let us go to the other approaches of segmentation, we have
said there are two other approaches of similarity based segmentation operations. One of them
is region growing segmentation the other one is called splitting and merging segmentation
operation.
So, first let us talk about the region growing operation. Now, what is this region growing
segmentation? It is like this, that suppose we consider that all the pixels belonging to the
image as a set of pixel say R. Okay and by this region growing operation or the segmentation
operation what it does is, it partitions this set of pixels R into a number of sub regions say R1,
R2, R3 like this up to say Rn. So, what segmentation is doing is, segmentation operation is
actually partitioning these set of pixels R which actually represents the entire image into a
number of sub-images or partitions that is n number of partitions R1 to Rn.
Now, when it is doing this kind of partition that is when I partition my original set R into n
number of such partitions R1 to Rn. This partitioning should follow certain property. The
properties are if I take the union of all these regions Ri, union over I, this should give me the
original image R. That means none of the pixels in the image should be left out it is not that
some pixel is not part of any other partitions.
1055
So, every pixel in the image should be a part of one of the partitions. Okay. The second
property is the region Ri should be, is connected. And we have defined earlier that what do
we really mean by a connected region. We have said that given a region, the region will be
called connected if I take any two points in the region then I should be able to find out a path
between these two points, considering only, only the points which are already belonging to
this region Ri.
So, if every pair of points in this region Ri are connected, then we say that this region Ri is
connected. Okay. So, the second property that this segmentation or partition must follow is
that these partitions we get n number of partitions, every partition Ri should be connected.
The third property that must be followed is R i R j φ for i j . That means if I take two
partitions, say R1 and R2. This R1 and R2 should be disjoint, that means there should be any
common pixels any common points in these two partitions R1 and R2. Then, if I define a
predicate say p over a region Ri that should be true, where this P is a logical predicate defined
over the points in set Ri in this partition Ri.
So, for a single partition Ri this logical predicate P should be true. And the last property that
must be followed is predicate over R i R j , that must be equal to false. So, what does it
mean false, for i j ? So, this actually means that if I define a predicate for the pixels or the
points belonging to a particular region. Then the predicate must be true for all the points
belonging to that particular region. And if I take points belonging to two different regions Ri
and Rj, then the predicate over this combined set R i R j , must be equal to false.
So, this is what says the similarity, that means all the points belonging to a particular region
must be similar and the points belonging to two different regions are dissimilar. So, what
does this region growing actually mean? The region growing as the name implies that it is a
procedure which groups the pixels or sub-regions into an into a larger region based on some
pre-defined criteria, and in our case this pre-defined criterion is the defined predicate. So, we
start from a single point and try to find out, what are the other points that can be grouped into
the same group which follows the same criteria for which the predicate value is, or for all of
which the predicate is true.
1056
(Refer Slide Time: 41:37)
So, this region growing operation works like this, I have an image and in this image I may
select a set of points. So, somehow I select a set of points like this and each of these points I
call as a seed point. And then what region growing operation tries to do is, it tries to grow the
region starting from the seed point by incorporating all the points which are similar to the
seed point.
Now, this similarity measure can be different types, for example, we can say that two points
are similar if their intensity values are very close and the point are dissimilar if their intensity
values are, widely different and one of the condition that we have said that the points must be
connected that means coming to this image again, say I have this big image and for region
growing what I have to do is I have to choose a seed point and our region growing operation
will start from the seed point.
1057
(Refer Slide Time: 42:51)
So, for this purpose, what I will do is, I can define I can have a 3x3 neighborhood around this
seed point, and since, one of the property that these partitions have to follow. So, what I am
doing is I am choosing this 3x3 neighborhood around the seed point. And since, one of the
property that these partitions have to follow is that every region or every partition has to be
connected. That means when I start to grow the region starting from the seed point then, all
the points which I will include in the same group or in the same partition these points have to
be connected. That means I have to start growing this region from the points which are
connected to this seed point.
So, here if I use say, 8 connectivity, concept of 8 connectivity. In that case, the point which
are to be grouped or in the same group as the seed point they have to be one of, they must
belong to this 3x3 neighborhood of this particular seed point. So, effectively what I am doing
is, once I choose a seed point I check the points in its 3x3 neighborhood and all the points
which are find are similar to this seed point, those points are points are put in the same group.
And then again I start growing the region from all these new points which are put in the same
group.
So, effectively what I am doing is, if I call this seed point which is put, which has been
selected as say seed point s0. Now, from it is neighborhood I may find that the other points
which can be put in the same group as point as this initial seed point s0 or say s1, s2 and say s5.
Next time I start growing the region from s1 it itself. I find within the neighborhood of s1,
within this 3x3 neighborhood of s1 following the same 8 connectivity, what are the points
1058
which are similar to s1 or what are the points which are similar to the seed point. And this
similarity can be based on the intensity difference. If the intensity difference is small, I say
that they are similar, if the intensity difference is high I say that they are not similar.
So, by this again I start growing the region from s1, I start growing the region from s2, I start
growing the region from s5 and so on. And this process will stop when no more new point can
be included in the same group. Okay. So, effectively what we are doing is, we are selecting a
number of seed points in the image following some criteria.
So, we have selected a number of seed points. So, the seed point selection is application
dependent. And once we select the seed points, then from the seed points we start growing
the region in different directions by incorporating more and more points, which are connected
as well as similar. And at the end what we have is a number of regions which are grown
around this seed points. Okay.
So, this is what is the basic region growing operation and you find that this basic region
growing operation can be very easily implemented by using some recursive algorithm. Now,
let us see that what kind of output or result, we can get by using this region growing
segmentation operation.
1059
(Refer Slide Time: 47:17)
So, here is an example, this is the X-ray image taken from a weld. And you find that, in case
of this X-ray image, there may be some cracks within the welded region or there may be
some faults within the welded region, and these faults can be captured by using X-ray image.
So, this top left one, this is the X-ray image of a welded part. And the nature of the problem
says that whenever there is a fault, then the faulty regions in the X-ray image are going to
have very high intensity values. So, here on the left hand side, it is first a simple segmentation
operation the thresholding based segmentation operation, where these are the pixel values,
these are the regions having pixel values near an intensity value of 255, that is the maximum
intensity level. And as we said that these faults usually appear as higher intensity values in
the X-ray image. Then, what you do is, the seed points are actually selected as all the points
in this thresholded image having a value of 255 after the thresholding operation. And then
you start the region growing operation around each of these seed points.
So, if I grow the region around each of the seed points now, when you go for this region
growing operation, the region growing operation has to be done on this original image not on
the thresholded image. The thresholding operation is simply done to select the seed points.
Once you get the seed point come to the corresponding seed point location in your original X-
ray image and grow the regions starting from those seed locations within the original X-ray
image. And this one shows the grown regions and now, you find that if I superimpose the
boundary of these grown regions each of them are the grown region.
1060
So, if I superimpose the boundary of this grown region on this original X-ray image, this
superposition output is shown on the bottom right image. Here, you find that these are
actually the boundary regions, boundaries which are superimposed on the original image. So,
we find that your segmentation operation in this particular case is quite satisfactory. So, by
using this similarity measure and incorporating them along with the region growing operation
we can have, quite satisfactory segmentation operation. Okay.
So, the next type of segmentation that we said that we will discuss about is, splitting and
merging operation, splitting and merging. Here again since, what we are trying to do is, we
are making a trying to form a segment of all the pixels which are similar in intensity values or
similar in some sense. Our approach in this particular case will be that if I have an image say
R, first you find try to find out whether this entire image region is similar or not or whether
the intensity values are similar if they are not similar, then you break this image into
quadrants. So, just make 4 partitions of this image, then you check each and every partition in
this image, if they are similar, if all the pixels within a partition is similar you leave it as it is,
if it is not similar then again you partition that particular region.
So, initially suppose this was region R0, this was region say R1, this was region say R2, this
was region say R3. Now, this R1 is non-uniform. So, I partition that again making it R10, R11,
R12 and say R13 and you go on doing this partitioning until and unless you come to a partition
size, which is the smallest size permissible or you come to a situation where the partitions
1061
have become uniform, so I cannot partition them anymore. And in the process of doing this,
what I am doing is I am having a Quad tree representation of the image.
So, in case of Quad tree representation, if root node is R, my initial partition gives me 4
nodes R0, R1, R2 and R3. Then R1, I am partitioning again in R10, R11, R12 and R13. Once such
partitioning is complete then what you do is, you try to check all the adjacent partitions to see
if they are similar, if they are similar, you merge them together to form a bigger segment. So,
this is the concept of splitting and merging technique for segmentation.
Now, let us see this with the help of an example, say I have an image of this form. When you
come to this original image you find that here I have this background and on this background
I have this object region. This is obviously non-uniform. So, what I do is I partition them into
4 quadrants, each of them are non-uniform, so I have to partition them again. So, let us take
one particular partition, example of one particular partition so I partition them in 4 again. And
here you find that this particular partition is uniform, so I do not partition it anymore. The rest
of the partitions I have to go for sub partitioning like this.
Let us take one of them, this is partitioned again, this is partitioned again, this is partitioned
again and so on. Now, at the end, when I find that I cannot do anymore partitioning either I
have reached a minimum partition size or every partition has become uniform, then I have to
look for adjacent partitions which can be combined together to give me a bigger segment. So,
that is what I do in this case, here you find that this partition, this partition, this partition and
this partition, they can be grouped together.
1062
So and then again this particular group can be combined with this particular partition, it can
be combined with this partition, it can be combined with this partition and so on. So, finally
what I get is, after splitting, after the splitting operation the entire object I break into a
number of smaller size partitions. And then in the merging operation, I try to find out the
partitions which can be merged together to give me a bigger segment size.
So, by doing this at the end of this splitting and merging operation, a different object can be
segmented out from the background. So, in brief we have discussed about the different
segmentation operations. Initially we started with discontinuity based segmentation where
you have gone for different edge detection operation or line detection operation followed by
linking, and then we have discussed about the similarity based segmentation, under similarity
based we have discussed about various thresholding operations, the region growing operation
and lastly the splitting and merging operations. Thank you.
1063
THIS BOOK
IS NOT FOR
SALE
NOR COMMERCIAL USE