8 Image Processing Fundamentals Full
8 Image Processing Fundamentals Full
An image of dimensions 32×21 (i.e., image width = 32 pixels, image height = 21 pixels)
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 7 / 100
Image Coordinate System
A specific pixel is specified by its coordinates (x,y) where x is increasing from left to right,
and y is increasing from top to bottom.
The origin (0,0) is in the top-left corner.
The following shows the coordinate system of digital images:
(0,0) (1,0) (2,0) (3,0) ··· (width-1,0)
(0,1) (1,1) (2,1) (3,1) ··· (width-1,1)
..
.
(0, height-1) (1, height-1) (2, height-1) (3, height-1) ... (width-1, height-1)
where width and height are the image width and image height, respectively.
Black Gray-level = 0
Dark gray Gray-level = 64
Medium gray Gray-level = 127
Light gray Gray-level = 190
White Gray-level = 255
Parameters:
fname: The image file to read: a filename or a file-like object opened in read-binary mode.
format: The image file format assumed for reading the data. If format is not given, the format is deduced from the
filename. If nothing can be deduced, PNG is tried.
Return value: numpy.array.
(M,N) for grayscale images
(M,N,3) for RGB images
(M,N,4) for RGBA images
PNG images are returned as float arrays (0-1). All other formats are returned as int arrays, with a bit depth
determined by the file’s contents.
URL: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imread.html
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 12 / 100
Show Images in Colab
URL: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html
Parameters:
fname: a path or a file-like object to store the image in.
arr: The image data. The shape can be one of M×N (luminance), M × N × 3 (RGB) or M × N × 4 (RGBA).
The first two dimensions (M,N) define the rows and columns of the image.
Returns AxesImage
AxesImage is an image attached to an Axes.
URL: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imsave.html
Image Processing
OpenCV (Open Source Computer Vision Library) is an open source computer vision and
machine learning software library.
OpenCV was built to provide a common infrastructure for computer vision applications
and to accelerate the use of machine perception in the commercial products.
The library has more than 2500 optimized algorithms, which includes a comprehensive set
of both classic and state-of-the-art computer vision and machine learning algorithms.
OpenCV supports a wide variety of programming languages such as Python, C++, Java,
etc.
To perform the above using OpenCV, you need to first import cv2
import cv2
Then use cvtColor() method of the cv2 module.
Syntax
cv2.cvtColor(image, code)
Parameters:
image: Image to be processed in n-dimensional array
code: Conversion code for colorspace. For converting RGB to grayscale, we use cv2.COLOR RGB2GRAY
Return value: Converted image.
URL: https://docs.opencv.org/3.4/d8/d01/group__imgproc__color__conversions.html#
ga397ae87e1288a81d2363b61574eb8cab
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 20 / 100
# Assume Google Drive has been mounted & the path has been added for interpreter to search
Parameters:
src: input image
M: 2 × 3 transformation matrix
dsize: size of the output image
flags: combination of interpolation methods
borderMode: pixel extrapolation method
borderValue: value used in case of a constant border; by default, it is 0
Return value: output image that has the size dsize and the same type as src
URL:
https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#ga0203d9ee5fcd28d40dbc4a1ea4451983
x ′ = x + tx
y ′ = y + ty
In matrix form
x
x′
1 0 tx y
=
y′ 0 1 ty
1
x ′ = (x − x0 )cosθ + (y − y0 )sinθ + x0
y ′ = −(x − x0 )sinθ + (y − y0 )cosθ + y0
In matrix form
If theta > 0, anti-clockwise, theta < 0 -> clockwise
x
x′
cosθ sinθ −x0 cosθ − y0 sinθ + x0 y
=
y′ −sinθ cosθ x0 sinθ − y0 cosθ + y0
1
x0, y0 is the point to rotate the image about
normally the centre of the image
URL:
https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#gafbbc470ce83812914a70abfb604f4326
URL:
https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#ga47a974309e9102f5f08231edc7e7529d
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 31 / 100
# Assume Google Drive has been mounted & the path
# has been added for interpreter to search
Answer:
Point operation! Since the output value at a specific coordinate of the grayscale image is
dependent only on the input value at the same coordinate of the color image.
# Import all the required libraries how to reverse the process? DO REVISION
import cv2; import numpy as np Method: Change of subject for the equation
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
# Perform thresholding
processedImg = grayImgUint > 128
We need a way to automatically determine the threshold value T so that the result of
thresholding is reproductible.
A well-known approach is Otsu’s method
1. Select an initial estimate of the threshold T. A good initial value is the average intensity of
the image.
2. Calculate the mean gray values µ1 and µ2 of the partitions, R1 , R2 .
3. Partition the image into two groups, R1 , R2 , using the threshold T.
4. Compute a new threshold
1
T = (µ1 + µ2 )
2
5. Repeat steps 2-4 until the mean values µ1 and µ2 in successive iterations do not change.
Parameters:
source: input image array (must be grayscale)
thresholdValue: value of threshold below and above which pixel values will change accordingly
maxVal: Maximum value that can be assigned to a pixel
thresholdingTechnique: The type of thresholding to be applied
(For Otsu’s, we put cv2.THRESH BINARY + cv2.THRESH OTSU)
Return values:
URL: https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 41 / 100
# Assume Google Drive has been mounted & the path has been added for interpreter to search
Recall, local operations refer to those the output value at a specific coordinate is
dependent on the input values in the neighborhood of that same coordinate.
Some of the most common neighborhoods are 4-connected neighborhood and the
8-connected neighborhood.
Image smoothing: It removes noise and softens edges and corners of the image. It is also
called blurring.
Image edge detection: It detects the boundaries (edges) of objects, or regions within an
image.
Image sharpening: It removes blur, enhances details, and dehazes.
=K (−1, −1)I (2, 2) + K (−1, 0)I (2, 1) + K (−1, 1)I (2, 0)+
K (0, −1)I (1, 2) + K (0, 0)I (1, 1) + K (0, 1)I (1, 0)+
K (1, −1)I (0, 2) + K (1, 0)I (0, 1) + K (1, 1)I (0, 0)
=(−1)(9) + (−1)(5) + (−1)(3) + (0)(7) + (0)(3) + (0)(1) + (1)(8) + (1)(4) + (1)(10)
= − 9 − 5 − 3 + 8 + 4 + 10 = 5
Steps
1. Inverse the kernel, i.e., flipping the kernel in both
horizontal and vertical directions about the center
of kernel.
-1 0 1 1 0 -1 1 0 -1
-1 0 1 1 0 -1 1 0 -1
-1 0 1 1 0 -1 1 0 -1
(Left) Original kernel, (Middle) Flipped
horizontally, (Right) Flipped vertically
2. Slide over the inversed kernel centered at interested
point.
3. Multiply inversed kernel data with the overlapped
area.
4. Sum and accumulate the output.
Step 2 to Step 4
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 47 / 100
Image Convolution Again
URL: https://docs.opencv.org/4.x/d4/d86/group__imgproc__filter.html#ga27c049795ce870216ddfb366086b5a04
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 51 / 100
Image Convolution
Note
filter2D does not mirror the kernel for you. You will need to flip the kernel before applying
cv2.filter2D.
# Assume Google Drive has been mounted & the path has been added for interpreter to search
Text
Text
Using a small kernel and performing many times is better than using big kernels
Original image
Effects of kernel in different size. (What do you observe?)
+ =
Original image Detail (Edge) Sharpened image
[Color flipped for clarity]
Edge image (vertical edges) Edge image (horizontal edges) Edge image
q (magnitude)
|Gx | |Gy | Gx2 + Gy2
Pixels of the processed images are inverted (i.e. black to white, white to black) for making them more visible.
Edge image (vertical edges) Edge image (horizontal edges) Edge image
q (magnitude)
|Gx | |Gy | Gx2 + Gy2
Pixels of the processed images are inverted (i.e. black to white, white to black) for making them more visible.
3 7 6
2 4 6
4 7 2
3+7+6+2+4+6+4+7+2 41
=
9 9
Reference Materials
Sometimes, you may want to crop the region of internet (ROI) for further processing.
For instance, in a face detection application, you may want to drop the face from an
image.
To crop an image, you can use the same method as numpy array slicing.
To slice an array, you need to specify the start and end index of the first as well as the
second dimension.
Syntax
croppedImg = sourceImg[start_row:end_row, start_col:end_col]
0 0 0 0 0 0 0 5 4 4 5 6 6 5 1 1 1 2 3 3 3
0 0 0 0 0 0 0 2 1 1 2 3 3 2 1 1 1 2 3 3 3
0 0 1 2 3 0 0 2 1 1 2 3 3 2 1 1 1 2 3 3 3
0 0 4 5 6 0 0 5 4 4 5 6 6 5 4 4 4 5 6 6 6
0 0 7 8 9 0 0 8 7 7 8 9 9 8 7 7 7 8 9 9 9
0 0 0 0 0 0 0 8 7 7 8 9 9 8 7 7 7 8 9 9 9
0 0 0 0 0 0 0 5 4 4 5 6 6 5 7 7 7 8 9 9 9
Parameters:
src: Source image
top: The border width in number of pixels in top direction
bottom: The border width in the number of pixels in bottom direction
left: The border width in the number of pixels in left direction
right: The border width in the number of pixels in the right direction
borderType: The kind of border to be added
cv2.BORDER CONST
cv2.BORDER REFLECT
cv2.BORDER REPLICATE
value (optional): The color of border if border type is cv2.BORDER CONSTANT
Returns the resulting image
Parameters:
image: Image of type uint8 or float32 represented as “[img]”
channels: It is the index of channel for which we calculate histogram. For grayscale image, its value is [0] and color
image, you can pass [0], [1], or [2] to calculate histogram of each channel respectively.
mask: mask image. To find histogram of full image, it is given as ‘None”.
histSize: This represents the number of bins. For full scale, we pass [256]
ranges: This is the range of intensities. Normally, it is [0,256].
Return value: Histogram of the image.
# Calculate histogram
hist = cv2.calcHist([grayImgUint], [0], None, [256], [0,255])
plt.figure()
plt.plot(hist) # Plot and show the histogram
plt.show()
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 69 / 100
Brightness Adjustment
To adjust the brightness of an image using OpenCV, you need to first import cv2
import cv2
Then use convertScaleAbs() method of the cv2 module.
Syntax
cv2.convertScaleAbs(image, alpha = 1, beta = 0)
Parameters:
image: Image to be processed in n-dimensional array
alpha: The scale factor. It is 1 by default
beta: The delta added to the scaled values. It is 0 by default.
Return value: Converted image.
Without gamma, shades captured by digital cameras would not appear as they did to our
eyes (on a standard monitor).
Gamma is also referred to as gamma correction, gamma encoding or gamma compression,
but these all refer to a similar concept.
A gamma encoded image has to have “gamma correction” applied when it is viewed –
which effectively converts it back into light from the original scene.
Gamma correction can be performed by adjusting gamma value (γ).
γ < 1 will make the image appear darker
γ > 1 will make the image appear lighter
γ = 1 will have no effect on the input image
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 72 / 100
# Assume Google Drive has been mounted & the path has been added for interpreter to search
# Import all the required libraries
import cv2; import numpy as np
import matplotlib.image as mpimg; import matplotlib.pyplot as plt
img = mpimg.imread('snorlax.png') # Read the image
# Convert the color image to gray and show it
grayImg = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
plt.figure(); plt.imshow(grayImg, cmap='gray', vmin=0, vmax=1)
# Convert pixel values from [0,1] to [0,255]
grayImgUint = grayImg*255; grayImgUint = grayImgUint.astype(np.uint8)
# Prepare look-up-table and perform gamma correction
gamma = 0.5; invGamma = 1/gamma
table = [((i / 255) ** invGamma) * 255 for i in range(256)]
table = np.array(table, np.uint8)
processedImg1 = cv2.LUT(grayImgUint, table)
plt.figure(); plt.imshow(processedImg1, cmap='gray', vmin=0, vmax=255)
# Prepare look-up-table and perform gamma correction
gamma = 2.2; invGamma = 1/gamma
table = [((i / 255) ** invGamma) * 255 for i in range(256)]
table = np.array(table, np.uint8)
processedImg2 = cv2.LUT(grayImgUint, table)
plt.figure(); plt.imshow(processedImg2, cmap='gray', vmin=0, vmax=255)
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 73 / 100
Histogram Equalization
Histogram equalization is another technique used to improve contrast of images.
The idea is to spread out the most frequent intensity values.
Algorithm
1. Compute the histogram, H, of the image
2. Compute the cumulative histogram, C, of
the image
i
X
C (i) = H(j)
j=0
Inew = C (I )
In fact, histogram equalization can be performed using OpenCV function. To do so, you
need to first import cv2
import cv2
Then use equalizeHist() method of the cv2 module.
Syntax
equalizeHist(source)
Parameters:
source: input image array
Return value: Equalized image
0 0 0 Convolve the
1 0 0 original image with
the kernel 5 times,
0 0 0
we get back an
image shifted by 5
Shifted identity pixels.
kernel
Original image Resulting image
{desmond,kccecia}@ust.hk COMP 2211 (Fall 2022) 79 / 100
Non-linear Filtering
Parameters:
src: input image that you want to process
kernelSize: The size of the kernel
Return value: filtered image.
import numpy as np
To demonstrate how morphological filter work, let us create two adjacent circles with
random noise on its background.
from skimage.draw import disk
import numpy as np
Erosion
Dilation
Opening
Closing
Erosion is used for shrinking of element in input image by using the structuring element.
The pixel values are retained only when the structuring element is completely contained
inside input image. Otherwise, it gets deleted or eroded.
Parameters:
src: input image that you want to erode
kernel: A structuring element used for erosion
dst: Output image
anchor: Integer representing anchor point and it’s default value Point is (-1,-1) which means that the anchor is at
the kernel center.
borderType: cv2.BORDER CONSTANT, cv2.BORDER REFLECT, etc.
iterations: Number of times erosion is applied.
borderValue: It is border value in case of a constant border.
Return value: filtered image.
Dilation is used for expanding of element in input image by using the structuring element.
The pixel values are “on” only when the structuring element has overlapped with the
input image. Otherwise, the pixel values are “off”.
Parameters:
src: input image that you want to dilate
kernel: A structuring element used for dilation
dst: Output image
anchor: Integer representing anchor point and it’s default value Point is (-1,-1) which means that the anchor is at
the kernel center.
borderType: cv2.BORDER CONSTANT, cv2.BORDER REFLECT, etc.
iterations: Number of times dilation is applied.
borderValue: It is border value in case of a constant border.
Return value: filtered image.
Parameters:
src: input image that you want to process
op: Operations (cv2.MORPH ERODE, cv2.MORPH DILATE, cv2.MORPH OPEN, cv2.MORPH CLOSE)
kernel: A structuring element used for dilation
dst: Output image
anchor: Integer representing anchor point and it’s default value Point is (-1,-1) which means that the anchor is at
the kernel center.
iterations: Number of times dilation is applied (e.g. iterations = 2, erode×2, dilate×2).
borderType: cv2.BORDER CONSTANT, cv2.BORDER REFLECT, etc.
borderValue: It is border value in case of a constant border.
Return value: filtered image.
img = plt.imread('input-open.png')
Closing filter removes small holes while also maintaining the original shape of the object.
Closing is done by applying the dilation first, and then applying erosion.
Parameters:
src: input image that you want to process
op: Operations (cv2.MORPH ERODE, cv2.MORPH DILATE, cv2.MORPH OPEN, cv2.MORPH CLOSE)
kernel: A structuring element used for dilation
dst: Output image
anchor: Integer representing anchor point and it’s default value Point is (-1,-1) which means that the anchor is at
the kernel center.
iterations: Number of times dilation is applied (e.g. iterations = 2, erode×2, dilate×2).
borderType: cv2.BORDER CONSTANT, cv2.BORDER REFLECT, etc.
borderValue: It is border value in case of a constant border.
Return value: filtered image.
img = plt.imread('input-close.png')
imread(): https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imread.html
imshow(): https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html
imsave(): https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imsave.html
cvtColor(): https://docs.opencv.org/3.4/df/d9d/tutorial_py_colorspaces.html
warpAffine(), getRotationMatrix2D(), resize():
https://docs.opencv.org/3.4/da/d6e/tutorial_py_geometric_transformations.html
copyMakeBorder(): https://docs.opencv.org/3.4/dc/da3/tutorial_copyMakeBorder.html
calcHist(): https://docs.opencv.org/3.4/dd/d0d/tutorial_py_2d_histogram.html
convertScaleAbs():
https://docs.opencv.org/3.4/d2/de8/group__core__array.html#ga3460e9c9f37b563ab9dd550c4d8c4e7d
threshold(): https://docs.opencv.org/3.4/d7/d4d/tutorial_py_thresholding.html
equalizeHist(): https://docs.opencv.org/3.4/d5/daf/tutorial_py_histogram_equalization.html
filter2D(), medianBlur(): https://docs.opencv.org/3.4/d4/d13/tutorial_py_filtering.html
erode(), dilate(), morphologyEx(),
https://docs.opencv.org/3.4/d9/d61/tutorial_py_morphological_ops.html