Image_Processing
Image_Processing
Image_Processing
Clara Gonçalves
March 2024
Abstract
1 What is an image?
An image can be represented by a Matrix composed with values that represent the intensity of the color on
a certain point (also called pixel).
If we think of an image in gray scale as a function. Then f(x,y) will be the intensity of the pixel in a
given point in the image space.
1
A image processor operator is a function that takes one or more input and returns an image with those
operations. Note that usually each function has two inputs the pixel value and it’s location.
f (x) = h(f (x)) − > f (i, j) = h(f (i, j)), being x = (i, j) (1)
A common function used in point processing is g(x) = af (x) + b, this function is set to allow to change
contrast(af (x)) and brightness(b).
If we increase of gamma value leads to a darker image, and the decrease in gamma value leads to a lighter
image.
The frequancy espectrum is symetric on the amplitude (vertical) axis. So it averages at zero.
2
Intensity Histogram is the histogram that represent the pixel intensity thought an image. The distri-
bution of pixel intensity of an image, in other words, the probability of a value r is present in a given pixel,
is the following:
numberof pixelswithintensityr
p(r) = (3)
T otalnumberof pixels
This is important because when you calculate p(r) for the different values of r and plot, then you can
get the normalized histogram or probability density function (PDF) of that image intensity values. This
histogram provides information about the distribution of intensity in the image.
A High contrast image is given by an image in which his histogram is uniformly distributed between
all the intensity ranges. With this distribution, the image is more detailed.
3 Filtering of an Image
Filtering an image passes by given an image and a filter. We apply the filter to the given image and then the
filter does mathematics operations to the image, modifying the pixel values. This process has as main goal
to emphasize or suppress some features of an image. This allows in the end of the retrieve of information on
the original image.
In the context of image processing, we have different types of filter, that retrieve different information:
3
• Noise Reduction Filters : Also caled Smoothing Filters (Bluring), they are used as the name implies
to reduce the noise of an image and create a smother version of it. One of the example filter that does
this process is the Gausssian filter and median filter.
• Sharpening Filters : These filters are used to enhance the details of an image, per example enhancing
the edges. To get this result, we can use Laplacian or Sobel filters.
• Edge detection filters : The Edge detection filters, as the name suggests, apply a filter in an image
in such a way that only the edges are visible. One of the examples is the Canny Edge Detector Filter.
• Frequency Domain Filters : Filters applied in frequency domain, per example Fourier Transformed,
can be used in order to apply to an image a low-pass or high-pass filter.
Filtering an image involves manipulating the pixel values of the original image in such a way that will
enhance or suppress some features of the image. The manipulation of the pixel values is made based on some
function (this function depends on the filter that is being applied) applied on a local neighborhood of each
pixel.
Denoising : The process of removing noise from an image, where noise refers to unwanted random
variations in brightness or color.
Debluring : Addressing blurred images which can result from factors like motion during image
captivation or limitations in the image system.
2. Image Compression
This involves reducing the file size of an image while maintaining the quality of the image. The
listed standards (JPEG, HEIF, MPEG) are examples of widely used image compression algo-
rithms.
3. Computing Field Properties
Optical Flow : Analysing the motion of objects with a sequence of images. It helps understand how
pixels move from one frame to another.
Disparity : Disparity in the context of image processing is all about figuring out how far away
different objects are in a pair of images. Imagine you take two pictures of the same scene from
slightly different angles, like your left and right eye might see a scene. Disparity helps you
understand the differences between these two images.
So, when you’re talking about disparity in image processing, you’re essentially looking at the
variations or variations in the apparent position of objects between those two images. It’s like
understanding the ”shift” or ”offset” of objects as seen from different viewpoints. This information
is useful in tasks like creating 3D models of a scene or understanding the depth in a stereoscopic
image.
4
3.2 Noise reduction
Noise reduction or denoising as mention above is the capability of reducing noise from an image.
On of the ways that this can be done is using temporal averaging this is done by taking a lot of images
from the same place and average the values of the intensity through.
Another way is to use a Linear Filtering in which you modify the pixel in an image based on a given
function on the local neighborhood of each pixel.
3.3 Cross-Correlation
Cross-correlation in image processing is a technique used to measure the similarity between two signals
or images by sliding one over the other and computing the sum of products at each position, providing
information about spatial relationships and pattern matching.
With F representing the image, H the kernel with size 2k + 1x2k + 1 and G the output image we can
describe this operation by the following equation:
X v=−k
u=−k X
G[i, j] = H[u, v]F [i + u, j + v] (4)
k k
H[u, v] is the prescription for weights in linear combination. Also can be denoted by the ”dot product” of
the kernel and the image: O
G=H F (5)
The cross cross-correlation is neither associative nor commutative.
3.4 Convolution
The Convolution is very similar to the cross correlation. the difference lies in the handling of the kernel
during the operation. In convolution the kernel if ”flipped” both vertically and horizontally.
Having that into account we can describe the convolution equation as:
u=−k
X v=−k
X
G[i, j] = H[u, v]F [i − u, j − v] (6)
k k
Different from cross correlation the convolution is associative and commutative. which in a way turns
the convolution into a multiplication like operation with all its proprieties:
5
• commutative
• associative
• distributes over addition
• scalars factor out
• identity: use with unit impulse
This also means that we can apply multiple layers of convolution calculations and not worry about the
order in which they are applied.
With these CNNs have proven successful in various computer vision applications, including image clas-
sification, object detection, and semantic segmentation.
In Convolution Neural Networks (CNNs), padding and stride are two important concepts associated
with the convolutional layers:
Padding :
1 Pooling layers in neural networks, often used in conjunction with convolutional layers, downsample the spatial dimensions
of feature maps, reducing their resolution and retaining essential information for efficient processing.
6
• Definition: Padding refers to the addition of extra pixels (usually zero-valued) around the input image
or feature map before applying the convolution operation.
• Purpose: Padding helps to retain spatial information at the edges of the image and mitigates the
problem of shrinking feature maps as they pass through convolutional layers.
• Types: Common padding types include zero-padding (adding zeros around the image), valid padding
(no padding), and reflective or symmetric padding.
Stride:
• Definition: Stride is the step size with which the convolutional filter moves across the input image or
feature map.
• Purpose: A larger stride reduces the spatial dimensions of the output feature map, effectively down
sampling it. This can be useful for reducing computational complexity and memory requirements.
• Impact: Smaller strides provide higher spatial resolution in the output, but may increase computa-
tional cost. Larger strides result in more aggressive down sampling.
In summary, padding, and stride are parameters that influence the spatial dimensions of the feature maps
produced by convolutional layers in CNNs. Padding helps maintain information at the edges, while stride
controls the step size of the filter, affecting the down sampling or up sampling of the feature maps. These
concepts play a crucial role in determining the architecture and performance of a CNN in various computer
vision tasks.
Here, F (i, j) represents the original pixel value, and k is the size of the filter/kernel.
Mean filtering is a straightforward and computationally efficient method for basic image smoothing, but
it may not be suitable for preserving fine details or edges in the image. More advanced filtering techniques,
such as Gaussian filtering, are often employed when a smoother yet more visually appealing result is desired.
7
3.8 Separable filters
Separable filters are a type of filter in image processing that can be decomposed into two one-dimensional
filters, often applied in succession along different axes (horizontal and vertical). This decomposition simplifies
the computation and makes the overall filtering process more efficient.
1. Filter Decomposition: A separable filter can be expressed as the outer product of two vectors:
one for the row and one for the column. Mathematically, if F is a 2D filter, it can be presented as
F = AB T , where A is a vector representing the horizontal filter, and B is a vector representing the
vertical filter.
2. Separable Convolution: Instead of applying the 2D filter directly to the image, the separable
approach involves applying the horizontal filter along the rows and then applying the vertical filter
along the columns (or vice versa). This is computationally more efficient than applying the full 2D
filter in a single step.
3. Efficiency Benefits: The main advantage of separable filters lies in the reduced computational
complexity. Convolution with a separable filter takes fewer operations compared to the equivalent
non-separable filter. This can lead to significant speed improvements in image processing algorithms,
especially for large images and filters.
4. Examples: Common examples of separable filters include Gaussian filters and Sobel filters. These
filters can be decomposed into horizontal and vertical components, making them separable.
5. Implementation: When implementing separable filters, the filter kernel is often factored into two 1D
vectors. The image is convolved first with the horizontal vector, and then the result is convolved with
the vertical vector. This reduces the overall computational cost, especially for larger filter sizes.
Separable filters are widely used in real-time image processing applications, computer vision, and other
domains where efficiency is crucial. They offer a balance between computational savings and filter expres-
siveness, making them a valuable tool in image filtering algorithms.
If the image has M x N pixels and the filter kernel has size LxL:
• What is the cost of convolution with a non-separable filter? L2 xM xN
• What is the cost of convolution with a separable filter? 2x(LxM xN )
This process effectively reduces high-frequency (low-pass filter) noise and sharpens transitions between
different image regions. The result image of the Gaussian filter to the image to achieve smoothing or blurring.
As we can see in the image bellow Selecting an appropriate value for σ is crucial for achieving the desired
level of smoothing without overly blurring the image or losing important details.
The Gaussian filter, while highly efficient in smoothing and blurring images has some limitation:
8
• Loss of detail - Since the Gaussian filter applies blur to the image, it can result in a loss of fine
details(sharp edges or small feature).
• Border effects - In Gaussian filter near the edges of an image, the pixels outside the image boundary
are typically ignored or handled in a special way (e.g., by zero-padding or by mirroring). This can lead
to artifacts or border effects in the filtered image, such as halos or ringing around edges.
• Computationally Intensive - It can be computer demanding to convolute a Gaussian kernel with
large standard deviation. However the separability feature of the Gaussian kernel can help with this.
• Parameter Sensitivity - The performance of the Gaussian filter can be sensitive to the choice of
parameters, particularly the standard deviation (σ).
• Linear Nature - Gaussian filtering is linear, meaning that it treats all image regions equally regardless
of their content.
• Trade-off Between Noise Reduction and Detail Preservation - There is often a trade-off be-
tween noise reduction and detail preservation when applying Gaussian filtering. Increasing the standard
deviation (σ) leads to stronger smoothing and noise reduction but may also result in more loss of detail.
• Filter Principal
Mean Filter: Replace each pixel values with the average values of the pixel neighborhood. Computes
a simple average giving every pixel the same weights.
Gaussian Filter: Replace each pixel value with the weighted2 average in the image using the
Gaussian kernel.
• Smoothing Effect
Mean Filter: Performs uniform smoothing and blurring through the image. The intensity of the
blurring is given by box kernel.
Gaussian Filter: Performs weighted smoothing and blurring.The amount of smoothing can be
controlled by adjusting the standard deviation (σ) of the Gaussian kernel
• Edge Preservation
Mean Filter: Blured edges
Gaussian Filter: Not as blured
• Computational Complexity
Mean Filter: simple computation calculation
Gaussian Filter: more complex computation computation
2 The weights are determined by the Gaussian distribution, with more weight given to nearby pixels and less weight to distant
pixels.
9
• Parameter Sensitivity
Mean Filter: few parameters to adjust
Gaussian Filter: a small change in the parameters can result in a lot more blluring.
While both Mean and Gaussian filter are used for image blurring, the Gaussian filter tends to preserve
more details in an image and is also better at removing noise.
2. Subtract the blurred image- In order to get the high frequency component of the image. We
subtract the blurred image to the original image.
3. Enhance the Original image- Finally we take the original image and the high frequency image and
we add them together.
Mathematically, the process can be represented as follows:
Where:
”Original Image” is the input image. ”Blurred Image” is the result of applying a smoothing filter to the
input image. ”Amount” is a parameter that controls the strength of the sharpening effect.
It’s worth noting that while sharpening filters can enhance image details, they can also amplify noise and
artifacts in the image.
10
3.14 The bilateral filter
Bilateral filtering is a non-linear, edge-preserving, and smoothing filter used in image processing to reduce
noise while preserving edges in the image. It differs from linear smoothing filters like the Gaussian filter in
that it considers both spatial proximity and intensity similarity when performing the filtering operation.
The bilateral filter operates by applying a weighted average to each pixel in the image, where the weights
are determined by both spatial distance and intensity difference between neighboring pixels. Pixels with
similar intensities and spatial proximity are given higher weights, while pixels with larger intensity differences
or greater spatial distances are given lower weights.
The formula for bilateral filtering can be expressed as:
1 X
h[m, n] = g[k, l]rmn [k, l]f [m + k, n + l] (10)
Wmn
k,l
Where:
1
Normalization factor: Wm n
Spatial weighting: g[k, l]
Intensity range weighting: rmn [k, l]
4 Sampling Aliasing
4.1 Sampling an image
Sampling an image involves capturing or representing it in a digital form. This process can include resizing,
sub-sampling, undersampling, and upsampling.
Image Resizing Image resizing refers to changing the dimensions of an image. This can involve either
increasing (upsampling) or decreasing (downsampling) its size while trying to preserve its visual quality.
Undersampling Undersampling is a specific form of sub-sampling where the image is reduced in size
by capturing fewer samples than necessary. It can lead to aliasing artifacts if not properly handled.
Upsampling Upsampling, also known as interpolation, is the process of increasing the resolution of
an image by adding new pixels between existing ones. Various interpolation techniques, such as nearest-
neighbor, bilinear, or bicubic interpolation, can be used to estimate the values of the new pixels.
Image Sub-sampling Image sub-sampling involves reducing the resolution of an image by keeping only
a subset of its original pixels. This is typically done by discarding alternate rows and columns of pixels,
resulting in a smaller image.
These techniques are fundamental in image processing and are used in various applications such as image
resizing for display or printing, compression, and digital image analysis.
4.2 Aliasing
Aliasing refers to the distortion that occur when continuous signals, such as images , are sampled at too low
a rate or when high-frequency information is not adequately represented. Which means that aliasing occurs
when the sampling image that not allows us to recreate the original image.
In computer vision, aliasing can occur during image acquisition, processing, or display. It can manifest
as distortion or loss of detail in images, particularly when resizing, rotating, or transforming images. Proper
anti-aliasing techniques, such as low-pass filtering or supersampling, are often employed to mitigate aliasing
effects in computer vision applications.
Aliasing can also affect neural networks, particularly in tasks involving image recognition or classifica-
tion. If not properly addressed, aliasing in training data or during the image preprocessing stage can degrade
the performance of neural networks. Techniques such as data augmentation, filtering, or higher-resolution
input images can help reduce aliasing effects in neural network training and inference.
Understanding and mitigating aliasing is essential to ensure accurate representation and analysis of
images.
11
5 Gaussian Pyramid
A Gaussian pyramid is a type of image pyramid used in image processing and computer vision tasks. It is
constructed by iteratively applying Gaussian smoothing and downsampling operations to an input image,
resulting in a series of images at different scales or resolutions. Each level of the pyramid represents a
smoothed and downsampled version of the original image, with progressively lower resolutions.
• The input image is convolved with a Gaussian kernel to blur or smooth the image, reducing
high-frequency noise.
• The amount of smoothing applied is determined by the standard deviation (σ) of the Gaussian
kernel. A larger σ results in more smoothing.
3. Downsampling:
• The smoothed image is then down-sampled or subsampled to reduce its size, typically by a factor
of 2 in each dimension (halving the width and height).
• Down-sampling is achieved by discarding every other row and column of pixels in the image,
effectively reducing its resolution.
• The process continues until a stopping criterion is reached, such as reaching a predefined number
of levels or reaching a minimum image size.
The resulting Gaussian pyramid consists of a series of images arranged in a hierarchical structure, with
the base level containing the original input image and subsequent levels containing progressively smoothed
and down-sampled versions of the input. The pyramid enables multi-scale analysis of the image, allowing
algorithms to operate at different resolutions for tasks such as image blending, image alignment, and scale-
invariant feature detection.
12
6.1 Bilinear Interpolation
Bilinear interpolation is a simple and efficient method that estimates the value of a pixel by interpolating
between its four nearest neighbors in a 2x2 pixel grid. The interpolated value is calculated as a weighted
average of the values of these neighboring pixels, where the weights are determined by the distances between
the target position and the neighboring pixels.
The formula for bilinear interpolation is given by:
• α and β are the fractional parts of the target position (x, y) within the grid.
Bilinear interpolation provides a good balance between computational efficiency and visual quality, mak-
ing it suitable for real-time applications such as image resizing and texture mapping.
6.3 Comparison
Bilinear interpolation is simpler and faster, making it suitable for real-time applications where computational
efficiency is critical. On the other hand, bicubic interpolation produces higher-quality results at the cost of
increased computational complexity, making it more suitable for offline image processing tasks where visual
quality is paramount.
• Averaging: Simple averaging of pixel values across multiple images to reduce noise and enhance image
quality. This method assumes that the low-resolution images are aligned and have similar content.
13
• Interpolation: Interpolation techniques such as bilinear or bicubic interpolation can be used to
estimate high-resolution details between pixels in the low-resolution images. These methods provide
smoother transitions between neighboring pixels and can generate visually appealing results.
• Learning-based Approaches: Deep learning techniques, such as convolutional neural networks
(CNNs), can be trained to learn the mapping between low-resolution and high-resolution images using
a large dataset of paired images. These models can capture complex relationships in the data and
generate high-quality super-resolved images.
8 Image Derivatives
8.1 Partial Derivatives with Convolution
Partial derivatives are a fundamental concept in calculus that describe how a function changes with respect to
its input variables. In this context we see images as functions and in image processing, partial derivatives are
used to quantify the rate of change of pixel values in an image along different directions, typically horizontal
and vertical.
Gradient
The gradient of an image represents the rate of change of pixel values in both the horizontal and vertical
directions. It is computed using partial derivatives with convolution operations. The gradient of an image
can be described as follow:
df df
∇f = [ , ]
dx dy
where the fraction on the left represent the x derivative and the fraction on the right represents the y
derivative of the image.
Horizontal Derivative
The horizontal derivative, also known as the x-derivative or the derivative with respect to the x-axis, is
computed by convolving the image with a derivative filter such as the Sobel filter:
−1 0 1
Gx = I ∗ −2 0 2
−1 0 1
where I is the input image and Gx is the horizontal derivative.
Vertical Derivative
The vertical derivative, also known as the y-derivative or the derivative with respect to the y-axis, is computed
by convolving the image with a derivative filter similar to the horizontal derivative but rotated by 90 degrees:
−1 −2 −1
Gy = I ∗ 0 0 0
1 2 1
where I is the input image and Gy is the vertical derivative.
14
Magnitude and Direction
The magnitude of the gradient, often denoted as |∇I|, represents the overall rate of change of pixel values
in the image and is computed as:
The direction of the gradient, often denoted as θ, represents the orientation of the edges in the image
and is computed as:
Gy
θ = arctan
Gx
In summary, partial derivatives with convolution are used to compute the gradient of an image, which
provides information about the rate of change of pixel values in different directions. This information is
useful for various image processing tasks such as edge detection, feature extraction, and image enhancement.
9 Edge Detection
Edges can be characterized by sudden changes or discontinuities in the intensity or color values of neighboring
pixels. These changes can occur across different directions, such as horizontal, vertical, or diagonal.
Images as Functions
In image processing, an image can be viewed as a function f (x, y), where x and y are spatial coordinates,
and f (x, y) represents the intensity or color value at each point in the image.
Effects of Noise
Noise in images can interfere with edge detection algorithms by introducing false edges or reducing the
contrast between true edges and background regions.
To mitigate the effects of noise on edge detection, a common approach is to apply a smoothing or blurring
filter to the image before performing edge detection. Smoothing filters help to reduce high-frequency noise
while preserving the overall structure of the image.
15
This theorem is useful in image processing and signal processing for computing the derivative of a
smoothed image or signal efficiently.
Application
In image processing, the derivative theorem of convolution is often applied in edge detection algorithms. By
convolving an image with a Gaussian kernel and then taking the derivative of the resulting smoothed image,
the edges in the image can be detected more effectively.
Derivative
The derivative of the Gaussian filter with respect to x can be computed analytically. The derivative of the
Gaussian function is given by:
x 1 x2
− 2σ
g ′ (x) = − · √ e 2
σ2 2πσ 2
This derivative represents the rate of change of the Gaussian function with respect to distance x from
the center of the kernel.
Application
The derivative of the Gaussian filter is often used in edge detection algorithms, such as the Canny edge
detector. By convolving an image with the derivative of the Gaussian filter, the gradient magnitude and
direction at each pixel can be computed, which are used to detect edges in the image effectively.
12 Laplace Filter
The Laplace filter, also known as the Laplacian filter, is a kernel used for edge detection and image enhance-
ment in image processing. It computes the second derivative of the image intensity function, highlighting
regions of rapid intensity change.
The Laplace filter kernel is defined as:
∂ 2 f (x, y) ∂ 2 f (x, y)
∇2 f (x, y) = +
∂x2 ∂y 2
where ∇2 represents the Laplacian operator and f (x, y) is the intensity function of the image.
In edge detection, the Laplace filter is convolved with the image to highlight regions of rapid intensity
change, which typically correspond to edges in the image. The Laplace filter is particularly sensitive to noise,
so it is often applied after smoothing the image with a Gaussian filter to reduce noise.
16
13 Laplacian of Gaussian (LoG) Filter
The Laplacian of Gaussian (LoG) filter is a combination of the Laplace filter and the Gaussian filter. It first
applies Gaussian smoothing to the image to reduce noise and then computes the Laplacian of the smoothed
image to detect edges more effectively.
The LoG filter kernel is obtained by convolving the Laplacian filter kernel with the Gaussian filter kernel:
14 Zero-Crossing
In image processing, a zero-crossing refers to the point in an image where the intensity changes sign along a
particular direction. Zero-crossings often occur at edges in the image, where the intensity transitions from
dark to light or vice versa.
Zero-crossings can be detected by examining the signs of neighboring pixel intensities. A zero-crossing is
identified when the intensity values on opposite sides of a pixel transition from negative to positive or from
positive to negative.
It is commonly used in edge detection algorithms to identify the location and orientation of edges in
an image. By detecting zero-crossings, edge pixels can be accurately located, allowing for precise edge
localization and segmentation.
15 Derivative Filters
Derivative filters are convolution kernels used to compute the derivative of an image function with respect
to spatial coordinates. They are commonly used in edge detection and feature extraction tasks in image
processing.
Types
There are several types of derivative filters, including the Sobel filter, Prewitt filter, and Roberts filter.
These filters compute the first-order derivative of the image intensity function along horizontal and vertical
directions.
Application
Derivative filters are widely used in edge detection algorithms to compute the gradient of the image intensity
function. By convolving the image with derivative filters, the rate of change of intensity in different directions
can be measured, allowing for the detection of edges and other features in the image.
17
16.2 Reconstruction from 2D Derivatives
The gradient magnitude and direction can be reconstructed from the derivatives computed using the Sobel
filter. The gradient magnitude represents the strength of the edge, while the gradient direction indicates the
orientation of the edge in the image.
17 Edge Detector
An edge detector is an image processing algorithm used to identify and localize the boundaries of objects or
regions within an image. It works by detecting sharp changes in intensity, which often correspond to edges
or boundaries between different objects or textures in the image.
18
3. Compute Gradient Magnitude and Orientation ate each pixel. This is usually done using
derivative filters like Sobel or Prewitt.
4. Non-Maximum Suppression to thin the edges detected in the previous step. This involves sup-
pressing all gradient values except for the local maxima along the direction of the edge.
5. Apply Hysteresis Thresholding to classify the edges as strong, weak, or non-edges based on their
gradient magnitudes. This involves using two thresholds: a high threshold to identify strong edges and
a low threshold to identify weak edges.
6. Edge Tracking by Hysteresis Connect the strong edges identified by the high threshold with neigh-
boring weak edges above the low threshold. This helps ensure that edges are continuous and not
fragmented.
7. Output the Detected Edges - Finally, output the detected edges as a binary image, where edge
pixels are marked as white and non-edge pixels as black.
19