0% found this document useful (0 votes)
20 views

Computer Vision - Notes - ch2

Uploaded by

svv1856
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Computer Vision - Notes - ch2

Uploaded by

svv1856
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

COMPUTER VISION - EC01 & EC02

Ex 2.1: Least squares intersection point and line fitting


~
1. If you are given more than two lines and want to find a point X that minimizes the sum
of squared distances to each line, how can you compute this quantity? (Hint: Write the
~T ~
dot product as X l i and turn the squared quantity into a quadratic form, ~ T
X A X.
~
2. To fit a line to a bunch of points, you can compute the centroid (mean) of the points
as well as the covariance matrix of the points around this mean. Show that the line
passing through the centroid along the major axis of the covariance ellipsoid (largest
eigenvector) minimizes the sum of squared distances to the points.
3. These two approaches are fundamentally different, even though projective duality tells
us that points and lines are interchangeable. Why are these two algorithms so apparently
different? Are they actually minimizing different objectives?

Answer
2.1.1
Given the sum of squared distances is

The dot product can be rewritten as

Where

The obtained quadratic form of D can be minimized by taking the gradient of D with respect to x
and setting it equal to zero. Thus,

Thus the required point can be found out.

2.1.2
Fitting a Line to a Set of Points Using the Co-
variance Matrix
Consider a set of n points in , given by:
The centroid (mean) c of these points is given by:

The covariance matrix Σ of the points is defined as:

Eigenvectors and Eigenvalues of the Covariance Matrix


The covariance matrix Σ can be diagonalized as:

where V is a matrix whose columns are the eigenvectors of Σ, and Λ is a diagonal matrix whose
diagonal elements are the corresponding eigenvalues λ1 ≥ λ2.
The eigenvector corresponding to the largest eigenvalue λ1 is called the major axis of the
covariance ellipsoid. This eigenvector represents the direction of
maximum variance in the data
Fitting a Line to the Points
We want to find the line l(t) = c + tv that minimizes the sum of squared
distances to the points, where v is a unit vector in the direction of the line, and
t is a scalar parameter.
The squared distance from a point xi to the line is given by the squared
norm of the projection of xi − c onto the direction orthogonal to v:

The total sum of squared distances is

This simplifies to

Minimization of the Sum of Squared Distances

The term
is independent of the choice of v. Hence, minimizing the total distance is equivalent to
maximizing the term
This can be rewritten as:

The vector v that maximizes this expression is the eigenvector corresponding to the largest
eigenvalue λ1 of the covariance matrix Σ. Therefore, the line that minimizes the sum of squared
distances is the line passing through the centroid c and in the direction of the eigenvector
associated with the largest eigenvalue

In conclusion it can be said that the line passing through the centroid of the points along the
direction of the largest eigenvector of the covariance matrix (the major axis of the covariance
ellipsoid) indeed minimizes the sum of squared distances from the points to the
line.

2.1.3
Although points and lines are interchangeable due to projective duality, the two algorithms look
different because one minimizes the distance between a point and lines while the other
minimizes the distance from points to a line. They are minimizing different objectives.
ALGORITHM 1 = Fitting a Line to Points (Principal Component Analysis - PCA)
Objective: Find a line that minimizes the sum of squared perpendicular (orthogonal) distances
from a set of points to the line.
Approach:
● Compute the centroid (mean) of the points.
● Compute the covariance matrix of the points around this centroid.
● The line of best fit is along the eigenvector of the covariance matrix corresponding to the
largest eigenvalue.
What is Minimized: This approach minimizes the sum of squared orthogonal distances of the
points to the line. It is fundamentally about capturing the direction of maximum variance among
the points. In other words, it's about finding the line that best represents the spread of the data.
ALGORITHM 2 = Fitting a Point to Lines
Objective: Find a point that minimizes the sum of squared perpendicular (orthogonal) distances
to a set of lines.
Approach:
● The lines are described by direction vectors and points on each line.
● The optimal point is found by solving a system of equations derived from minimizing the
sum of squared distances to each line.
What is Minimized: This approach minimizes the sum of squared distances from a single point
to multiple lines. It's about finding the point that is closest to all the lines in the least squares
sense.
In the PCA-based problem, duality might transform the problem into finding a point that
maximizes some projection or minimizes some different functional, but it wouldn’t directly map
onto the point-to-lines problem without changing the nature of what’s being minimized.
In the point-to-lines problem, projective duality would relate this to some line-to-point problem,
but this wouldn’t map directly to PCA since PCA is fundamentally about variance, not distance
minimization per se.

Ex 2.2: 2D transform editor. Write a program that lets you interactively create a set of
rectangles and then modify their “pose” (2D transform). You should implement the
following
steps:
1. Open an empty window (“canvas”).
2. Shift drag (rubber-band) to create a new rectangle.
3. Select the deformation mode (motion model): translation, rigid, similarity, affine, or
perspective.
4. Drag any corner of the outline to change its transformation.
This exercise should be built on a set of pixel coordinate and transformation classes,
either
implemented by yourself or from a software library. Persistence of the created
representation
(save and load) should also be supported (for each rectangle, save its transformation).

ANSWER

Python Code:

import tkinter as tk

import numpy as np

class TransformableCanvas:

def __init__(self, root):

self.root = root

self.canvas = tk.Canvas(root, width=800, height=600, bg="black")

self.canvas.pack(side=tk.LEFT)

self.start_x = None
self.start_y = None

self.rect = None

self.rect_coords = None

self.canvas.bind("<Button-1>", self.start_draw)

self.canvas.bind("<B1-Motion>", self.draw_rect)

self.canvas.bind("<ButtonRelease-1>", self.finish_draw)

self.create_buttons()

def create_buttons(self):

button_frame = tk.Frame(self.root, bg="black")

button_frame.pack(side=tk.RIGHT, fill=tk.Y, padx=10)

button_style = {"fg": "red", "bg": "black", "font": ("Helvetica", 12)}

translate_button = tk.Button(button_frame, text="Translate", command=self.translate,


**button_style)

translate_button.pack(pady=5)

rigid_button = tk.Button(button_frame, text="Rigid Transform",


command=self.rigid_transform, **button_style)

rigid_button.pack(pady=5)
similarity_button = tk.Button(button_frame, text="Similarity Transform",
command=self.similarity_transform, **button_style)

similarity_button.pack(pady=5)

affine_button = tk.Button(button_frame, text="Affine Transform",


command=self.affine_transform, **button_style)

affine_button.pack(pady=5)

perspective_button = tk.Button(button_frame, text="Perspective Transform",


command=self.perspective_transform, **button_style)

perspective_button.pack(pady=5)

def start_draw(self, event):

self.start_x = event.x

self.start_y = event.y

def draw_rect(self, event):

if self.rect:

self.canvas.delete(self.rect)

self.rect = self.canvas.create_rectangle(self.start_x, self.start_y, event.x, event.y,


outline="#FF0000") # Neon yellow

def finish_draw(self, event):

self.rect_coords = self.canvas.coords(self.rect)

self.rect_coords = np.array([
[self.rect_coords[0], self.rect_coords[1], 1],

[self.rect_coords[2], self.rect_coords[1], 1],

[self.rect_coords[2], self.rect_coords[3], 1],

[self.rect_coords[0], self.rect_coords[3], 1]

])

def apply_transformation(self, matrix):

if self.rect_coords is None:

return

transformed_coords = np.dot(self.rect_coords, matrix.T)

# Ensure the transformed coordinates maintain the rectangle's structure

x_min, y_min = transformed_coords[:, 0].min(), transformed_coords[:, 1].min()

x_max, y_max = transformed_coords[:, 0].max(), transformed_coords[:, 1].max()

self.canvas.coords(self.rect, x_min, y_min, x_max, y_max)

def translate(self):

dx = 20

dy = 20

matrix = np.array([

[1, 0, dx],

[0, 1, dy],

[0, 0, 1]
])

self.apply_transformation(matrix)

def rigid_transform(self):

angle = np.radians(15)

dx = 20

dy = 20

matrix = np.array([

[np.cos(angle), -np.sin(angle), dx],

[np.sin(angle), np.cos(angle), dy],

[0, 0, 1]

])

self.apply_transformation(matrix)

def similarity_transform(self):

angle = np.radians(15)

scale = 1.2

dx = 20

dy = 20

matrix = np.array([

[scale * np.cos(angle), -scale * np.sin(angle), dx],

[scale * np.sin(angle), scale * np.cos(angle), dy],

[0, 0, 1]
])

self.apply_transformation(matrix)

def affine_transform(self):

matrix = np.array([

[1.2, 0.2, 20],

[0.1, 1.1, 20],

[0, 0, 1]

])

self.apply_transformation(matrix)

def perspective_transform(self):

matrix = np.array([

[1, 0.2, 0],

[0.2, 1, 0],

[0.0005, 0.0005, 1]

])

self.apply_transformation(matrix)

if __name__ == "__main__":

root = tk.Tk()

app = TransformableCanvas(root)

root.mainloop()
Images:

Org
Translate

Rigid:

Similarity:
Ex 2.3: 3D viewer. Write a simple viewer for 3D points, lines, and polygons. Import a set
of point and line commands (primitives) as well as a viewing transform. Interactively
modify the object or camera transform. This viewer can be an extension of the one you
created in Exercise 2.2. Simply replace the viewing transformations with their 3D
equivalents. (Optional) Add a z-buffer to do hidden surface removal for polygons.
(Optional) Use a 3D drawing package and just write the viewer control.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D, art3d
import numpy as np

class Simple3DViewer:
def __init__(self):
self.fig = plt.figure()
self.ax = self.fig.add_subplot(111, projection='3d')
self.points = []
self.lines = []
self.polygons = []
self.view_transform = np.eye(4)
self.ax.set_xlabel('X')
self.ax.set_ylabel('Y')
self.ax.set_zlabel('Z')

def add_point(self, x, y, z, color='b'):


self.points.append((x, y, z, color))

def add_line(self, start, end, color='r'):


self.lines.append((start, end, color))

def add_polygon(self, vertices, color='g'):


self.polygons.append((vertices, color))

def plot(self):
for point in self.points:
x, y, z, color = point
self.ax.scatter(x, y, z, c=color)
for line in self.lines:
start, end, color = line
xs, ys, zs = zip(*[start, end])
self.ax.plot(xs, ys, zs, color=color)
for polygon in self.polygons:
vertices, color = polygon
vertices_2d = [(x, y) for x, y, z in vertices]
poly = plt.Polygon(vertices_2d, color=color, alpha=0.5)
self.ax.add_patch(poly)
art3d.pathpatch_2d_to_3d(poly, z=0, zdir="z")

plt.show()

viewer = Simple3DViewer()
viewer.add_point(1, 2, 3)
viewer.add_line((1, 2, 3), (4, 5, 6))
viewer.add_polygon([(1, 1, -1), (1, 2, 1), (2, 2, 1), (2, 1, 1), (1, 1, 1)])
viewer.plot()
Q2) Ex 2.4) : Focus distance and depth of field. Figure out how the focus distance and
depth of field indicators on a lens are determined.
1. Compute and plot the focus distance zo as a function of the distance
traveled from the focal length ∆zi = f − zi for a lens of focal length f (say,
100mm). Does this explain the hyperbolic progression of focus distances
you see on a typical lens (Figure 2.20)?
2. Compute the depth of field (minimum and maximum focus distances) for
a given focus setting zo as a function of the circle of confusion diameter c
(make it a fraction of the sensor width), the focal length f, and the f-stop
number N (which relates to the aperture diameter d). Does this explain
the usual depth of field markings on a lens that bracket the in-focus
marker, as in Figure 2.20a?
3. Now consider a zoom lens with a varying focal length f. Assume that as you zoom, the
lens stays in focus, i.e., the distance from the rear nodal point to the sensor plane zi
adjusts itself automatically for a fixed focus distance zo. How do the depth of field
indicators vary as a function of focal length? Can you reproduce a two-dimensional plot
that mimics the curved depth of field lines seen on the lens in Figure 2.20b?

Solution:

2.4.1) The focus distance zo is related to the object distance zi and the focal length f of the lens
by the lens equation:

Given the difference Δzi=f−zi, we can rewrite the object distance zi as:

Substituting this into the lens equation, we get:

We'll plot zo as a function of Δzi for a focal length f=100 mm. The plot will illustrate how the
focus distance changes with the change in distance We'll plot zo as a function of Δzi for a focal
length f=100 mm. The plot will illustrate how the focus distance changes with the change in
distance travelled from the focal We'll plot zo as a function of Δzi for a focal length f=100 mm.
The plot will illustrate how the focus distance changes with the change in distance We'll plot zo
as a function of Δzi for a focal length f=100 mm. The plot will illustrate how the focus distance
changes with the change in distance travelled from the focal length.
Observations:

● The relationship between zo and Δzi is hyperbolic, which explains the hyperbolic
progression of focus distances you typically see on a lens.
● As Δzi approaches zero (i.e., the object distance approaches the focal length), the focus
distance zo tends to infinity, which corresponds to the far end of the hyperfocal distance.
● When Δzi is negative, zo can be negative, indicating the image formation would be
virtual (as in a virtual image for lenses).

The hyperbolic progression of focus distances on a typical lens can be explained by the
relationship derived from the lens equation:

distance 𝑧i (or the distance travelled from the focal length Δ𝑧𝑖=(𝑓-𝑧i) , it shows that as 𝑧𝑖 (or
When you rearrange this equation to express the focus distance zo as a function of the image

Δzi) changes, 𝑧0 varies in a hyperbolic manner. Specifically, when 𝑧𝑖 is very close to the focal
length 𝑓, the focus distance 𝑧0 becomes large and changes rapidly. As 𝑧𝑖 moves further away
from 𝑓, the rate of change of 𝑧0 slows down, which leads to the hyperbolic progression seen on
lens markings.

This hyperbolic nature explains why the focus distances marked on lenses are not equally
spaced. Instead, they cluster closer together at shorter distances and spread out more as the
focus distance increases, which is a characteristic of the hyperbolic function. This progression is
what you typically observe on lens distance scales.

2.4.2) Here the parameters given are:

Circle of confusion (c): The maximum diameter of the image of a point light source that is
still perceived as a point by the human eye.

Focal length (f): The distance between the lens's optical center and the image sensor when
focused at infinity.

f-stop number (N): The ratio of focal length to aperture diameter (N = f/d).

Focus setting (zo): The distance between the lens and the subject in focus.

Depth of field is the range of distances from the camera where objects are acceptably sharp.

Here, we need to calculate Depth of field from these parameters

Aperture diameter (d) = f / N


Hyperfocal distance (H) = f² / (Nc)
Near focus limit (Dn) = (H * zo) / (H + zo - f)

Far focus limit (Df) = (H * zo) / (H - zo + f)

We find Depth of Field (DOF) which is the difference between the far and near focus limits
(Df-Dn)

We can approximate Dn =z0/(1+(c*N*z0/f²))

Df= z0/(1-(c*N*z0/f²))

Hence Depth of Field is approximated as

Yes, the depth of field markings on a lens that bracket the in-focus marker can be explained by
these calculations

As seen in Figure 2.20a, the depth of field markings typically consist of two sets of numbers on
either side of the focus distance. These numbers represent the near and far focus limits for a
specific aperture setting.
Therefore, the markings on the lens effectively visualize the depth of field calculations we've
done, providing a practical way for photographers to estimate the acceptable sharpness range
without complex calculations. In essence, the depth of field are a visual representation of the
mathematical relationships between aperture, focal length, focus distance, and circle of
confusion.

2.4.3) For a zoom lens with a varying focal length f, the depth of field (DOF) indicators will
change dynamically as the focal length is adjusted. The key point is that as we zoom in or out,
the focal length f changes while the focus distance zo remains fixed, meaning the lens
automatically compensates by adjusting the distance zi from the rear nodal point to the sensor
plane.

● The depth of field is inversely proportional to the focal length f. Specifically, as f


increases (zooming in), the DOF becomes shallower, meaning the range of distances
over which objects appear sharp is reduced. Conversely, as f decreases (zooming out),
the DOF becomes broader.
● The depth of field for different focal lengths can be computed using:

● The near and far limits of the DOF can be expressed as:

The following plot shows the near and far depth of field (DOF) limits as functions of the focal
length for a fixed focus distance. The plot mimics the curved depth of field lines seen on a lens,
as illustrated in Figure 2.20b.
Explanation:

● Near DOF Limit: The yellow curve represents the closest distance at which objects
appear acceptably sharp.
● Far DOF Limit: The orange curve represents the farthest distance where objects remain
sharp.
● Depth of Field: The shaded grey area represents the range where objects are in focus,
between the near and far DOF limits.

As the focal length increases, the depth of field narrows, which is why telephoto lenses have a
more restricted range of acceptable focus compared to wide-angle lenses. This variation in
depth of field is why the curved depth of field lines appear as they do on zoom lenses.

This plot effectively demonstrates how depth of field indicators change with varying focal
lengths, helping to visualize the concept.

The depth of field indicators on a zoom lens change as a function of the focal length f. As you
zoom in, the indicators move closer together, indicating a reduced depth of field. Conversely, as
you zoom out, the indicators spread apart, indicating a greater depth of field.
2.5) F-numbers and shutter speeds. List the common f-numbers and shutter speeds that
your camera provides. On older model SLRs, they are visible on the lens and shut ter
speed dials. On newer cameras, you have to look at the electronic viewfinder (or LCD
screen/indicator) as you manually adjust exposures.

1. Do these form geometric progressions; if so, what are the ratios? How do these relate
to exposure values (EVs)?

2. If your camera has shutter speeds of 1 60 and 1 125, do you think that these two
speeds are exactly a factor of two apart or a factor of 125/60 = 2.083 apart?

3. How accurate do you think these numbers are? Can you devise some way to measure
exactly how the aperture affects how much light reaches the sensor and what the exact
exposure times actually are?
Ans)

Common F-numbers:

F-numbers represent the aperture settings on a camera lens. Common f-numbers are:

● f/1.4, f/2, f/2.8, f/4, f/5.6, f/8, f/11, f/16, f/22

Common Shutter Speeds:

Shutter speeds are typically measured in seconds or fractions of a second. Common shutter
speeds are:

● 1/1000 s, 1/500 s, 1/250 s, 1/125 s, 1/60 s, 1/30 s, 1/15 s, 1/8 s, 1/4 s, 1/2 s, 1 s
1. Geometric Progressions and Ratios

Geometric Progression:

F-numbers: The sequence of f-numbers forms a geometric progression. The ratio


between consecutive f-numbers is approximately √2 (about 1.41). This ratio reflects
the fact that the area of the aperture (and thus the amount of light passing through)
is halved as we move to the next higher f-number.

Shutter Speeds: The sequence of shutter speeds also forms a geometric progression, where
the ratio between consecutive shutter speeds is 2. Each step represents either a doubling or
halving of the exposure time, which directly affects the exposure value (EV).

Relation to Exposure Values (EVs):

Each step in either the f-number or shutter speed sequence represents a change of 1 EV. A
higher EV means less exposure (darker image), and a lower EV means more exposure (brighter
image). Increasing the f-number or halving the shutter speed by one step both reduce the
exposure by 1 EV.

2. Shutter Speeds of 1/60 and 1/125

If our camera has shutter speeds of 1/60 s and 1/125 s, they are not exactly a factor
of two apart. The factor is approximately 125/60 ≈ 2.083. However, in practice,
these values are chosen because they are close to a doubling in exposure time. The
discrepancy is small, but not exactly 2.

3. Accuracy of F-numbers and Shutter Speeds

Accuracy: The f-numbers and shutter speeds are typically quite accurate, but there can be
slight variations due to manufacturing tolerances and camera calibration. Modern cameras,
especially digital ones, tend to be very precise, but older mechanical cameras might have small
discrepancies.

Measuring Aperture and Exposure Times:

Aperture: We can measure the effect of aperture on light exposure using a light meter. By
taking a series of exposures at different f-numbers with the same shutter speed, We can
measure the intensity of light that reaches the sensor and compare it to the expected values
based on the f-number sequence.
Exposure Time: We can measure the exact exposure times using an oscilloscope or a high-
speed light sensor to capture the duration of the shutter opening. Comparing this with the
camera's indicated shutter speed can give we an idea of the accuracy.

Q6)Ex 2.6:Estimate the amount of noise in your camera by taking repeated shots of a
scene with the camera mounted on a tripod. (Purchasing a remote shutter release is a
good investment if you own a DSLR.) Alternatively, take a scene with constant color
regions (such as a color checker chart) and estimate the variance by fitting a smooth
function to each color region and then taking differences from the predicted function.

1. Plot your estimated variance as a function of level for each of your color channels
separately.
2. Change the ISO setting on your camera; if you cannot do that, reduce the overall light
in your scene (turn off lights, draw the curtains, wait until dusk). Does the amount of
noise vary a lot with ISO/gain?
3. Compare your camera to another one at a different price point or year of make. Is
there evidence to suggest that “you get what you pay for”? Does the quality of digital
cameras seem to be improving over time?

Sol:

1.

import numpy as np

import cv2

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

# Load multiple images (replace 'image1.jpg', 'image2.jpg', etc., with your actual image paths)

image_paths = ['image1.jpg', 'image2.jpg', 'image3.jpg']

images = [cv2.imread(image) for image in image_paths]

# Convert images to RGB and stack them for analysis

images_rgb = [cv2.cvtColor(image, cv2.COLOR_BGR2RGB) for image in images]

stacked_images = np.stack(images_rgb, axis=-1)


# Define a smooth function (e.g., polynomial function)

def smooth_func(x, a, b, c):

return a * x**2 + b * x + c

# Extract color channels

color_channels = ['Red', 'Green', 'Blue']

channel_data = [stacked_images[:, :, i, :] for i in range(3)] # 3 channels (R, G, B)

# Placeholder for variance and intensity data

variances = []

intensity_levels = []

# Iterate through each channel

for channel_index, channel_name in enumerate(color_channels):

# Flatten the channel data

flattened_data = channel_data[channel_index].reshape(-1, stacked_images.shape[-1])

# Estimate intensity levels (mean across all images for each pixel)

mean_intensity = np.mean(flattened_data, axis=1)

# Fit the smooth function to the mean intensity data

popt, _ = curve_fit(smooth_func, np.arange(len(mean_intensity)), mean_intensity)


# Compute the predicted values from the smooth function

predicted_intensity = smooth_func(np.arange(len(mean_intensity)), *popt)

# Calculate differences and estimate variance

differences = flattened_data - predicted_intensity[:, None]

variance = np.var(differences, axis=1)

# Store the variance and intensity level data for plotting

variances.append(variance)

intensity_levels.append(mean_intensity)

# Plot the estimated variance as a function of intensity level

plt.figure()

plt.scatter(mean_intensity, variance, alpha=0.5)

plt.title(f'Estimated Variance vs Intensity Level ({channel_name} Channel)')

plt.xlabel('Intensity Level')

plt.ylabel('Estimated Variance')

plt.grid(True)

plt.show()
2.Noise Estimation in Photography

When capturing images with a digital camera, you may notice some undesired speckles or
graininess, known as 'noise.' Understanding and estimating this noise is essential for
photographers aiming to enhance the quality of their images. Noise can affect the sharpness
and detail in photos, and it becomes particularly noticeable under certain conditions, such as
low light.

To estimate noise, a practical exercise involves taking multiple shots of a steady scene with the
camera fixed on a tripod. A scene with consistent color regions, like a color checker chart,
provides a clear basis for spotting variations caused by noise. By comparing these images,
especially focusing on areas that should have uniform color, you can measure the variance
which indicates the level of noise across the image.

Variance Analysis in Color Channels

Digital images are typically composed of three color channels: red, green, and blue. Each
channel represents the intensity of the respective colors across the image. Analyzing the
variance in these channels helps pinpoint the noise level in each. The process starts with fitting
a smooth function to the pixel values within a uniform color region and then calculating the
differences between the actual pixel values and the predicted values obtained from the function.
Visualizing Noise Variance

By plotting these differences for all the pixels in a channel, one can visualize a variance plot.
This plot represents how noise levels fluctuate within that channel. A higher variance typically
implies greater noise, which can lead to a grainy appearance in the photo, especially noticeable
in darker or more uniform areas.

ISO Settings Impact on Noise

ISO settings on a camera adjust the sensor's sensitivity to light: a higher ISO setting boosts this
sensitivity, allowing for clearer pictures in lower light. However, an increased ISO also amplifies
the signal noise. Thus, understanding the relationship between ISO and noise is crucial,
especially when shooting in challenging light conditions.

Testing ISO's Effect

By altering the ISO settings or adjusting the scene's lighting if ISO can't be changed, and then
analyzing variance plots, one can observe distinct changes in noise levels. Higher ISO settings
generally result in more noise. This relationship underscores the trade-off photographers must
consider between image brightness and noise levels.

3.

Digital Camera Quality Over Time

As technology progresses, we observe a general trend of improvement in digital camera quality.


Cameras with higher price points or more recent production dates often exhibit lower noise
levels, thanks to advancements in sensor design and image processing algorithms.

Comparing cameras across different generations or price brackets provides insights into these
technological advances. A more expensive or newer model is likely to handle noise more
effectively, producing clearer images with fine details even in low light conditions.

Older cameras may show higher noise levels at the same ISO settings compared to newer
models

Investment in a higher-quality camera can yield a noticeable reduction in image noise

Continuous improvements in camera technology suggest an ongoing trend toward higher image
quality over time
These observations underline the benefits of calibrating and understanding your camera's noise
performance, guiding purchases and usage for optimal photo outcomes.

Q7) Ex 2.7: Gamma correction in image stitching. Here’s a relatively simple puzzle.
Assume you are given two images that are part of a panorama that you want to stitch
(see Section 8.2). The two images were taken with different exposures, so you want to
adjust the RGB values so that they match along the seam line. Is it necessary to undo the
gamma in the color values in order to achieve this?

When stitching images with different exposures, it is generally necessary to undo the gamma
correction to achieve accurate color matching along the seam line. Gamma correction is a
nonlinear process that affects how luminance and color values are stored. Most images are in a
gamma-corrected space, where pixel values do not directly correspond to actual light intensity.

To blend or average pixel values correctly, you need to work in a linear color space where the
relationship between pixel values and light intensity is linear. This involves undoing the gamma
correction, performing the necessary adjustments, and then reapplying the gamma correction
afterward. This approach ensures that the adjustments are based on true light intensity, leading
to smoother and more natural transitions between the stitched images.

The accuracy of aperture and shutter speed values on cameras can vary slightly due to
mechanical tolerances or electronic timing limitations, but modern cameras are generally quite
precise. However, to measure their exact accuracy:

Aperture Accuracy:

● Light Meter Method: Use a handheld light meter to measure the light intensity at
different f-stops. This will allow you to verify whether each f-stop change results in the
expected doubling or halving of light. Any deviations from the theoretical values indicate
aperture inaccuracy.
● Photometric Testing: Capture a series of images at different apertures under the same
lighting conditions. Use image processing software to analyze the brightness of the
photos. A consistent 50% reduction in exposure for each full stop indicates accurate
aperture control.

Shutter Speed Accuracy:

● High-Speed Light Sensor Method: Connect a high-speed light sensor to a


microcontroller (e.g., Arduino). The sensor detects changes in light intensity as the
shutter opens and closes, and the microcontroller measures the exact duration. This will
help you determine if the actual shutter speed matches the camera’s indicated values.
● Audio/Visual Method: Record the shutter's operation using a high-frame-rate camera or
microphone. By analyzing the video frames or audio waveform, you can determine the
exact time the shutter remains open, comparing this with the stated shutter speed.
Q8) Ex: 2.8 - White point balancing—tricky. A common (in-camera or post-processing)
technique for performing white point adjustment is to take a picture of a white piece of
paper and to adjust the RGB values of an image to make this a neutral color.

1. Describe how you would adjust the RGB values in an image given a sample “white
color” of (Rw, Gw, Bw) to make this color neutral (without changing the exposure too
much).

2. Does your transformation involve a simple (per-channel) scaling of the RGB values or
do you need a full 3 × 3 color twist matrix (or something else)?

3.Convert your RGB values to XYZ. Does the appropriate correction now only depend on
the XY (or xy) values? If so, when you convert back to RGB space, do you need a full 3 ×
3 color twist matrix to achieve the same effect?

4. If you used pure diagonal scaling in the direct RGB mode but end up with a twist if you
work in XYZ space, how do you explain this apparent dichotomy? Which approach is
correct? (Or is it possible that neither approach is actually correct?)

1. Adjusting RGB Values to Make the Sample White Color Neutral

Given a sample white color with RGB values (Rw,Gw,Bw), the objective is to adjust these values
so that the color appears as a neutral white, typically (255,255,255) in 8-bit color.

Steps:

● Calculate Scaling Factors: To achieve neutrality, each RGB channel can be scaled to
bring the sample white to the desired value:

● Apply Scaling: The RGB values of every pixel in the image are then adjusted using
these scaling factors. For any pixel with original RGB values (R,G,B), the new values are
calculated as:
2. Scaling vs. 3×3 Color Twist Matrix

Simple Per-Channel Scaling:

● The method described above involves simple per-channel scaling. This approach
adjusts each channel independently, which works well if the color imbalance is uniform
across the entire image.

3×3 Color Twist Matrix:

● In more complex scenarios, where the relationship between RGB channels is not
straightforward (for example, due to non-uniform lighting or color distortions), a 3×3
color twist matrix might be required. This matrix allows for adjustments that account for
the interaction between channels, correcting colors in a more sophisticated manner.

● Simple scaling is sufficient for straightforward white balance correction.


● A 3×3 matrix is needed for complex color corrections where channels interact in non-
linear ways.

3. Converting RGB Values to XYZ and Correcting Based on XY Values

Converting to XYZ:

The RGB values can be converted to the XYZ color space using the standard transformation
matrix:

Correction in XYZ Space:

● In XYZ space, chromaticity is often described using x and y values, where:


· The correction might aim to adjust x and y to match the chromaticity coordinates of a
standard white point (like D65 for daylight).

● When converting back to RGB after correcting in XYZ, you may find that a simple per-
channel scaling in XYZ does not map neatly back to RGB. A 3×3 color twist matrix may
be necessary to maintain color accuracy after conversion, reflecting the complex
interdependencies in color perception.

4. Explaining the Dichotomy: Scaling in RGB vs. Twist in XYZ

Pure Diagonal Scaling in RGB:

● When you apply pure diagonal scaling in RGB space, each color channel is adjusted
independently. This works under the assumption that color channels are linearly
independent, which is often valid for simple cases.

Twist in XYZ:

● When you correct colors in XYZ space, the relationship between color channels is more
complex due to how human vision perceives color. Simple scaling in XYZ may not
produce the correct results when converted back to RGB, requiring a twist matrix to
properly map the adjustments back to RGB.

Which Approach is Correct?

● Neither approach is universally correct. The best method depends on the specific
scenario:
○ RGB Scaling: Works well for straightforward, uniform color corrections.
○ XYZ with Twist Matrix: Necessary for complex color corrections, where
chromaticity adjustments need to be accurately mapped back to the RGB space.

● RGB Scaling is simple and effective for basic white balancing.


● XYZ Correction with a potential twist matrix is more suitable for complex adjustments,
ensuring accurate color reproduction when converting back to RGB. The approach
chosen should match the complexity of the color imbalance in the image.
Q9) Ex 2.9: In-camera color processing—challenging. If your camera supports a RAW
pixel mode, take a pair of RAW and JPEG images, and see if you can infer what the
camera is doing when it converts the RAW pixel values to the final color-corrected and
gamma-compressed eight-bit JPEG pixel values.

1. Deduce the pattern in your color filter array from the correspondence between co-
located RAW and color-mapped pixel values. Use a color checker chart at this stage if it
makes your life easier. You may find it helpful to split the RAW image into four separate
images (subsampling even and odd columns and rows) and to treat each of these new
images as a “virtual” sensor.

2. Evaluate the quality of the demosaicing algorithm by taking pictures of challenging


scenes which contain strong color edges (such as those shown in in Section 10.3.1).

3. If you can take the same exact picture after changing the color balance values in your
camera, compare how these settings affect this processing.

4. Compare your results against those presented in (Chakrabarti, Scharstein, and Zickler
2009), Kim, Lin et al. (2012), Hasinoff, Sharlet et al. (2016), Karaimer and Brown (2016),
and Brooks, Mildenhall et al. (2019) or use the data available in their database of color
images.

Sol:

1 Learning About the Color Filter Array (CFA): The Bayer pattern, which consists of a grid of
alternating red, green, and blue filters, is the most often used CFA in digital cameras. In this
configuration, half of the sensors are covered in green filters arranged in a checkerboard
pattern, while the remaining sensors are covered in red and blue filters. The rationale for the
higher number of green filters is that the human visual system is more sensitive to luminance,
which is strongly related to green light, than to chrominance.

How to Decipher the Symbol :Capture JPEG and RAW Images: Using your camera, take two
RAW and one JPEG image at first. Make sure the scenario offers a variety of colors and lighting
situations in order to provide a large dataset.

Analyze the RAW Data: Open the RAW image in MATLAB, Python (using OpenCV or PIL), or
any other image processing software that allows you to inspect individual pixel values. The pixel
values will reflect the corresponding light intensity of the red, green, or blue filters because the
RAW image is a perfect reproduction of the sensor data.

Use a Color Checker Chart: To make your analysis easier, include a color checker chart in your
scenario. When a color checker chart is accessible as a trusted reference, it is easier to
ascertain how each color in the chart is represented in the RAW data.

Subsample the RAW Image: To simplify the analysis, you can subsample the RAW image by
dividing it into four separate images:
One image with only even rows and even columns.

one with even rows and odd columns.

one with odd rows and even columns.

One with peculiar rows and columns.

Each of these images can be viewed as a "virtual sensor," a simplified version of the original
image, where each pixel represents one of the three color filters (red, green, or blue).

Link the RAW Data with the Color-Mapped Values:The pixel values of the JPEG image and the
RAW image should be contrasted. The JPEG image has been processed in-camera using
techniques like demosaicing, which entails filling in the color values that are missing. This
connection allows you to deduce how the camera's processing pipeline produces the full-color
image from the RAW data.

Determine the Bayer Pattern: The Bayer pattern, which is distinguished by a 2:1 ratio of green
to red or blue pixels, should be easy to identify with the help of this comparison. Green pixels
are arranged in a checkerboard pattern, while red and blue pixels are arranged in alternating
rows and columns.

Draw the Pattern: You may deduce the exact arrangement of the color filters in the CFA by
using this type of RAW data analysis. You'll likely see a pattern that resembles the Bayer
pattern, in which each color filter contributes to the final image's overall color mapping by
covering a certain set of sensors.

2. Test for the effectiveness of your camera's demosaicing algorithm by taking pictures of
scenes with strong color edges, which pose a big problem for color interpolation. A crucial stage
in digital photography is called demosaicing, in which the camera uses the sensor's incomplete
color data to recreate a full-color image. The majority of cameras include red, green, or blue
filters covering every pixel on the sensor, thus each pixel can only record one of these hues. To
create a full RGB image, the demosaicing algorithm must then interpolate each pixel's missing
color information. The resulting image's sharpness, color correctness, and general integrity are
greatly influenced by the quality of this method.

Choose or create scenes with high contrast color borders to start your analysis. Perfect test
scenarios may have high contrast writing, such black text on a colored background, or
checkerboard patterns with alternating red, green, blue, and white squares. They could also
have sharp color transitions, like a vividly colored object on a contrasting background. These
kinds of pictures are especially hard since the demosaicing algorithm needs to precisely
recreate abrupt color changes without causing visual artifacts. These scenes are a great way to
test how well the algorithm can preserve image quality in difficult situations because of their
complexity.
Once you've chosen your scenes, use your camera to record them in RAW format. The
camera's raw sensor data is preserved by the RAW format, making it possible to assess the
demosaicing procedure directly. After obtaining your photos, use RAW data handling image
processing software to examine them. You may evaluate the effectiveness of various
demosaicing algorithms using programs like MATLAB, OpenCV, or dedicated RAW processing
tools like RawTherapee. To compare the efficacy of several algorithms, attempt applying them
to the same image if at all possible.

Pay close attention to a few crucial areas of image quality when you analyze the data. Examine
the color edges' crispness first. A good demosaicing method should prevent blurring and
maintain the sharpness of color transitions. Next, assess the color accuracy, paying special
attention to the edges where several hues converge. A good algorithm will replicate the colors of
the original scene precisely, without adding any color changes or mixing. Lastly, search for
frequent artifacts linked to subpar demosaicing, such the "zipper" effect, which is characterized
by alternating lines of different colors appearing at edges, or moiré patterns, which are wavy,
artificial patterns that were not in the original scene. Color fringing, an artifact where undesirable
colors emerge around the borders of things, is another one to look out for.

For a comprehensive assessment, check your results against reference photos that have been
processed using well-known or high-quality demosaicing techniques, or refer to the examples
given in your textbook's Section 10.3.1. By using these references as benchmarks, you can
assess how well the algorithm in your camera performs in comparison to industry norms. You
can learn a lot about your camera's demosaicing process and its shortcomings by carrying out
this experiment. This will give you important information about how well your camera performs
when producing high-quality photographs, particularly in situations with complex color dynamics.

3. Camera's color balance settings can have a significant impact on how your image is
processed and presented, including how the colors look. Adjusting the intensity of colors like
red, green, and blue (RGB) has an effect on how these colors are reproduced and captured in
the image. This process is known as color balance.

You may modify the color balance settings to change how the camera reacts to various light
wavelengths. To boost warmth and make the image appear more lively, you can adjust the red
balance of your image to make the reds more prominent. On the other hand, lowering the red
balance can make the image appear colder and the reds appear less intense, giving the image
a sometimes more muted tone.

In a similar vein, changing the green and blue balances will change the picture's overall color
and tone. By making foliage and other green elements appear more vivid, one can improve the
image's clarity and richness of greens by adjusting the green balance. Conversely, depending
on how the green balance interacts with other color choices, lowering it can produce a more
neutral or even reddish tone. The photograph's atmosphere can be changed by adjusting the
blue balance; a higher value can emphasize the yellow and red tones and create a calmer,
colder feeling, while a lower value can warm the image and create a warmer feeling.
In general, adjusting the color balance values modifies how the camera records and reproduces
color, which has an impact on image processing. This adjustment has the power to significantly
alter the image's visual result, impacting not just the color representation but also the image's
mood, depth, and perceived quality. Photographers can enhance the mood or ambiance they
want to portray in their photos and generate a variety of creative effects by experimenting with
different color balance settings.

4.Following the trials, we should compare your results with the conclusions drawn from
important research papers to see how your camera's color processing, demosaicing algorithms,
and color balancing settings effect the final photographs. Chakrabarti, Scharstein, and Zickler
(2009), Kim, Lin et al. (2012), Hasinoff, Sharlet et al. (2016), Karaimer and Brown (2016), and
Brooks, Mildenhall et al. (2019) are a few of the studies that provide insights into different facets
of color image processing, such as color correction, demosaicing techniques, and the effects of
various processing pipelines.

Examine the approaches and findings of these investigations first. Your findings on edge
sharpness and color correctness could be directly compared to Chakrabarti et al. (2009)'s
exploration of advanced demosaicing approaches, which aim to increase the accuracy of color
interpolation, especially at high-frequency edges. Comparing your findings on how color balance
settings affect picture processing with those of Kim, Lin et al. (2012) may provide insights into
how color balance and correction algorithms are tuned for various lighting circumstances.

The performance of several demosaicing algorithms under various settings is documented by


Hasinoff, Sharlet et al. (2016) and Karaimer and Brown (2016). These studies could be used as
a benchmark to evaluate the caliber of the processing algorithms in your camera. Last but not
least, Brooks, Mildenhall, et al. (2019) may provide sophisticated data-driven approaches to
color processing that may clarify how contemporary methods surpass more conventional ones.

We can determine how well your camera performs in comparison to state-of-the-art research by
comparing your results with those of these studies. Additionally, you can compare processed
photos directly or examine particular metrics like color fidelity, edge sharpness, or artifact
prevalence using any datasets that are made available by these papers. This comparison helps
you better understand how various algorithms and processing methods impact image quality in
real-world situations while also validating your findings.

You might also like