0% found this document useful (0 votes)
7 views89 pages

Computer Graphics Notes

The document provides an overview of computer graphics rendering techniques, focusing on ray casting and shading methods. It discusses the processes of visibility determination, light transport, and various shading models such as Phong and Blinn-Phong. Additionally, it covers geometric representations, transformations, projection methods, and rasterization in the context of rendering 3D scenes.

Uploaded by

jin sunwoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views89 pages

Computer Graphics Notes

The document provides an overview of computer graphics rendering techniques, focusing on ray casting and shading methods. It discusses the processes of visibility determination, light transport, and various shading models such as Phong and Blinn-Phong. Additionally, it covers geometric representations, transformations, projection methods, and rasterization in the context of rendering 3D scenes.

Uploaded by

jin sunwoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 89

Computer Graphics Notes

Ray Casting
Rendering
Two steps to rendering

1. Visibility (Ray casting)


2. Light transport / Shading (ray tracing)

Ray Casting

Cast a ray from q into the scene, if its first intersection point is p then p is
visible.

Direct Illumination are rays that go from the light source to the surface

Indirect Illumination are rays that go from surface to surface (reflected


light)
Primary rays solve visibility and start at the sensor
Secondary rays compute light transport and do not start or end at the sensor
- Colour/ illumination
Shadow rays start/end at the light source

Ray
Rays are represented in parametric form

r ( t )=o+ t d
Where o is the origin and d direction.

All intersections between the ray and objects are found and then the closest
is calculated.

Implicit Surfaces
Implicit Surfaces are surfaces defines mathematically e.g. a plane.

A plane can be represented with a normal n and an offset r .

Intersections with a plane and point p satisfy the below

n ⋅(p – r )=0

p−r is the line from r to p and it will be perpendicular to the normal. (the dot
product is zero when perpendicular)

Intersection with ray


And so the intersection with a ray can be solved with the following equation.

t is not the intersection point rather a parameter at which r (t) is an


intersection

Quadrics
Quadrics are implicit surfaces defined by quadratic equations.

E.g. Spheres, cones, hyperboloids

Sphere Example
Sub ray into sphere function, splitting into x,y,z components.

Derive coefficients and solve. The coefficients will be scalar as the insides
are all dot producted out.

Thus in a scene of quadrics you can check each quadric with each ray, and
thus a constant time from each quadric.

Parametric Surfaces
Partial objects

Combined Objects
Add intersect subtract

Implementing intersection

- Analyze intersections of all makeup objects


- Consider which objects the intersections are in

Triangles
Why triangles

- They are simple

How to represent

- Parametric, baycentric

Intersection

- Check for intersection with plane of triangle


- Then check b parameters
Axis aligned boxes
Encapsulate complex geometries with simple shapes with simple intersection
tests

Intersection test

- A box is characterized by 3 slabs a x,y,z slab where a slab is just two


lines enclosing the box along its axis

- Check the rays intersection with the slabs. There will be two
intersections per slab and if these intersections overlap then the ray is
within the box

Boxes can have hierarchies, boxes within boxes and if the parent box is
intersected then you can check for the child boxes and then the primitives.

The speed of a ray tracer is often governed by the speed of its intersection
tests.

Iso-surfaces
Ray casting fluids

Explicit surface

Density mapping
todo
Shading
Context
Shading is the colour and/or intensity computation at a surface position.

Radiance is photons per time per area per angle, how much light is
transported along a ray

Directions
l direction of the source light

n normal of surface

v direction of the sensor

All directions point away from the surface and are normalized

Colour
Coloured light is typically thought of as a 3D vector of RGB colour intensities.
All colour normalised.

( )
Lred
L= Lgreen
L blue

e.g. blue light

()
0
blue light = 0
1

Surfaces also have a colour and are characterized by a reflectance


coefficient (object colour)

( )
ρred
ρ= ρ green
ρblue

How much of a certain light it reflects/absorbed.


L( p , ωo ): Light from position p in direction ω o

Ω : All visible directions from p

Phong Illumination model


light
L : light emitted from the source
surf
L : surface illumination caused by Llight

- Calculated by the angle between l and n


refl
L : how much light is reflected in any direction
- Governed by ρ
cam
L : How much Lrefl is in direction of the sensor

- Material property shininess

Surface Illumination
Light at an angle seems less bright due to Lambert’s cosine law

Cosine of the angle between l and n .

When normalized the cosine of theta can just be the dot product of n and l .
And so Lrefl can be calculated like so.

Diffused Light shading


We assume matte surfaces reflect light equally in all directions.

Specular Light shading

A object colour of white is used as the specular reflection always converges


to the light colour.
m is the shininess of the surface. The higher m the shinier the surface.
Governs the size of the highlight not the intensity
r is the reflection of l in respect to n
Reflection
Can be derived like so

Blinn-Phong illumination model

Uses the half distance between light and viewer. Sometimes faster
calculations?

Energy Conservation
The Phong and Blinn-Phong illuminations models both do not ensure energy
preservation.
Reflected light may exceed the total light shone on the surface depending on
the m

Ambient light
Simply add another term Lindirect to the illumination equation. Apply object
colour to it.

Final Equation

They are often summed with weights α , β , γ . These weights are often user-
defined and independent as energy is not preserved anyways.

Depending on the coefficients you can get different results.


Variants
To do

Limitations
- Approximate
- Limited surfaces, no transparent or subsurface scattering
- Local computation
- Less realistic
- Only point light sources

Extensions
Light attenuation
Light further away is less bright. Illumination decreases by inverse square
law.
light
r =¿∨ p − p∨¿
r is the distance between the light and surface.

To do explain variant

Fog
Light converges to the colour of the fog

A simple linear combination of the computed light and the fog.


Shading Models
Fragment is the information of the area projected onto a pixel

Primitive is the face formed by three or more vertices.

Vertices is the corners of a model.

Flat shading
The fragments of each primitive are shaded the same with the light
computed from one arbitrarily picked vertex.

- Evaluation per vertex

Gouraud Shading
The fragments of the primitive are interpolated from the vertices of that
primitive.

- Evaluation per vertex


- Can cause the highlight to be lost, when the highlight misses the
vertex

Mach band effects can occur where the contrast of edges are greater making
weird artifacts.
Phong Shading
Light computed for each Fragment

Surface normals are interpolated from vertices to fragments.


Homogeneous Notation
Motivation
Affine transformations

A : The transformations matrix

p: the position vector

t : the translation vector

Positions and Vectors


Homogeneous Notation
Homogeneous coordinates of Positions
3-D positions are represented with a homogeneous component w where w ≠ 0.

The 3-D coordinate can be derived from this notation like so

Generally, w=1 as there is no real advantage to use a different w . However,


some transformations may change w .

Thus, there are really an infinite number of representations of a 3-D


coordinate in homogeneous notation.

Homogeneous coordinates of Vectors


Same thing as a position however w=0 .

This allows for the relations of vectors and positions be preserved


Homogeneous coordinates of Linear Transformations

Transformations
Translation

The translation of a vector would change nothing.

Inverse

Self-explanatory inverse, negative of t .


Rotation

Where ϕ is the degrees/radians rotated. Each axis has its own rotation matrix

Inverse
The inverse is simply the transpose. Since:
Reflection
Reflecting in respect to the axis. Simply get the negative of the x, y or z
component.

Inverse
The inverse is the transpose or itself.

Orthogonal

Scaling
You scale in respect to an axis.

To get a uniform scale, s x =s y =s z.


You can also do

Inverse
1
s

Shear

Moving a point in one axis in respect to another axis.

Inverse
Simply do the minus of the shear.

Applications
Compositing Transformations
All transformations can be multiplied together to form one transformation
matrix.

However order matters, the matrix multiplied last is applied first.


Rigid Body Transformation
A transformation that keeps the shape the same, so a rotation and a
translation.

Thus the inverse can be derived like so:

Planes and Normals


To do

Basis Transformation
A 3D coordinate system can be defined as an origin and 3 basis axis vectors.

To move an object from one coordinate system to another,

1. Translate the object by T (−t )


2. Then rotate according to the basis vectors

The basis vectors should be orthonormal.

Object should be rotated by the negative angle.


Projection
Motivation
When rendering a 3D scene, we want to capture depth where objects further
away are smaller.

Homogeneous Equations
Homogeneous Equations are used for projection transformations.

The use of the last row allows to calculate divisions by a linear combination
of p x , p y , p z , w .

Projection in 2D
Terms
v - Viewpoint

p - Point
'
p - Intersection with y axis
d – Viewpoints distance from the origin

We can derive the intersection p' like so, where the view line is the y-axis and
the viewpoint is on the x axis.

Matrix Form
Matrix M allows us to calculate p' from p with one matrix multiplication.

In matrix form it will look like this.

Where you can see how the projection components are used to realize
divisions.
General case
The general case in 2D can be defined like so.

A point can be projected even if it is behind the viewpoint or view


line as long as it is not on the viewpoint.

In the general form matrix M can be derived like so.


Example
When we apply to the previous example you can see it works like so.

Parallel Projection
We can write this matrix in a different form are the same.

This allows for parallel projection as d approaches infinity the y and w


component remains unchanged.
Projection in 3D
General Case

Same derivation as before but the view line is now view plane n.

Example

The matrix now makes the z position zero. While the positions x∧ y rely on z .
Parallel Case

Clip space
Clip Space is where all vertices’ positions are normalized so that it is within
a canonical view volume which is a cube.

NDC Space is clip space where the homogeneous component is normalized


to 1.

- Projection transform is used to transform an object from view space to


clip/NDC space.
- A parallel projection can be used for viewport mapping and relative z
values are preserved.

This allows for simplified and unified implementations of:

- Culling
- Clipping
- Visibility
- Viewport mapping

Derivation

We can derive y p by looking at the similar distances.

Same thing for x p


We know that y n is a linear
transformation of y p .

We can find α and β by subbing in

when y p=b , y n=−1when y p=t , y n=1The


same process for x n

For the z component

We create this and solve for A∧B in the same way by subbing in
z v =n∧w v =1∧z n=−1

z v =f ∧wv =1∧z n=1

Matrix Form
We can the form the matrix

Symmetric
When the view frustum is symmetrical it simplifies things a lot more.
Variants
Some variants are that the camera views down the negative z direction,
which can be written like so.

Non-linear mapping
Depth values are mapped non-linearly.

This means that objects in the canonical view volume and skewed to the far
plane.

This squashing can cause Z-fighting, where two depth values are too close
together to differentiate. Which can cause flickering.

- To stop this, make the near plane as far from the camera as possible
- Make the far plane as close to the near plane as possible
Orthographic Projection Matrix

Involves simple a scale and translation.

Typical use case


Local Space is simply the description of the object.

Global Space is the world coordinates

View Space is the space where the camera is made the origin

Clip Space the space normalised to the canonical view frustum


T −1
P Rcam T cam M i

P projection matrix
T −1
Rcam T cam inverse view matrix, see slides for derivation. AKA V −1

M i model transform matrix on model i


Rasterization
Context
Computation of pixel positions in an image plane that represents a projected
primitive

Rasterization the term is used usually for the pixel position approximation
process, but can also be used to refer to the whole pipeline.

Overlap

Look at depth values

Implemented in clip space

Rasterization based rendering pipeline

For light transport Phong Illumination is used.

Definitions
Vertices and primitives

Fragment is a possible pixel. E.g. a fragment that is occluded by another


fragment is still a fragment

Framebuffer is a set of pixels


Pixels store a colour and position or whatever

Vertex Processing
View transforms

Colour computations, texture computations, stuff to do with vertices

Transform all vertices into the canonical view volume

Gpu assumes in canonical

Z component in NDC space is the depth value

Color can be computed if we have all that other info like normal and colour,
material properties

Texture extra data - used if stored at vertex

Rasterizations
Input: Vertices and faces (connectivity information)

Creates primitives

Fragment for each pixel position

Gets fragments

Interpolation from vertex attributes, Depth values interpolated from the


known vertex depth values

Line rasterization

Algorithm made for slope from 0-1 but can be used for any slope by utilizing
symmetry
Only compute if x +1,y or x+1,y+1

Bresenham Line Algorithm


Line representation with function F= …

See if mid point of E and NE is above or below

If Above go with E otherwise go with NE. We are seeing which is closer to the
line.

Generate fragment at that point

Initialization
We multiply F by two so we only use integers

We can do this coz we only care about the sign of F

Polygon rasterization
For closed shapes

Horizontal scan line

At each intersection we go from inside outside

Attribute Interpolation
Linear interpolation from vertex to fragment

Liner interpolation in view space requires a non-linear interpolation in NDC


space

The homogenous value in clip space is the z value from view space, so
sometimes it best to do interpolation in clip space rather than NDC space.

Coz the non-linear interpolation in clip spaces needs the z value from view
space. Maths explained in slides

Fragment processing
Occlusion and so on

Processing
Combination of fragment colour and texture colour

Fog colour combined with fragment colour

Anti-aliasing

Texturing:
Vertices are assigned a coordinate which is a relative coordinate in the
texture

Texture coordinates are interpolated, then that coordinate is used to lookup


the corresponding pixel data on the texture data

Textures don’t have to be colour can be other stuff

Testing
Scissor tests

- Check if in user defined area

Alpha test:
- Check if the alpha is above a user defined value

Stencil Test

- Some scalar used for specific stuff


- Shadows and so on

Depth test

- If multiple fragments at one pixel


- Fragment depth vs the frame buffer depth value
- Discarded if further, replaced if closer

Framebuffer holds attributes at that pixel, these attributes are separated into
different buffers:

- Depth
- Colour
- Stencil (texture)
- accumulation

Framebuffer update
Update framebuffer attributes colour and so on

Depth value at the framebuffer

Blending
Blending can be done as a linear interpolation according to the alpha value

Overview

We want parallelization
This process is good for hardware

Helps explain some of opengl


Parametric Curves
Application
We want to be able to make curves to do modelling, create paths for
animation and so on.

Curves can be defined by a function with some coefficient.

But the coefficient c is not intuitive.

And so, create a method to create curves that is more understandable.

We use control points to help create intuition.


Polynomial Curves

Bezier curves
Bezier Curves are polynomial curves represented by Control Points.

- It allows for intuitive usage.

Control Points are points that describe the curve. Represented with
pi

- For polynomials of degree n there are n-1 control points

Interpolation
Interpolation is when you try find the intermediary points in between two
points. It is often calculated with:

( 1−t ) p i+ t pi+1

Constant

Line

Interpolation between two points


Quadratic

Interpolation between the interpolation result of two points

Cubic

Interpolation of interpolation of interpolation…

De Casteljau Algorithm
This type of interpolation is called De Casteljau algorithm. Play with it here:

https://www.geogebra.org/m/gfve79ad
Bernstein Polynomials
The coefficients can be derived from Binomial coefficients written as:
i
Bn ( t )

The Bernstein polynomials show the influence of the control point on a


certain point.

The 1−t stuff is when you expand (a+ b)n anda=1−t and b=t .
Matrix Representation
We want to represent this in matrices so that it is easy to work with in
computers.

In Cubics:

You can also have an inverse spline, the transpose of the spline.

The control points are unknown and given as inputs usually.

The Geometry matrix is the control points, the spline matrix with the basis
makes the Bernstein polynomials.

You can replace the spline with different things to calculate


different types of coefficients for different strategies for creating
curves.
You can convert from control points and spline to canonical coefficients.
Curve Subdivision
Bezier curves can be split into two curves with the same number of control
points

After determining a t split

We can derive the new control points shown below.

The start and end new control points are trivial as they include the start and
end of the original curve and the point of t split .

The ones in between can be derived algebraically with matrices, but they are
just the intermediate interpolation between the first and last control points.

Look at slides to how to


derive.
Tangent

x (t )=( x ( t ) , y ( t ) )

Transpose means to convert into a vector



dx( )
t ( t )= t
dt
The tangent vector at t (t) is the derivative of the curve and is also the
direction at t.

Velocity
The tangent can be used as velocity if we interpret it as time.

And the magnitude (speed) can be given.

Acceleration
Acceleration is the derivative of the velocity function.
Continuity

Piecewise Polynomial Curves

Higher degree Bezier curves are ugly and hard to work with. Change one
control point and everything changes.
A when connecting two curves, the endpoint and start point of the two
curves should be the same to achieve C^0 continuity.

To achieve C^1 continuity the connecting point and two adjacent control
points should form a line. The tangent should be equal.

Cubic Hermite
This curve explicitly states the velocity at the two end points.

They can be written as

Where the coefficients can be derived from the end points and respective
velocities. Check slides for derivation.

P are the end points and m the velocities.


Catmull Rom Spline
Variant of the Hermite spline.

Where you are given endpoints but the derivatives at the end points are
derived from given control points.

1st and 3rd points derive the first points velocity and the 2 nd and 3rd points
derive the second velocity

Matrices can be derived from the Hermite spline formulation.

They are C1 continuous and can be extended piecewise, the tangent at one
point equaling the direction formed by the previous and following control
points.
Particle Fluids
Particle simulation
Subdivided into small volumes

Parcels/particles are not just spheres they have a shape and volume that we
do not know

Surfaces are rendered with a triangle mesh

Definitions
Particle motion, quantities

Mass, m

Position, x, 3D vector

Velocity, v, 3D vector

Force, F, 3D vector
F
Acceleration, a= , 3D
m

Density ρ

Pressure p

At time t

H is a time step

We want to calculate quantities at t+ h when we know quantities at t

Governing Equations

The problem we want to solve:


We know initial values x t ∧v t .
we know the governing equations.

What is x t +h and v t+ h.

updating
Explicit Euler
Use Taylor series

Simple linear assumption

Assume velocity is constant

Verlet
Use two more further derivatives of the taylor expansion

Assume acceleration is constant

Governing Equation Navier Stokes Equation

Pressure acceleration

- Pressure is represented in each xyz direction

High pressure to low pressure

Viscosity acceleration

- Looks at the change in velocity around the particle, how velocity


changes as positions changes

Other forces e.g. gravity


Smoothed Particle Hydrodynamics (SPH)
Particles positions are updated according to their velocity and their velocity
is updated according to this equation:

The W_i_j is a kernel which weights the impact of a neighbour based on its
distance from i.

Usually some piecewise function:

Looks gaussian to me

The qualities are used from the last timestamp and then updated
accordingly.

Update with the verlet or the explicit euler.

SPH incorporation into fluid simulation


We want the density of a fluid to be constant. Like real fluids.

So as density of particles change, the pressure changes, so the acceleration


from pressure ensures that the density of a particle doesn’t deviate too much
from the rest density ρ0
Step By Step of SPH Fluid Solver
1. For all particle i find neighbours j
2. Compute density and pressure:
ρi=∑ m j W ij
j

pi=k
( ) ρi
ρ0
−1

3. Compute Accelerations

4. Update Velocity and position according to Explicit Euler or Verlet


Neighbour Search
Uniform grid
Particles are stored in a grid

Grid size if determined by the kernel support. The largest distance between
two particles will still be given a non-zero value by the kernel.

Implementation
A particle is given a unique cell identifier, according to its position.

The identifiers are ordered according to a space filling line so that they can
all be represented in a list where close particles are close to each other on
that list.
Boundary Handling

Possible Implementation
Boundary particles are simply computed as fluids that don’t move.

The pressure of the boundary particle can just be copied from a nearby
particle who’s pressure is known.

Visualization
Iso surface
Image Processing
open

thebox

Noise and Filters


Noise
Noise is disturbance during acquisition or transmission of data

Additive noise
¿
I ij =I i , j +nij

Where I is observed pixel, I* is clean pixel and n is the noise at position i,j.

Noise Distribution
Noise can be distributed with different distributions.

Gaussian Distribution
Multiplicative Noise
¿
I ij =I i , j (1+nij )

Impulse Noise
Certain pixels are randomly replaced by one unipolar or two bipolar fixed
values.

Uniform Noise
Certain pixels are replaced by uniformly distributed random values.
SNR - Signal to noise ratio

Variance of ground truth image over the variance of the noise

PSNR – Peak signal to noise ratio

Range of possible pixels to the sum of total error

PSNR of 40 is okay

PSNR of 7 is almost unrecognizable

Point Operations
Perform a function on each pixel independently

Examples:
Brightness
u ( x , y )=I ( x , y ) +b
Contrast
u ( x , y )=a∗I ( x , y )

Gamma

make sensitivity more similar to a person, dark areas become brighter


without messing up saturation in brighter areas

Normalize and then apply transformation and then denormalize.

Gray value histogram


Histogram equalization makes all gray values equally frequent

Difference
image figure what pixels moved

With a threshold to simply find all pixels that changed above a certain
threshold.

Background Subtraction
Same thing but with a comparison of static image with another object.

Average of three colours.

Filtering
F ( f )∗F ( h )=F (f∗h)

Transform of filter and image equals multiplication of the transforms


High frequency information and low frequency is preserved
Linear Filters
Gaussian Filter
1. Create a gaussian kernel or mask, which is a box of coefficients
calculated by the gaussian distribution with the current pixel as the
center.

2. For each pixel calculate the sum of all pixel values covered by the
mask, multiplied by the corresponding coefficients
3. The new value of the current pixel is this sum.

Can do the x and y separately

Todo convolution explain

Complexity: O(NMσ)

Boundary Conditions
When the mask exits the boundary of the image you can either:

Dirichlet Boundary Conditions fix pixel values outside the boundary,


usually 0

Homogeneous Neumann Boundary Conditions mirror the image along


the boundary

Box filter
Like the gaussian filter but the mask is simply an average of all pixels within
the mask.

Disadvantages:
- Result not as smooth (differentiability is increased only by one order)
- Not rotationally invariant

Complexity: O(NM)
The Box Filter converges to the gaussian filter.

Edge Preserving Filters


Bilateral Filter
Idea: Pixels on different sides of the edge should not be summed

The kernel coefficients are calculated with both spatial similarity and
intensity similarity

The spatial weights are the same as the gaussian values

The intensity weights are calculated with a gaussian function but with the
difference in intensity between the neighbouring pixel and current pixel.

h[m,n] = the output filtered pixel value

1/W_mn = the normalization factor

k,l = the indices of the kernel/mask

g = the spatial weight

r = the intensity weight given the current and neighboring pixel intensities

f = the neighboring pixel intensity

Derivative Filters
A filter to find the gradient at each point in an image. To find edges.

To ensure that images are differentiable a small gaussian filter is applied on


the image.
Sampling the derivative of the gaussian masks gives us an implementation
to calculate the derivative.

Laplace Filter
Uses the second derivative to find edges

Zero crossings are seen as edges

Gradient Magnitude
Magnitude of the gradient is calculated and tested against a threshold for
edges.
Energy Minimization and Optimization
We can write our problems in the form below and then solve them for the
minimum as a tool

The function E(x ) is often called energy. In machine learning it is called the
loss function.

Image Denoising
This method of energy minimization can be used for denoising.

We assume that the output should be:

1. Similar to the input image


2. Similar to neighboring values

Similarity to input term

Smoothness term

We simply compare with two adjacent neighbors


Final Problem

Evaluation of minimization approach


Advantages
- All model assumptions clearly stated (transparency)
- All variables are optimized jointly, interdependencies not lost by
intermediate decisions (optimality)
- Can be analyzed

Disadvantages
- Formalizing assumptions can be arbitrary?
- Choosing weight parameters are hard
- Global optimization is hard

Convexity
For a unique solution to a linear equation, we require it to be convex.

A line formed by two points of f is above f apart from the endpoints.


Every combination of two convex functions is also convex.

Our energy function is a combination of strictly convex (quadratic) functions


thus it has a unique solution.
Process for Minimization
To minimize, the first derivative must be 0.

1. For every ui , j we find its partial derivative

This is our energy function for denoising:

When deriving in terms of ui , j, we must also take the terms from the other
sums, and thus for each linear equation we get four terms from the
smoothness factor.

For all pixels not on the boundary the linear equation to solve will
look like this:

2. We can write this as a large system of linear equations

The 2-dimensional pixels can be written linearly.

The −I i , j can be moved to the other side.

And the coefficients of the pixel can be written separately. (The boundary
pixels as you can see have different coefficients)
Jacobi Method
3. We then form an iterative solution by decomposing the system of
coefficients into its diagonal D and off diagonal M component
A=D+ M

And so, we can now form an iterative equation

Ax=b( D+ M ) x=bD x=b−M x x=D−1( b−M x)


−1
D can be simply computed as the D is a diagonal matrix, so just replace the
diagonal with the inverse.

We start with any x 0 usually the input image and solve iteratively, subbing in
the new value
k+1 −1 k
x =D (b−M x )

4. Iterate until the change in the solution (x k+1 −x k )2 becomes smaller than
some threshold or the residual r k =A x k −b

The Jacobi method is guaranteed to converge if the sum of the diagonals is


larger than the off diagonals. Which is always true for the denoising system
of coefficients.

Pros and Cons


- Simple
- Can be implemented in parallel (pixels are independent)
- Slow convergence
- Convergence for k -> infinity
- Computation not in place (you need to hold both the hold and new
values)
Gauss-Seidel method
We split M further into the upper triangle U and lower triangle L

A=D+ M A=D+(U + L)

And so the final iterative equation is written as


k+1 −1 k +1 k
x =D (b−L x −U x )

As you work through the pixels the new value can simply replace the old
pixel value as the next pixel will use the new value. Thus in-place
computation.

All in the upper triangle, new value for first


pixel
New value in L and so can be used

This allows for quicker convergence, however, hinders parallelization as the


next pixel relies on the new value of the previous pixels.
Successive over-relaxation (SOR)
Simply add a weight to the new and old value, so that the new value has
more strength for faster convergence.
k+1 k −1 k+ 1 k
x =(1−ω)x +ω D (b−L x −U x )
Conjugate Gradient
Two vectors are conjugate with respect to A if their inner product

⟨ u , v ⟩ A ≔uT Av=0
If the two vectors are perpendicular in the space transformed by A .

And so, the n conjugate vectors { p k } forms the basis vectors of R of n


dimensions.

Thus, the solution x ¿ of Ax=b can be expanded as


¿
x =α 1 p 1+ …+α n p n

A combination of the weighted basis vectors

The coefficients can be derived like so:

So, after n computations for each coefficient we get the exact solution for x ¿.
If we solve for good basis vectors, we may do it in less than n as it has
converged enough.

Steps
1. Start with p0 as the residual r 0 =b− A x 0 . The residual is how far the
current solution is from satisfying Ax=b.
2. Then iterate by solving these equations, where α k is the size of the step
to move in direction pk and pk +1=r k +1+ βk pk . Do this n times. The
coefficient β k ensures that the new direction is conjugate.
3. Repeat until the residual is small, guaranteed convergence after n
iterations.

Preconditions
- n is the number of pixels, so it is not feasible to converge fully (too
many pixels)

-
- Pre-conditioners are used to make A have a smaller condition
number

-
- They are applied to both sides and then undone once the solution is
found

-
−1
- P is easy to compute as P is diagonal in this case

Multigrid methods
Motivation
Previous solvers are good at optimizing locally, but it may take many
iterations for local information to reach far neighbors.

Multigrid methods swap from fine to coarse grids so that information can
travel faster in fewer iterations than traditional methods.

Unidirectional
Simply down sample the image and use that as the initial guess x 0. Down-
sampling simply means to take a lower resolution image, with whatever
method of your choosing.
Run several iterations. Up sample the image back to the fine/finer grid and
use the approximate solution as the initial guess. Up sample can involve
interpolating from the coarse solution to the original solution.

Advantages: simple implementation and fast

Disadvantage: Coarse level does not approximate the solution well,


multiple coarse iterations don’t really matter too much

Correcting (Bidirectional) Multigrid


Down sample the error and not the image
Full Multigrid

Start with a down sampled image

Up sample and run the bilateral multigrid method, solving for the error

At each finer level apply a W-cycle

Motion Estimation
Definitions
Motion Estimation involves tracking a pixel’s movement along a screen.

I (x , y , t) define an image, which is the intensity at position x∧ y at time t

Motion Estimation seeks a vector field, tracking the movement of each pixel.
It is defined as

(u , v )(x , y , t) the offset of a pixel in directions u , v at position x , y at time t

u , v shows the offset of a pixel from t+ 1

A vector field can be visualized like so, where the background is moving to
the left and the car to the right.
Motivation
- Segmentation
- Label propagation
- Scene flow, 3D motion of objects
- Structure from motion

Ambiguities – Aperture Problem


There is difference between actual motion and apparent motion.

Optical flow can have multiple solutions.


Movement of stripes in such a shape is ambiguous to the actual movement
of pixels.

The black pixel could be moving down, left, diagonally etc.

Prior assumptions are needed for a unique solution.

This is the ambiguity problem.

Constraint
Within this course a constraint is given for motion estimation problems where
all gray values of pixels are preserved.

The same pixel offset has the same intensity/colour/gray value as before.

We look for u∧v .

But since they are non-linear, to simplify the problem we expand with the
Taylor series:

We assume that u∧v are small and so we can remove O ( u2 , v 2 ) . And once
simplifying we get the final problem:

I x is the change in intensity along the horizontal axis

I y is the change in intensity along the vertical axis

I t is the change in intensity across time

The approximation is only sufficiently precise as long as the images are


smooth and the flow is small (u , v are not too big).
Methods
Lucas-Kanade Method
Lucas-kanade

We make a system of linear equation by assuming pixels close to each other


move the same and then find the u,v that minimizes the error. So you
assume that neighbouring pixels have the same u,v.

When minimizing we can using a weighted window (gaussian).

If gradients are parallel, you go back to the ambiguities

Assumes locally constant motion

No direct constraints

Horn-Schunck Model
Horn and shucnk

Smoothness term, pixels should move similarly to close pixels

Make an energy function

Some fancy maths to solve for it

Benchmarking
Testing

Average angular error


Need ground truth

Angle between true and estimate

End point error


Distance between points

Middlebury flow benchmark, example ground truths to test on

Open source movie


Deep Learning
Neural networks now

Synthetic data

Flow net issues:

Low movement and zero motion


Matching and Local descriptors
Concept
Concept we want to match pixels/points between to slightly different images
of the same scene.

To do this we want to detect features of an image. Like corners blobs shapes


so on. We want a way to capture the same features in two different images.

Then we want to describe this feature in a descriptor.

And then we need a way to compare the similarity between two descriptors
to figure out if they are pointing to the same point in the scene.

Then we can compare all the features in one image to the features of
another and see which are the same. Usually with Euclidean distance

Match two different pictures of the same local structure

Block matching

Patches should have enough information to be useful

Theres is a way to find corners

Check every pixel and get a threshold p.10

Histogram of gray value

Good for rotation and blurring

Histogram of gradient

p.23

example cnn blur

Feature detection
sparse matching

block matching

good interest points and regions


corner detection, with structure tensor

maths, eigenvalue decomposition

scale invariance

Sift is scale invariant blob matching

Affine region detector

Regions encircled by large gradients,

assign an ellipse to area

local descriptor

gray value histogram

normalise the peaks

gradient direction histogram (orientation histogram)

Sift descriptor computation


sift descriptor, is a histogram of gradients within the region

normalise the peaks, so the peak is the first, if multiple create multiple peaks

6 steps

1. Calculate gradient orientation and magnitude at each pixel


2. Initialise a histogram with 8 bins at each pixel, it is initialised with the
first value being the pixel’s own orientation and magnitude
3. Every pixel’s histogram is smoothed with each other so that the
histogram is filled according to the values in the surrounding pixels.
The local area is from area designated by the feature
4. The histogram itself is smoothed
5. Sample feature vectors from histogram image
a. Used to have an unique histogram per 4 pixel spacing
6. Normalise the vectors (todo)
manual descriptor

define descriptor todo

feature learning

we start with the problem with classifying (naming objects) that way we can
indirectly learn feature recognition and descriptor creation

we can take out the name of the object and simply train a network to learn
that some object is the same as another object, with out a name rather say it
has the same class

Siamese networks

Learns metric

Contrastive learning

Have positive and negative samples, the positive sample should be close in
embedding space and the negative far
3-D Reconstruction
Depth reconstruction
Hard thing is depth

Depth from multiple views


One solution have two cameras

Camera calibration in matrix P, so you know position of camera and how the
pixels project onto the sensor

Problems:

- Corresponding points
- Occluded points
- Accuracy of pixel grid

Look last week thing where u match descriptors

Infrared projects a pattern, and then u can see how that pattern
changes due to depth

There will be noise due to pixel misalignment, lighting and whatnot, occluded
points

Structure from Motion


Structure from one moving camera, you know the movement and rotation of
the camera, or you know it’s not too far off frame to frame. The scene must
be static.

Camera motion is called egomotion, we want to estimate this.

Projection matrix

P = KM

K camera matrix, alpha focal length, principle point (center of the sensor):
intrinsics

M position of camera in the world, Rotation + translation: extrinsics, the pose


Loop closing

Camera returns to the starting position, then you can correct the estimated
motion, accumulated motion

Features that are tangential to the ray are not used

Shape from silhouette


Calculate the background, find where the object ends and starts

With multiple cameras u create a visual hull (a border around the object),
maximal volume
Reconstruction from single image
Shape from shading
Bidirectional reflectance distribution function (BRDF)

Lambertian surface emits light equally in all directions, like diffused lighting.

So radiance only depends on the angle of surface and the light source
direction.

Albedo (Greek p) is the non-absorbed light, material properties. We assume


constant albedo, so same material and reflectance properties.

Horn’s method needs the material properties to be known and the light
source direction to be known. So less practical. We solve for the surface
normal. We get two unknowns and do some variational methods to solve it.

We get this, we know light intensity, we know material properties, we know


light source direction, we just solve for p and q, the surface normal direction.
The final energy equation assumes that the intensity we estimate should be
close to the actual intensity and surface should be smooth.

How to distinguish patterns on the surface with lighting

4 good Prior assumptions found by learning.

1. Usually smooth, model with a gamma distribution. The surface has


smooth changes in material properties and not too many different
types of material properties
2. Shape is mostly smooth
3. Normal usually point towards camera, rarely point away or backwards.
Isotropy of shapes.
4. Natural lighting, assumptions of what lighting would look like

(Move it away from the albedo?)

Jon Barron

Shape from defocus


Simply change the focus from one view and calculate the estimated blur.
Looking at the blur you can see how far it is.

Shape from Texture


If texture pattern is known, you can reconstruct from the deformation of the
texture.

If the texture is not known you assume the pattern is regular/homogenous.

Deep learning
Deep learning, from rgb image

Depth map vizualisation can look good due to the vizualisation technique but
when actually compared to the depth map it can be very bad.

Limited to the quality of training data

Most prominent methods are stereo and motion


Object Recognition and deep learning
Nearest neighbour classifier

- Assign label of most similar training point


- Downside must store and search through all training data

Instance recognition

- Similar to image matching, in class 6

Class recognition

- Regime of convolutional networks (type of neural network)

Regime of convolutional networks


Multiple layers is better as they generalize better

Drop out, randomly remove some nodes during training

Networks are constrained otherwise too many parameters

Feature map, output of the first layer of neurons

reLu activation function

Localization
Segmentation
definition
An image can be split into regions with generally no overlap

And they add up to the original image

Semantic segmentation: try group objects together, things that makes sense
as a group to us

Bottom-up segmentation: split according to homogenous pixel regions,


combine pixels have superpixels

Bottom-up segmentation
- Intensity/colour
- Texture
- Motion

Thresholding
- Simply define a limit, pixel higher are included other not

Clustering
- K Is total number of regions
- Move pixels until the total distance reduces
- Distance being the dissimilarity between pixels in the same region

Greedy heuristics
- Region growing
o Seed points initial point and consider neighboring pixels and if
they are similar enough add to region
- Region merging
o All pixel have their own region
o Merge the two most similar regions
o Repeat until similarity threshold is not reached or number of
regions
Dissimilarity criteria

- Euclidean distance, between feature’s means (descriptors)

Contours
Contours, along a plane

Region based bottom-up segmentation

Energy function, the dissimilarity of pixels within a region and the length of
the contour should be minimized

Min Cut
Graph cuts for segmentation

Min cut, for contour, edge weights according to some likeness criteria of
pixels, artificial sink and source nodes to indicate the two regions

Panoptic segmentation is a combination of semantic segmentation and


instance segmentation

Learning for semantic


Downsampled to abstract forms then up sampled

Some kind of up sampling method with up convolution or something

You might also like