Computer Graphics Notes
Computer Graphics Notes
Ray Casting
Rendering
Two steps to rendering
Ray Casting
Cast a ray from q into the scene, if its first intersection point is p then p is
visible.
Direct Illumination are rays that go from the light source to the surface
Ray
Rays are represented in parametric form
r ( t )=o+ t d
Where o is the origin and d direction.
All intersections between the ray and objects are found and then the closest
is calculated.
Implicit Surfaces
Implicit Surfaces are surfaces defines mathematically e.g. a plane.
n ⋅(p – r )=0
p−r is the line from r to p and it will be perpendicular to the normal. (the dot
product is zero when perpendicular)
Quadrics
Quadrics are implicit surfaces defined by quadratic equations.
Sphere Example
Sub ray into sphere function, splitting into x,y,z components.
Derive coefficients and solve. The coefficients will be scalar as the insides
are all dot producted out.
Thus in a scene of quadrics you can check each quadric with each ray, and
thus a constant time from each quadric.
Parametric Surfaces
Partial objects
Combined Objects
Add intersect subtract
Implementing intersection
Triangles
Why triangles
How to represent
- Parametric, baycentric
Intersection
Intersection test
- Check the rays intersection with the slabs. There will be two
intersections per slab and if these intersections overlap then the ray is
within the box
Boxes can have hierarchies, boxes within boxes and if the parent box is
intersected then you can check for the child boxes and then the primitives.
The speed of a ray tracer is often governed by the speed of its intersection
tests.
Iso-surfaces
Ray casting fluids
Explicit surface
Density mapping
todo
Shading
Context
Shading is the colour and/or intensity computation at a surface position.
Radiance is photons per time per area per angle, how much light is
transported along a ray
Directions
l direction of the source light
n normal of surface
All directions point away from the surface and are normalized
Colour
Coloured light is typically thought of as a 3D vector of RGB colour intensities.
All colour normalised.
( )
Lred
L= Lgreen
L blue
()
0
blue light = 0
1
( )
ρred
ρ= ρ green
ρblue
Surface Illumination
Light at an angle seems less bright due to Lambert’s cosine law
When normalized the cosine of theta can just be the dot product of n and l .
And so Lrefl can be calculated like so.
Uses the half distance between light and viewer. Sometimes faster
calculations?
Energy Conservation
The Phong and Blinn-Phong illuminations models both do not ensure energy
preservation.
Reflected light may exceed the total light shone on the surface depending on
the m
Ambient light
Simply add another term Lindirect to the illumination equation. Apply object
colour to it.
Final Equation
They are often summed with weights α , β , γ . These weights are often user-
defined and independent as energy is not preserved anyways.
Limitations
- Approximate
- Limited surfaces, no transparent or subsurface scattering
- Local computation
- Less realistic
- Only point light sources
Extensions
Light attenuation
Light further away is less bright. Illumination decreases by inverse square
law.
light
r =¿∨ p − p∨¿
r is the distance between the light and surface.
To do explain variant
Fog
Light converges to the colour of the fog
Flat shading
The fragments of each primitive are shaded the same with the light
computed from one arbitrarily picked vertex.
Gouraud Shading
The fragments of the primitive are interpolated from the vertices of that
primitive.
Mach band effects can occur where the contrast of edges are greater making
weird artifacts.
Phong Shading
Light computed for each Fragment
Transformations
Translation
Inverse
Where ϕ is the degrees/radians rotated. Each axis has its own rotation matrix
Inverse
The inverse is simply the transpose. Since:
Reflection
Reflecting in respect to the axis. Simply get the negative of the x, y or z
component.
Inverse
The inverse is the transpose or itself.
Orthogonal
Scaling
You scale in respect to an axis.
Inverse
1
s
Shear
Inverse
Simply do the minus of the shear.
Applications
Compositing Transformations
All transformations can be multiplied together to form one transformation
matrix.
Basis Transformation
A 3D coordinate system can be defined as an origin and 3 basis axis vectors.
Homogeneous Equations
Homogeneous Equations are used for projection transformations.
The use of the last row allows to calculate divisions by a linear combination
of p x , p y , p z , w .
Projection in 2D
Terms
v - Viewpoint
p - Point
'
p - Intersection with y axis
d – Viewpoints distance from the origin
We can derive the intersection p' like so, where the view line is the y-axis and
the viewpoint is on the x axis.
Matrix Form
Matrix M allows us to calculate p' from p with one matrix multiplication.
Where you can see how the projection components are used to realize
divisions.
General case
The general case in 2D can be defined like so.
Parallel Projection
We can write this matrix in a different form are the same.
Same derivation as before but the view line is now view plane n.
Example
The matrix now makes the z position zero. While the positions x∧ y rely on z .
Parallel Case
Clip space
Clip Space is where all vertices’ positions are normalized so that it is within
a canonical view volume which is a cube.
- Culling
- Clipping
- Visibility
- Viewport mapping
Derivation
We create this and solve for A∧B in the same way by subbing in
z v =n∧w v =1∧z n=−1
Matrix Form
We can the form the matrix
Symmetric
When the view frustum is symmetrical it simplifies things a lot more.
Variants
Some variants are that the camera views down the negative z direction,
which can be written like so.
Non-linear mapping
Depth values are mapped non-linearly.
This means that objects in the canonical view volume and skewed to the far
plane.
This squashing can cause Z-fighting, where two depth values are too close
together to differentiate. Which can cause flickering.
- To stop this, make the near plane as far from the camera as possible
- Make the far plane as close to the near plane as possible
Orthographic Projection Matrix
View Space is the space where the camera is made the origin
P projection matrix
T −1
Rcam T cam inverse view matrix, see slides for derivation. AKA V −1
Rasterization the term is used usually for the pixel position approximation
process, but can also be used to refer to the whole pipeline.
Overlap
Definitions
Vertices and primitives
Vertex Processing
View transforms
Color can be computed if we have all that other info like normal and colour,
material properties
Rasterizations
Input: Vertices and faces (connectivity information)
Creates primitives
Gets fragments
Line rasterization
Algorithm made for slope from 0-1 but can be used for any slope by utilizing
symmetry
Only compute if x +1,y or x+1,y+1
If Above go with E otherwise go with NE. We are seeing which is closer to the
line.
Initialization
We multiply F by two so we only use integers
Polygon rasterization
For closed shapes
Attribute Interpolation
Linear interpolation from vertex to fragment
The homogenous value in clip space is the z value from view space, so
sometimes it best to do interpolation in clip space rather than NDC space.
Coz the non-linear interpolation in clip spaces needs the z value from view
space. Maths explained in slides
Fragment processing
Occlusion and so on
Processing
Combination of fragment colour and texture colour
Anti-aliasing
Texturing:
Vertices are assigned a coordinate which is a relative coordinate in the
texture
Testing
Scissor tests
Alpha test:
- Check if the alpha is above a user defined value
Stencil Test
Depth test
Framebuffer holds attributes at that pixel, these attributes are separated into
different buffers:
- Depth
- Colour
- Stencil (texture)
- accumulation
Framebuffer update
Update framebuffer attributes colour and so on
Blending
Blending can be done as a linear interpolation according to the alpha value
Overview
We want parallelization
This process is good for hardware
Bezier curves
Bezier Curves are polynomial curves represented by Control Points.
Control Points are points that describe the curve. Represented with
pi
Interpolation
Interpolation is when you try find the intermediary points in between two
points. It is often calculated with:
( 1−t ) p i+ t pi+1
Constant
Line
Cubic
De Casteljau Algorithm
This type of interpolation is called De Casteljau algorithm. Play with it here:
https://www.geogebra.org/m/gfve79ad
Bernstein Polynomials
The coefficients can be derived from Binomial coefficients written as:
i
Bn ( t )
The 1−t stuff is when you expand (a+ b)n anda=1−t and b=t .
Matrix Representation
We want to represent this in matrices so that it is easy to work with in
computers.
In Cubics:
You can also have an inverse spline, the transpose of the spline.
The Geometry matrix is the control points, the spline matrix with the basis
makes the Bernstein polynomials.
The start and end new control points are trivial as they include the start and
end of the original curve and the point of t split .
The ones in between can be derived algebraically with matrices, but they are
just the intermediate interpolation between the first and last control points.
Velocity
The tangent can be used as velocity if we interpret it as time.
Acceleration
Acceleration is the derivative of the velocity function.
Continuity
Higher degree Bezier curves are ugly and hard to work with. Change one
control point and everything changes.
A when connecting two curves, the endpoint and start point of the two
curves should be the same to achieve C^0 continuity.
To achieve C^1 continuity the connecting point and two adjacent control
points should form a line. The tangent should be equal.
Cubic Hermite
This curve explicitly states the velocity at the two end points.
Where the coefficients can be derived from the end points and respective
velocities. Check slides for derivation.
Where you are given endpoints but the derivatives at the end points are
derived from given control points.
1st and 3rd points derive the first points velocity and the 2 nd and 3rd points
derive the second velocity
They are C1 continuous and can be extended piecewise, the tangent at one
point equaling the direction formed by the previous and following control
points.
Particle Fluids
Particle simulation
Subdivided into small volumes
Parcels/particles are not just spheres they have a shape and volume that we
do not know
Definitions
Particle motion, quantities
Mass, m
Position, x, 3D vector
Velocity, v, 3D vector
Force, F, 3D vector
F
Acceleration, a= , 3D
m
Density ρ
Pressure p
At time t
H is a time step
Governing Equations
What is x t +h and v t+ h.
updating
Explicit Euler
Use Taylor series
Verlet
Use two more further derivatives of the taylor expansion
Pressure acceleration
Viscosity acceleration
The W_i_j is a kernel which weights the impact of a neighbour based on its
distance from i.
Looks gaussian to me
The qualities are used from the last timestamp and then updated
accordingly.
pi=k
( ) ρi
ρ0
−1
3. Compute Accelerations
Grid size if determined by the kernel support. The largest distance between
two particles will still be given a non-zero value by the kernel.
Implementation
A particle is given a unique cell identifier, according to its position.
The identifiers are ordered according to a space filling line so that they can
all be represented in a list where close particles are close to each other on
that list.
Boundary Handling
Possible Implementation
Boundary particles are simply computed as fluids that don’t move.
The pressure of the boundary particle can just be copied from a nearby
particle who’s pressure is known.
Visualization
Iso surface
Image Processing
open
thebox
Additive noise
¿
I ij =I i , j +nij
Where I is observed pixel, I* is clean pixel and n is the noise at position i,j.
Noise Distribution
Noise can be distributed with different distributions.
Gaussian Distribution
Multiplicative Noise
¿
I ij =I i , j (1+nij )
Impulse Noise
Certain pixels are randomly replaced by one unipolar or two bipolar fixed
values.
Uniform Noise
Certain pixels are replaced by uniformly distributed random values.
SNR - Signal to noise ratio
PSNR of 40 is okay
Point Operations
Perform a function on each pixel independently
Examples:
Brightness
u ( x , y )=I ( x , y ) +b
Contrast
u ( x , y )=a∗I ( x , y )
Gamma
Difference
image figure what pixels moved
With a threshold to simply find all pixels that changed above a certain
threshold.
Background Subtraction
Same thing but with a comparison of static image with another object.
Filtering
F ( f )∗F ( h )=F (f∗h)
2. For each pixel calculate the sum of all pixel values covered by the
mask, multiplied by the corresponding coefficients
3. The new value of the current pixel is this sum.
Complexity: O(NMσ)
Boundary Conditions
When the mask exits the boundary of the image you can either:
Box filter
Like the gaussian filter but the mask is simply an average of all pixels within
the mask.
Disadvantages:
- Result not as smooth (differentiability is increased only by one order)
- Not rotationally invariant
Complexity: O(NM)
The Box Filter converges to the gaussian filter.
The kernel coefficients are calculated with both spatial similarity and
intensity similarity
The intensity weights are calculated with a gaussian function but with the
difference in intensity between the neighbouring pixel and current pixel.
r = the intensity weight given the current and neighboring pixel intensities
Derivative Filters
A filter to find the gradient at each point in an image. To find edges.
Laplace Filter
Uses the second derivative to find edges
Gradient Magnitude
Magnitude of the gradient is calculated and tested against a threshold for
edges.
Energy Minimization and Optimization
We can write our problems in the form below and then solve them for the
minimum as a tool
The function E(x ) is often called energy. In machine learning it is called the
loss function.
Image Denoising
This method of energy minimization can be used for denoising.
Smoothness term
Disadvantages
- Formalizing assumptions can be arbitrary?
- Choosing weight parameters are hard
- Global optimization is hard
Convexity
For a unique solution to a linear equation, we require it to be convex.
When deriving in terms of ui , j, we must also take the terms from the other
sums, and thus for each linear equation we get four terms from the
smoothness factor.
For all pixels not on the boundary the linear equation to solve will
look like this:
And the coefficients of the pixel can be written separately. (The boundary
pixels as you can see have different coefficients)
Jacobi Method
3. We then form an iterative solution by decomposing the system of
coefficients into its diagonal D and off diagonal M component
A=D+ M
We start with any x 0 usually the input image and solve iteratively, subbing in
the new value
k+1 −1 k
x =D (b−M x )
4. Iterate until the change in the solution (x k+1 −x k )2 becomes smaller than
some threshold or the residual r k =A x k −b
A=D+ M A=D+(U + L)
As you work through the pixels the new value can simply replace the old
pixel value as the next pixel will use the new value. Thus in-place
computation.
⟨ u , v ⟩ A ≔uT Av=0
If the two vectors are perpendicular in the space transformed by A .
So, after n computations for each coefficient we get the exact solution for x ¿.
If we solve for good basis vectors, we may do it in less than n as it has
converged enough.
Steps
1. Start with p0 as the residual r 0 =b− A x 0 . The residual is how far the
current solution is from satisfying Ax=b.
2. Then iterate by solving these equations, where α k is the size of the step
to move in direction pk and pk +1=r k +1+ βk pk . Do this n times. The
coefficient β k ensures that the new direction is conjugate.
3. Repeat until the residual is small, guaranteed convergence after n
iterations.
Preconditions
- n is the number of pixels, so it is not feasible to converge fully (too
many pixels)
-
- Pre-conditioners are used to make A have a smaller condition
number
-
- They are applied to both sides and then undone once the solution is
found
-
−1
- P is easy to compute as P is diagonal in this case
Multigrid methods
Motivation
Previous solvers are good at optimizing locally, but it may take many
iterations for local information to reach far neighbors.
Multigrid methods swap from fine to coarse grids so that information can
travel faster in fewer iterations than traditional methods.
Unidirectional
Simply down sample the image and use that as the initial guess x 0. Down-
sampling simply means to take a lower resolution image, with whatever
method of your choosing.
Run several iterations. Up sample the image back to the fine/finer grid and
use the approximate solution as the initial guess. Up sample can involve
interpolating from the coarse solution to the original solution.
Up sample and run the bilateral multigrid method, solving for the error
Motion Estimation
Definitions
Motion Estimation involves tracking a pixel’s movement along a screen.
Motion Estimation seeks a vector field, tracking the movement of each pixel.
It is defined as
A vector field can be visualized like so, where the background is moving to
the left and the car to the right.
Motivation
- Segmentation
- Label propagation
- Scene flow, 3D motion of objects
- Structure from motion
Constraint
Within this course a constraint is given for motion estimation problems where
all gray values of pixels are preserved.
The same pixel offset has the same intensity/colour/gray value as before.
But since they are non-linear, to simplify the problem we expand with the
Taylor series:
We assume that u∧v are small and so we can remove O ( u2 , v 2 ) . And once
simplifying we get the final problem:
No direct constraints
Horn-Schunck Model
Horn and shucnk
Benchmarking
Testing
Synthetic data
And then we need a way to compare the similarity between two descriptors
to figure out if they are pointing to the same point in the scene.
Then we can compare all the features in one image to the features of
another and see which are the same. Usually with Euclidean distance
Block matching
Histogram of gradient
p.23
Feature detection
sparse matching
block matching
scale invariance
local descriptor
normalise the peaks, so the peak is the first, if multiple create multiple peaks
6 steps
feature learning
we start with the problem with classifying (naming objects) that way we can
indirectly learn feature recognition and descriptor creation
we can take out the name of the object and simply train a network to learn
that some object is the same as another object, with out a name rather say it
has the same class
Siamese networks
Learns metric
Contrastive learning
Have positive and negative samples, the positive sample should be close in
embedding space and the negative far
3-D Reconstruction
Depth reconstruction
Hard thing is depth
Camera calibration in matrix P, so you know position of camera and how the
pixels project onto the sensor
Problems:
- Corresponding points
- Occluded points
- Accuracy of pixel grid
Infrared projects a pattern, and then u can see how that pattern
changes due to depth
There will be noise due to pixel misalignment, lighting and whatnot, occluded
points
Projection matrix
P = KM
K camera matrix, alpha focal length, principle point (center of the sensor):
intrinsics
Camera returns to the starting position, then you can correct the estimated
motion, accumulated motion
With multiple cameras u create a visual hull (a border around the object),
maximal volume
Reconstruction from single image
Shape from shading
Bidirectional reflectance distribution function (BRDF)
Lambertian surface emits light equally in all directions, like diffused lighting.
So radiance only depends on the angle of surface and the light source
direction.
Horn’s method needs the material properties to be known and the light
source direction to be known. So less practical. We solve for the surface
normal. We get two unknowns and do some variational methods to solve it.
Jon Barron
Deep learning
Deep learning, from rgb image
Depth map vizualisation can look good due to the vizualisation technique but
when actually compared to the depth map it can be very bad.
Instance recognition
Class recognition
Localization
Segmentation
definition
An image can be split into regions with generally no overlap
Semantic segmentation: try group objects together, things that makes sense
as a group to us
Bottom-up segmentation
- Intensity/colour
- Texture
- Motion
Thresholding
- Simply define a limit, pixel higher are included other not
Clustering
- K Is total number of regions
- Move pixels until the total distance reduces
- Distance being the dissimilarity between pixels in the same region
Greedy heuristics
- Region growing
o Seed points initial point and consider neighboring pixels and if
they are similar enough add to region
- Region merging
o All pixel have their own region
o Merge the two most similar regions
o Repeat until similarity threshold is not reached or number of
regions
Dissimilarity criteria
Contours
Contours, along a plane
Energy function, the dissimilarity of pixels within a region and the length of
the contour should be minimized
Min Cut
Graph cuts for segmentation
Min cut, for contour, edge weights according to some likeness criteria of
pixels, artificial sink and source nodes to indicate the two regions