0% found this document useful (0 votes)

28 views28 pages

0 Computer Vision Panikzettel

The document provides an overview of computer vision techniques including image processing, segmentation, local features and matching, and deep learning. It discusses topics like linear and nonlinear filters, edge detection, clustering, graph-theoretic segmentation, local feature detection and description, convolutional neural networks and architectures like LeNet, AlexNet and ResNet.

Uploaded by

Guillaume Rossi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views28 pages

0 Computer Vision Panikzettel

Uploaded by

Guillaume Rossi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

3.3 Classifier Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19
3.3.1 Classification with SVMs (Support Vector Machines) . . . . . . . 19
3.3.2 Classification with Boosting . . . . . . . . . . . . . . . . . . . . . . 21
panikzettel.philworld.de
4 Local Features and Matching 23
4.1 Local Features - Detection and Description . . . . . . . . . . . . . . . . . 23
Computer Vision Panikzettel 4.1.1 Local Invariant Features . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Keypoint Localization . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.3 Scale Invariant Region Selection . . . . . . . . . . . . . . . . . . . 25
Hans Wurst 4.1.4 Local Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
September 3, 2021 4.2 Recognition with Local Features . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Finding Consistent Configurations . . . . . . . . . . . . . . . . . . 27
4.2.2 Affine Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.3 Homography Estimation . . . . . . . . . . . . . . . . . . . . . . . 28
Contents
5 Deep Learning 29
1 Image Processing 3 5.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.1 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 5.1.1 Background: Deep Learning . . . . . . . . . . . . . . . . . . . . . 30
1.2 Linear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5.2 Convolutional Neural Networks (CNNs) . . . . . . . . . . . . . . . . . . 32
1.3 Nonlinear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.3 CNN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4 Multi-Scale Representations . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.3.1 LeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5.3.2 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5.1 Filters as Templates . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5.3.3 VGGNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5.2 Image gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 5.3.4 GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5.3 2D Edge Detection Filters . . . . . . . . . . . . . . . . . . . . . . . 7 5.3.5 ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5.4 Canny Edge Detector . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.3.6 Transfer Learning with CNNs . . . . . . . . . . . . . . . . . . . . 34
1.6 Fitting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.4 Practical Advise on CNN Training . . . . . . . . . . . . . . . . . . . . . . 34
1.6.1 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.4.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6.2 RANSAC (RANdom SAmple Consensus) . . . . . . . . . . . . . . 11 5.4.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4.3 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Segmentation 11
5.4.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1 Segmentation as Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4.5 Learning Rate Schedules . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.1 k-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.5 CNNs for Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 Probabilistic Clustering . . . . . . . . . . . . . . . . . . . . . . . . 13
5.5.1 R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.3 Model-free clustering . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.5.2 Fast R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Graph-Theoretic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5.3 Faster R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.1 Segmentation as Energy Minimization . . . . . . . . . . . . . . . 15
5.5.4 Mask R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.2 Graph Cuts for Image Segmentation . . . . . . . . . . . . . . . . . 16
5.5.5 YOLO/SSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Object Recognition and Categorization 18 5.6 CNNs for Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1 Sliding-Window Object Detection . . . . . . . . . . . . . . . . . . . . . . . 18 5.6.1 Fully Convolutional Networks (FCN) . . . . . . . . . . . . . . . . 38
3.2 Gradient-based Representation . . . . . . . . . . . . . . . . . . . . . . . . 18 5.6.2 Encoder-Decoder Architecture . . . . . . . . . . . . . . . . . . . . 38
5.6.3 Transpose Convolutions . . . . . . . . . . . . . . . . . . . . . . . . 39

1 2
5.6.4 Skip Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Depth of Field: Distance between image planes where blur is tolerable, a smaller
5.6.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 aperture (Blende) increases the range in which the object is approx. in focus.
5.6.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.7 CNNs for Human Body Pose Estimation . . . . . . . . . . . . . . . . . . 41
Field of view depends on focal length f :
5.8 CNNs for Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.8.1 Siamese Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 • f ↓: image becomes more wide angle, more world points
5.8.2 Triplet loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 project into the finite image plane
5.9 Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 • f ↑: image becomes more telescopic, smaller part of the
world projects onto the finite image plane
6 3D Reconstruction 43
6.1 Epipolar Geometry and Stereo Basics . . . . . . . . . . . . . . . . . . . . 43
6.1.1 Calibrated Case: Essential matrix . . . . . . . . . . . . . . . . . . 45
1.2 Linear Filters
6.2 Stereopsis and 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . 45
6.3 Stereo Image Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Types of noise (i.i.d. = “independent, identically distributed”):
6.4 Disparity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
• Salt and pepper noise: random occurrences of black and white pixels
6.4.1 Dense Correspondence Search . . . . . . . . . . . . . . . . . . . . 46
• Impulse noise: random occurrences of white pixels
6.4.2 Sparse Correspondence Search . . . . . . . . . . . . . . . . . . . . 46
• Gaussian noise: variations in intensity drawn from a Gaussian distribution
6.5 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.5.1 Camera Models/Parameters . . . . . . . . . . . . . . . . . . . . . 47 f ( x, y) = f¯( x, y) + η ( x, y) , η ( x, y) ∼ N (µ, σ)
6.5.2 Calibration Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 49 | {z } | {z }
ideal image noise process
6.6 Uncalibrated Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.6.1 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Correlation Filtering Convolution Filtering
6.6.2 Uncalibrated Case: Fundamental Matrix . . . . . . . . . . . . . . 51 Replace each pixel by a weighted combina- Flip the filter in both dimensions, then ap-
6.6.3 Stereo Pipeline with Weak Calibration . . . . . . . . . . . . . . . . 53 tion of its neighbors. ply correlation.
6.6.4 Extension: Epipolar Transfer . . . . . . . . . . . . . . . . . . . . . 53
6.7 Structure-from-Motion (SfM) . . . . . . . . . . . . . . . . . . . . . . . . . 53 k k k k
G [i, j] = ∑ ∑ H [u, v] F [i + u, j + v] G [i, j] = ∑ ∑ H [u, v] F [i − u, j − v]
u=−k v=−k u=−k v=−k
= H⊗F = H?F
1 Image Processing
with averaging window size 2k + 1 × 2k + 1, input image F, output image G, ker-
1.1 Image Formation nel/mask with non-uniform weights H. Convolution is separable into rows and
columns 1D filters (linear).
Lenses: Increasing pinhole size in pinhole camera to increase amount of light causes blur.
(If H [u, v] = H [−u, −v], then correlation = convolution.)
Lenses keep image in sharp focus while gathering light from a larger area.
Gaussian Smoothing BOX filter
Thin Lens Model: valid if lens thickness is small
Weigh nearby pixels more than distant For every pixel, average every neighbor
compared to the radius of curvature. Thin lens
ones (→ ”fuzzy blob”) using Gaussian Ker- over the number of neighbors.
equation:
1 1 1 nel with variance σ2 :
− =
z0 z f 1 − (x2 +2y2 )
Gσ = e 2σ
In a thin lens the scene points at distinct depths 2πσ2
come in focus at different image planes.

3 4
Fourier Gaussian Pyramid
• Map function onto a frequency spectrum. • Create each level from previous one
• The better a function is localized in one do- with ”smooth and sample”-principle.
main, the worse it is localized in the other. • Smooth with Gaussians, q in part be-
(”compression” c ”stretching”, vice versa)
cause G (σ1 ) ∗ G (σ2 ) = G ( σ12 + σ22 ).
• Convolving in image domain corresponds to
product in frequency domain: • Gaussians are low-pass filters, so
the representation is redundant once
f ?g cF · G smoothing is performed: No need
to store smoothed images at the full
Noise introduces high frequencies: Gaussian as a convenient choice for a low-pass filter.
original resolution.
Laplacian Pyramid
1.3 Nonlinear Filters
Median Filter
Replace each pixel by the median of its neighbors.
Properties:
• Laplacian ∼ Difference of Gaussians
• doesn’t introduce new pixel values
(DoG): cheap approximation
• removes spikes: good for impulse, salt and pepper noise
• non-linear
• edge preserving
• better results than Gaussian, even with small kernel

1.4 Multi-Scale Representations

1.5 Edge Detection
Sampling
1.5.1 Filters as Templates

Think of filters as a dot product of the filter vector with the image region. Measure the
angle between the vectors using cos θ = |aa||·bb| . Angle (similarity) between vectors can
be measured by normalizing the length of each vector to 1 and taking the dot product.
Filters look like the effects they are intended to find, and they find effects they look
alike.

1.5.2 Image gradients

For partial derivative filters we get Hx = (1, 0, −1) = HyT . Images of partial derivatives
Nyquist theorem: In order to recover a certain frequency f , we need to sample with are slightly shifted, therefore, shouldn’t be used.
at least 2 f . This corresponds to the point at which the transformed frequency spectra
∂f ∂f
start to overlap (Nyquist limit). • ∇ f = [ ∂x , ∂y ]: gradient, points in the direction of most rapid intensity change
− 1 ∂f ∂f
• θ = tan ( ∂y / ∂x ): gradient direction (orientation of edge normal)
Image pyramids provide an efficient representation for space-scale invariant processing. q
• k∇ f k = ( ∂∂xf )2 + ( ∂∂yf )2 : gradient magnitude (edge strength)

5 6
1.5.3 2D Edge Detection Filters • Thresholding, thinning.

Gradient filter amplifies noise, smooth with Gaussian first. Scale: Sensitivity:
Use σ of the Gaussian to set the scale at (
∂ which edges will be later extracted. 1, if F [i, j] ≥ t (on)
Derivative of Gaussian (DoG) ∂x hσ ( u, v ) FT [i, j] =
0, otherwise (o f f )
∂ ∂
(h ? f ) = ( h) ? f where t threshold, F [i, j] pixel.
∂x ∂x
= ((1, 0, −1) ? h) ? f Canny Edge Detector steps:
1. Filter image with derivative of Gaussian.
2. Find magnitude and orientation of gradient.
3. Non-maximum suppression: Thin multi-pixel wide ”ridges” down to single pixel
Laplacian of Gaussian (LoG) ∇2 hσ (u, v)
2 2 width.
(∇2 f = ∂∂x2f + ∂∂y2f )
Check, if pixel is local maximum along
∂2 ∂2 gradient direction, select single max across
(h ? f ) = ( 2 h) ? f width of the edge.
∂x2 ∂x
a) Compute interpolated pixels p and r
b) Keep q iff Mag(q) > Mag( p) and
Mag(q) > Mag(r )
Image Derivatives
4. Linking and (hysteresis) thresholding
G is 1D Gaussian filter, D is 1D derivative of Gaussian filter, and I is the image:
• Define two thresholds: k low and k high (k high /k low = 2)
Ix = G T ? ( D ? I ) • Use the high threshold to start edge curves and the low threshold to continue
them, until no pixel along the edge is above the low threshold.
Iy = D T ? ( G ? I )
Ixx = G T ? ( D ? Ix )
1.6 Fitting Techniques
Iyy = D T ? ( G ? Iy )
T 1.6.1 Hough Transform
Ixy = D ? ( G ? Ix )
Many objects are characterized by presence of straight lines.
1.5.4 Canny Edge Detector Voting technique Hough Transform answers the three main questions of Line Fitting:

An “optimal” edge detector should have good detection, good localization, and single • Given points that belong to a line, what is the line?
response. • How many lines are there?
• Which points belong to which lines?
Primary edge detection steps:
Idea:
1. Smoothing: Suppress noise.
2. Edge enhancement: Filter for contrast. 1. Vote for all possible lines on which each edge point could lie.
3. Edge localization 2. Look for line candidates that get many votes.
• Determine which local maxima from filter output are actually edges vs. 3. Noise features will cast votes too, but their votes should be inconsistent.
noise. Hough Space

7 8
• set of points ( x, y) 7→ (m, b) such that y = mx + b Extensions
(a line in the image corresponds to a point in Hough space)
• point ( x0 , y0 ) 7→ (m, − x0 m + y0 ) = (m, b) 1) Use the image gradient instead of iterating over 3) Change the sampling of (d, θ )
(a point in the image corresponds to a line in Hough space) all possible directions θ: give more/less resolution.
!
• two points ( x0 , y0 ), ( x1 , y1 ) correspond to b = − x0 m + y0 = − x1 m + y1 ∂f ∂f 4) The same procedure can be
(two points in the image correspond to the intersection of the two lines, which is a point θ = gradient at ( x, y) = tan−1 ( / )
∂y ∂x used with circles, squares, or any
in Hough space)
other shape.
→ Reduces degrees of freedom on voting space
Polar Representation for lines ((m, b) representation undefined for vertical lines, infinite
values) 2) Give more votes for stronger edges (use magni-
x · cos θ + y · sin θ = d tude of gradient).
with d the perpendicular distance from line to origin, θ angle d makes with the x-axis.
Extension to Circles
A point in image space corresponds to sinusoid segment in Hough space.
Circle equation with center ( a, b), radius r: ( xi − a)2 + (yi − b)2 = r2
Hough Transform Algorithm
Let each edge point in image space vote for a set of possible parameters in Hough space.
The Hough transform subdivides the Hough space into a discrete set of bins. Increase
the vote count in each bin that the line passes through. Find peaks in local maxima
of the Hough space (→ Non-maximum suppression filter, is the value in center larger
that the values of its 8 neighbors?).
1. Init: H [d, θ ] = 0 Algorithm:
2. For each edge point ( x, y) in the image
For θ = 0 to 180 For every edge pixel ( x, y)
d = x · cos θ + y · sin θ For each possible radius value r
H [d, θ ]+ = 1 For each possible gradient direction θ // or use estimated gradient
3. Find the value(s) of (d,ˆ θ̂ ) where H [d,ˆ θ̂ ] is maximal a = x − r cos θ
4. Detected line in the image is given by b = y + r sin θ
dˆ = x · cos θ̂ + y · sin θ̂ H [ a, b, r ]+ = 1

Noise makes the maximum point in Hough space spread over a larger area. Generalized Hough Transform

Pros and Cons: For arbitrary shapes defined by boundary

+ all points are processes indepen- - complexity of search time increases points and a reference point, with fixed
dently, so can cope with occlusion exponentially with number of model orientation and scale (e.g. only translation
+ some robustness to noise, noise parameters as transformation).
points unlikely to contribute consis- - non-target shapes can produce spuri-
tently ous peaks in parameter space To detect the model shape in new image:
+ can detect multiple instances of a - quantization: hard to pick a good grid
model in a single pass size • For each edge point
– Index into table with its gradient orientation θ (edge orientation) and com-
puted displacement vector r = a − pi with reference point a = ( xc , yc ), given

9 10
as r := (rk , αk ), where rk is the vectors length and αk the vector orientation. • parallelism, symmetry, continuity, closure
– Use retrieved r vectors to vote for position of reference point. The accumula-
tor array has the two coordinates of unknown object center a as axes.
xc = xi ± rk cos αk 2.1 Segmentation as Clustering
yc = yi ± rk sin αk Best cluster centers are those that minimize SSD
A( xc , yc )+ = 1 (Sum of Squared Distances) between all points and ∑ ∑ k p − c i k2
clusters i points p
• Peak in this Hough space is reference point with most supporting edges. their nearest cluster center ci : in cluster i

1.6.2 RANSAC (RANdom SAmple Consensus) 2.1.1 k-Means

Alternative strategy for Line Fitting: Sample points and fit a line to them using least- k-Means++: prevents arbitrarily bad local
1. Randomly initialize the cluster cen-
squares regression. RANSAC only returns a “good” result with a certain probability, minima
ters.
but this probability increases with the number of iterations. 2. Determine points in each cluster: For 1. Randomly choose first center.
each point p, find closest ci . Put p 2. Pick new center with prob. propor-
RANSAC loop
into cluster i. tional to k p − ci k2 for i ∈ [1, k − 1]
1. Randomly select a seed group of points on which to base transformation estimate 3. Set ci to be the mean of points in centers
(e.g., a group of matches). cluster i. 3. Repeat until k centers.
2. Compute transformation from seed group. 4. If ci has changed, repeat Step 2. → expected error = O(log k) · optimal
3. Find inliers to this transformation. (inlier = points within a certain distance to the
transformation/line) Feature Space
4. If the number of inliers is sufficiently large, re-compute least-squares estimate of
Feature space depends on grouping by pixels:
transformation on all of the inliers.
→ Keep the transformation with the largest number of inliers! • intensity similarity (1D intensity value as feature space)
5. Repeat until some termination criterion is met (e.g. #iterations). • color similarity (3D color value as feature space)
• texture similarity (24D filter bank responses as feature space)
Improve this initial estimate with estimation over all inliers (e.g. with standard least-
• intensity+position similarity (simple way to encode both similarity and proximity)
squares minimization). But this may change inliers, so alternate fitting with reclassifica-
tion as inlier/outlier. k-Means for Clustering:
RANSAC is not limited to fitting lines, it also can be applied to arbitrary transformation 1. Collect feature vectors for all pixels in an image.
models. Then the inliers are those points whose transformation error (distance of 2. Apply k-Means with predefined k segments/clusters on those vectors.
transformed point to its corresponding point in the other image) is below a certain 3. Assign one segment per cluster.
threshold.
Pros and Cons:
In many practical situations, the percentage of outliers is often very high (≥ 90%), + simple, fast to compute - setting k
but RANSAC is only applicable with < 50% of outliers. In this case, use Generalized + converges to local minimum of - sensitive to initial centers and out-
Hough Transform instead. within-cluster squared error (always liers
finds some local minimum) - detects spherical clusters only
- assuming means can be computed
2 Segmentation - NP-hard, even with k = 2

Gestalt factors: (make intuitive sense, but are very difficult to translate into algorithms)
• proximity, similarity, common fate, common region

11 12
2.1.2 Probabilistic Clustering 2. Learn a MoG model for the color values in each region.
3. Use those models to classify all other pixels.
Instead of treating the data as a bunch of points, assume, that they are all generated
by sampling a continuous function. This function is called a generative model, which is Pros and Cons:
defined by a vector of parameters θ, which we want to compute. + probabilistic interpretation - local minima
+ soft assignments between data points - initialization (often a good idea to
Mixture of Gaussians (MoG)
and clusters start with some k-means interations)
K Gaussian blobs (ellipses) with means µ j , covariance matrices Σ j , dim. D + generative model, can predict novel - need to know number of components
data points (solution: model selection)
1 1 + relatively compact storage - numerical problems are often a nui-
p(x|θ j ) = exp{− (x − µ j )T Σ− 1
j ( x − µ j )}
(2π ) D/2 |Σ j |1/2 2 sance (Ärger)

The likelihood of observing x is a weighted mixture of Gaussians, where blob j is

selected with probability π j : 2.1.3 Model-free clustering

K Mean-Shift Algorithm
p(x|θ ) = ∑ π j p ( x | θ j ), θ = (π1 , µ1 , Σ1 , ..., π M , µ M , Σ M )
1. Initialize random seed, and window W.
j =1
2. Calculate center of gravity (the ”mean”) of W: ∑ x∈W xH ( x ) (often Gaussian
profile). In this case, H is the height of the corresponding histogram bin.
Expectation Maximization (EM)
3. Shift the search window to the mean.
Goal: Find blob parameters θ that maximize Approach: 4. Repeat Step 2 until convergence.
the likelihood function 1. Randomly initialize shape of Gaus- To use mean-shift for clustering:
N sian blobs.
p(data|θ ) = ∏ p(xn |θ ) 2. E-step: Given current guess of blobs, • Cluster: all data points in the attraction basin of a mode (= local maximum of the
n =1
compute ownership of each point. density of a given distribution).
Idea: Given the Gaussian shape, assign 3. M-step: Given ownership probabili- • Attraction basin: the region for which all trajectories (Flugbahnen) lead to the same
points to clusters; Given the assigned ties, update blobs to maximize likeli- mode
points, approximate Gaussian shape. hood function. 1. Find features (color, gradients, texture, etc.).
4. Repeat until convergence. 2. Initialize windows at individual pixel locations.
3. Perform mean shift for each window until convergence.
4. Merge windows that end up near the same ”peak” or mode. (for plateaus)
Speed-ups to mitigate computational complexity:
• Assign all points within radius r of end point to the mode.
• Assign all points within radius r/c of the search path to the mode.
Pros and Cons:
+ model free: does not assume any prior - output depends on window size
shape on data clusters - window size selection is not trivial
+ single parameter h (window size) - computationally expensive
+ variable number of modes - does not scale well with dimension
MoG Color Models for Segmentation:
+ robust to outlies of feature space
1. User marks two regions for foreground and background.

13 14
2.2 Graph-Theoretic Segmentation 2.2.2 Graph Cuts for Image Segmentation

2.2.1 Segmentation as Energy Minimization Idea: Convert MRF into a source-sink

graph.
Markow Random Fields (MRF)
MRF are a way to visualize dependencies in random variables. E( x, y) = ∑ φi ( xi ) + ∑ wij · δ( xi 6= x j )
i i,j |
| {z } {z }
Joint probability unary terms “Potts model”

δ = 1, if condition xi 6= x j is met, other-

p( x, y) = ∏ Φ ( xi , yi ) ∏ Ψ ( xi , x j )
i i,j wise δ = 0.

with Instead of hard labeling, we add edges

x: scene Φ: image-scene compatibility func from t and s to every other node. This
y: image Ψ: scene-scene compatibility func approach is called regional bias. For t, we
Alternatively: use φi (s) and for s we use φi (t) instead of
the hard constraints (binary object segmen-
x: true values of the scene, e.g. segment numbers tation). Cutting through an edge with φ(t)
y: observed raw pixel values means, that the node remains connected to
Φ: probability, that we observe pixel value y, if the true pixel value is x t.
Ψ: probability, that two adjacent pixels have the same or a different label
More generally, regional bias can be based
Maximizing the joint probability is the same as minimizing the negative logarithm of it: on any intensity models of object and back-
ground.
−log p( x, y) = − ∑ log Φ( xi , yi ) − ∑ log Ψ( xi , x j ) ⇒ E( x, y) = ∑ φ( xi , yi ) + ∑ ψ( xi , x j )
i i,j i i,j How to set the potentials? Code:

with
E energy function
φ single-node/unary potentials
– Encode information about given pixel/patch: How likely is a pixel/patch to
belong to a certain class?
ψ pairwise potentials
– Encode neighborhood information: How different is a pixel/patch’s label
from that of its neighbor?
Graph-Cuts Energy Minimization
Solve an equivalent graph cut problem:
1. Introduce extra nodes: source s and sink t.
2. Weigh connections to source/sink (t-links) by φ( xi = s) and φ( xi = t), respectively.
3. Weigh connections between nodes (n-links) by ψ( xi , x j ).
4. Find the minimum cost cut that separates source from sink.
⇒ Solution is equivalent to minimum of the energy.

15 16
s-t-Mincut Algorithm 3 Object Recognition and Categorization
When applicable?
Solve the dual maximum flow problem, Idea:
the maximum flow equals the cost of the • Represent each object (view) by a global descriptor (= feature vector).
st-mincut, which needs to be minimized • For recognizing objects, just match the descriptors.
(Min-cut/Max-flow Theorem). The cost of a • Some modes of variation are build into the descriptor, the others have to be
st-cut is the sum of cost of all edges going incorporated in the training data.
from set S to set T.
Identification: Find a particular object. Categorization: Recognize any object of a class.
Assuming non-negative capacity, we get
the algorithm:
3.1 Sliding-Window Object Detection
1. Find path from source to sink with positive capacity.
Idea: If the object may be in a cluttered scene, slide a window around looking for it.
2. Push maximum possible flow through this path.
Search over space and scale with the help of a binary classifier.
3. Adjust the capacity of the used edges and record “residual flows” (backwards
flow). Therefore, we need to:
4. Repeat until no path can be found. 1. Obtain training data.
α-Expansion 2. Define features.
3. Define classifier.
Algorithms for non-binary cases are no longer guaranteed to return the globally optimal
result, but an approximation. Problem is NP-hard with 3 or more labels. Pros and Cons:

Idea: Break multi-way cut computation into a sequence of binary s-t cuts. + simple detection protocol to - high computational complexity
implement - with so many windows, false positive rate
1. Start with any initial solution.
+ good feature choices critical better be low
2. For each label ”α” in any order:
+ past successes for certain - non-rigid, deformable objects not captured
a) Compute optimal α-expansion move (s-t
classes well with representations assuming a fixed
graph cuts).
2D structure
b) Decline the move if there is no energy
- objects with less-regular textures not captured
decrease.
well with holistic appearance-based descrip-
3. Stop when no expansion move would de-
tions
crease energy.
- if considering windows in isolation, context
Pros and Cons: is lost
+ powerful technique, based on proba- - graph cuts can only solve a limited - not all objects are “box” shaped
bilistic model (MRF) class of models: submodular energy
+ applicable for a wide range of prob- functions, can capture only part of
lems the expressiveness of MRFs 3.2 Gradient-based Representation
+ very efficient algorithms available - only approximate algorithms avail- Idea: Consider edges, contours, and (oriented) intensity gradients. Summarize local
+ becoming a de-facto standard able for multi-label cases distribution of gradients with histograms.
Histograms of Oriented Gradients (HoG)

Divide image window into small spatial regions (“cells”), map each grid cell in the

17 18
input window to a histogram, counting the matching gradients per orientations. That’s because we choose an already known kernel function seen below.

k xi − x j k p

K (xi , x j ) = xiT x j , K (xi , x j ) = (1 + xiT x j ) p , K (xi , x j ) = exp −
3.3 Classifier Construction linear polynomial of power p
2σ2
Gaussian (Radial-Basis Function)
Learn a decision rule (classifier) assigning image features to different classes.
Line equation for a linear classifier, which separate positive and negative examples: SVMs for Recognition:
1. Define your representation for each example.
w1 x1 + w2 x2 + b = 0 ⇐⇒ w T x + b = 0
2. Select a kernel function.
with w being the normal of the line. Then xn is positive, if w T xn + b ≥ 0, and negative, 3. Compute pairwise kernel values between labeled examples.
if w T xn + b < 0. 4. Pass this “kernel matrix” to SVM optimization software to identify support vectors
and weights.
5. To classify a new example: Compute kernel values between new input and
3.3.1 Classification with SVMs (Support Vector Machines) support vectors, apply weights, check sign of output.

Discriminative classifier based on optimal HOG (Histograms of Oriented Gradients) Detector

separating hyperplane (line for 2D case).
Finding the maximum margin line: HOG descriptor processing chain:
Maximize the margin between the positive
and negative training examples.

Extension: Non-Linear SVMs

Idea: The original input space can be mapped to some higher-dimensional feature space
where the training set is separable (x 7→ φ( x )). For datasets, which cannot be separated
by a linear hyperplane in 2D.
Kernel Trick: Instead of explicitly computing the lifting transformation φ( x ), define a
kernel function K (xi , x j ) = φ(xi )T · φ(x j ). This gives a nonlinear decision boundary in Assume linear SVM classification function y( x ) = w T x + b. Then x is the HOG feature
the original feature space: map and w the template obtained by the SVM.
∑ an tn K (xn , x) + b
n After a multi-scale dense scan, we want to suppress non-maximum detections. First, we
Since the optimization formulation uses the data points only in the form of inner clip the detection score (the negative). We map each detection to 3D [ x, y, scale] space.
products φ(xn )T φ(xm ), we never need to actually compute the lifting transformation. Subsequently, we apply a robust mode detection, e.g. mean shift.
DPM Detector

19 20
Basic steps: Detailed training algorithm:
1. In each iteration, AdaBoost trains
a new weak classifier hm (x) based
on the current weighting coefficients
W(m) .
2. Adapt the weighting coefficients for
each point: Increase wn if xn was mis-
classified by hm (x), else decrease.
3. Make predictions using the final com-
bined model
!
M
H (x) = sign ∑ αm hm (x)
m =1
Deformation cost can be directly computed
on top of the part filter responses through Recognition:
Generalized Distance Transform. Results • Evaluate all selected weak classifiers on test data: h1 (x), · · · , hm (x)
into transformed responses. • Final classifier is weighted combination of selected weak classifiers, as seen in
step 3 basic steps.
Viola-Jones Face Detection
Introduce a set of “rectangular” Haar filters: Subtract the
3.3.2 Classification with Boosting pixels in the white region from the pixels in the black region.
Efficiently computable with integral image, which is com-
AdaBoost
puted once per image: In every pixel ( x, y) in the integral
image we store the sum of pixels over the rectangle spanned
by the pixel and the top left image corner.
Idea: Build a strong classifier H by com-
bining a number of ”weak classifiers” Using AdaBoost for informative feature
h1 , · · · , h M , which need only be better than and classifier selection, we want to select
chance. At each iteration, add a weak clas- the single rectangle feature and thresh-
sifier (sequential learning process). old that best separates positive (faces) and
negative (non-faces) training examples, in
terms of weighted error.
Given: training set X = {x1 , · · · , x N } with target values T = {t1 , · · · , t N }, tn ∈ {−1, 1},
associated weights W = {w1 , · · · , w N } for each training point
Even if the filters are fast to compute, each
new image has a lot of possible windows
to search. For efficiency, in a cascade fashion
apply less accurate but faster classifiers
first to immediately discard windows that
clearly appear to be negative.

21 22
Extension: Integral Channel Features Requirements:
Generalization of Haar integral image idea from Viola-Jones: Instead of only considering • Region extraction needs to be repeatable and accurate.
intensities, also take into account other feature channels (gradient orientations, color, – Invariant to translation, rotation, scale changes.
texture). – Robust or covariant to out-of-plane (≈ affine) transformations.
Also generalize block computation: – Robust to lighting variations, noise, blur, quantisation.
• Locality: Features are local, therefore robust to occlusion and clutter.
• 1st order features: Sum of pixels in rectangu-
Classifier construct (e.g. VeryFast • Quantity: We need a sufficient number of regions to cover the object.
lar region (integral over certain region)
detector) • Distinctiveness: The regions should contain ”interesting” structure.
• 2nd order features: Haar-like difference of
• Efficiency: Close to real-time performance.
sum-over-blocks (difference between the integrals
over two regions)
• Generalized Haar: More complex combina- 4.1.2 Keypoint Localization
tions of weighted rectangles (higher order de-
Look for two-dimensional signal changes: In the region around a corner, image gradient
sign with multiple such blocks)
has two or more dominant directions.
• Histograms: Computed by evaluating local
sums on quantized images (multiple orienta- Harris Detector
tion channels, build histogram over corresponding
Formulation:
regions, similar to HOG)

Since M is symmetric, we can execute a

4 Local Features and Matching eigenvalue decomposition, where λ should
be large, when we are looking at an axis-
4.1 Local Features - Detection and Description aligned corner:
4.1.1 Local Invariant Features
λ1 0

M = R −1 R
Global representations have major limitations. Instead, describe and match only local 0 λ2
regions. Increased robustness to occlusions, articulation, intra-category variations.
General approach:
We can visualize M as an ellipse with axes lengths determined
by the eigenvalues and orientation determined by R.
Corner Response Function, with constant α for fast approximation:

R = det( M ) − α trace( M )2 = λ1 λ2 − α(λ1 + λ2 )2

Then R < 0, if ”edge”, and R > 0, if ”corner”.

Properties:
• R is invariant to image rotation (ellipse shape, therefore the eigenvalues stay the
same)
• not invariant to image scale (due to fixed window size)
Algorithm:

23 24
1. Compute second moment matrix (autocorrelation matrix): which are “scale invariant”. Rescale region to fixed size.
!
Ix2 (σD ) Ix Iy (σD ) Signature Function: Laplacian-of-Gaussian Detector (LoG)
M (σI , σD ) = g(σI ) ?
Ix Iy (σD ) Iy2 (σD )

where σD is the scale of the Gaussian derivative. Rule of thumb: σI ≥ 2σD .

The Gaussian already performs weighted sum ∑ x,y w( x, y) of the general second Laplacian-of-Gaussian = “blob” detector.
moment matrix M = ∑ x,y w( x, y) M Ix ,Iy , where w( x, y) is a window function.
We define the characteristic scale as the
a) Image derivatives: Ix , Iy
scale that produces peak of Laplacian re-
b) Square of derivatives: Ix2 , Iy2 , Ix Iy
sponse.
c) Gaussian filter: g ? Ix2 , g ? Iy2 , g ? Ix Iy
2. Corner Response Function - two strong eigenvalues:

R = det( M ) − αtrace( M)2

Signature Function: Difference-of-Gaussian Detector (DoG)
= ( g ? Ix2 )( g ? Iy2 ) − ( g ? ( Ix Iy ))2 − α( g ? Ix2 + g ? Iy2 )2
We can efficiently approximate the Lapla-
3. Perform non-maximum suppression. cian with a Difference of Gaussians:
Hessian detector
L = σ2 ( Gxx ( x, y, σ) + Gyy ( x, y, σ))
Corner response is obtained with
⇒ DoG = G ( x, y, kσ) − G ( x, y, σ)
2
det( Hessian( I )) = Ixx Iyy − Ixy
The advantages of using this approxima-
which responses mainly on corners and tions are that we do not need to compute
strongly textured areas. 2nd derivatives and Gaussian are com-
puted anyways (Gaussian pyramid).
Properties:
• rotation invariant Signature Function: Harris-Laplace
• not invariant to image scale
1. Initialization: Multiscale Harris corner detection.
2. Scale selection based on Laplacian.
4.1.3 Scale Invariant Region Selection (same procedure with Hessian → Hessian-Laplace)
Automatic Scale Selection
Given are two images of the same scene 4.1.4 Local Descriptors
with a large scale difference between them. How to describe points for matching?
Find the same interest points independently
in each image (→ characteristic scale). The simplest descriptor would be a list of intensities within a patch (write regions as
vectors). But small shifts can affect matching score a lot.
For this, search for maxima of suitable functions (signature function) in scale and space,
SIFT (Scale Invariant Feature Transform)

25 26
Descriptor Computation: 3D rotation around coordinate axes:
• Divide patch into 4x4 sub-patches: 16
cos γ − sin γ 0
     
1 0 0 cos β 0 sin β
cells.
R x (α) = 0 cos α − sin α , Ry ( β) =
   0 1 0 , Rz (γ) = sin γ cos γ 0

• Compute histogram of gradient ori-
0 sin α cos α − sin β 0 cos β 0 0 1
entations (8 reference angles) for all
pixels inside each sub-patch.
• Resulting descriptor: 4x4x8 = 128 di- Affine transformations are combinations of linear transformations and translations.
mensions.
SIFT aims to achieve robustness to lighting variations and small positional shifts. For 4.2.2 Affine Estimation
rotation invariant descriptors, find local orientation (dominant direction of gradient In alignment, we will fit the parameters of some transformation according to a set of
for the image patch). Rotate patch according to this angle, this puts the patches into a matching feature pairs (”correspondences”). An affine model approximates perspective
canonical orientation. projection of planar objects. Assuming we know the correspondences, how do we get
Computation: Extraordinarily robust matching technique: the transformation?
Least Squares Estimation
1. Compute orientation histogram. + can handle changes in viewpoint up
2. Select dominant orientation. to 60◦ out-of-plane rotation With Least Squares Estimation we get for Given set of data points ( Xi , Xi0 ), find linear
3. Normalize: Rotate to fixed orienta- + can handle significant changes in il- x0 = Mx + t function to predict X 0 s from Xs with Xa +
tion. lumination   b = X0:
+ fast and efficient, can run in real time m
  1      0
···  m2  ··· X1 1 X1

• SURF as fast approximation of SIFT
idea.  xi yi 0 0 1 0
   0
m3   x i   X2 1  a =  X 0  ⇐⇒ Ax = B
b 2
  =
 0 0 x i y i 0 1  m4   y 0  ··· ··· ···
 
i
··· ···
 
 t1 
4.2 Recognition with Local Features
t2 Overconstrained problem:
Image content is transformed into local features that are invariant to translation, rotation, mink Ax − Bk2 → Least-squares minimiza-
and scale. tion
Goal: Verify if they belong to a consistent configuration.
4.2.3 Homography Estimation
• Warping: Given a source image and the transformation, what does the transformed
output look like? A projective transform is a mapping between any two perspective projections with the
• Alignment: Given two images with corresponding features, what is the transfor- same center of projection. In a projective transform, a rectangle should map to arbitrary
mation between them? quadrilateral. Parallel lines aren’t preserved, but need to be straight.
The simplest way to estimate a homography H from feature correspondences is the
4.2.1 Finding Consistent Configurations Direct Linear Transformation (DLT) method:

With homogeneous coordinates we can express any 2D transformation:

     
1 0 tx sx 0 0 1 sh x 0
0 1 t y ,  0 s y 0, shy 1 0
0 0 1 0 0 1 0 0 1
translation scaling shearing

27 28
5.1.1 Background: Deep Learning
Generalized Linear Discriminants: Multi-Layer Perceptrons (MLP):

yk ( x ) > 0, if input belongs to target class k. φ( xi ) can be seen as features. The black
node introduces an “offset”, the so called bias term.
An MLP with 1 hidden layer can implement any function (universal approximator).
Solution: Null-space vector of A, corresponds to smallest eigenvector. However, if the function is deep, a very large hidden layer may be required.
If v99 may be zero, normalize the vector length instead: If we leave out the non-linearity g(·), the layers collapse into a single linear function.
[v19 , ..., v99 ] Therefore, the non-linearities make multi-layer representation more powerful.
h=
|[v19 , ..., v99 ]| Nonlinearities

Basic matching algorithm: 1

σ( a) = , tanh( a) = 2σ(2a) − 1, g( a) = max(0, a)
1. Detect interest points in two images. 1 + exp(− a) hyperbolic tangent rectified linear unit (ReLU)
sigmoid
2. Extract patches and compute a descriptor for each one.
3. Compare one feature from image 1 to every feature in image 2 and select the Gradient Descent
nearest-neighbor pair.
4. Repeat the above for each feature from image 1. Use gradient descent to efficiently adapting all weights, not just the last layer.
5. Use the lost of best pairs to estimate the transformation between images.
Set up an error function

E(W) = ∑ L(tn , y(xn ; W)) + λΩ(W)

5 Deep Learning n

with loss L, regularizer Ω (for ambiguity),

5.1 Neural Networks
xn data point and tn target value. Update
Training: Find network weights w to minimize the error Perceptron: (k)
each weight Wij in the direction of the
between true training labels tn and estimated labels f w (xn ): ∂E(W)
(negative) gradient (k) , which points to
∑ L(tn , f (xn ; W))
∂Wij
E(W) =
n
the larges reduction in the error function.
Supervised Learning
Minimization can be done via gradient descent, provided
f is differentiable. Gradient is determined through error
backpropagation.

29 30
1. Computing the gradients for each weight. Glorot: He:
Idea: Compute the gradient layer by layer. Each layer below builds upon the results
of the layer above.
⇒ Backpropagation algorithm

2. Adjusting the weights in the direction of the gradient.

(Stochastic) Gradient descent: Adjust the weight for one time step further by chang- Designed for ReLU nonlinearity.
ing the previous weight by a weighted gradient with respect to that weight Designed for tanh nonlinearity.
nin : number of incoming connection for forward pass
evaluated at the current location of the weights: nout : number of outgoing connections for backwards pass

( τ +1) (τ ) ∂E(w)
wkj = wkj − η 5.2 Convolutional Neural Networks (CNNs)
∂wkj w(τ )

with rate/step size η and times index τ. This is applied on minibatches of data. Convolutional Layers

Varnishing gradients problem Feed-forward feature extraction, with classification layer at the end:

In multi-layer nets, gradients need to be propagated through many layers. The mag- 1. Convolve input with learned filters
nitudes of the gradients are often very different for the different layers, especially if the 2. Non-linearity
initial weights are small. Gradients can therefore get very small in the early layers of 3. Spatial pooling
deep nets. 4. (Normalization)

When designing deep networks, we need Backprop steps and where the gradients The convolutional filters are learned through supervised training and back-propagating
to make sure gradients can be propagated can varnish: the classification error.
throughout the network: But why do we want to use CNNs?
• restricting the network depth • To avoid huge amounts of parameters, use convolutions with learned kernels, since
• very careful implementation the neurons of one layer share the same parameters (the kernels) across different
• choosing suitable nonlinearities locations.
• performing proper initialization • All neural net activations are arranged in 3 dimensions. Multiple neurons are all
looking at the same input region, stacked in depth. (Naming convention: depth
→, width %, height ↑)
Weights Initialization • Convolutional layers can be stacked. The filters of the next layer then operate on
the full activation volume. Filters are local in ( x, y), but densely connected in
Best practice is to use a zero-mean distribution for sampling the initial weights, e.g. a depth.
Gaussian or uniform distribution. Compute the variance according to Glorot or He, • Each activation map is a depth slice through the output volume.
and plug it into your chosen distribution. Pooling Layers

31 32
Pooling filter responses at different spatial locations, ⇒ same receptive field as AlexNet, but much fewer parameters: 3 · 32 = 27 vs.
we gain robustness to the exact spatial location of 72 = 49
features (robustness to translations). It makes the
representation smaller without losing too much in- 5.3.4 GoogLeNet
formation.
• Main ideas: ”inception” module as
Pooling happens independently across each slice, preserving the number of slices. modular component, learns filters at
several scales within each module,
1 × 1 convs (”bootleneck layers”) for
5.3 CNN Architectures dimensionality reduction
5.3.1 LeNet • several inception modules in net, with auxiliary classification outputs for training
• 2x conv-conv-pool blocks for feature the lower layers (deprecated), 22 layers, no FC layers, only 5M parameters
representation • VGGNet and GoogLeNet perform at similar level
• FC layers for classification on the end
⇒ successfully used for handwritten
5.3.5 ResNet
digit recognition
• for other tasks, the two layers of feature computation were not sufficient; also Core component:
computationally expensive
• skip connections bypassing each layer
• better propagation of gradients to the deeper layers
5.3.2 AlexNet
Similar to LeNet, but
• bigger model (7 hidden layers, 650k 5.3.6 Transfer Learning with CNNs
units, 60M parameters), more train- Transfer learning is a machine learning method where a model developed for a task is
ing data (106 images instead of 103 ) reused as the starting point for a model on a second task.
• receptive field in the first layer: 11 ×
11, stride 4 1. Train net on ImageNet.
2. If small dataset: Fix all weights (treat CNN as fixed feature extractor), retrain only
⇒ AlexNet almost halved the error rate from previous approaches
the classifier.
3. If you have medium sized dataset, ”finetune” instead: Use the old weights as
5.3.3 VGGNet initialization, train the full network or only some of the higher layers with a
• deeper network, more convolutional smaller learning rate.
layers with smaller filters (+ nonlin-
earity), with similar amount of FC
5.4 Practical Advise on CNN Training
layers and softmax layer on the end
• receptive field in the first layer: 3 × 3, 5.4.1 Data Augmentation
stride 1, stacked ones resulting into
Augment (cropping, zooming, flipping, color PCA) original data with synthetic varia-
an efficient 5 × 5 receptive field, twice
tions to reduce overfitting. Results into much larger training set and robustness against
7 × 7 etc.
expected variations.
• 138M parameters, but most of them in the FC layers

33 34
Overfitting: Happens when the model fits too well to the training set. The model starts 5.4.5 Learning Rate Schedules
to recognize specific images in the training set instead of general patterns.
Final improvement step after convergence is reached:
1. Reduce learning rate η by a factor of 10.
5.4.2 Initialization 2. Continue training for a few epochs.
When initializing the weights: 3. Do this 1-3 times, then stop training.

• Draw them randomly from a zero-mean distribution. Turning down the learning rate will reduce the random fluctuations in the error due to
• Common choices in practice: Gaussian or uniform. different gradients on different minibatches. When turning down the learning rate too
• Common trick: Add a small positive bias (+ε) to avoid units with ReLU nonlin- soon, further progress will be much slower/impossible after that.
earities getting stuck-at-zero.
When sampling weights from an uniform distribution [ a, b]: 5.5 CNNs for Object Detection
1
• Standard deviation is computed as σ2= −
12 ( b a )2 . Region Proposal Based Detectors
• Glorot initialization with uniform distribution
√ √ Avoid dense sliding window with region proposal.
6 6
W ∼ U [− √ ,√ ] • R-CNN: Selective Search + CNN classification / regression
nin + nout nin + nout
• Fast R-CNN: Swap order of convolution and region extraction
• Faster R-CNN: Compute region proposals within the network
5.4.3 Batch Normalization • Mask R-CNN: Detection + instance segmentation + pose estimation
Optimization works best of all inputs of a layer are normalized. Anchor Box Based Detectors
Introduce intermediate layer that centers the activations of the previous layer per mini-
batch, resulting in a variance of 1 or mean of 0. I.e., perform transformations on all Perform detection in a single step using grid of anchors boxes.
activations and undo those transformations when backpropagating gradients. • YOLO, SSD
Centering and normalization also needs to be done at test time, but minibatches are
no longer available at that point. Therefore, learn the normalization parameters to 5.5.1 R-CNN
compensate for the expected bias of the previous layer (usually a simple moving
average). Cut of feature extractor, replace with trained CNN.

This batch normalization results into much improved convergence.

5.4.4 Dropout

Reduce reliance on individual units by randomly switching off units during training.
This changes the net for each data point, effectively training many different variants of
the network.
To compensate much larger output responses during testing time, multiply activations
with the probability that the unit was set to zero, which reduces the magnitude of the
activations.
This results into improved performance.

35 36
Pipeline: 5.5.4 Mask R-CNN

1. Determine RoI of input image from proposal

method.
2. Warp image regions.
3. Forward each region through ConvNet.
4. Evaluate ConvNet classification with SVMs.

Problems:
- ad hoc training objectives For detection + instance segmentation, detection + pose estimation.
– fine-tuned net with softmax classifier (log loss)
– trained post-hoc linear SVMs (hinge loss)
– trained post-hoc bounding-box regressors (squared loss) 5.5.5 YOLO/SSD
- training (2 days) and testing (47s/image) are slow Directly go from image to detection scores. Results into a very light weight backbone.
- takes a lot of disk space, need to store all precomputed CNN features for training Subdivide image into a grid. Within each grid cell
the classifier
1. Start from a set of anchor boxes.
2. Regress from each of the B anchor boxes to a final box.
5.5.2 Fast R-CNN 3. Predict scores for each of C classes (including background).
Instead of using ConvNets for every region proposal, apply a ConvNet once on the
entire image, and use RoI pooling. 5.6 CNNs for Segmentation
Pipeline:
For semantic segmentation label each pixel in the image with a category label. For instance
1. Forward whole image through ConvNet. segmentation also give an instance label per pixel.
2. Extract RoIs from a proposal method.
3. ”RoI Pooling” layer, resulting into warp.
4. Feed pools into FCs. 5.6.1 Fully Convolutional Networks (FCN)
5. Linear + softmax classifier, bounding-box re-
Design a network as a sequence of convolutional layers, to make predictions (for
gressors.
segmentation) for all pixels at once.
6. Feed linear + softmax and linear into multi-
task loss (log loss + smooth L1 loss) In FCNs, all operations formulated as convolutions. Fully-connected layers become 1x1
convolutions. The advantage of using convolutions is that FCNs can process arbitrarily
sized images.
5.5.3 Faster R-CNN
Think of FCNs as performing a sliding-window classification. The computation is
Remove dependence on external region One network, four losses (joint training):
more efficient, since computations are reused between windows. On the other hand,
proposal algorithm. Instead, infer region
convolutions at original image resolution will be very expensive.
proposals from same CNNs. Results into
feature sharing and makes object detection
in a single pass possible. 5.6.2 Encoder-Decoder Architecture
Faster R-CNN = Fast R-CNN + RPN (Re- Design net as a sequence of convolutional layers, with downsampling and upsampling
gion Proposal Network) inside the network.

37 38
• Downsamling: pooling, strided (stride > 1) convolution. 5.6.5 Extensions
• Upsampling:
Dilated Convolutions (Atrous Convolutions)
– Nearest-Neighbor: Spread value of pixel to a patch of one size bigger, results
into a blocky output structure. Sample the input at every r-th pixel for the convolution. In-
– “Bed of Nails”: Keep one pixel value as the original and pad the other ones crease receptive field without increasing computation. With
in a patch with zero. dilation factor l:
– Max Unpooling: Remember which element was maximum after max-pooling,
K
and use this position for “Bed of Nails” upsampling. y [i ] = ∑ x [ i + r · k ] w [ k ], ( F ?l k)(p) = ∑ F (s)k(t)
– Strided Transpose Convolution k =1 s+lt=p

Spatial Pyramid Pooling (SPP) Atrous Spatial Pyramid Pooling (ASPP)

5.6.3 Transpose Convolutions
Multiple convolutional pooling layers at Extends SPP concept by using dilated con-
For a transpose convolution with stride 2 and pad 1: Learned 1D example: different scales. Improves robustness to volutions instead of regular ones. Increases
filter moves 2 pixels in the output for every one pixel in the varying image scales, and creates a multi- receptive field without extra computation.
input. (stride gives ratio between movement in output and scale feature representation for classifica-
input) tion.
Output contains copies of the filter weighted by the input,
summing overlaps in the output. We need to crop one pixel
from output to make output exactly 2x input.
Express convolution in terms of matrix multiplication (1D, kernel size = 3, stride = 1,
pad = 1):
 
 0
x y z 0 0 0   ay + bz
  
 0 x y z 0 0  a 
ax + by + cz 
x ? a = Xa =  0 0 x y z 0 
 b = 
  bx + cy + dz
c
0 0 0 x y z cx + dy
0 5.6.6 Examples
• SegNet
Convolution transpose multiplies by the transpose of the same matrix: x ? T a = X T a . – encoder-decoder architecture with skip connections
When stide = 1, convolution transpose is just a regular convolution (with different – encoder based on VGG-16, decoder is using max unpooling
padding rules). – output with K-class softmax classification
When stride > 1, convolution transpose is no longer a normal convolution. • U-Net
– similar idea: encoder-decoder architecture with skip connections
• DeepLabv3+
5.6.4 Skip Connections – uses atrous spatial pyramid pooling (ASPP)
Since downsampling loses high-resolution information, use – simple decoder module
skip connections to preserve higher-resolution information. – depth-wise separable convolutions for efficiency
Introduce skip connections when performing pooling, con-
nected just before the corresponding unpooling stage. This
transfers some of the input before it was pooled.

39 40
5.7 CNNs for Human Body Pose Estimation Offline Hard Triplet Mining

Setup: Mining hard triplets becomes crucial for learning, where the triplet is prone for con-
1. Annotate images with keypoints for skeleton joints. fusion, because the anchor and negative example are close to each other. A popular
2. Define a target disk around each keypoint with radius r. solution for that is offline hard triplet mining:
3. Set the ground-truth label to 1 within each such disk. 1. Process the dataset to find hard triplets, in
4. Infer heatmaps for the joints as in semantic segmentation. form of mini-batches.
2. Use those for learning.
3. Iterate.
5.8 CNNs for Matching
Considerable effort needed, but a very wasteful design, because a minibatch of the
5.8.1 Siamese Networks mined triplets contains potentially much more hard triplets than we actually mined.
Present the two stimuli to two Types of models used for matching tasks: Online Hard Triplet Mining
identical copies of a network (with
shared parameters). Train them Use online hard triplet mining instead:
to output similar values iff the in- • Each member of another triplet becomes an additional negative candidate. But,
puts are (semantically) similar. we need both hard negatives and hard positives.
• An even better design is to sample K images from P classes for each minibatch. For
one anchor, the previously positive and negative example become positive, every
other sample in the minibatch (whether previous anchor, positive, or negative)
becomes a negative example. The triplets are then only constructed within the
minibatch.
Applications

• Person re-identification: tracking and surveillance, multi-camera handover.

• Face identification
• Stereo matching: siamese net for feature extraction, learns an embedding optimized
5.8.2 Triplet loss for matching using correlation score, correspondence search in the correlation
volume.
To train a network to achieve this, we learn a discriminative embedding: Present the
network with triples of examples (negative, anchor, positive). Apply triplet loss to learn an
embedding f that groups the positive example closer to the anchor than the negative 5.9 Recurrent Networks
one:
p
k f ( xia ) − f ( xi )k2 < k f ( xia ) − f ( xin )k2 RNNs are regular NNs whose hidden units have additional connections over time. You
can unroll them to create a net that extends over time. When you do this, keep in mind
Triplet loss formulation with distance Da,p = d( x a , x p ) between anchor and positive that the weights for the hidden units are shared between temporal layers.
example and safety margin m as hyperparameter:
The net is very powerful, because they combine two properties:
Ltri (θ ) = ∑
a,p,n,
[m + Da,p − Da,n ]+ = ∑
a,p,n,
max(0, m + Da,p − Da,n )
• Distributed hidden state that allows them to store a lot of information about the
y a =y p 6=yn y a =y p 6=yn past efficiently.
• Non-linear dynamics that allows them to update their hidden state in complicated
ways.

41 42
On the other hand, the training is more challenging, since unrolled nets are deep. Parallel Optical Axes
We want to train a language model with
Assume, these parameters are given and fixed.
p(next word | previous words), where p is
high for this setting. Pink annotates the Task of depth estimation: Estimate disparity map
recurrent connections between hidden lay- D ( x, y) from a set of images. For similar tri-
ers. angles ( pl , P, pr ) and (Ol , P, Or ) we get depth
associated with point p
After training the RNN, we get a language
model, which predicts the next word by T − ( xr − x l ) T T
sampling the output (posterior) of the pre- = ⇐⇒ Z= f
Z− f Z xr − x l
vious one. From each output yi , sam-
ple next input word xi+1 , without end se- where xr − xl disparity, the horizontal motion.
quence. Then, we can get a point on the other image
through the disparity map D ( x, y) := f ZT :
Applications
• Image tagging: Simple combination of CNN and RNN. Use CNN to define initial ( x 0 , y0 ) = ( x + D ( x, y), y)
state h0 of an RNN. Use RNN to produce text description for the image. Trained Stereo Correspondences Constraints
on corpus of images with textual descriptions
• Video to text description Generally, two cameras do not need to have parallel optical axes.
Geometry of two views allows us to constrain where the corresponding pixel for some
image point in the first view must occur in the second view. Here, the epipolar constraint
6 3D Reconstruction is useful, because it reduces correspondences problem to 1D search along conjugate
epipolar lines.
To reconstruct a 3D structure, we need multi-view geometry because structure from one
image is inherently ambiguous. The epipolar geometry between two views is essentially the geometry of the intersection
of the image planes with the epipolar planes having the baseline as (fixed) axis.
Given several image of the same object or scene, compute a representation of its 3D
shape. In particular, given a calibrated binocular stereo pair, fuse it to produce a depth • Baseline: line joining the camera centers.
image. • Epipole e: point of intersection of baseline with the image plane. All epipolar lines
intersect at the epipole.
• Epipolar line l: intersection of epipolar plane with image plane.
6.1 Epipolar Geometry and Stereo Basics • Epipolar plane Π: plane containing baseline and world point. An epipolar plane
The epipolar geometry is the intrinsic projective geometry between two views. It is intersects the left and right image planes in epipolar lines.
independent of scene structure, and only depends on the cameras’ internal parameters Potential matches for p have to lie on the corresponding epipolar line l.
and relative pose.
Principle: Triangulation, which gives the reconstruction as intersection of two rays.
Requires camera calibration and point correspondences.
Parameters for camera calibration:
• Extrinsic: rotation matrix and translation vector (camera frame ↔ reference frame)
• Intrinsic: focal length, pixel sizes, image center point, radial distortion parameters
(image coords. relative to camera ↔ pixel coords.) Some useful stuff for computation:

43 44
• normalize homogeneous point: ( x1 , x2 , x3 )T → ( xx31 , xx23 , 1)T 1. Calibrate cameras.
• normalize homogeneous line: ( a, b, c)T → ( √ 2a 2 , √ 2b 2 , √ 2c 2 )T 2. Rectify images.
a +b a +b a +b
• perpendicular distance from a point to a line: d = l · p (dot product) 3. Compute disparity.
4. Estimate depth.

6.1.1 Calibrated Case: Essential matrix

6.3 Stereo Image Rectification
To rotate and translate camera reference frame 1 to get camera reference frame 2, we
use the 3D vector X 0 = RX + T. When the cameras’ optical axes are not parallel, we want to pre-warp
the images, such that the corresponding epipolar lines are coincident.
X 0 = RX + T
Algorithm:
T × X0 = T × RX + T × T = T × RX
| {z } 1. Reproject image planes onto a common plane parallel to the
plane normal
0 0 0
line between optical centers. Pixel motion is horizontal after
X · ( T × X ) = X · ( T × RX ) = 0 this transformation.
2. Compute two homographies (3 × 3 transforms), one for each
Let E = T× R be the essential matrix, which Matrix form of cross product: input image reprojection.
relates corresponding image points:
− az ay
 
0
0 0 6.4 Disparity
X · ( T × RX ) = 0 = X · T× RX a × b = [ a× ]b =  az 0 −ax  b
0T
X EX = 0 − a y a x 0
Correspondence problem: Multiple match hypotheses satisfy epipolar constraint, but
This holds for the rays p and p0 that are For epipolar lines from point p and p0 re- which is correct?
parallel to the camera-centered position spectively we get
vectors X and X 0 , so we have the epipolar 6.4.1 Dense Correspondence Search
constrain equation l 0 = Ep, l = E T p0
Rectify images first. For each pixel in the first image, find corresponding epipolar line
p0T Ep = 0 in the right image. Examine all pixels on the epipolar line and pick the best match
according to some matching score (e.g. SSD, correlation).
+ simple process - breaks down in textureless regions
+ more depth estimates, can be useful - raw pixel distances can be brittle
for surface reconstruction - not good with very different view-
points

6.4.2 Sparse Correspondence Search

Restrict search to sparse set of detected features. Rather than pixel values (or lists of
pixel values) use feature descriptor and an associated feature distance. Still narrow search
further by epipolar geometry.
We want window large enough to have sufficient intensity variation, yet small enough
6.2 Stereopsis and 3D Reconstruction
to contain only pixels with about the same disparity.
Main steps:

45 46
+ efficiency - have to know enough to pick good coordinate system is in the corner.
+ can have more reliable feature features
Pinhole Model with Origin not in Princi- Not Explicit Camera Center
matches, less sensitive to illumina- - sparse information
ple Point
tion than raw pixels For pixel size 1/m x × 1/my , where m x , my
Possible Sources of Error But it may not be, so we define a more pixels per meter in horizontal and vertical
general mapping with the calibration matrix direction respectively, we get calibration
• low-contrast/textureless image regions
K: matrix K:
• occlusions
  
• camera calibration errors mx f px
• violations of brightness constancy (e.g. specular reflections) K=  my   f py 
• large motions 1 1
 
αx x0
= α y y0 
6.5 Camera Calibration
1
Recap: To solve Ax = 0, apply SVD
  T
d11 v11 · · · v1N
..   .. .. ...  Camera Rotation and Translation
A = UDV T = U 

.  . . 
d NN v N1 · · · v NN
| {z }| {z }
singular values singular vectors
In general, the camera coordinate frame
• singular values of A = square roots of the eigenvalues of A T A will be related to the world coordinate
• Solution of Ax = 0 is the nullspace vectors of A. frame by a rotation and a translation.
• This corresponds to the smallest singular vector of A. X̃cam : coordinates of point in camera frame
X̃: coordinates of a point in world frame (non-homogeneous)
X: coordinates of a point in world frame (homogeneous)
C̃: coordinates of camera center in world frame

6.5.1 Camera Models/Parameters

In the following, we will look at camera models, which are matrices with particular
Summary
properties that represent the camera mapping.
Pinhole Model (DoF)

• Principal axis: Line from camera cen- • Intrinsic parameters

– principal point coordinates p x , py (2)
ter perpendicular to image plane
– focal length f (1)
• Normalized (camera) coordinate system:     
– pixel magnification factors m x , my (1) mx f S px α x S x0
Camera center is at origin and princi-
– skew (non-rectangular pixels) S (1) K =  m y
 f py  =  α y y0 
pal axis is z-axis
(e.g. “picture of picture”) 1 1 1
• Principal point p: Point where prin-
– radial distortion (preprocessing step)
cipal axis intersects the image plane • Extrinsic parameters
(origin of normalized coordinate sys- – rotation R (3)
tem) – translation t (3)
The origin of camera coordinate system is principal point, the origin of the image

47 48
(both relative to world coordinate system) DLT Algorithm (Direct Linear Transform)
• Camera projection matrix P = K [ R|t]
⇒ general pinhole camera: 9 DoF
Idea: Get rid of unknown scaling factor λ.
⇒ CCD camera with non-square pixels: 10 DoF
⇒ general camera: 11 DoF Notes:
• 5 1/2 correspondences needed for a minimal
solution
6.5.2 Calibration Procedure
• For coplanar points that satisfy Π T X = 0 for
Compute intrinsic and extrinsic parameters using observed camera data. a plane Π, we will get degenerate solutions
(Π, 0, 0), (0, Π, 0), or (0, 0, Π). We need cali-
Given n points with known 3D coordinates Xi and known image projections xi , estimate bration points in more than one plane!
camera parameters, such that xi = PXi .
To figure out intrinsic and extrinsic parameters after
Main idea: recovering the numerical from of the camera matrix,
xi : point on image
use matrix decomposition. Xi : point in world coordinates
1. Place calibration object with known geometry in the scene. λ: unknown scaling factor depending on unknown
depth
2. Get correspondences.
3. Solve for mapping from scene to image: Estimate P = Pint Pext .
6.6 Uncalibrated Reconstruction
For best results, the calibration points need to be measured with subpixel accuracy
(depending in exact pattern). 3 main question for two-view geometry:
Algorithm for checkerboard pattern: • Scene geometry (structure): Where is the pre-image of the points in 3D? (→ triangu-
lation)
1. Perform Canny edge detection.
• Correspondence (stereo matching): How does a point in one image constrain the
2. Fit straight lines to detected linked edges.
position of the corresponding point x 0 in another image? (→ epipolar constraint)
3. Intersect lines to obtain corners. If sufficient care is taken, the points can then be
• Camera geometry (motion): What are the cameras for the two views? (→ SfM)
obtained with localization accuracy < 1/10 pixel
The fundamental matrix and the essential matrix are the algebraic representations of
Rule of thumb: Number of constraints should exceed number of unknowns by a factor
epipolar geometry, that satisfies some epipolar constraint for any pair of corresponding
of 5, thus, at least 28 points are necessary.
points x ↔ x 0 in the two images.

6.6.1 Triangulation

Given projection of a 3D point in several images, find coordinates of

the point. Because of errors, the intersection of the two visual rays
will never meet exactly.

1) Geometric Approach

Find shortest segment connecting the two viewing rays and let X be the midpoint of
that segment.

49 50
2) Linear Algebraic Approach

Two independent equations each in terms of three

unknown entries of X. Stack them and solve using
SVD. This approach nicely generalizes to multiple
cameras, unlike geometric approach.
3) Nonlinear Approach

Find X that minimizes

d2 ( x1 , P1 X ) + d2 ( x2 , P2 X )

Most accurate, but unlike the other two methods, it does not have a closed-from solution
(e.g. can’t be expressed with a finite number of standard operations).
8 points are sufficient, because F has a rank of 2.
Eight-point algorithm has poor numerical conditioning, which can be fixed by rescaling
6.6.2 Uncalibrated Case: Fundamental Matrix the data.
Using x̂ and x̂ 0 in matrix space, we apply a normalized coordinate system to get pixel The solution will usually not fulfill the constraint that F only has rank 2, if we are
coordinates x and x 0 : working on noisy data. This means, that there will be no epipoles through which all
epipolar lines pass.
Normalized Eight-Point Algorithm

1. Center the image data at the origin, and scale it so the mean squared distance
between the origin and the data points is 2 pixels.
2. Use the eight-point algorithm to compute F from the normalized points.
3. Enforce the rank-2 constraint using SVD.

Geometrically, F represents a mapping from the 2D projective plane of the first image
to the pencil of epipolar lines through the epipole e. Thus, it represents a mapping
from a 2D onto a 1D projective space, and hence must have rank 2.
4. Transform fundamental matrix back to original units: If T and T 0 are the normal-
To estimate the fundamental matrix F use eight-point algorithm. izing transformations in the two images, then the fundamental matrix in original
coordinates is T T FT 0 .
Eight-Point Algorithm
Nonlinear Least-Squares (Refinement Approach)

For refining solution, minimize

N N
∑ (xiT Fxi0 )2 ⇒ ∑ (d2 (xi , Fxi0 ) + d2 (xi0 , Fxi ))
i =1 i =1

51 52
6.6.3 Stereo Pipeline with Weak Calibration Structure from Motion Ambiguity Hierarchy of 3D Transformations

We want to estimate world geometry without requiring calibrated cameras. If we transform the scene using a trans- From most unconstraint on top to the most
Main Idea: Estimate epipolar geometry from a (redundant) set of point correspondences formation Q (similarity, affine, projective) constraint on the bottom:
between two uncalibrated cameras. and apply the inverse transformation to
the camera matrices, then the images don’t
Procedure to find F and correspondences: change:
1. Find interest points in both images. (e.g., Harris corners)
x = PX = ( PQ−1 ) QX
2. Compute correspondences using only proximity.
3. Compute epipolar geometry. Idea: With no constraints on camera cali-
4. Refine. bration we get a projective reconstruction.
RANSAC for robust estimation of F: Add information to upgrade the reconstruc-
tion to affine, similarity, or Eulidean.
1. Select random sample of correspondences.
2. Compute F using them. This determines epipolar constraint. Projective Structure from Motion
3. Evaluate amount of support – number of inliers within threshold distance of
Given are m images of n fixed 3D points. Two-camera case with depths z and z0 :
epipolar line.
4. Iterate until a solution with sufficient support has been found (or max #iterations). zij xij = Pi X j , i = 1, ..., m, j = 1, ..., n
5. Choose F with most support.
We want to estimate m projection matrices
Pi and n 3D points X j from the mn corre-
6.6.4 Extension: Epipolar Transfer spondences xij . With no calibration info,
Given x1 and x2 points on the two images with known epipolar geometries F13 , F23 , cameras and points can only be recovered
point x3 in third image is the intersection of l31 and l32 : up to a 4 × 4 projective transformation Q:

T
l31 = F13 x1 , T
l32 = F23 x2 X → QX, P → PQ−1
We can solve for structure and motion when 2mn ≥ 11m + 3n − 15. For two cameras at
Can’t be applied, if the motion is strictly in the same plane. least 7 points are needed.
Decomposing the fundamental matrix in the two-camera case means that if we can com-
6.7 Structure-from-Motion (SfM) pute the fundamental matrix between two cameras, we can directly estimate the two
projection matrices from F. Once we have the projection matrices, we can compute the
3D position of any point X by triangulation.
Given: m images of n fixed 3D points xij = Pi X j , i = 1, ..., m,
j = 1, ..., n. To obtain both kinds of information at the same time, use projective factorization.

Problem: Estimate m projection matrices Pi and n 3D points

X j from the mn correspondeces xij .

53 54
Projective Factorization

If we knew the depths z, we could factorize

D to estimate M and S. If we knew M and
S, we could solve for z.
Solution: Iterative estimation with Sequen-
tial Structure from Motion by alternating the
steps above.
Sequential Structure from Motion
1. Initialize motion from two images using fundamental matrix. Initialize structure.
2. For each additional view:
• Determine projection matrix of new camera using all the known 3D points
that are visible in its image - calibration
• Refine and extend structure: Compute new 3D points, re-optimize existing
points that are also seen by this camera - triangulation
3. Refine structure and motion - bundle adjustment

Bundle Adjustment
Non-linear method for refining structure and motion. Minimize mean-square reprojec-
tion error
m n
E(P, X) = ∑ ∑ D(xij , Pi X j )2
i =1 j =1

Idea: Seek the Maximum Likelihood (ML) solution assuming the measurement noise is
Gaussian. It involves adjusting the bundle of rays between each camera center and the
set of 3D points.
Considerably improves the results and allows assignment of individual covariances
to each measurement. However, it needs a good initialization, and can become an
extremely large minimization problem.

Deep Learning For Remote Sensing Images With Open Source Software (Rémi Cresson) (Z-Library)
No ratings yet
Deep Learning For Remote Sensing Images With Open Source Software (Rémi Cresson) (Z-Library)
165 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
The Complete Dictionary of Photography
No ratings yet
The Complete Dictionary of Photography
63 pages
Question and Answer
100% (3)
Question and Answer
15 pages
Computer Vision55
100% (1)
Computer Vision55
268 pages
Nikon D5600 Experience
100% (2)
Nikon D5600 Experience
38 pages
Landscape Rubric
No ratings yet
Landscape Rubric
1 page
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
98 pages
Cviii 2024 Ws
No ratings yet
Cviii 2024 Ws
45 pages
Computer Vision I
No ratings yet
Computer Vision I
61 pages
Opencv 2 Refman
No ratings yet
Opencv 2 Refman
643 pages
CV2019
No ratings yet
CV2019
152 pages
Opencv Reference Manual PDF
No ratings yet
Opencv Reference Manual PDF
817 pages
Intelligent Robotic Vision: Dissertation
No ratings yet
Intelligent Robotic Vision: Dissertation
10 pages
Opencv 2 Refman
No ratings yet
Opencv 2 Refman
929 pages
Opencv 2 Refman
No ratings yet
Opencv 2 Refman
913 pages
Opencv 2 Refman
No ratings yet
Opencv 2 Refman
819 pages
Opencv 2 Refman
No ratings yet
Opencv 2 Refman
929 pages
Opencv 2 Refman
No ratings yet
Opencv 2 Refman
931 pages
Unit 4 Computer Vision Lecture Notes 1 4 Compress
No ratings yet
Unit 4 Computer Vision Lecture Notes 1 4 Compress
138 pages
Manual de Referencia Opencv
No ratings yet
Manual de Referencia Opencv
915 pages
1-Recent Advances in Object Detection in The Age of Deep Convolutional Neural Networks
No ratings yet
1-Recent Advances in Object Detection in The Age of Deep Convolutional Neural Networks
104 pages
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
No ratings yet
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
149 pages
Computer Vision 1731163352
No ratings yet
Computer Vision 1731163352
153 pages
Matconvnet Manual
No ratings yet
Matconvnet Manual
59 pages
Matconvnet: Convolutional Neural Networks For Matlab
No ratings yet
Matconvnet: Convolutional Neural Networks For Matlab
55 pages
Opencv2refman PDF
No ratings yet
Opencv2refman PDF
899 pages
Opencv2refman CPP
No ratings yet
Opencv2refman CPP
409 pages
Opencv Tutorials
No ratings yet
Opencv Tutorials
425 pages
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
No ratings yet
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
58 pages
Object Recognition On The REEM Robot
No ratings yet
Object Recognition On The REEM Robot
88 pages
Computer Vision Sample
No ratings yet
Computer Vision Sample
57 pages
Tesi
No ratings yet
Tesi
57 pages
Computer Vision
No ratings yet
Computer Vision
33 pages
Digital Image Processing BN
No ratings yet
Digital Image Processing BN
300 pages
Opencv Tutorials
No ratings yet
Opencv Tutorials
405 pages
Opencv Tutorials 2.4.3
No ratings yet
Opencv Tutorials 2.4.3
409 pages
LectureNotes Ilovepdf Compressed Ilovepdf Compressed PDF
No ratings yet
LectureNotes Ilovepdf Compressed Ilovepdf Compressed PDF
237 pages
Dissertation
No ratings yet
Dissertation
86 pages
5 Major Computervision Technique
No ratings yet
5 Major Computervision Technique
10 pages
Opentut
No ratings yet
Opentut
351 pages
Opencv Tutorials PDF
No ratings yet
Opencv Tutorials PDF
351 pages
Opencv Tutorials
50% (2)
Opencv Tutorials
451 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
From Everand
Conquering the Competition: Strategies for Standing Out in the Gaming Content Landscape
Rian McCullen
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Teaching Scratch Programming…from Scratch
From Everand
Teaching Scratch Programming…from Scratch
John Nunez
No ratings yet
A To Z of Internet: Everything You Wanted to Know
From Everand
A To Z of Internet: Everything You Wanted to Know
Bittu Kumar
No ratings yet
CAN Bus for Beginners: A Practical Guide to Automotive Networking
From Everand
CAN Bus for Beginners: A Practical Guide to Automotive Networking
Mohamad Charara
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Operation Longlife
From Everand
Operation Longlife
E. Hoffmann Price
3.5/5 (3)
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
Linux Shell Scripting - A Beginner's Guide: First Edition
From Everand
Linux Shell Scripting - A Beginner's Guide: First Edition
Michael Basler
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
1 Evaluating Multiple Object Tracking Performance The CLEAR MOT Metrics
No ratings yet
1 Evaluating Multiple Object Tracking Performance The CLEAR MOT Metrics
10 pages
1 Springer Handbook of Robotics - Multi Sensor Data Fusion
No ratings yet
1 Springer Handbook of Robotics - Multi Sensor Data Fusion
26 pages
Multisensor Data Fusion A Review of The State-Of-The-Art
No ratings yet
Multisensor Data Fusion A Review of The State-Of-The-Art
17 pages
A Survey On Machine Learning For Data Fusion
No ratings yet
A Survey On Machine Learning For Data Fusion
15 pages
An Introduction To Multisensor Data Fusion
No ratings yet
An Introduction To Multisensor Data Fusion
18 pages
ACCY 504 Auditing I Module 5
No ratings yet
ACCY 504 Auditing I Module 5
31 pages
Atomic Habits
No ratings yet
Atomic Habits
4 pages
Introduction To Communication Systems
100% (1)
Introduction To Communication Systems
469 pages
Laboratory 4. Image Features and Transforms: 4.1 Hough Transform For Lines Detection
No ratings yet
Laboratory 4. Image Features and Transforms: 4.1 Hough Transform For Lines Detection
13 pages
Edge Detection Methods
No ratings yet
Edge Detection Methods
7 pages
Brighteen Magazine 2nd Issue
No ratings yet
Brighteen Magazine 2nd Issue
4 pages
M000115e 4 nw500 Brochure Emea en 240305 1
No ratings yet
M000115e 4 nw500 Brochure Emea en 240305 1
5 pages
Panasonic DMW-FL220 Flash
No ratings yet
Panasonic DMW-FL220 Flash
128 pages
2024 SuperSonic Catalog
No ratings yet
2024 SuperSonic Catalog
130 pages
Anime IVTC
No ratings yet
Anime IVTC
13 pages
Bird Photography
No ratings yet
Bird Photography
339 pages
Adobe Aftereffects Lesson 1: Fundamentals of Motion Graphics
No ratings yet
Adobe Aftereffects Lesson 1: Fundamentals of Motion Graphics
6 pages
Iso Astm TR 52916-22
No ratings yet
Iso Astm TR 52916-22
29 pages
Drypro Σ Ii: Dry Laser Imager
No ratings yet
Drypro Σ Ii: Dry Laser Imager
4 pages
FIAP Approved Contests List - 2020chronological - 10.04.2021
No ratings yet
FIAP Approved Contests List - 2020chronological - 10.04.2021
87 pages
8.2 - Compliance Matrix For Technical Proposal - Jamaica
No ratings yet
8.2 - Compliance Matrix For Technical Proposal - Jamaica
9 pages
Leica C3
No ratings yet
Leica C3
15 pages
Pengadaan Kamera
No ratings yet
Pengadaan Kamera
2 pages
Sma 61 2015
No ratings yet
Sma 61 2015
10 pages
Assignment Week 1-DIP 2024
No ratings yet
Assignment Week 1-DIP 2024
6 pages
Alex Drone Photography - Aerial Photography
No ratings yet
Alex Drone Photography - Aerial Photography
5 pages
RP 50 Simple Manual
No ratings yet
RP 50 Simple Manual
27 pages
Public Admin Siwes
No ratings yet
Public Admin Siwes
22 pages
Different Brown Shades Little Dots Background - Google Search
No ratings yet
Different Brown Shades Little Dots Background - Google Search
1 page
Zoncare q3 Catalogue
No ratings yet
Zoncare q3 Catalogue
4 pages
Ws-Ray Diagram
No ratings yet
Ws-Ray Diagram
7 pages
Mri Scan
No ratings yet
Mri Scan
3 pages
LBP Hog
No ratings yet
LBP Hog
49 pages