0% found this document useful (0 votes)
441 views44 pages

Computer Vision 15 Exam Q and A

The document outlines the grading criteria and exam structure for a Computer Vision course at Utrecht University, emphasizing the importance of both practical assignments and written exams. It details the exam criteria, including theoretical and conceptual knowledge, types of questions, and preparation guidelines. Additionally, it lists key topics covered in the course, exam logistics, and encourages student feedback for course improvement.

Uploaded by

laughriotclip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
441 views44 pages

Computer Vision 15 Exam Q and A

The document outlines the grading criteria and exam structure for a Computer Vision course at Utrecht University, emphasizing the importance of both practical assignments and written exams. It details the exam criteria, including theoretical and conceptual knowledge, types of questions, and preparation guidelines. Additionally, it lists key topics covered in the course, exam logistics, and encourages student feedback for course improvement.

Uploaded by

laughriotclip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

COMPUTER VISION

2024 - 2025
>EXAM Q&A
UTRECHT UNIVERSITY
RONALD POPPE
GRADING
Practical assignments: 60%, Written exam: 40%
• Retake only if exam grade is >= 4
• No assignment retakes!

To pass the course:


• Final score must be at least 5.5 to pass, and
• Minimum grade 4 for weighted average of assignments, and
• Minimum grade 4 for the exam.
EXAM CRITERIA
EXAM CRITERIA
You will be graded on:
• Theoretical knowledge
• Conceptual knowledge/insight

Open questions
• Open questions to test understanding, often cross-topic
• Some explanation questions, some development questions

Multiple-choice questions
• Focus on insights
• Always multiple (or no) options possible
EXAM CRITERIA2
Theoretical knowledge. Be able to explain:
• How a method works (dropout, voxel reconstruction)
• Different steps
• Input/output of each step
• Relevance of each step
• (Dis)advantages/limitations of the method

• Differences between methods (1- vs. 2-stage object detection)


• Relative (dis)advantages
EXAM CRITERIA3
Conceptual knowledge/insight:
• Why are things the way they are (why use batch norm, why use HSV)?
• Explain (dis)advantages/limitations
• Combinations/parallels between topics

• How would you address a certain problem?


• Step-by-step process
• Explain (pseudo-code or brief sentences) how it works
• You might be asked to write the pseudo-code for a problem
EXAM CRITERIA4
Typically:
• 4 MC questions (multiple answers possible)
• 5 open questions

How to answer:
• Concise: longer is not needed. But make sure all criteria are covered.
• Specific: I need to be sure (not guess) that you understood
EXAM CRITERIA5
I should be able to understand your answer just from the text
• No links, no references to slides, knowledge clips etc.

If you add irrelevant or incorrect information, I might deduct points


• Avoid “hitting all buttons”

Don’t use “vague” terms


• “much more”, “sometimes”, “almost always”, “better”
• “use an algorithm”
PREPARATION
Slides of the knowledge clips and lectures

Additional reading (for your own understanding):


• Links to books
• Links to websites
• Links to lectures

Insights that were gained while working on the assignments


PREPARATION2
In general, you should be able to:
• Understand each statement in a slide
• Be able to explain it
• Be able to give an example of how something should be applied
• Be able to give an example of a case in which something does/doesn’t work

If you cannot do this, use the additional reading material!


Can you can also post your questions on Teams, exam preparation channel
TOPICS
1. Pixels, images, video 9. CNN architectures
2. Camera geometry & Image formation 10. Object detection and segmentation
3. 3D computer vision 11. Vision transformers
4. Learning-based computer vision 12. Vision language models
5. Performance measures 13. Image and video generation
6. Neural networks & Backpropagation 14. Computer vision challenges
7. Convolutional neural networks 15. Exam Q&A
8. Training CNNs
1. PIXELS, IMAGES, VIDEO
General background:
• Applications of CV
• Challenges in CV (in which applications are these important)
• Image/video data structure
• Color spaces and distances
2. CAMERA GEOMETRY AND
IMAGE FORMATION
Camera geometry:
• Intrinsics/extrinsics/camera matrix: how to calculate (equations), what is each element
• Calibration: how does it work (algorithm), which are the important parameters, which
are the assumptions
• Experience from Assignment 1

Camera radiometry:
• Sensors: how do they work, how do we measure color?
• Distortions: what are they and how/when do they occur?
3. 3D COMPUTER VISION
Depth from images:
• Which ways are there to get depth/3D from images?
• 3D reconstruction: Voxel vs. mesh models: (dis)advantages
• Silhouette-based reconstruction: how does it work (algorithm), look-up table,
what can we model (limitations), how to improve speed/memory
requirements, how to obtain a mesh model (algorithm)
• Experience from Assignment 2

Background subtraction
• How does it work, equation, assumptions, challenges
• Experience from Assignment 2
4. LEARNING-BASED
COMPUTER VISION
Common vision tasks image classification vs. object detection
• Role of image descriptors, intra-class vs. inter-class
• Supervised classification: generalization, overfitting
• Unsupervised classification: clustering, K-means (algorithm)

Object detection
• Sliding window, image pyramid, Selective Search
5. TRAINING, TESTING, AND
PERFORMANCE MEASURES
Training
• Splits: training, validation, test sets
• Cross-validation, parameter tuning
• Hard negative mining (how to use), data augmentation (options, risks), data synthesis

Experiment design: parameter search: grid search, evolutionary optimization

Performance measures:
• Precision/recall, F1, PR-curve, average precision
• Single vs. multiclass: confusion matrix
• Detection: IOU, non-maximum suppression, AP
6. NEURAL NETWORKS
Neurons:
• Activation functions, perceptrons, limitations, concepts

Networks:
• Feed-forward, hidden units, limitations, challenges, low vs. high-level
features, non-linearity
• Training neural networks: backpropagation (no equations)
7. CONVOLUTIONAL NEURAL
NETWORKS
Overall architecture: layers, inputs, outputs
• Convolution, pooling, fully connected, flatten layer, output layer

Receptive field

For all layers:


• Number of connections, parameters, activation volume size, how to connect them
• Experience from Assignment 3
8. TRAINING CNNS
Loss functions: all discussed
Learning rate: role, schedules
Optimizers: properties, (mini-batch) (stochastic) gradient descent, momentum
Regularization: L1/L2, batch normalization, dropout

Initialization: role (no equations)


Pretraining, fine-tuning
9. CNN ARCHITECTURES
Understand how the network works:
• AlexNet
• VGG

Inception: 1x1 convolution, auxiliary classifiers

ResNet/ DenseNet: vanishing gradient problem, skip connection

Two-stream network: video input, late/mid-level fusion


10. OBJECT DETECTION AND
SEGMENTATION
Architectures:
• R-CNN: architecture, way of training
• SPP: spatial pyramid pooling
• Fast R-CNN: way of training
• Faster R-CNN: anchor boxes, end-to-end training
• YOLOv1: architecture, output, one-stage vs. two-stage object detection
• Mask R-CNN: additional segmentation head

Multi-task learning: concept, architecture, assumptions, multi-dataset, adversarial


outputs

Insights from working on Assignment 4


11. VISION
TRANSFORMERS
Vision transformer:
• Global processing steps, inductive bias
• Input preparation: patchify, embedding, class token, position encoding
• Encoder: architecture, self-attention mechanism
• Output processing: role of class token output
• Training

No Swin transformer
12. VISION-LANGUAGE MODELS
Aliging text and image:
• Importance and consequences

CLIP:
• Way of training, why of using it, zero-shot learning capabilities

Decoder:
• Architecture, process of outputting tokens, training options
PRACTICE EXAMS
Five test “exams” online:
• 2015: NOT 4-6
• 2016: NOT 3-7
• 2018 test: Answers at the end (NOT 2, this is also not a complete exam)
• 2018: NOT 5-9
• 2021 test: NOT 2, 3, 6
• Last one is the most representative one
• A new one will be provided

Don’t rely on these materials to “guess” which questions will be asked


REQUESTED TOPICS
TRAINABLE PARAMETERS
Trainable parameters are learned during training

All neural networks:


• Fully connected layer: weights between each input neuron and
output neuron + bias

Convolutional neural networks:


• Convolution layer: kernel weights + bias
TRAINABLE PARAMETERS2
Vision transformers:
Per image:
• Patch embedding matrix
• Positional encoding (when using learned encodings) (matrix)
• Initial class embedding (matrix)
• FC layers in the final MLP head

Per transformer block:


• Weight matrices WK, WQ, WV
• FC layers in MLP
• Weight matrix WO (for multi-head self-attention)
TRAINABLE PARAMETERS3
We distinguish between parameters and hyperparameters:
• Parameters: we learn during training, are part of the model
• Hyperparameter: govern the training process

Typically, hyperparameters are set ourselves before training

Bigger models usually have more parameters


• CNNs: more layers
• ViTs: larger embedding space, more transformer blocks
CONVOLUTION LAYERS
What are the inputs, parameters and outputs of a convolution layer?

Input: a volume WxHxD


• Each element in the input is a neuron

Output: a volume W’xH’xD’


• Again, each element is a neuron
CONVOLUTION LAYERS2
Trainable parameters are in the convolution kernels
• These determine what kind of patterns are extracted
• All kernel elements + a bias term

Number of parameters is limited given input size


• Consequence of weight sharing
CONVOLUTION LAYERS3
Neurons in input volume are connected to neurons in output volume
• Not fully connected but locally connected

Example:
• Output s2 depends on x1-x3
• Multiplied by Yellow,Black,Blue
• Output is weighted sum

• Output s3 depends on x2-x4


• Multiplied by Yellow,Black,Blue
• Output is weighted sum
CONVOLUTION LAYERS4
Input and output layer connected with shared weights
• Values that are similar for many connections
BACKPROPAGATION
Goal of backpropagation is to calculate gradient for each parameter
• Gradient determines (with optimizer and learning rate) the change in
the value of the parameter (the weight update)

Developed for feed-forward neural networks


• Each output can be described as a (non-linear) function of the inputs
BACKPROPAGATION2
Simple mathematical formulation for a regular neural network at layer L:
• Input aL
• Output aL+1
• Weights between layer L and layer L+1 WL
• Activation function fL

Output calculated as: aL+1 = fL(WL aL)


BACKPROPAGATION3
𝑦𝑦
At the last layer N, the output is aN
𝐶𝐶
aN
During training, output aN is compared to the actual output y
• Loss function C used: C(aN, y)
• C is higher when aN deviates further from y a1N-1 a2N-1

But aN is calculated from WN-1 and aN-1!


BACKPROPAGATION4
𝑦𝑦
Each input into aN contributes to loss
𝐶𝐶
• We can calculate the partial derivative to each input aN

Adjust WN-1 and aN-1 to move in the right direction


• But then we also need to change WN-2 and aN-2 a1N-1 a2N-1
• Until we reach the input
BACKPROPAGATION5
In this backward pass, we have visited each parameter
• Received a “weight update” to move in the correct direction

But many parameters connect to multiple neurons


• Weight updates typically accumulated per parameter
BACKPROPAGATION6
For convolution layers, weights are shared
• Weight updates are accumulated over (many) different paths
BACKPROPAGATION7
For pooling layers, updating the weights in preceding layers depends on pooling type

For max-pool, only a single value is selected


• Gradient of 1 for the selected (maximum) input
• Gradient of 0 for all other inputs

For average pooling, all inputs are used but gradient divided by the number of inputs
FINALLY…
ASSIGNMENT
Assignment 4:
• Deadline: Sunday March 30, 23:00
• Don’t underestimate the time required to prepare the outputs and
implementing the loss function

Need help?
• Use Teams for questions
EXAM
Monday April 7, 13:30-15:30, EDUC-Alfa
• Two hours (plus 20 if you’re eligible for extra time)
• No materials and calculator allowed
• Just a pen and food/drinks

For people that have “minder massaal” provision


• Ruppert-029, 13:30-15:30
COURSE EVALUATION
I hope you have enjoyed the course!

Please give us feedback by filling in the Caracal course evaluation form. We


always like to improve the course:
• If you have suggestions
• If you thought something was bad
• If you enjoyed something
FINALLY…
Good luck with the exam and final assignment!

And thanks for your enthusiasm!

You might also like