CS445: Computational Photography: Programming Project #5: Video Stitching and Processing
CS445: Computational Photography: Programming Project #5: Video Stitching and Processing
html
Overview
In this project, you will experiment with interest points, image projection, and videos. You will
manipulate videos by applying several transformations frame by frame. In doing so, you will explore
correspondence using interest points, robust matching with RANSAC, homography, and background
subtraction. You will also apply these techniques to videos by projecting and manipulating individual
frames. You can also investigate cylindrical and spherical projection and other extensions of photo
stitching and homography as bells and whistles.
The starter package includes a input video, extracted frames, and utils for extracting frames and
creating videos from saved frames or numpy arrays.
To stitch two overlapping video frames together, you first need to map one image plane to the other.
To do that, you need to identify keypoints in both images, match between them to find point
correspondences, and compute a projective transformation, called a homography that maps from one
set of points to the other. Once you have recovered the homography, you can use it to project all
frames onto the same coordinate space, and stitch them together to generate the video output. The
starter notebook includes most of auto_homography that performs the steps of extracting SIFT
features, matching, and setting up RANSAC. You need to provide the parameters, the score function,
https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html 1/5
11/16/2020 https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html
and the homography estimation function. You may also want to experiment with the threshold used
for RANSAC and/or recompute the homography based on all inliers after RANSAC.
Check that your homography is correct by plotting four points that form a square in frame 270 and
their projections in each image, like this:
Include those images in your project report. Note that cv2.warpPerspective() takes output sizes, not
coordinates, as input, i.e.: img_warped = cv2.warpPerspective(img, H, (output_width,
output_height))
To blend your images, you need to create a canvas (a blank image) that is large enough to display the
warped pixels of each original image. Then, for each pixel in the canvas, you apply your
homographies to find the corresponding coordinate in source image and retrieve the color (or
interpolated color). Some of the pixels in frame 270 will correspond to negative coordinates in your
canvas. See the Tips document for a suggestion on how to deal with this.
We provide a very simple blending function in utils.py that replaces zero pixels in one image with non-
zero pixels in another image of the same size. Better blending methods are bells and whistles. In this
part you will map the frame number 270 onto the reference image and produce an output like the
following.
The images and videos in this project page are down-sampled but you need to produce a full
resolution version.
As an example, to the right is a central portion of the background image produced by my code.
Add an unexpected object in the movie. Label the pixels in each frame as foreground or background.
An inserted object must go below foreground and above background. Also note that an inserted object
must appear fixed on the ground. Create a video that looks like original video with the tiny difference
that some objects are inserted in the video.
You can apply the seven parts of the main project on two other videos. You get 20 points for
processing one additional video and 40 points for processing two. If you do two additional videos, one
of them must be your own. You get full points if you produce the results from parts 4-6 for your
video.
Note that the camera position should not move in space but it can rotate. You also need to have some moving
objects in the camera. Try to use your creativity to deliver something cool.
https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html 4/5
11/16/2020 https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html
Mapping frame 90 to frame 450 direclty is difficult because they share very little area. Therefore you need to
perform a two stage mapping by using frame 270 as a guide. Compute one projection from 90 to 270 and one
from 270 to 450 and multiply the two homography matrices (order of multiplication matters). This produces a
projection from 90 to 450 even though these frames have very little area in common.
Jaipur - Aligned
In the video you produced in part 3, each pixel appears in several frames. You
need to estimate which of the many colors correspond to the background. We
take advantage of the fact that the background color is fixed while the
foreground color changes frequently (because foreground moves). For example,
a pixel on the street has a gray color. It can become red, green, white or black
each for a short period of time. However, it appears gray more than any other
color.
For each pixel in the sequence of part 3, determine all valid colors (colors that
come from all frames that overlap that pixel). You can experiment with different
methods for determining the background color of each pixel, as discussed in class. Perform the same
procedure for all pixels and generate output. The output should be a completed panorama showing
only pixels of background or non-moving objects.
https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html 3/5
11/16/2020 https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html
Your background image and foreground videos won't be perfect if you use a very simple method to
determine background color and to select foreground pixels. That's ok for the core project, but you
could try to do better. One solution is to model the color distributions of each background pixel and all
the foreground pixels. E.g. each background pixel has a Gaussian distribution with its own mean and a
small variance, and all the foreground pixels are modeled with a histogram or mixture of Gaussians.
Then, you can solve for the probability that the pixel in each frame belongs to foreground and
background and use that to create your foreground/background videos. Your final result will be new
foreground and background videos.
In Part 5 you created a background movie by projecting back the panorama background to each
frame plane. If you map a wider area you will get a wider background movie. You can use this
background movie to extend the borders of your video and make it wider. The extended video must
be at least 50% wider. You can keep the same height.
You can track camera orientation using the homography matrices for each frame. This allows you to
estimate and remove camera shake. Please note that camera shake removal for moving cameras is a
more difficult problem and is an active area of research in computational photography. One idea
(which we haven't tried and might not work) is to assume that camera parameters change smoothly
and obtain a temporally smoothed estimate for each camera parameter. A better but more
complicated method would be to solve for camera angle and focal length and smooth estimates for
those parameters.
You can use the techniques from the first bells and whistles task to add more people to the street. You
can sample people from other frames that are a few seconds apart. You can alternatively show two
copies of yourself in a video. Please note that your camera needs some rotation.
Important Files
Starter code and materials
Tips and Python Samples
Report Template
Deliverables
To turn in your assignment, download/print your Jupyter Notebook and your report to PDF, and ZIP
your project directory including any supporting media used. See project instructions for details. The
Report Template (above) contains rubric and details of what you should include.
https://courses.engr.illinois.edu/cs445/fa2020/projects/video/ComputationalPhotograph_ProjectVideo.html 5/5