Assignment Computer Vision
Assignment Computer Vision
____________ ____________
Title: Multi-view Reconstruction
Authors: Hao Chen, Rui Zhao, Xiangyu Zhang, and Jian Sun
Published: March 8, 2021
Journal: arXiv
Abstract:
Multi-view 3D reconstruction is a challenging problem in computer vision. The
goal is to recover the 3D structure of an object from a set of 2D images. This
problem is difficult due to the ambiguity of the correspondence between the 2D
images and the 3D object.
In this paper, we propose a new method for multi-view 3D reconstruction based
on the Transformer architecture. The Transformer is a self-attention model that
has been shown to be effective for a variety of natural language processing tasks.
We show that the Transformer can also be used for multi-view 3D reconstruction.
Our method first extracts features from the 2D images using a convolutional
neural network. These features are then fed to a Transformer network, which
learns to predict the 3D structure of the object. The Transformer network uses
self-attention to learn the relationships between the features from different views.
We evaluated our method on the ShapeNet dataset, which contains a large number
of 3D models. Our method was able to reconstruct the 3D models with high
accuracy. We also showed that our method is more efficient than traditional
methods for multi-view 3D reconstruction.
Methods:
Our method for multi-view 3D reconstruction consists of two main steps:
Feature extraction: We first extract features from the 2D images using a
convolutional neural network. The features extracted from each image are then
stacked together to form a feature tensor.
3D reconstruction: We then use a Transformer network to predict the 3D
structure of the object from the feature tensor. The Transformer network uses self-
attention to learn the relationships between the features from different views.
Main findings:
Our method was able to reconstruct the 3D models with high accuracy. We also
showed that our method is more efficient than traditional methods for multi-view
3D reconstruction.
Conclusion:
We have proposed a new method for multi-view 3D reconstruction based on the
Transformer architecture. Our method was able to reconstruct the 3D models with
high accuracy and is more efficient than traditional methods.
Applications:
Multi-view 3D reconstruction has a wide range of applications in computer
vision, including:
3D object recognition
3D scene understanding
Augmented reality
Virtual reality
Robotics
Our method can be used to improve the accuracy and efficiency of these
applications.
Q No. 2) Derive the perspective equation projections for a virtual
image located at a distance 𝑓′ in front of the pinhole?
Ans :
x = -f * (X / Z)
y = -f * (Y / Z)
where:
x and y are the coordinates of the projected point on the image plane
X and Y are the coordinates of the original point in 3D space
Z is the distance of the original point from the pinhole
f is the focal length of the camera
Now, let's say that the virtual image is located at a distance f' in front of the
pinhole. This means that the distance of the original point from the image
plane is now Z - f'. So, we can update the perspective projection equations
as follows:
x = -f * (X / (Z - f'))
y = -f * (Y / (Z - f'))
The End