0% found this document useful (0 votes)
26 views

Assignment Computer Vision

The document is a submission for a computer science course on computer vision. It summarizes a research paper on multi-view 3D reconstruction using Transformers. The paper proposes using a Transformer network to predict a 3D structure from 2D image features. The method achieved high accuracy on the ShapeNet dataset and was more efficient than traditional methods. Applications of multi-view 3D reconstruction include object recognition, scene understanding, augmented reality and robotics.

Uploaded by

sheraz7288
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Assignment Computer Vision

The document is a submission for a computer science course on computer vision. It summarizes a research paper on multi-view 3D reconstruction using Transformers. The paper proposes using a Transformer network to predict a 3D structure from 2D image features. The method achieved high accuracy on the ShapeNet dataset and was more efficient than traditional methods. Applications of multi-view 3D reconstruction include object recognition, scene understanding, augmented reality and robotics.

Uploaded by

sheraz7288
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

UNIVERSITY OF SIALKOT

Academic Year : 2023 – 2024 Department : Computer Science


------------------------------------------------------------------------------------------
Program : MS Computer Science
------------------------------------------------------------------------------------------

Submitted by : Ali Imran Cheema Submitted to : Dr. Jahanzeb


Reg No : 1230200412 Session : Spring 2023
Subject : Computer Vision Total Credit Hours : 03
Assignment No : 1 Date of Submission : 11 / 06 / 2023

Student Sign Professor Sign

____________ ____________
Title: Multi-view Reconstruction

Authors: Hao Chen, Rui Zhao, Xiangyu Zhang, and Jian Sun
Published: March 8, 2021
Journal: arXiv

Abstract:
Multi-view 3D reconstruction is a challenging problem in computer vision. The
goal is to recover the 3D structure of an object from a set of 2D images. This
problem is difficult due to the ambiguity of the correspondence between the 2D
images and the 3D object.
In this paper, we propose a new method for multi-view 3D reconstruction based
on the Transformer architecture. The Transformer is a self-attention model that
has been shown to be effective for a variety of natural language processing tasks.
We show that the Transformer can also be used for multi-view 3D reconstruction.
Our method first extracts features from the 2D images using a convolutional
neural network. These features are then fed to a Transformer network, which
learns to predict the 3D structure of the object. The Transformer network uses
self-attention to learn the relationships between the features from different views.
We evaluated our method on the ShapeNet dataset, which contains a large number
of 3D models. Our method was able to reconstruct the 3D models with high
accuracy. We also showed that our method is more efficient than traditional
methods for multi-view 3D reconstruction.
Methods:
Our method for multi-view 3D reconstruction consists of two main steps:
Feature extraction: We first extract features from the 2D images using a
convolutional neural network. The features extracted from each image are then
stacked together to form a feature tensor.
3D reconstruction: We then use a Transformer network to predict the 3D
structure of the object from the feature tensor. The Transformer network uses self-
attention to learn the relationships between the features from different views.
Main findings:
Our method was able to reconstruct the 3D models with high accuracy. We also
showed that our method is more efficient than traditional methods for multi-view
3D reconstruction.
Conclusion:
We have proposed a new method for multi-view 3D reconstruction based on the
Transformer architecture. Our method was able to reconstruct the 3D models with
high accuracy and is more efficient than traditional methods.
Applications:
Multi-view 3D reconstruction has a wide range of applications in computer
vision, including:
 3D object recognition
 3D scene understanding
 Augmented reality
 Virtual reality
 Robotics
Our method can be used to improve the accuracy and efficiency of these
applications.
Q No. 2) Derive the perspective equation projections for a virtual
image located at a distance 𝑓′ in front of the pinhole?
Ans :
x = -f * (X / Z)
y = -f * (Y / Z)

where:

 x and y are the coordinates of the projected point on the image plane
 X and Y are the coordinates of the original point in 3D space
 Z is the distance of the original point from the pinhole
 f is the focal length of the camera

Now, let's say that the virtual image is located at a distance f' in front of the
pinhole. This means that the distance of the original point from the image
plane is now Z - f'. So, we can update the perspective projection equations
as follows:

x = -f * (X / (Z - f'))

y = -f * (Y / (Z - f'))

These equations give the coordinates (x, y) of the projection of a point P


(X, Y, Z) on the virtual image onto the image plane, taking into account the
focal length f and the distance d of the virtual image from the pinhole.
Q No. 3) Give a geometric construction of the image 𝑃′ of a point
𝑃 given the two focal points 𝐹 and 𝐹′ of a thin lens ?
Ans : Here are the steps on how to construct the image P' of a point P given the
two focal points F and F' of a thin lens:

 Draw a line segment to represent the optical axis of the lens.


 Mark the two focal points F and F' on the optical axis, with F being the
focal point on the same side of the lens as the object point P and F' being
the focal point on the opposite side of the lens.
 Draw a ray from P that is parallel to the optical axis.
 The ray will pass through F' after it is refracted by the lens.
 Draw a line from F' that is parallel to the optical axis.
 The line will intersect the ray from step 3 at point P'.

P' is the image of point P formed by the thin lens.

The End

You might also like