0% found this document useful (0 votes)

31 views

Deep Learning Based Monocular Depth Estimation For Object Distance Inference in 2D Images

Monocular depth estimation, a process of predicting depth from a single 2D image, has seen significant advancements due to the proliferation of deep learning techniques. This research focuses on leveraging deep learning for monocular depth estimation to infer object distances accurately in 2D images. We explore various convolutional neural network (CNN) architectures and transformer models to analyze their efficacy in predicting depth information.

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Deep Learning Based Monocular Depth Estimation For Object Distance Inference in 2D Images

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1431

Deep Learning Based Monocular

Depth Estimation for Object Distance
Inference in 2D Images
G. Victor Daniel1 (Assistant Professor); Koneru Gnana Shritej2; Kosari Hemanth Sai3; Sunkara Namith4
1
Department of Artificial Intelligence, Anurag University, Hyderabad, India
2,3,4
U.G. Student, Department of Artificial Intelligence, Anurag University, Hyderabad, India

Abstract:- Monocular depth estimation, a process of transformer-based architectures, have demonstrated

predicting depth from a single 2D image, has seen remarkable capabilities in extracting intricate features from
significant advancements due to the proliferation of deep images, enabling significant advancements in monocular
learning techniques. This research focuses on leveraging depth estimation. These models can learn to infer depth by
deep learning for monocular depth estimation to infer recognizing patterns and contextual cues within the image,
object distances accurately in 2D images. We explore such as shading, texture gradients, and object relationships.
various convolutional neural network (CNN) The primary objective of this research is to investigate and
architectures and transformer models to analyze their compare the performance of various deep learning models in
efficacy in predicting depth information. Our approach the context of monocular depth estimation. We aim to
involves training these models on extensive datasets determine how different architectures and training strategies
annotated with depth information, followed by rigorous impact the accuracy and reliability of depth predictions. To
evaluation using standard metrics. The results achieve this, we utilize large-scale datasets annotated with
demonstrate substantial improvements in depth depth information, enabling the models to learn and
estimation accuracy, highlighting the potential of deep generalize effectively. This study also explores the
learning in enhancing computer vision tasks such as importance of training data diversity and augmentation
autonomous driving, augmented reality, and robotic techniques in enhancing model performance. By varying the
navigation. This study not only underscores the datasets and introducing different augmentation strategies,
importance of model architecture but also investigates the we seek to understand how these factors contribute to the
impact of training data diversity and augmentation robustness of depth estimation models. In the following
strategies. The findings provide a comprehensive sections, we provide a comprehensive review of related work,
understanding of the current state-of-the-art in detailing the evolution of monocular depth estimation
monocular depth estimation, paving the way for future techniques and the role of deep learning in this domain. We
innovations in object distance inference from 2D images. then describe our experimental setup, including the datasets
By providing a detailed analysis of various models and used, model architectures, and evaluation metrics. The results
their performance, this research contributes to a better section presents a detailed analysis of model performance,
understanding of monocular depth estimation and its highlighting key findings and insights. Finally, we discuss the
potential for real-world applications, paving the way for implications of our results for future research and practical
future advancements in object distance inference from 2D applications, and conclude with a summary of our
images. contributions and potential directions for further study.

Keywords:- Monocular Depth Estimation, Deep Learning, II. LITERATURE SURVEY

Convolutional Neural Network (CNN), Computer Vision,
Augmented Reality, Robotic Navigation. Masoumian et al. [1] conducted a comprehensive review
of monocular depth estimation using deep learning. The
I. INTRODUCTION authors discussed the advancements in this field and
highlighted the potential of deep learning models in
Monocular depth estimation, the task of determining accurately estimating depth from single images.
depth information from a single 2D image, is a fundamental
problem in computer vision with wide-ranging applications Höllein et al. [2] introduced Text2Room, a method for
in fields such as autonomous driving, augmented reality, and extracting textured 3D meshes from 2D text-to-image
robotics. Traditionally, depth estimation relied on stereo models. While not directly related to depth estimation, this
vision or multiple camera setups, which can be cost- work highlighted the significance of 3D representation for
prohibitive and complex to implement. However, the advent scene understanding, which is closely tied to monocular depth
of deep learning has opened new avenues for solving this estimation.
problem using a single camera, making it more feasible for a
variety of applications. Deep learning models, particularly
convolutional neural networks (CNNs) and more recently,

IJISRT24APR1431 www.ijisrt.com 3096

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1431

Wang et al. [3] proposed a monocular 3D object A. Existing Systems

detection framework with depth from motion. The study Monocular depth estimation has been a subject of
demonstrated the potential of leveraging motion cues to extensive research over the years, with various approaches
improve depth estimation, indicating a direction for future developed to tackle the challenge of inferring depth from a
research in incorporating dynamic information for depth single 2D image. The existing systems can be broadly
inference. categorized into traditional methods and deep learning-based
methods.
Lian et al. [4] proposed MonoJSG, a joint semantic and
geometric cost volume for monocular 3D object detection.  Traditional Methods
This work highlighted the synergy between semantic and
geometric information in depth estimation, suggesting a  Structure from Motion (SfM):
multi-modal approach for enhancing depth prediction SfM techniques reconstruct 3D structures by analyzing
accuracy. the motion of objects across multiple frames of a video. By
tracking feature points across these frames, the relative
Sharma et al. [5] conducted a review of deep learning- motion between the camera and the objects can be used to
based human activity recognition on benchmark video estimate depth. While effective, these methods require
datasets. Although the focus was on activity recognition, the multiple images and are computationally intensive.
review shed light on the potential of leveraging temporal
information for depth estimation, offering a direction for  Shape from Shading (SfS):
future research in spatiotemporal modeling. SfS methods infer depth by analyzing the shading
patterns in an image, assuming known lighting conditions.
Samant et al. [6] presented a framework for deep These methods rely on the reflectance properties of surfaces
learning-based language models using multi-task learning in and often require complex optimization techniques to resolve
natural language understanding. While seemingly unrelated, ambiguities in depth perception.
this work provided insights into multi-task learning
paradigms, which could be adapted for jointly learning depth  Stereo Vision:
estimation along with related vision tasks. Stereo vision involves using two or more cameras to
capture different perspectives of the same scene. The
Chen et al. [7] discussed representation learning in disparity between the images is then used to compute depth.
multi-view clustering. The study emphasized the importance Although stereo vision can provide accurate depth estimates,
of holistic scene understanding through multi-view it necessitates precise camera calibration and
information, advocating for the integration of multi-view synchronization, increasing system complexity and cost.
cues in monocular depth estimation for comprehensive spatial
perception.  Deep Learning-Based Methods

III. PROBLEM STATEMENT  Convolutional Neural Networks (CNNs):

CNNs have been widely used for monocular depth
Accurately estimating depth from single 2D images estimation due to their ability to capture spatial hierarchies
using deep learning techniques is a fundamental challenge in and learn complex features. Pioneering works like Eigen et
computer vision with significant implications for various al.'s multi-scale deep network laid the foundation by
real-world applications. Traditional depth estimation predicting depth at multiple scales to capture both global and
methods, often reliant on stereo vision or multi-camera local features.
setups, pose inherent limitations in terms of complexity, cost,
and scalability. These constraints hinder the widespread  Encoder-Decoder Networks:
adoption of depth estimation technology in domains such as These networks, such as U-Net and Fully Convolutional
autonomous navigation, augmented reality, and robotics. Networks (FCNs), encode the input image into a latent
Addressing these challenges requires the development of a representation and then decode it to produce a dense depth
deep learning-based monocular depth estimation system map. They have shown significant improvements in depth
capable of achieving high accuracy, real-time performance, estimation accuracy.
and robustness across diverse environmental conditions.
Additionally, there is a pressing need for resource-efficient  Vision Transformers (ViTs):
models suitable for deployment on resource-constrained ViTs have been applied to depth estimation tasks,
platforms, such as embedded systems or mobile devices. demonstrating their ability to achieve competitive
Furthermore, ensuring the generalization of trained models to performance by capturing both local and global features in
unseen data and their adaptability to novel environments is the image.
critical for practical deployment in real-world scenarios.
 Hybrid CNN-Transformer Models:
These models combine the strengths of CNNs (for local
feature extraction) and transformers (for global context
modeling), resulting in robust depth estimation performance.

IJISRT24APR1431 www.ijisrt.com 3097

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1431

B. Proposed System The proposed methodology involves several key steps

The proposed system leverages deep learning-based to achieve real-time object distance inference using deep
object detection for real-time monocular depth estimation to learning-based monocular depth estimation. The steps are as
infer object distances using a single 2D image from a follows:
webcam. The system integrates the YOLO (You Only Look
Once) object detection model with a web-based interface,  Model Initialization:
enabling real-time monitoring and interaction. The key The system initializes the YOLOv8 model, which has
components of the system include the YOLOv8 model for been pre-trained on a large dataset to recognize a variety of
object detection, a live video capture module using OpenCV, objects. The YOLOv8 model is selected for its balance
and a Flask-based web application for displaying the between speed and accuracy, making it suitable for real-time
processed video feed. The proposed system is designed to applications.
provide accurate and efficient object distance inference,
addressing the limitations of traditional depth estimation  Video Capture:
methods that often require stereo vision or multiple cameras. The system uses Open CV to access the default webcam
By using a single monocular camera, the system simplifies (denoted as 0) and capture live video frames. The video
the hardware requirements and broadens the range of capture runs continuously, providing a real-time feed to the
potential applications, including surveillance, autonomous object detection pipeline.
navigation, and augmented reality.
 Object Detection:
IV. PROPOSED METHODOLOGY Each frame captured from the webcam is processed by
the YOLO model to detect objects. The model outputs
bounding boxes, class labels, and confidence scores for the
detected objects. This step leverages the YOLOv8 model's
capability to perform rapid and accurate object detection.

 Distance Estimation:
For each detected object, the system calculates an
approximate distance based on the size of the bounding box
relative to the frame dimensions. The width of the bounding
box is used as an inverse indicator of the distance to the
camera. The approximate distance is computed using a
heuristic approach:

apx_distance=(1-width/frame_width)2

This approach assumes that larger objects in the frame

are closer to the camera, providing a simple yet effective
means of distance estimation.

 Frame Annotation:
The detected objects are annotated on the video frame
with bounding boxes, class labels, confidence scores, and
estimated distances. This information is overlaid on the video
feed, enabling real-time visualization of the detected objects
and their distances.

 Web Application:
A Flask web application serves the annotated video feed
to users. The web interface allows users to start and stop the
webcam feed and view the list of detected objects along with
their estimated distances. This interface provides an
accessible and interactive means of monitoring the system's
output.

 Thread Synchronization:
Threading is employed to handle video capture and
object detection concurrently, ensuring that the web interface
remains responsive. A threading lock is used to synchronize
access to shared resources, such as the list of detected objects,
to prevent race conditions and ensure data consistency.
Fig 1 Flowchart

IJISRT24APR1431 www.ijisrt.com 3098

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1431

 REST API Endpoints:

The web application includes several REST API
endpoints:

 /video_feed: Streams the processed video feed with object

annotations.

 /toggle_webcam: Toggles the webcam feed on or off.

 /show_results: Returns a JSON response containing the

list of detected objects and their estimated distances.

These endpoints enable dynamic interaction with the

system, allowing users to control the webcam feed and access
detection results programmatically. Fig 4 Sample Output 3

V. RESULTS

The developed system was rigorously tested to evaluate

its performance in real-time object detection and distance
estimation using a monocular camera. The results
demonstrate the system's capability to accurately detect
objects and infer their distances, which are displayed through
a user-friendly web interface. The primary interface of the
system displays the live video feed from the webcam with
real-time annotations for detected objects. Each detected
object is enclosed in a bounding box, and relevant
information such as the object class and confidence score are
displayed. The approximate distance to each object,
Fig 5 Sample Output 4
calculated based on the size of the bounding box relative to
the frame dimensions, is also overlaid on the video feed. This
VI. CONCLUSION
setup provides immediate visual feedback on object detection
and distance estimation, making it useful for applications like
In this research, we have developed a deep learning-
surveillance and autonomous navigation.
based system for monocular depth estimation, enabling
accurate object distance inference from single 2D images.
The proposed system utilizes the YOLOv8 model for real-
time object detection, integrated with a robust methodology
for estimating distances based on bounding box dimensions.
By leveraging a single monocular camera, the system offers
a cost-effective and scalable solution suitable for various
applications, including autonomous navigation, augmented
reality, and surveillance. The experimental results
demonstrate the system's effectiveness in real-time scenarios,
showcasing its ability to detect multiple objects and
accurately estimate their distances. The user-friendly web
interface enhances accessibility and usability, providing clear
Fig 2 Sample Output 1 visual and textual feedback on detected objects and their
distances. This dual-mode presentation ensures that users can
easily interpret and utilize the information for practical
applications. Several key challenges were addressed in the
development of this system, including the need for high
accuracy, real-time performance, robustness across different
environments, and resource efficiency. The system's ability
to generalize well to diverse datasets and conditions
highlights its potential for deployment in a wide range of real-
world scenarios. Future work will focus on refining the
distance estimation algorithms, exploring the integration of
additional sensors to enhance accuracy, and optimizing the
system for deployment on mobile and embedded devices.
Additionally, extending the system to handle more complex
Fig 3 Sample Output 2

IJISRT24APR1431 www.ijisrt.com 3099

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1431

scenes and dynamic environments will be a critical area of

further research. In conclusion, this research advances the
state-of-the-art in monocular depth estimation, providing a
viable solution for real-time object distance inference from
2D images. The developed system has significant potential to
enhance various applications in computer vision, contributing
to the development of smarter, more responsive technologies
in numerous fields.

REFERENCES

[1]. Masoumian, Armin., Rashwan, Hatem A.., Cristiano,

Julián., Asif, M. Salman., & Puig, D.. (2022).
Monocular Depth Estimation Using Deep Learning:
A Review. Sensors (Basel, Switzerland), 22.
http://doi.org/10.3390/s22145353
[2]. Höllein, Lukas., Cao, Ang., Owens, Andrew.,
Johnson, Justin., & Nießner, M.. (2023). Text2Room:
Extracting Textured 3D Meshes from 2D Text-to-
Image Models. 2023 IEEE/CVF International
Conference on Computer Vision (ICCV), 7875-7886.
http://doi.org/10.1109/ICCV51070.2023.00727
[3]. Wang, Tai., Pang, Jiangmiao., & Lin, Dahua. (2022).
Monocular 3D Object Detection with Depth from
Motion. ArXiv, abs/2207.12988. http://doi.org/
10.48550/arXiv.2207.12988
[4]. Lian, Qing., Li, Peiliang., & Chen, Xiaozhi. (2022).
MonoJSG: Joint Semantic and Geometric Cost
Volume for Monocular 3D Object Detection. 2022
IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 1060-1069.
http://doi.org/10.1109/CVPR52688.2022.00114
[5]. Sharma, Vijeta., Gupta, Manjari., Pandey, A.., Mishra,
Deepti., & Kumar, Ajai. (2022). A Review of Deep
Learning-based Human Activity Recognition on
Benchmark Video Datasets. Applied Artificial
Intelligence, 36. http://doi.org/10.1080/08839514.
2022.2093705
[6]. Samant, R.., Bachute, M.., Gite, Shilpa., & Kotecha,
K.. (2022). Framework for Deep Learning-Based
Language Models Using Multi-Task Learning in
Natural Language Understanding: A Systematic
Literature Review and Future Directions. IEEE
Access, 10, 17078-17097. http://doi.org/10.1109/
ACCESS.2022.3149798
[7]. Chen, Mansheng., Lin, Jia-Qi., Li, Xiang-Long., Liu,
Bao-Yu., Wang, Changdong., Huang, Dong., & Lai,
J.. (2022). Representation Learning in Multi-view
Clustering: A Literature Review. Data Science and
Engineering, 7, 225-241. http://doi.org/10.1007/s
41019-022-00190-8

IJISRT24APR1431 www.ijisrt.com 3100

Crochet Axolotl Pattern
100% (8)
Crochet Axolotl Pattern
9 pages
As NZS 3679.1 (2016)
100% (1)
As NZS 3679.1 (2016)
57 pages
Test Bank For Introductory Statistics 9th Edition Prem S Mann
100% (56)
Test Bank For Introductory Statistics 9th Edition Prem S Mann
9 pages
Montvalley Short-Haul Lines - Case Study
No ratings yet
Montvalley Short-Haul Lines - Case Study
6 pages
Monocular Depth Estimation Based On Deep Learning An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning An Overview
16 pages
Image caption Generation Research Paper-
No ratings yet
Image caption Generation Research Paper-
8 pages
A Survey On Image Denoising Techniques
No ratings yet
A Survey On Image Denoising Techniques
2 pages
3 - Deep Learning For Vision-Based
No ratings yet
3 - Deep Learning For Vision-Based
10 pages
Review of deep learning methods for remote sensing satellite images classification experimental survey and comparative analysis
No ratings yet
Review of deep learning methods for remote sensing satellite images classification experimental survey and comparative analysis
24 pages
TETC 1 Deep_Learning_for_Visual_Localization_and_Mapping_A_Survey
No ratings yet
TETC 1 Deep_Learning_for_Visual_Localization_and_Mapping_A_Survey
21 pages
Automatic Image Captioning Combining Natural Language Processing and
No ratings yet
Automatic Image Captioning Combining Natural Language Processing and
14 pages
Feart 11 1288003
No ratings yet
Feart 11 1288003
18 pages
Monocular Depth Estimation Based On Deep Learning: An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning: An Overview
14 pages
JCTN Avinash Rohini 417 425
No ratings yet
JCTN Avinash Rohini 417 425
10 pages
Visualsentimentanalysis - Deeplearning - Applsci 12 01030 With Cover
No ratings yet
Visualsentimentanalysis - Deeplearning - Applsci 12 01030 With Cover
24 pages
Computer Vision 3
No ratings yet
Computer Vision 3
38 pages
Deep Feature Learning and Classification of Remote Sensing Images
No ratings yet
Deep Feature Learning and Classification of Remote Sensing Images
19 pages
Group 09
No ratings yet
Group 09
9 pages
Deep Learning in Object Detection: A Review: August 2020
No ratings yet
Deep Learning in Object Detection: A Review: August 2020
12 pages
Deep Residual Learning for Image Recognition 2
No ratings yet
Deep Residual Learning for Image Recognition 2
26 pages
Displays: Shaohua Qi, Xin Ning, Guowei Yang, Liping Zhang, Peng Long, Weiwei Cai, Weijun Li
No ratings yet
Displays: Shaohua Qi, Xin Ning, Guowei Yang, Liping Zhang, Peng Long, Weiwei Cai, Weijun Li
12 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
10 1016@j Isprsjprs 2019 04 016
No ratings yet
10 1016@j Isprsjprs 2019 04 016
19 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
A Survey of Multi-View Representation Learning
No ratings yet
A Survey of Multi-View Representation Learning
21 pages
2 - Multi Modality Medical Image Fusion Technique Using Multi Objective Differential Evolution Based Deep Neural Networks
No ratings yet
2 - Multi Modality Medical Image Fusion Technique Using Multi Objective Differential Evolution Based Deep Neural Networks
11 pages
Symmetry 14 02657
No ratings yet
Symmetry 14 02657
14 pages
Image Caption Generation Research Paper
No ratings yet
Image Caption Generation Research Paper
9 pages
Zhong Et Al. - 2017 - Learning To Diversify Deep Belief Networks For Hyperspectral Image Classification
No ratings yet
Zhong Et Al. - 2017 - Learning To Diversify Deep Belief Networks For Hyperspectral Image Classification
15 pages
SegContrast 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination
No ratings yet
SegContrast 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination
8 pages
A Deep Learning Paradigm For Medical Imagin - 2024 - Expert Systems With Applica
No ratings yet
A Deep Learning Paradigm For Medical Imagin - 2024 - Expert Systems With Applica
9 pages
1 s2.0 S0893608021002707 Main
No ratings yet
1 s2.0 S0893608021002707 Main
13 pages
Research Progress On Few-Shot Learning For Remote Sensing Image Interpretation
No ratings yet
Research Progress On Few-Shot Learning For Remote Sensing Image Interpretation
16 pages
IJRAR1DUP001
No ratings yet
IJRAR1DUP001
3 pages
A_Survey_of_Deep_Learning-Based_Object_Detection_M
No ratings yet
A_Survey_of_Deep_Learning-Based_Object_Detection_M
17 pages
Zhang 2018
No ratings yet
Zhang 2018
12 pages
Unet Segmentation Related Works Fin
No ratings yet
Unet Segmentation Related Works Fin
5 pages
Research Article
No ratings yet
Research Article
13 pages
23PD3 Ieee Paper
No ratings yet
23PD3 Ieee Paper
6 pages
smtggg
No ratings yet
smtggg
24 pages
Remote Sensing Urban Green Space Layout and Site Selection Based On Lightweight Expansion Convolutional Method
No ratings yet
Remote Sensing Urban Green Space Layout and Site Selection Based On Lightweight Expansion Convolutional Method
13 pages
Project List
No ratings yet
Project List
20 pages
gong2016
No ratings yet
gong2016
14 pages
Remote-Sensing_Image_Segmentation_Based_on_Implicit_3-D_Scene_Representation
No ratings yet
Remote-Sensing_Image_Segmentation_Based_on_Implicit_3-D_Scene_Representation
5 pages
Attention_and_Feature_Fusion_SSD_for_Remote_Sensing_Object_Detection
No ratings yet
Attention_and_Feature_Fusion_SSD_for_Remote_Sensing_Object_Detection
9 pages
1-s2.0-S0020025523003894-main
No ratings yet
1-s2.0-S0020025523003894-main
14 pages
A_Review_on_Classifications_of_Tracking_Systems_in
No ratings yet
A_Review_on_Classifications_of_Tracking_Systems_in
14 pages
Efficient Hybrid Tree-Based Stereo Matching With Applications To Postcapture Image Refocusing
No ratings yet
Efficient Hybrid Tree-Based Stereo Matching With Applications To Postcapture Image Refocusing
15 pages
Environmental Exploration and Monitoring of Vegetation Cover Using Deep Convolutional Neural Network in Gombe State
No ratings yet
Environmental Exploration and Monitoring of Vegetation Cover Using Deep Convolutional Neural Network in Gombe State
8 pages
3D CNN For Human Action Recognition: Conference Paper
No ratings yet
3D CNN For Human Action Recognition: Conference Paper
7 pages
Chen 2016
No ratings yet
Chen 2016
20 pages
Key paper_Robust Feature Matching for Remote Sensing Image Registration via Linear Adaptive Filtering
No ratings yet
Key paper_Robust Feature Matching for Remote Sensing Image Registration via Linear Adaptive Filtering
15 pages
DW & Caption Generator - Paper 1
No ratings yet
DW & Caption Generator - Paper 1
6 pages
Deepgaze Iii: Modeling Free-Viewing Human Scanpaths With Deep Learning
No ratings yet
Deepgaze Iii: Modeling Free-Viewing Human Scanpaths With Deep Learning
27 pages
Object Detection With DL
No ratings yet
Object Detection With DL
17 pages
Ijrpr Paper Templatev1
No ratings yet
Ijrpr Paper Templatev1
17 pages
SPIN-2021 Paper 610
No ratings yet
SPIN-2021 Paper 610
8 pages
Remotesensing 15 01187 v2
No ratings yet
Remotesensing 15 01187 v2
21 pages
High-Resolution Remote Sensing Image Captioning Based On Structured Attention
No ratings yet
High-Resolution Remote Sensing Image Captioning Based On Structured Attention
14 pages
fnins-18-1448517
No ratings yet
fnins-18-1448517
2 pages
Depth Estimation From A Single Image Using Deep Learned Phase Coded Mask
No ratings yet
Depth Estimation From A Single Image Using Deep Learned Phase Coded Mask
12 pages
Convolutional Neural Networks For Image Classification
No ratings yet
Convolutional Neural Networks For Image Classification
5 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
UML Modeling and Full-Stack Implementation of a Teleconsultation Platform with Real-Time Management of Patients and Medical Procedures
No ratings yet
UML Modeling and Full-Stack Implementation of a Teleconsultation Platform with Real-Time Management of Patients and Medical Procedures
13 pages
Comparative Study of Practical and Theoretical Approach in Teaching and Learning of Chemistry A Case Study of Government Technical College Kano
No ratings yet
Comparative Study of Practical and Theoretical Approach in Teaching and Learning of Chemistry A Case Study of Government Technical College Kano
6 pages
The Evolution of Luxury Tourism in Thailand: Trends and Consumer Behavior in the Hotel Industry by 2030
No ratings yet
The Evolution of Luxury Tourism in Thailand: Trends and Consumer Behavior in the Hotel Industry by 2030
12 pages
Investment Feasibility of Hydroponic Farming: Analysing the Return on Investment (ROI) Compared to Traditional Farming
No ratings yet
Investment Feasibility of Hydroponic Farming: Analysing the Return on Investment (ROI) Compared to Traditional Farming
4 pages
Recognizing and Addressing Mental Health Comorbidities in Hypertension Care Strategies: A Narrative Review
No ratings yet
Recognizing and Addressing Mental Health Comorbidities in Hypertension Care Strategies: A Narrative Review
9 pages
Mathematical Optimization of Vertical Farm Locations Advancing Sustainable Agriculture in Saudi Arabia
No ratings yet
Mathematical Optimization of Vertical Farm Locations Advancing Sustainable Agriculture in Saudi Arabia
26 pages
Exploratory Data Analysis for Banking
No ratings yet
Exploratory Data Analysis for Banking
5 pages
Computer-Assisted Lung Cancer Diagnosis through Morphological Analysis & CNN
No ratings yet
Computer-Assisted Lung Cancer Diagnosis through Morphological Analysis & CNN
7 pages
Machine Learning-Enhanced Models in Brain Tumors: A Mathematical and Computational Perspective
No ratings yet
Machine Learning-Enhanced Models in Brain Tumors: A Mathematical and Computational Perspective
4 pages
Conceptual Model on the Effect of Axial Load on Shallow Isolated Footings Resting on Clay Soil
No ratings yet
Conceptual Model on the Effect of Axial Load on Shallow Isolated Footings Resting on Clay Soil
6 pages
A Effectiveness of Multi-Intervention Programme Combining Benson's Relaxation Therapy and Counseling on Perceived Stress among Stroke Victims
No ratings yet
A Effectiveness of Multi-Intervention Programme Combining Benson's Relaxation Therapy and Counseling on Perceived Stress among Stroke Victims
7 pages
Predicting Employee Attrition using Machine Learning Techniques
No ratings yet
Predicting Employee Attrition using Machine Learning Techniques
10 pages
Corporate Social Responsibility as a Strategic Tool for Organisaeional Success in Corpoarate Financial Intermediation: Empirical Evidence from Rivers State, Nigeria
No ratings yet
Corporate Social Responsibility as a Strategic Tool for Organisaeional Success in Corpoarate Financial Intermediation: Empirical Evidence from Rivers State, Nigeria
9 pages
Assessment Tools and Gap Analysis on the Competencies Covered in Mathematics in Tupi Secondary High School
No ratings yet
Assessment Tools and Gap Analysis on the Competencies Covered in Mathematics in Tupi Secondary High School
12 pages
AI-Powered Inventory Management System: Revolutionizing Stock Monitoring with Real-Time Alerts & Visual Recognition
No ratings yet
AI-Powered Inventory Management System: Revolutionizing Stock Monitoring with Real-Time Alerts & Visual Recognition
12 pages
Real-Time Sign Language to Speech Translation using Convolutional Neural Networks and Gesture Recognition
No ratings yet
Real-Time Sign Language to Speech Translation using Convolutional Neural Networks and Gesture Recognition
5 pages
Learning-Based Intrusion Detection and Prevention System (LIDPS)
No ratings yet
Learning-Based Intrusion Detection and Prevention System (LIDPS)
10 pages
Phacoemulsification vs. Manual SICS: Which Poses a Higher Risk for Postoperative Dry Eye?
No ratings yet
Phacoemulsification vs. Manual SICS: Which Poses a Higher Risk for Postoperative Dry Eye?
5 pages
Comparative Study of Formulated Herbal Lozenges and AYURTUSS Lozenges
No ratings yet
Comparative Study of Formulated Herbal Lozenges and AYURTUSS Lozenges
6 pages
Exploring The Skin Lightening Potential of PADMAKA (Prunus cerasoides) In A Novel Face Serum
No ratings yet
Exploring The Skin Lightening Potential of PADMAKA (Prunus cerasoides) In A Novel Face Serum
8 pages
Mechanical Performance and Durability Evaluation of Self-Healing Polymers
No ratings yet
Mechanical Performance and Durability Evaluation of Self-Healing Polymers
5 pages
AI-Powered Local Crime Prediction
No ratings yet
AI-Powered Local Crime Prediction
6 pages
Design and Economic Analysis of Boil-Off Gas Recovery in LNG Facilities
No ratings yet
Design and Economic Analysis of Boil-Off Gas Recovery in LNG Facilities
11 pages
An EOQ Model for Deteriorating Item with Preservation Technology, Linear Holding Cost, and Multi-Variate Demand
No ratings yet
An EOQ Model for Deteriorating Item with Preservation Technology, Linear Holding Cost, and Multi-Variate Demand
6 pages
Extraction of Cu(II) Ions Using Chloroform Solution of 4,4 ́-(1E,1E ́)-1,1 ́-(Ethane-1,2- Diylbis(Azan-1-YL- 1ylidene))BIS(5-Methyl-2- Phenyl-2,3-Dihydro-1H-Pyrazol-3-OL) (H2BuEtP) Under the Influence of Acids, Anions and Complexing Agents
No ratings yet
Extraction of Cu(II) Ions Using Chloroform Solution of 4,4 ́-(1E,1E ́)-1,1 ́-(Ethane-1,2- Diylbis(Azan-1-YL- 1ylidene))BIS(5-Methyl-2- Phenyl-2,3-Dihydro-1H-Pyrazol-3-OL) (H2BuEtP) Under the Influence of Acids, Anions and Complexing Agents
10 pages
Impact of Nurse-Patient Ratios on Patient Outcomes in Acute Care Settings in Mogadishu, Somalia
No ratings yet
Impact of Nurse-Patient Ratios on Patient Outcomes in Acute Care Settings in Mogadishu, Somalia
7 pages
Cardio-Eye Connection: Retinal Eye Imaging for Heart Attack Risk Prediction
No ratings yet
Cardio-Eye Connection: Retinal Eye Imaging for Heart Attack Risk Prediction
6 pages
Evaluating The Impact of Partially Replacing Cement with Rice Husk Ash and Metakaolin on the Rheological Behavior and Mechanical Strength of Self-Compacting Concrete
No ratings yet
Evaluating The Impact of Partially Replacing Cement with Rice Husk Ash and Metakaolin on the Rheological Behavior and Mechanical Strength of Self-Compacting Concrete
19 pages
Promoting Sustainable Development through Waste Recycling: A Case Study of Green Entrepreneurship in Bo City, Sierra Leone
No ratings yet
Promoting Sustainable Development through Waste Recycling: A Case Study of Green Entrepreneurship in Bo City, Sierra Leone
11 pages
Machine Learning Approaches to Classification of Online Users by Exploiting Information Seeking Behaviours
No ratings yet
Machine Learning Approaches to Classification of Online Users by Exploiting Information Seeking Behaviours
6 pages
Besard-Ballet Violao Solo
No ratings yet
Besard-Ballet Violao Solo
9 pages
Tugas Uji Tarik Fix
No ratings yet
Tugas Uji Tarik Fix
255 pages
IBADAN MEDICAL FITNESS
No ratings yet
IBADAN MEDICAL FITNESS
1 page
feasib reviewer
No ratings yet
feasib reviewer
8 pages
ICOI Fellowship App 0418 RDR
No ratings yet
ICOI Fellowship App 0418 RDR
4 pages
MarcoCerri FloodsimulationusingHEC-RASmodel
No ratings yet
MarcoCerri FloodsimulationusingHEC-RASmodel
72 pages
The Capital Gate
No ratings yet
The Capital Gate
17 pages
Genetically Modified Organisms
No ratings yet
Genetically Modified Organisms
6 pages
A Study of Olmec Iconography
100% (5)
A Study of Olmec Iconography
98 pages
IB Biology Topic 5.1 Evidence For Evolution Flashcards - Quizlet
No ratings yet
IB Biology Topic 5.1 Evidence For Evolution Flashcards - Quizlet
6 pages
Off Grid Solar Systems
No ratings yet
Off Grid Solar Systems
11 pages
The PMO Provides Guidance and Structure To The Execution of Projects
No ratings yet
The PMO Provides Guidance and Structure To The Execution of Projects
2 pages
Cultures and disasters understanding cultural framings in disaster risk reduction Fred KrüGer - The ebook in PDF format is ready for immediate access
100% (1)
Cultures and disasters understanding cultural framings in disaster risk reduction Fred KrüGer - The ebook in PDF format is ready for immediate access
68 pages
Tos 2022-2023 Final G-7
No ratings yet
Tos 2022-2023 Final G-7
2 pages
What Tuberculosis Did For Modernism The Influence of A Curative Environment On Modernist Design and Ar
No ratings yet
What Tuberculosis Did For Modernism The Influence of A Curative Environment On Modernist Design and Ar
26 pages
Copy Control Delivery To Bill
No ratings yet
Copy Control Delivery To Bill
3 pages
Book 1
No ratings yet
Book 1
63 pages
Brand Guide Format
No ratings yet
Brand Guide Format
36 pages
2009 (DUHS-JSMU) Question Keys Research
No ratings yet
2009 (DUHS-JSMU) Question Keys Research
5 pages
Charger 2006
No ratings yet
Charger 2006
6 pages
The Partnership For Market Readiness: Shaping The Next Generation of Carbon Markets
No ratings yet
The Partnership For Market Readiness: Shaping The Next Generation of Carbon Markets
4 pages
index
No ratings yet
index
6 pages
Huawei Certification Written Exam Guide
No ratings yet
Huawei Certification Written Exam Guide
19 pages
Assignment 1 HRM
No ratings yet
Assignment 1 HRM
8 pages
Exercise 13-Chi Square-Assoc Group 6
No ratings yet
Exercise 13-Chi Square-Assoc Group 6
4 pages
PHILO Summative Module 3
No ratings yet
PHILO Summative Module 3
2 pages

Deep Learning Based Monocular Depth Estimation For Object Distance Inference in 2D Images

Uploaded by

Deep Learning Based Monocular Depth Estimation For Object Distance Inference in 2D Images

Uploaded by

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1431

Deep Learning Based Monocular

Abstract:- Monocular depth estimation, a process of transformer-based architectures, have demonstrated

Keywords:- Monocular Depth Estimation, Deep Learning, II. LITERATURE SURVEY

IJISRT24APR1431 www.ijisrt.com 3096

Wang et al. [3] proposed a monocular 3D object A. Existing Systems

III. PROBLEM STATEMENT  Convolutional Neural Networks (CNNs):

IJISRT24APR1431 www.ijisrt.com 3097

B. Proposed System The proposed methodology involves several key steps

This approach assumes that larger objects in the frame

IJISRT24APR1431 www.ijisrt.com 3098

 REST API Endpoints:

 /video_feed: Streams the processed video feed with object

 /toggle_webcam: Toggles the webcam feed on or off.

 /show_results: Returns a JSON response containing the

These endpoints enable dynamic interaction with the

The developed system was rigorously tested to evaluate

IJISRT24APR1431 www.ijisrt.com 3099

scenes and dynamic environments will be a critical area of

[1]. Masoumian, Armin., Rashwan, Hatem A.., Cristiano,

IJISRT24APR1431 www.ijisrt.com 3100

You might also like