VisionSense - Real - Time - Object - Detection Report
VisionSense - Real - Time - Object - Detection Report
Project Report
Course Code: CSE278
VisionSense: Real-Time Object Recognition on
Android
By
Student Name: Bondhon Das
ID: 20CSE016
Session: 2020-2021
(This is report is submitted in the fulfilment of the requirement for the project
of “ Second Year Second Semester” in Computer Science and Engineering.)
By
Bondhon Das
ID: 20CSE016
(Second Year Second Semster)
Session : 2020-2021
Supervised By
1 Introduction 2
1.1 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Methodology 2
2.1 Camera Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 TensorFlow Lite Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Real-Time Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Implementation 4
3.1 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Camera Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Model Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 Real-Time Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.5 Annotation Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.6 Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Results 4
4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Application Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Conclusion 5
6 References 6
1
Abstract
The project ”VisionSense: Real-Time Object Recognition on Android” leverages deep learning
and mobile computing to enable real-time object detection on smartphones. It integrates the
efficient and accurate SsdMobilenetV1 model using TensorFlow Lite, ensuring real-time perfor-
mance without compromising accuracy. The user-friendly interface provides instant feedback
on detected objects, enhancing user interaction. The project’s practical applicability spans
augmented reality, image recognition, and security systems, offering innovative solutions to
real-world challenges. By democratizing advanced computer vision capabilities on Android
smartphones, the project empowers users and opens new avenues for intelligent applications.
In summary, ”VisionSense” signifies a significant leap in mobile computing and AI, redefining
object recognition possibilities on Android with its efficiency, accuracy, and practicality.
1 Introduction
This is the introduction section.
1.3 Scope
In recent years, the field of computer vision has witnessed significant advancements, particularly
in the domain of object recognition. With the proliferation of smartphones equipped with high-
performance processors, cameras, and machine learning frameworks, there is immense potential
to bring these capabilities to handheld devices. VisionSense seeks to harness this potential by
developing a real-time object recognition system for Android smartphones. By enabling users
to detect and classify objects in real-time, VisionSense opens up new possibilities for intelligent
applications across diverse domains.
2 Methodology
The core methodology of VisionSense involves several key components:
2
2.2 TensorFlow Lite Model
The SsdMobilenetV1 model is loaded into the application using TensorFlow Lite, enabling
efficient inference on mobile devices. TensorFlow Lite is specially optimized for on-device
machine learning (Edge ML). As an Edge ML model, it is suitable for deployment to resource-
constrained edge devices. Edge intelligence, the ability to move deep learning tasks (object
detection, image recognition, etc.)
2.4 Visualization
Detected objects are visually represented on the live video feed using bounding boxes and
corresponding labels.
3
2.5 Real-Time Rendering
The processed video stream with overlaid annotations is rendered in real-time on the device’s
screen
3 Implementation
The implementation section provides an overview of the steps involved in developing the Vi-
sionSense application.
3.1 Permissions
The application requests camera permissions from the user to access the device’s camera
hardware. The Android framework supports capturing images and video through the an-
droid.hardware.camera2 API or camera Intent.
3.6 Display
The annotated video stream is displayed on the device’s screen using a TextureView and Im-
ageView combination.
4 Results
VisionSense successfully achieves real-time object recognition on Android devices, providing
users with an intuitive interface for identifying objects in their environment. The application
demonstrates high performance and accuracy in detecting various objects, making it suitable
for a wide range of practical applications. Here are some screenshots of this application:
4
4.1 Performance
In terms of performance, VisionSense outperforms existing object recognition applications on
Android. Through efficient implementation and optimization techniques, VisionSense achieves
real-time object detection with minimal latency. The application utilizes the device’s hardware
resources effectively, ensuring smooth and responsive user experience even on lower-end devices.
4.2 Accuracy
The accuracy of VisionSense’s object recognition capabilities is commendable. Leveraging state-
of-the-art machine learning models, VisionSense consistently achieves high detection accuracy
across various object categories. The application’s ability to accurately identify objects in
diverse environments enhances its utility for users across different use cases.
(a) Detect Bicycle (b) Detect Book (c) Detect Laptop and KeyBoard
5 Conclusion
In conclusion, ”VisionSense” represents a significant advancement in the realm of mobile-based
real-time object recognition, demonstrating the fusion of cutting-edge machine learning tech-
niques with the convenience and ubiquity of Android devices. By harnessing the power of
TensorFlow Lite and deploying the SsdMobilenetV1 model directly on mobile hardware, Vi-
sionSense showcases the practicality and efficiency of on-device AI inference. This approach
not only reduces reliance on cloud-based processing but also enhances user privacy by keeping
data localized.
The successful implementation of VisionSense underscores its potential to revolutionize var-
ious domains, including accessibility, augmented reality, and computer vision. In the realm of
accessibility, VisionSense has the capacity to empower visually impaired individuals by pro-
viding them with instant object recognition capabilities, thereby enhancing their independence
and quality of life. Moreover, in augmented reality applications, VisionSense can serve as a
5
cornerstone for creating immersive experiences that seamlessly integrate virtual and real-world
elements, opening up new avenues for entertainment, education, and commerce.
Furthermore, VisionSense holds immense promise in the field of computer vision, offering a
powerful tool for tasks such as object tracking, scene understanding, and automated content
tagging. Its ability to perform real-time inference directly on Android devices enables applica-
tions to respond swiftly to dynamic environments, making it suitable for a wide range of use
cases, from industrial automation to interactive gaming.
6 References
• https://developer.android.com/media/camera/camera2
• https://www.tensorflow.org/lite/android/tutorials/object detection