0% found this document useful (0 votes)
8 views9 pages

VisionSense - Real - Time - Object - Detection Report

The project report details 'VisionSense: Real-Time Object Recognition on Android', developed by Bondhon Das at Bangabandhu Sheikh Mujibur Rahman Science and Technology University. It utilizes TensorFlow Lite and the SsdMobilenetV1 model for efficient real-time object detection on smartphones, demonstrating high accuracy and performance. The application aims to enhance user interaction and has potential applications in augmented reality, image recognition, and security systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

VisionSense - Real - Time - Object - Detection Report

The project report details 'VisionSense: Real-Time Object Recognition on Android', developed by Bondhon Das at Bangabandhu Sheikh Mujibur Rahman Science and Technology University. It utilizes TensorFlow Lite and the SsdMobilenetV1 model for efficient real-time object detection on smartphones, demonstrating high accuracy and performance. The application aims to enhance user interaction and has potential applications in augmented reality, image recognition, and security systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Bangabandhu Sheikh Mujibur Rahman

Science and Technology University,


Gopalganj

Project Report
Course Code: CSE278
VisionSense: Real-Time Object Recognition on
Android
By
Student Name: Bondhon Das
ID: 20CSE016
Session: 2020-2021

Department of Computer Science and Engineering


Bangabandhu Sheikh Mujibur Rahman Science and
Technology University
VisionSense: Real-Time Object Recognition on Android

(This is report is submitted in the fulfilment of the requirement for the project
of “ Second Year Second Semester” in Computer Science and Engineering.)

By

Bondhon Das
ID: 20CSE016
(Second Year Second Semster)
Session : 2020-2021

Supervised By

Abu Bakar Muhammad Abdullah


Assistant Professor

Department of Computer Science and Engineering


Bangabandhu Sheikh Mujibur Rahman Science and Technology
University
Declaration

The project work entitled “VisionSense: Real-Time Object Recognition on An-


droid” has been carried out in the Department of Computer Science and Engineering, Banga-
bandhu Sheikh Mujibur Rahman Science and Technology University is original and conforms
the regulations of this University.
I understand the University’s policy on plagiarism and declare that no part of this project has
been copied from other sources or been previously submitted elsewhere for the award of any
degree or diploma.

Signature of the Candidate Signature of the Supervisor


Bondhon Das Abu Bakar Muhammad Abdullah
ID: 20CSE016 Assistant Professor
Date: 12.05.2024
Contents
Abstract 2

1 Introduction 2
1.1 Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Methodology 2
2.1 Camera Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 TensorFlow Lite Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.5 Real-Time Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Implementation 4
3.1 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Camera Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Model Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 Real-Time Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.5 Annotation Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.6 Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Results 4
4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Application Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Conclusion 5

6 References 6

1
Abstract
The project ”VisionSense: Real-Time Object Recognition on Android” leverages deep learning
and mobile computing to enable real-time object detection on smartphones. It integrates the
efficient and accurate SsdMobilenetV1 model using TensorFlow Lite, ensuring real-time perfor-
mance without compromising accuracy. The user-friendly interface provides instant feedback
on detected objects, enhancing user interaction. The project’s practical applicability spans
augmented reality, image recognition, and security systems, offering innovative solutions to
real-world challenges. By democratizing advanced computer vision capabilities on Android
smartphones, the project empowers users and opens new avenues for intelligent applications.
In summary, ”VisionSense” signifies a significant leap in mobile computing and AI, redefining
object recognition possibilities on Android with its efficiency, accuracy, and practicality.

1 Introduction
This is the introduction section.

1.1 Project Objectives


The VisionSense project aims to develop a sophisticated Android application capable of real-
time object recognition using the device’s camera. Leveraging machine learning algorithms,
specifically TensorFlow Lite, the application can detect objects in real-time, annotate them
with bounding boxes and labels, and display the results to the user.

1.2 Project Overview


This report provides a detailed overview of the VisionSense project, including its objectives,
architecture, functionality, implementation details, challenges faced, and future prospects.

1.3 Scope
In recent years, the field of computer vision has witnessed significant advancements, particularly
in the domain of object recognition. With the proliferation of smartphones equipped with high-
performance processors, cameras, and machine learning frameworks, there is immense potential
to bring these capabilities to handheld devices. VisionSense seeks to harness this potential by
developing a real-time object recognition system for Android smartphones. By enabling users
to detect and classify objects in real-time, VisionSense opens up new possibilities for intelligent
applications across diverse domains.

2 Methodology
The core methodology of VisionSense involves several key components:

2.1 Camera Integration


VisionSense integrates with the Android camera API to capture live video frames from the
device’s camera. The Camera2 API provides ways to query for available extensions, configure
an extension camera session, and communicate with the Camera Extensions OEM library. This
allows your application to use extensions like Night, HDR, Auto, Bokeh, or Face Retouch.

2
2.2 TensorFlow Lite Model
The SsdMobilenetV1 model is loaded into the application using TensorFlow Lite, enabling
efficient inference on mobile devices. TensorFlow Lite is specially optimized for on-device
machine learning (Edge ML). As an Edge ML model, it is suitable for deployment to resource-
constrained edge devices. Edge intelligence, the ability to move deep learning tasks (object
detection, image recognition, etc.)

2.3 Object Detection


Each frame captured from the camera is processed using the loaded model to detect objects
within the scene. Given an image or a video stream, an object detection model can identify
which of a known set of objects might be present and provide information about their positions
within the image.
Standard version of SSD MobileNet model can detect 90 objects, and we can use these existing
models inside Android for our custom use cases and build smart Mobile Applications. Some of
the objects which these pretrained models can detect are ←
1. Person 21. Elephant 41. Wine glass 61. Dining table
2. Bicycle 22. Bear 42. Cup 62. Toilet
3. Car 23. Zebra 43. Fork 63. TV
4. Motorcycle 24. Giraffe 44. Knife 64. Laptop
5. Airplane 25. Backpack 45. Spoon 65. Mouse
6. Bus 26. Umbrella 46. Bowl 66. Remote
7. Train 27. Handbag 47. Banana 67. Keyboard
8. Truck 28. Tie 48. Apple 68. Cell phone
9. Boat 29. Suitcase 49. Sandwich 69. Microwave
10. Traffic light 30. Frisbee 50. Orange 70. Oven
11. Fire hydrant 31. Skis 51. Broccoli 71. Toaster
12. Stop sign 32. Snowboard 52. Carrot 72. Sink
13. Parking meter 33. Sports ball 53. Hot dog 73. Refrigerator
14. Bench 34. Kite 54. Pizza 74. Book
15. Bird 35. Baseball bat 55. Donut 75. Clock
16. Cat 36. Baseball glove 56. Cake 76. Vase
17. Dog 37. Skateboard 57. Chair 77. Scissors
18. Horse 38. Surfboard 58. Couch 78. Teddy bear
19. Sheep 39. Tennis racket 59. Potted plant 79. Hair drier
20. Cow 40. Bottle 60. Bed 80. Toothbrush

2.4 Visualization
Detected objects are visually represented on the live video feed using bounding boxes and
corresponding labels.

3
2.5 Real-Time Rendering
The processed video stream with overlaid annotations is rendered in real-time on the device’s
screen

3 Implementation
The implementation section provides an overview of the steps involved in developing the Vi-
sionSense application.

3.1 Permissions
The application requests camera permissions from the user to access the device’s camera
hardware. The Android framework supports capturing images and video through the an-
droid.hardware.camera2 API or camera Intent.

3.2 Camera Initialization


Upon permission approval, VisionSense initializes the camera hardware and sets up a camera
capture session.

3.3 Model Loading


The SsdMobilenetV1 model and label file are loaded into memory using TensorFlow Lite.

3.4 Real-Time Inference


As each frame becomes available from the camera, VisionSense performs object detection in-
ference using the loaded model. Real-time inference refers to the process of running predictions
or making decisions using a machine learning model with minimal delay, typically within mil-
liseconds or microseconds.

3.5 Annotation Overlay


Detected objects are annotated with bounding boxes and labels, which are overlaid onto the
live video feed.

3.6 Display
The annotated video stream is displayed on the device’s screen using a TextureView and Im-
ageView combination.

4 Results
VisionSense successfully achieves real-time object recognition on Android devices, providing
users with an intuitive interface for identifying objects in their environment. The application
demonstrates high performance and accuracy in detecting various objects, making it suitable
for a wide range of practical applications. Here are some screenshots of this application:

4
4.1 Performance
In terms of performance, VisionSense outperforms existing object recognition applications on
Android. Through efficient implementation and optimization techniques, VisionSense achieves
real-time object detection with minimal latency. The application utilizes the device’s hardware
resources effectively, ensuring smooth and responsive user experience even on lower-end devices.

4.2 Accuracy
The accuracy of VisionSense’s object recognition capabilities is commendable. Leveraging state-
of-the-art machine learning models, VisionSense consistently achieves high detection accuracy
across various object categories. The application’s ability to accurately identify objects in
diverse environments enhances its utility for users across different use cases.

4.3 Application Screenshots

(a) Detect Bicycle (b) Detect Book (c) Detect Laptop and KeyBoard

5 Conclusion
In conclusion, ”VisionSense” represents a significant advancement in the realm of mobile-based
real-time object recognition, demonstrating the fusion of cutting-edge machine learning tech-
niques with the convenience and ubiquity of Android devices. By harnessing the power of
TensorFlow Lite and deploying the SsdMobilenetV1 model directly on mobile hardware, Vi-
sionSense showcases the practicality and efficiency of on-device AI inference. This approach
not only reduces reliance on cloud-based processing but also enhances user privacy by keeping
data localized.
The successful implementation of VisionSense underscores its potential to revolutionize var-
ious domains, including accessibility, augmented reality, and computer vision. In the realm of
accessibility, VisionSense has the capacity to empower visually impaired individuals by pro-
viding them with instant object recognition capabilities, thereby enhancing their independence
and quality of life. Moreover, in augmented reality applications, VisionSense can serve as a

5
cornerstone for creating immersive experiences that seamlessly integrate virtual and real-world
elements, opening up new avenues for entertainment, education, and commerce.

Furthermore, VisionSense holds immense promise in the field of computer vision, offering a
powerful tool for tasks such as object tracking, scene understanding, and automated content
tagging. Its ability to perform real-time inference directly on Android devices enables applica-
tions to respond swiftly to dynamic environments, making it suitable for a wide range of use
cases, from industrial automation to interactive gaming.

Looking ahead, VisionSense is poised to inspire further innovation in mobile-based AI appli-


cations, spurring the development of even more sophisticated models and intelligent systems.
As the capabilities of mobile hardware continue to evolve, VisionSense stands at the forefront
of a new era in which AI-driven solutions are seamlessly integrated into everyday mobile ex-
periences, enriching lives and transforming industries. With its robust methodology, tangible
outcomes, and far-reaching implications, VisionSense represents a paradigm shift in how we
perceive and interact with technology, setting the stage for a future defined by intelligent,
responsive, and accessible mobile applications.

6 References
• https://developer.android.com/media/camera/camera2
• https://www.tensorflow.org/lite/android/tutorials/object detection

You might also like