0% found this document useful (0 votes)
12 views49 pages

P2 Report

The document is a project report on 'Object Detection using Machine Learning' submitted by students at Jaypee University for their B.Tech degree. It outlines the project's objectives, scope, and methodology, focusing on developing an efficient machine learning model for real-time object detection. The report includes sections on budget management, literature survey, requirement analysis, design and implementation, and results, emphasizing the need for accurate and scalable solutions in various applications.

Uploaded by

shubhamsinghb11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views49 pages

P2 Report

The document is a project report on 'Object Detection using Machine Learning' submitted by students at Jaypee University for their B.Tech degree. It outlines the project's objectives, scope, and methodology, focusing on developing an efficient machine learning model for real-time object detection. The report includes sections on budget management, literature survey, requirement analysis, design and implementation, and results, emphasizing the need for accurate and scalable solutions in various applications.

Uploaded by

shubhamsinghb11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Object Detection using Machine Learning

A PROJECT REPORT

Submitted by:
Shubh Pratap Singh (221B377)

Tanishq Saxena(221B408)

Tanmay Kushwaha (221b410)

Under the guidance of Supervisor: Dr. Amit Rathi

May – 2025

Submitted in partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
Department of Computer Science & Engineering
JAYPEE UNIVERSITY OF ENGINEERING &
TECHNOLOGY, A-B ROAD, RAGHOGARH, DT. GUNA -
473226, M.P., INDIA
Declaration by the Students

We hereby declare that the work reported in the B. Tech. project entitled as
“Object Detection using Machine Learning”, in partial fulfillment for the
award of degree of B. Tech (CSE) submitted at Jaypee University of
Engineering and Technology, Guna, as per best of my knowledge and belief
there is no infringement of intellectual property right and copyright. In case
of any violation, I will solely be responsible.

Shubh Pratap Singh (221B377)


Tanishq Saxena (221B408)
Tanmay Kushwaha (221B410)

Department of Computer Science and Engineering


Jaypee University of Engineering and
Technology,
Guna, M.P., India

Date : 09/05/25
JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY
Accredited with Grade-A+ by NAAC & Approved U/S 2(f) of the UGC Act, 1956
A.B. Road, Raghogarh, District Guna (MP), India, Pin-473226
Phone: 07544 267051, 267310-14, Fax: 07544 267011
Website: www.juet.ac.in

CERTIFICATE

This is to certify that the work titled “Object Detection using Machine
Learning” submitted by Shubh Pratap Singh (221b377), Tanishq
Saxena (221B408) and Tanmay Kushwaha (221B410) in partial
fulfillment for the award of degree of B-Tech of Jaypee University of
Engineering & Technology, Guna has been carried out under my
supervision. As per best of my knowledge and belief there is no
infringement of intellectual property right and copyright. Also, this
work has not been submitted partially or wholly to any other University
or Institute for the award of this or any other degree or diploma. In case
of any violation concern student will solely be responsible.

Signature of Supervisor

Dr. Amit Rathi


Dept. of CSE (Assistant Professor)
Date : 09/05/25
ACKNOWLEDGEMENT

We thank the almighty for giving us the courage & perseverance in


completing the project. This project itself is an acknowledgement for all
those who have given us their heart-felt cooperation in making it a grand
success.
We are also thankful to the project coordinator, Dr. Amit Rathi for
extending their sincere & heartfelt guidance throughout this project
work. Without their supervision and guidance, stimulating &
constructive criticism, this project would never come out in this form. It
is a pleasure to express our deep and sincere gratitude to the Project
Guide Dr. Amit Rathi and are profoundly grateful for the unmatched
help rendered by him.
Last but not the least, we would like to express our deep sense and
earnest thanksgiving to our dear parents for their moral support and
heartfelt cooperation in doing the project. We would also like to thank
our friends, whose direct or indirect help has enabled us to complete this
work successfully.

Shubh Pratap Singh (221B398)


Tanishq Saxena (221B408)
Tanmay Kushwaha (221B410)

Date : 09/05/25
Table of Contents
Title Page…............................................................................................................................................................i
Declaration of the Student....................................................................................................................................ii
Certificate of the Guide….....................................................................................................................................iii
Abstract..................................................................................................................................................................iv
Acknowledgement…..............................................................................................................................................v

Chapter – 1 Introduction
1.1 Overview of Budget Management
1.2 Problem Statement
1.3 Objectives
1.4 Scope of the Project
1.5 Features and Functionalities
1.6 Organization of the Report
1.7 Conclusion of Chapter 1

Chapter – 2 Literature Survey


2.1 Existing Expense Tracking Apps
2.2 Comparative Analysis
2.3 Gaps Identified
2.4 Conclusion of Chapter 2

Chapter – 3 Requirement Analysis


3.1 Functional Requirements
3.2 Non-Functional Requirements
3.3 Software and Hardware Requirements
3.4 Use-Case Diagrams
3.5 Sequence Diagram
3.6 Activity Diagram
3.7 Conclusion of Chapter 3

Chapter – 4 Design and Implementation


4.1 System Architecture
4.2 Technology Stack Used
4.3 Frontend Backend Design
4.4 Database Schema and Structure
4.5 User Authentication and Authorization
4.6 Expense Management Flow
4.7 Data Visualization and Notifications
4.8 Testing Methodologies
4.9 Conclusion of Chapter 4
Chapter - 5 Feature Implementation and Functionalities
5.1 User Authentication and Dashboard
5.2 Image Input and Preprocessing
5.3 Object Detection
5.4 Text Extraction
5.5 Data Extraction and Structuring
5.6 Model confidence and extraction
5.7 Real time object detection
5.8 Interactive Visualization
5.9 Training the Model
5.10 Summary of Implemented Features

Chapter – 6 Results and Conclusion


5.1 Key Features Achieved
5.2 Screenshots and Output Samples
5.3 Challenges Faced
5.4 Future Enhancements
5.5 Conclusion of Chapter 5

References
 Research papers, articles, websites, and other sources used in the project.
Chapter 1: Introduction

1.1 Overview of Budget Management


Budget management for the "Object Detection Using Machine Learning" project is a critical aspect
that ensures the project is delivered within financial constraints while meeting its objectives.
Effective budget planning and monitoring help allocate resources efficiently, anticipate financial
risks, and maintain project timelines.
1.1.1 Budget Planning and Allocation
The budget is initially planned based on the project scope, timeline, and resource requirements.
Key cost components include:
 Personnel Costs: Salaries or stipends for data scientists, machine learning engineers, software
developers, and project managers.
 Software and Tools: Licenses for machine learning frameworks, cloud computing services (e.g.,
AWS, Google Cloud), and annotation tools.
 Hardware Infrastructure: Costs for GPUs, storage, or other computing equipment if not using
cloud services.
 Data Acquisition and Annotation: Expenses related to collecting and labeling datasets,
especially if using third-party services or purchasing proprietary datasets.
 Training and Development: Workshops or training programs to upskill the team on advanced
object detection techniques.
 Contingency Fund: Reserved funds for unexpected expenses or delays.
1.1.2 Cost Optimization
To stay within budget, cost optimization strategies are implemented:
 Leveraging open-source tools and pre-trained models (e.g., YOLO, Faster R-CNN).
 Utilizing cloud credits or academic licenses for computing resources.
 Outsourcing non-core tasks to reduce personnel costs.

7
1.2 Problem Statement

In many real-world applications, such as autonomous driving, surveillance systems, medical


diagnostics, and industrial automation, the ability to accurately detect and identify objects within
images or video streams is crucial. Traditional computer vision techniques often struggle with
complex environments, varied lighting conditions, occlusions, and scalability issues. These
limitations hinder their effectiveness in dynamic and real-time settings.

The problem this project aims to address is the lack of an efficient, accurate, and scalable
solution for detecting and classifying objects in visual data using machine learning
techniques. Despite recent advancements in artificial intelligence, many existing object detection
systems are either computationally expensive, lack real-time capability, or perform poorly in
diverse and unstructured environments.

This project seeks to develop a robust machine learning-based object detection model that can
accurately identify and locate multiple objects within images, with a focus on improving
performance, reducing computational cost, and enhancing real-world applicability.

1.3 Objectives
 Develop an Accurate Object Detection Model
To design and train a machine learning model capable of accurately detecting and classifying
multiple objects within static images or video frames.

 Utilize State-of-the-Art Algorithms

To implement and evaluate modern object detection algorithms such as YOLO (You Only Look

Once), SSD (Single Shot Detector), and Faster R-CNN, and determine the most effective approach

based on accuracy and performance.

 Create a Robust and Scalable Solution

To build a model that maintains high performance across diverse datasets, including variations in

lighting, background, scale, and occlusion.

 Optimize for Real-Time Performance

To reduce inference time and computational overhead, enabling the system to operate in near real

time for applications such as surveillance or autonomous navigation.


8
 Automate Data Preparation and Annotation

To streamline the dataset preparation process by incorporating semi-automated data labeling tools

and preprocessing techniques to improve efficiency.

 Evaluate Model Performance Using Standard Metrics

To assess the model using metrics like Precision, Recall, mAP (mean Average Precision), and FPS

(Frames Per Second) to ensure its effectiveness in practical applications.

 Deploy a Usable Prototype

To develop a prototype application or interface that demonstrates the object detection system in a

real-world scenario, such as object tracking in video feeds or real-time image analysis.

1.4 Scope of the Project


The project "Object Detection Using Machine Learning" focuses on the design, development,
and evaluation of an intelligent system capable of identifying and localizing multiple objects
within images or video streams using advanced machine learning techniques. The scope of this
project includes the following key areas:

1) Data Collection and Preparation


a. Gathering image datasets from publicly available sources or through custom data
collection.
b. Preprocessing and annotating data to ensure it is suitable for training object
detection models.
2) Model Selection and Implementation
a. Researching and selecting appropriate object detection algorithms such as YOLO,
SSD, or Faster R-CNN.
b. Implementing these models using popular machine learning frameworks like
TensorFlow or PyTorch.
3) Training and Optimization
a. Training the selected model(s) using labeled datasets.
b. Performing hyperparameter tuning and optimization to improve accuracy and
reduce latency.
4) Testing and Evaluation
a. Evaluating model performance using standard metrics such as precision, recall,
mean Average Precision (mAP), and inference time.

9
b. Testing the model on unseen data to assess generalization and robustness.
5) Prototype Development
a. Creating a user-friendly prototype or interface that demonstrates real-time or batch
object detection capabilities.
b. Ensuring the prototype is deployable on standard computing hardware or cloud-
based platforms.

1.5 Features and Functionalities


Feature Functionality Description

Detects and classifies multiple objects within a single image


Multi-Object Detection
or video frame.

Enables fast inference for real-time applications like


Real-Time Processing
surveillance and live monitoring.

Bounding Box Displays bounding boxes with labels and confidence scores
Visualization for detected objects.

Model Training & Allows training with custom datasets and fine-tuning of pre-
Customization trained models.

Data Annotation & Supports annotation tools and dataset preprocessing (e.g.,
Management augmentation, class balancing).

Evaluates model with metrics like Precision, Recall, mAP,


Performance Metrics
and FPS.

Provides a simple interface for uploading data and viewing


User Interface / Dashboard
results in real-time.

Supports deployment on local systems or cloud, and


Deployment Ready
integration via APIs.

10
1.6 Organization of the Report

This report is structured into the following chapters:

 Chapter 2: Literature Survey – A review of existing personal finance tools and analysis of
their limitations.

 Chapter 3: Requirement Analysis – A detailed analysis of functional, non-functional,


software, and hardware requirements.

 Chapter 4: System Design and Implementation – An overview of the architecture,


technology stack, frontend/backend logic, and UI components.

 Chapter 5: Feature Implementation and Functionalities – Describes in detail all key


features implemented, including advanced tools like the chatbot, receipt scanning, and
predictive analytics.

 Chapter 6: Results and Conclusion – Presents system outcomes, screenshots, user


interface samples, challenges, and future improvement areas.

 References – A compilation of all scholarly articles, websites, and tools cited, in IEEE format.

1.7 Conclusion of Chapter 1

The project "Object Detection Using Machine Learning" addresses the growing need for
intelligent, automated systems capable of accurately identifying and locating objects within images
and video streams. By leveraging advanced machine learning algorithms such as YOLO, SSD, and
Faster R-CNN, the project aims to develop a robust and scalable object detection model that
performs well in real-world scenarios.

Through clearly defined objectives, the project focuses on building an accurate and efficient
detection system, optimizing it for real-time applications, and ensuring it is adaptable through
custom training and dataset management. Effective budget management ensures the optimal use of
resources, while the project scope outlines practical boundaries to keep the development focused
and achievable.

11
Chapter 2: Literature Survey

2.1 Existing Expense Tracking Apps


In the context of managing the budget for the "Object Detection Using Machine Learning"
project, several expense tracking applications can support effective financial planning, monitoring,
and reporting. These tools are essential for keeping the project within financial constraints, avoiding
overspending, and maintaining transparency.
1. Microsoft Excel / Google Sheets
 Type: Manual tracking
 Use Case: Widely used for creating custom budget templates, tracking daily expenses, and
generating financial reports.
 Advantages:
o Highly customizable
o Free and accessible
o Easy to use for small-to-medium-sized projects
 Limitations:
o Manual data entry
o No automation or real-time alerts
2. Trello with Budget Plugins
 Type: Project management tool with budget tracking integrations
 Use Case: Useful for combining task management with financial tracking by integrating
plugins like Costello.
 Advantages:
o Visual board-style planning
o Budget tracking tied to specific project tasks
 Limitations:
o Requires third-party integrations
o Less financial detail compared to dedicated expense tools
3. Expensify

 Type: Dedicated expense tracking app

12
 Use Case: Tracks individual and team expenses, ideal for recording equipment purchases,
software subscriptions, and travel.
 Advantages:
o Receipt scanning
o Real-time syncing and approvals
o Mobile app availability
 Limitations:
o Paid plans for full features
o May be more suitable for individual expense tracking than full project budgeting

2.2 Comparative Analysis

Table 2.1 Comparative Analysis of Existing Application

13
From this comparison, it is evident that while several applications offer comprehensive features,
many are region-specific, overly complex for average users, or lack essential features such as
data export or customized alerts.

2.3 Gaps Identified


 Limited Access to Domain-Specific Annotated Data

The project lacks high-quality, labeled datasets tailored to the specific objects or environment it
targets.

 Insufficient Computational Resources

Lack of access to high-end GPUs or cloud infrastructure can hinder model training and testing
efficiency.

 Performance Limitations for Real-Time Applications

Some models (e.g., Faster R-CNN) may not meet real-time speed requirements without
optimization.

 No Integrated Budget Tracking System

Project expenses are not tracked in real-time, increasing the risk of overspending or poor financial
visibility.

 Lack of User Interface (UI)

Absence of a user-friendly interface limits usability for non-technical stakeholders or broader


demonstration.

 No Defined Deployment Strategy

The project does not outline how the model will be deployed (e.g., on web, mobile, or embedded
systems).

 Manual Data Annotation

Annotation of training data is time-consuming and may lead to inconsistency without automation
tools.
14
 Security and Privacy Concerns

No clear strategy for managing sensitive image data or ensuring compliance with privacy
standards like GDPR.

 Limited Cross-Platform Compatibility

The solution may not be optimized for multiple platforms, reducing its accessibility and
scalability.

2.4 Conclusion of Chapter 2

The development of the "Object Detection Using Machine Learning" project demonstrates
strong potential in addressing real-world challenges through intelligent automation. However, to
ensure the project is successful, scalable, and sustainable, a number of operational and technical
gaps must be addressed.

The review of existing expense tracking applications highlights that while there are several tools
available—such as Google Sheets, Expensify, Zoho Expense, and QuickBooks—each comes with
its own strengths and limitations. The comparative analysis shows that for research-driven, small-
to-medium-scale technical projects, a combination of customizable tools like Google Sheets and
automated platforms like Zoho Expense can provide an effective balance between control and
efficiency in budget management.

Despite the project’s strong technical foundation, several gaps have been identified, such as
limited access to domain-specific datasets, lack of real-time model performance, absence of a user
interface, manual data annotation, and no integrated expense tracking system. These issues can
negatively impact the accuracy, usability, and scalability of the final solution if not addressed.

Overall, bridging these gaps—through data automation, performance optimization, UI


development, and financial tracking integration—will not only enhance the project’s quality but
also ensure its readiness for real-world deployment and further innovation.

15
Chapter 3: Requirement Analysis

The requirement analysis for the project "Object Detection Using Machine Learning"
involves identifying and defining the technical, functional, and resource needs essential for
successful implementation. At the core, the project requires a clearly defined dataset
containing labeled images for training and validation, ideally sourced from real-world
environments relevant to the intended application (e.g., surveillance, traffic, industrial
settings). Software requirements include machine learning frameworks like TensorFlow or
PyTorch, along with tools for data annotation, model training, and visualization. Hardware
requirements may involve access to high-performance GPUs or cloud computing platforms to
handle intensive training workloads.
3.1 Functional Requirements

 Image and Video Input Handling

The system shall allow users to upload static images or video files for object detection.

The system shall support real-time video stream input (e.g., from a webcam or IP camera).

 Object Detection

The system shall detect and classify multiple objects in each image or video frame using a
trained machine learning model.

The system shall provide bounding boxes around detected objects along with the object class
and confidence score.

 Model Training and Evaluation

The system shall enable training of object detection models using labeled datasets.

The system shall evaluate model performance using metrics such as Precision, Recall, and
mAP (mean Average Precision).

 Dataset Management

The system shall allow users to import, manage, and annotate datasets.
16
The system shall support dataset preprocessing such as resizing, augmentation, and
normalization.

 User Interface

The system shall provide a graphical user interface (GUI) for image/video upload, detection
visualization, and result display.

The system shall display detection results with class labels and confidence scores in real
time.

 Result Export and Storage

The system shall allow users to export detection results (e.g., as images with bounding boxes
or text-based output).

The system shall store processed data and model outputs for later review.

3.2 Non-Functional Requirements


 Performance
 The system shall provide object detection results with minimal latency, aiming for at
least 15–30 FPS in real-time mode.
 The model inference time should not exceed 200 milliseconds per frame for real-time
applications.
 Scalability
 The system shall be scalable to handle larger datasets and support retraining with
additional classes or data without significant redesign.
 It should support deployment on both local machines and cloud platforms (e.g., AWS,
Google Cloud).
 Usability
 The system shall provide a user-friendly interface accessible to users with minimal
technical background.
 It should include tooltips, documentation, or help guides for ease of use.
 Reliability and Availability
 The system shall maintain stable performance during long-running processes such as
training or real-time detection.

17
 It should include error handling to notify users of failures (e.g., failed uploads, model
loading errors).
 Maintainability
 The system codebase shall be modular and well-documented to allow for future updates,
bug fixes, or enhancements.
 Model configurations and parameters should be stored in an easily editable format (e.g.,
YAML or JSON).
 Portability
 The system shall be operable across different operating systems (e.g., Windows, Linux).
 It should support deployment in both GUI-based and command-line environments.
 Security and Privacy
 The system shall ensure that all user-uploaded data is stored securely and not shared
without consent.
 For deployments involving sensitive data, compliance with data protection standards
(e.g., GDPR) shall be enforced.

3.3 Software and Hardware Requirements


Software Requirements
 Operating System:
o Windows 10 or higher / Ubuntu 20.04+ / macOS (optional)
 Programming Language:
o Python 3.8 or above
 Libraries and Frameworks:
o TensorFlow or PyTorch
o OpenCV
o NumPy, Pandas, Matplotlib
o Scikit-learn (for additional ML tasks)
 Development Tools:
o Jupyter Notebook
o Visual Studio Code or PyCharm

18
 Annotation Tools:
o LabelImg, CVAT, or Roboflow for dataset labeling
 Package Managers:
o pip or conda
 Visualization & UI Tools:
o Streamlit, Flask, or Dash (for building web interfaces)
 Version Control:
o Git (with GitHub or GitLab)
 Optional:
o SQLite/PostgreSQL (for storing results or tracking data)
o Google Colab, AWS, or Google Cloud for training at scale
o Google Sheets API / Zoho Expense API (for expense tracking)

Hardware Requirements
 Processor (CPU):
o Minimum: Intel i5 or AMD Ryzen 5
o Recommended: Intel i7 or AMD Ryzen 7 (or higher)
 RAM:
o Minimum: 8 GB
o Recommended: 16 GB or more
 Storage:
o Minimum: 256 GB SSD
o Recommended: 512 GB SSD or more
 Graphics Processing Unit (GPU):
o Minimum: NVIDIA GTX 1050 or equivalent
o Recommended: NVIDIA RTX 2060 / RTX 30-series or better (for faster
training)
 Display & Peripherals:
o Standard monitor, keyboard, and mouse
o Optional webcam (for real-time video input testing)

19
3.4 Use-Case Diagram
Actors:
 User – The person uploading or providing images/videos for detection.
 System Administrator – Manages models and system configurations (optional).
 ML Model – The object detection model (considered as a system actor).

- Use Cases and Their Descriptions

Use Case Description

Upload User uploads image or video files that need to be processed by the object
Image/Video detection system.

The system uses the trained ML model to identify and label objects
Detect Objects
within the uploaded media.

Detected objects are displayed visually to the user, often with bounding
Display Results
boxes and labels.

User can download the output, including images/videos with annotations


Download Results
or a report of detected objects.

System Administrator can initiate retraining of the object detection


Retrain Model
model with new or updated datasets.

System Administrator reviews logs of user activities, model


View Logs
performance, and system health.

Admin adds, removes, or updates the dataset used for training or


Manage Dataset
evaluating the ML model.

20
Table 3.1 Use-Cases and their Descriptions

Figure 3.1 Use-Case Diagram of the System

3.5 Sequence Diagram

- Actors & Components:

 Mobile App/UI

 Camera Module

 Image Pre-Processing Module

 YOLO Object Detection Model

 Audio Generator

 Speaker

21
Figure 3.2 Sequence Diagram of the System

3.6 Class Diagram

Components of the Class Diagram


1. User Class
 Attributes:
o userID: String
o name: String
o email: String
 Methods:
o uploadMedia(): Uploads image or video to the system.
o viewResults(): Views the detection results.
o downloadResults(): Downloads the detection output.
22
2. Media Class
 Attributes:
o mediaID: String
o mediaType: String (e.g., image, video)
o uploadTime: DateTime
o filePath: String (location of the media)
o userID: String (link to User)
 Methods:
o getMediaInfo(): Returns metadata about the media.

3. DetectionResult Class
 Attributes:
o resultID: String
o mediaID: String (link to Media)
o detectedObjects: List of ObjectLabel (objects detected)
o detectionTime: DateTime
 Methods:
o generateReport(): Creates a summary of the detection results.

4. ObjectLabel Class
 Attributes:
o labelID: String
o labelName: String (e.g., "Car", "Person")
o confidenceScore: Float (probability of correct detection)
o boundingBox: String (coordinates of the detected object)

5. MLModel Class
 Attributes:
o modelID: String
o modelVersion: String

23
o accuracy: Float (model's detection accuracy)
 Methods:
o detectObjects(): Detects objects in an image/video.
o retrainModel(): Retrains the model with new data.

6. Admin Class
 Attributes:
o adminID: String
o name: String
o email: String
 Methods:
o manageDataset(): Modifies and manages the dataset for training.
o retrainModel(): Triggers the model retraining process.
o viewLogs(): Views logs of system activities and performance.

Figure 3.3 Class Diagram of the System

3.7 Activity Diagram


24
Components of the Activity Diagram

1. Start (Initial Node)

 The entry point of the workflow.

2. Data Collection

 Gather image/video data from various sources (datasets, sensors, cameras, etc.).

3. Data Preprocessing

 Resize images

 Normalize pixel values

 Annotate images (bounding boxes/labels)

 Data augmentation (optional)

4. Dataset Splitting

 Split data into:

 Training set

 Validation set

 Test set

5. Model Selection

 Choose a model architecture (e.g., YOLO, SSD, Faster R-CNN)

6. Model Training

 Train the model using training data

 Monitor loss and accuracy

7. Model Evaluation

 Evaluate performance on the validation/test data

25
 Metrics: mAP (mean Average Precision), IoU, accuracy

8. Model Optimization (Optional)

 Fine-tuning

 Hyperparameter tuning

 Pruning or quantization (if needed for deployment)

9. Model Deployment

 Integrate the model into an application (web app, mobile app, etc.)

 Backend deployment (cloud, edge devices)

10. Real-time Object Detection (Optional)

 Detect objects in real-time from camera input or video feed

11. End (Final Node)

 Marks the completion of the process.

26
Figure 3.4 Activity Diagram of the System

27
3.8 Conclusion of Chapter 3

This project focuses on building an efficient object detection system using machine learning,
illustrated through a detailed activity diagram. The process begins with data collection and
preprocessing, ensuring high-quality, annotated data for training. The workflow continues with
model selection, training, and evaluation, using advanced architectures like YOLO, SSD, or
Faster R-CNN to achieve accurate object recognition.

The activity diagram provides a clear visual representation of the project's lifecycle, including key
decision points for model optimization and real-time detection deployment. This structured
approach helps streamline development, improve team collaboration, and ensure robust
performance across different environments.

In conclusion, the combination of a well-defined activity diagram and systematic implementation


makes this object detection project practical, scalable, and adaptable to real-world applications such
as surveillance, autonomous vehicles, and smart devices.

28
Chapter 4: Design and Implementation

1.1 System Architecture

1. Data Collection Layer

 Image/Video Sources:

o Datasets (e.g., COCO, Pascal VOC, custom data)

o Cameras (for real-time detection)

o Web scraping / APIs

 Storage:

o Cloud storage (AWS S3, Google Cloud Storage)

o Local storage (file system)

2. Data Preprocessing Layer

 Image Annotation:

o Tools: LabelImg, CVAT, VIA

o Output format: COCO JSON, Pascal VOC XML, YOLO TXT

 Data Augmentation:

o Techniques: Rotation, flipping, scaling, cropping, color jitter

 Normalization:

o Resizing to fixed input shape

o Pixel normalization

3. Model Training Layer

 Model Selection:
29
o Pre-trained models: YOLOv5/8, SSD, Faster R-CNN, EfficientDet

o Frameworks: TensorFlow, PyTorch, Keras, OpenCV

 Training Pipeline:

o Dataset splitting: Train/Validation/Test

o Loss functions: Classification + Localization loss

o Hyperparameter tuning

 Compute Infrastructure:

o GPU/TPU clusters

o Cloud ML platforms (AWS SageMaker, Google AI Platform)

4. Evaluation Layer

 Metrics:

o mAP (mean Average Precision)

o IoU (Intersection over Union)

o Precision, Recall

 Visualization Tools:

o Bounding box rendering

o Confusion matrices

5. Inference/Prediction Layer

 Deployment Options:

o On-device (Edge AI, mobile)

o Server-based REST API

30
o Web interface

 Optimization:

o Model quantization, pruning, conversion (ONNX, TensorRT, TFLite)

o Serving Frameworks:

o TensorFlow Serving, TorchServe, FastAPI, Flask

6. Monitoring & Feedback Layer

 Monitoring Tools:

o Logging predictions, latency, accuracy drift

o Tools: Prometheus + Grafana, custom dashboards

 User Feedback Loop:

o Manual correction for false predictions

o Retraining dataset with corrected samples

7.Optional Enhancements

 Real-time Processing:

o Stream data from camera feeds

o Use Kafka, MQTT, or WebSockets

 AutoML Integration:

o Use AutoML for model architecture search or hyperparameter tuning

 CI/CD for ML (MLOps):

o Automate data validation, model training, testing, and deployment

o Use DVC, MLflow, Kubeflow

31
1.2 Technology Stack Overview
1. Data Collection & Storage
 Data Sources:
o Camera streams (IP/USB), video files, image datasets (e.g., COCO, Pascal VOC)
 Web Scraping & APIs:
o requests, BeautifulSoup, Selenium, public APIs (e.g., Google Images, Open
Images Dataset)
 Storage:
o Local: HDD/SSD file systems
o Cloud: AWS S3, Google Cloud Storage, Azure Blob Storage
o Database (if metadata storage needed): PostgreSQL, MongoDB

2. Data Annotation & Preprocessing


 Annotation Tools:
o LabelImg (Pascal VOC, YOLO format)
o CVAT (COCO, YOLO, VOC)
o VIA (VGG Image Annotator)
 Preprocessing Libraries:
o OpenCV: Image processing (resize, crop, color correction)
o Albumentations: Advanced image augmentation
o Pillow: Image I/O and manipulation
o NumPy: Array operations

1.1 Frontend and Backend Design

 Frontend Design:

Tech Stack:

o HTML/CSS + JavaScript
32
Frameworks:

o React.js or Vue.js (modern, component-based)

Styling:

o Tailwind CSS or Bootstrap

Chart/Visualization Libraries:

o Chart.js, D3.js (for metrics display)

o OpenCV.js or custom canvas logic (for bounding box overlays)

2. Responsibilities

User Interface

o Upload image or video

o Show prediction results with bounding boxes

o Display class labels and confidence scores

Visualization

o Canvas overlay on image/video with detection results

o Zoom/pan support (for detailed inspection)

Backend Design:

1. Tech Stack

o Programming Language: Python

o Web Framework: FastAPI or Flask (FastAPI preferred for speed and async
support)

o Model Serving: TorchServe, TensorFlow Serving, or custom inference

33
script

o Database (optional for logging/results): PostgreSQL or MongoDB

o Storage: Local file system or cloud (AWS S3, GCP Storage)

o Containerization: Docker

2. Responsibilities

Model Inference API

o Endpoint: POST /predict

o Accepts: Image/video input (as file or base64)

o Returns: Detected object labels, bounding box coordinates, confidence


scores

Model Management

o Load and manage ML model(s) during app startup

o Support multiple models (YOLOv5, Faster R-CNN, etc.)

o Database Schema and Structure

o The database structure is normalized to avoid data redundancy:

o Users Collection/Table: Fields include userID, username, email,


hashedPassword, and createdAt.

Transactions Collection/Table: Contains transactionID, userID, type


(income/expense), amount, category, description, and timestamp.

Budgets Collection/Table: Contains userID, category, limit, and month. Indexes


are used on userID and timestamp fields to improve query performance.

1.2 Expense Management Flow


1. Receipt Upload/Input
 User uploads image or scanned copy of receipt via the frontend (web/mobile UI).
34
 Accepted formats: JPG, PNG, PDF (converted to image)
2. Object Detection + OCR Pipeline
 Object Detection (e.g., using YOLOv8 or Faster R-CNN):
o Detect bounding boxes for key fields: Merchant, Date, Total Amount, Items,
Taxes.
 OCR (Text Extraction):
o Apply Tesseract, EasyOCR, or PaddleOCR on detected regions.

35
1.3 Data Visualization and Notifications
Once receipts are processed and expense data is extracted, users are presented with an
interactive dashboard that displays their financial activity through various visual components.
This includes pie charts showing expense distribution by category, bar or line graphs for
monthly spending trends, and tables listing individual transactions. Users can filter these views
by date range, vendor, or payment method, allowing for deep insights into their spending
behavior. Visualization tools such as Chart.js, Recharts, or D3.js are integrated into the
frontend (built using React.js or Vue.js), providing a responsive and engaging interface. Each
visual element is clickable, enabling users to review detailed receipt data, make corrections, or
drill down into specific expense groups.

1.4 Testing Methodologies


1. Unit Testing
Unit testing verifies that individual functions or modules work as intended. This includes:
 Backend logic: validating image upload, file handling, and API endpoints using tools
like Pytest or unittest.
 Frontend components: testing UI components and state management with Jest and
React Testing Library.
 Utility functions: parsing, string handling, and data formatting for expenses and
receipts.

🧪 2. Integration Testing

Integration testing ensures that multiple components work together seamlessly:


o Test the complete receipt flow: image upload → object detection → OCR →
categorization → database storage.
o Simulate real-world inputs (e.g., blurry images, rotated receipts) and verify the
data pipeline.

36
o Use tools like Postman or Supertest to simulate API calls from the frontend to
the backend.

🤖 3. Machine Learning Model Testing

ML-specific testing focuses on the performance and accuracy of the object detection
and OCR modules:
o Object Detection Evaluation:

o Use metrics like mAP (mean Average Precision), IoU (Intersection over Union),
and Precision/Recall.
o Test on a diverse dataset of receipts with varying formats, lighting conditions,
and languages.
o OCR Testing:

o Compare extracted text against labeled ground truth.

o Calculate Character Error Rate (CER) and Word Error Rate (WER).

🧭 4. Functional Testing

This verifies that the system behaves as expected from an end-user perspective:
o Upload receipts and validate correct extraction of date, merchant, amount, etc.

o Ensure that the visualizations update dynamically with new expense data.

1.5 Conclusion of Chapter 4

In conclusion, the Expense Management System leveraging Object Detection and OCR is a
sophisticated tool designed to automate and streamline expense tracking, enhance financial
management, and provide valuable insights to users. By combining machine learning
technologies, such as object detection for receipt field recognition and OCR for text extraction,
with intuitive data visualization and real-time notifications, this system empowers users to
manage their finances with minimal effort.

The various testing methodologies implemented—from unit and integration testing to machine
learning model evaluation—ensure that every component functions correctly, that the system is
reliable and secure, and that it delivers an accurate and smooth user experience. The use of

37
diverse testing strategies also guarantees that potential issues, from functional bugs to
performance bottlenecks, are identified and resolved early in the development process.

Ultimately, this system offers not only a robust technical foundation but also a user-centric
interface that brings automation and accuracy to everyday expense tracking. By making
financial data more accessible, organized, and actionable, it enables individuals and businesses
to stay on top of their spending, plan their budgets effectively, and make informed financial
decisions with ease.

This integration of cutting-edge technologies and best practices in software and machine
learning testing ensures the creation of a powerful, scalable, and user-friendly solution that
meets the needs of modern financial management.

38
Chapter 5: Features Implementation and Functionalities

5.1 User Authentication and Dashboard

User Authentication and the Dashboard are key components that ensure a secure,
personalized, and user-friendly experience. These elements not only safeguard
sensitive financial data but also present it in an organized and actionable way.
User authentication is a critical aspect of ensuring that only authorized individuals
have access to personal financial data. This process involves verifying the identity of
users, securely storing their credentials, and providing role-based access to different
parts of the system.

5.2 Image Input and Preprocessing


 Image Upload: Users can upload receipt images or scanned documents (JPG, PNG,
PDF).
 Preprocessing: The uploaded image is preprocessed for better detection (e.g.,
resizing, normalization, color correction).
 Noise Reduction: Filters such as Gaussian blur may be used to reduce noise for
better object detection accuracy.

5.3 Object Detection


 Bounding Boxes: Identifies and draws bounding boxes around detected objects
(e.g., merchant name, date, total amount on receipts).
 Class Labels: Assigns labels to identified objects (e.g., "Date", "Amount",
"Vendor").
 Localization: Not only detects but also locates the objects in the image (the position
of the bounding box coordinates).
39
 Multiple Object Detection: Detects and distinguishes multiple objects in a single
image (e.g., identifying several items or text sections in a receipt).

5.4 Text Extraction (OCR)


 Optical Character Recognition (OCR): After detecting the bounding boxes around
text areas, OCR (such as Tesseract, EasyOCR, or PaddleOCR) extracts the text
content from the identified areas.
 Text Cleaning: Post-processing of extracted text to remove noise, correct errors, and
format it for use (e.g., removing special characters or correcting OCR mistakes).

5.5 Data Extraction and Structuring


 Field Extraction: Extracts key financial fields such as merchant name, date,
amount, tax, and itemized lists from the detected objects.
 Data Structuring: Organizes the extracted data into a structured format (JSON or
database records) for easy use and analysis.

5.6 Model Confidence and Accuracy


 Confidence Scores: For each detected object, the model provides a confidence
score to indicate how likely the object is correctly identified.
 Thresholding: Confidence scores help filter out false positives. A threshold is set
(e.g., > 70%) to ensure only highly confident detections are used.

5.7 Real-Time Object Detection


 Live Image Processing: Supports real-time processing of images or video streams
(e.g., in an application where users upload receipts continuously).
 Integration with Camera/Scanner: Allows integration with camera or scanner
40
inputs for live scanning and detection.

5.8 Interactive Visualization


 Bounding Box Display: On the frontend, detected objects are highlighted with
bounding boxes and labels overlaid on the original image.
 Error Display and Correction: If an object is detected incorrectly, users can
manually adjust the bounding box or text fields to improve accuracy.

5.9 Training the Model


 Dataset Collection: Gather a large dataset of annotated images, such as receipts or
invoices, where the locations of objects (merchant, amount, etc.) are labeled.
 Model Selection: Use a pre-trained model like YOLOv5, Faster R-CNN, or SSD
for object detection.
 Transfer Learning: Fine-tune the pre-trained model on the specific dataset
(receipts, invoices, etc.) to adapt it to the problem domain.
Feature Functionality

Image Input and Upload images (JPG, PNG, PDF), preprocess for better detection (resizing,
Preprocessing noise reduction, normalization).

Detect and draw bounding boxes around objects (e.g., merchant, amount),
Object Detection
classify detected objects, and localize them in the image.

Apply Optical Character Recognition (OCR) to extract text from identified


Text Extraction (OCR)
objects like the merchant name, date, total amount, etc.

Data Extraction and Extract key financial fields (e.g., merchant name, date, amount) and
Structuring structure them into a readable format (JSON, database).

Model Confidence and Calculate and display confidence scores for object detection, filter out low-
Accuracy confidence detections, and ensure accurate detection.

41
Feature Functionality

Real-Time Object Detect objects in real-time for live image processing or video streams, with
Detection integration to camera/scanner for continuous scanning.

Interactive Visualize detected objects with bounding boxes and labels on the uploaded
Visualization images, with options for error correction and user interaction.

Train the object detection model on labeled datasets using pre-trained


Training the Model
models (e.g., YOLO, Faster R-CNN), fine-tune it for domain-specific use.

Perform inference on new images to detect objects, apply OCR, and extract
Model Inference
text.

Handling Different Support for processing single images or multi-page documents (e.g., invoices,
Input Formats receipts).

Accuracy Evaluation Evaluate model accuracy with metrics like mAP, IoU, Precision/Recall, and
and Tuning fine-tune the model with hyperparameter adjustments.

Post-Processing Apply non-maximum

42
Chapter 6: Results and Conclusion

6.1 Key Features Achieved

Object detection using machine learning with YOLO is fast, accurate, and capable of real-time
performance. It detects multiple objects in a single pass by treating detection as a single
regression problem. YOLO is efficient, generalizes well to new data, and supports lightweight
versions for mobile deployment. Its open-source nature and broad applicability make it popular
in fields like surveillance, healthcare, and assistive technology.

 Real-Time Performance – YOLO processes images quickly, making it ideal for real-
time applications.

 Single-Pass Detection – It detects objects and predicts bounding boxes in one forward
pass.

 Multi-Object Capability – Can detect multiple objects in a single image accurately.

 Lightweight Variants – Versions like YOLOv4-tiny allow deployment on mobile and


edge devices.

 Wide Applicability – Used in fields like security, autonomous driving, and assistive
technologies.

6.2Screenshots of Code Snippet

Output Layers

43
Confidence Function Logic

Input

44
Output

45
6.3 Challenges Faced

Challenges in object detection using YOLO include difficulty detecting small or overlapping
objects due to its grid-based approach. It may also struggle with objects at extreme angles or in
low-light conditions, and requires large labeled datasets for accurate training.

 6.3.1 Poor Detection of Small Objects – YOLO may miss small objects due to its grid-based
division of images.

 6.3.2 Overlapping Objects – Struggles to distinguish between closely packed or overlapping


objects.

 6.3.3 Complex Backgrounds – Accuracy drops when objects appear in cluttered or dynamic
scenes.

 6.3.4 Angle and Scale Variability – Limited performance when objects are rotated or vary
greatly in size.

 6.3.5 Requires Large Labeled Datasets – Needs extensive, annotated training data for good
accuracy.

6.3.6 Hardware Constraints – High-performance models demand significant computational


resources.

Each challenge contributed significantly to team learning and improved the system’s reliability

6.4 Future Enhancements

Here are some future enhancements in object detection using machine learning with YOLO:

6.4.1 Improved Small Object Detection – Enhancing YOLO's architecture to better detect
small or distant objects.
46
6.4.2 Integration with Transformers – Combining YOLO with vision transformers for better
contextual understanding.

6.4.3 Edge AI Optimization – Further model compression for faster inference on mobile
and embedded devices.

6.4.4 3D Object Detection – Expanding YOLO to detect objects in 3D space for AR/VR
and autonomous systems.

6.4.5 Self-Supervised Learning – Reducing dependence on large labeled datasets by using


self-learning techniques.

6.4.6 Cross-Domain Adaptability – Enhancing model robustness across different


environments and lighting conditions

6.5 Conclusion of Chapter 5

In conclusion, YOLO (You Only Look Once) has transformed object detection by offering a fast
and efficient solution for real-time applications. Its single-stage architecture allows for real-time
processing, making it ideal for scenarios like autonomous vehicles, security, and robotics.
YOLO’s ability to detect objects in a single pass reduces computational costs while maintaining
impressive speed. While it performs well in detecting clearly defined objects, it may struggle
with small or overlapping objects. Despite these challenges, YOLO’s versatility and continuous
evolution—through models like YOLOv5 and YOLOv7—make it adaptable across various
industries. Although YOLO excels in speed and efficiency, optimizing performance requires
careful tuning and a well-curated dataset. Overall, YOLO remains a powerful tool for object
detection, though its limitations should be considered for specific tasks.

47
References

[1] You Only Look Once: Unified, Real-Time Object Detection.


This paper by Joseph Redmon et al. introduces YOLO, a fast, real-time object detection method that
processes images in a single pass, enabling high frame rates like 45 FPS with the base model and
155 FPS with Fast YOLO

You Only Look Once: Unified,Real-Time-Object Detection

[2] YOLO9000: Better, Faster, Stronger


In this paper, Redmon and Farhadi present YOLO9000, an enhanced version of YOLO capable of
detecting over 9000 object categories. They introduce a joint training method that allows YOLO9000 to
predict detections for object classes without labeled detection data, achieving state-of-the-art performance
on standard detection tasks like PASCAL VOC and COCO.
YOLO9000:Better,Faster,Stronger

48
Personal Details

Name : Shubh Pratap Singh


Er. No. : 221B377
Course: Bachelor in Technology
Branch: Computer Science and Engineering
Email ID : [email protected]
Contact : +91-9897827275

Name : Tanishq Saxena


Er. No. : 221B408
Course: Bachelor in Technology
Branch: Computer Science and Engineering
Email ID :[email protected]
Contact : +91-9019315540

Name : Tanmay Kushwaha


Er. No. : 221B410
Course: Bachelor in Technology
Branch: Computer Science and Engineering
Email ID : [email protected]
Contact : +91-7477048198

49

You might also like