P2 Report
P2 Report
A PROJECT REPORT
Submitted by:
Shubh Pratap Singh (221B377)
Tanishq Saxena(221B408)
May – 2025
We hereby declare that the work reported in the B. Tech. project entitled as
“Object Detection using Machine Learning”, in partial fulfillment for the
award of degree of B. Tech (CSE) submitted at Jaypee University of
Engineering and Technology, Guna, as per best of my knowledge and belief
there is no infringement of intellectual property right and copyright. In case
of any violation, I will solely be responsible.
Date : 09/05/25
JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY
Accredited with Grade-A+ by NAAC & Approved U/S 2(f) of the UGC Act, 1956
A.B. Road, Raghogarh, District Guna (MP), India, Pin-473226
Phone: 07544 267051, 267310-14, Fax: 07544 267011
Website: www.juet.ac.in
CERTIFICATE
This is to certify that the work titled “Object Detection using Machine
Learning” submitted by Shubh Pratap Singh (221b377), Tanishq
Saxena (221B408) and Tanmay Kushwaha (221B410) in partial
fulfillment for the award of degree of B-Tech of Jaypee University of
Engineering & Technology, Guna has been carried out under my
supervision. As per best of my knowledge and belief there is no
infringement of intellectual property right and copyright. Also, this
work has not been submitted partially or wholly to any other University
or Institute for the award of this or any other degree or diploma. In case
of any violation concern student will solely be responsible.
Signature of Supervisor
Date : 09/05/25
Table of Contents
Title Page…............................................................................................................................................................i
Declaration of the Student....................................................................................................................................ii
Certificate of the Guide….....................................................................................................................................iii
Abstract..................................................................................................................................................................iv
Acknowledgement…..............................................................................................................................................v
Chapter – 1 Introduction
1.1 Overview of Budget Management
1.2 Problem Statement
1.3 Objectives
1.4 Scope of the Project
1.5 Features and Functionalities
1.6 Organization of the Report
1.7 Conclusion of Chapter 1
References
Research papers, articles, websites, and other sources used in the project.
Chapter 1: Introduction
7
1.2 Problem Statement
The problem this project aims to address is the lack of an efficient, accurate, and scalable
solution for detecting and classifying objects in visual data using machine learning
techniques. Despite recent advancements in artificial intelligence, many existing object detection
systems are either computationally expensive, lack real-time capability, or perform poorly in
diverse and unstructured environments.
This project seeks to develop a robust machine learning-based object detection model that can
accurately identify and locate multiple objects within images, with a focus on improving
performance, reducing computational cost, and enhancing real-world applicability.
1.3 Objectives
Develop an Accurate Object Detection Model
To design and train a machine learning model capable of accurately detecting and classifying
multiple objects within static images or video frames.
To implement and evaluate modern object detection algorithms such as YOLO (You Only Look
Once), SSD (Single Shot Detector), and Faster R-CNN, and determine the most effective approach
To build a model that maintains high performance across diverse datasets, including variations in
To reduce inference time and computational overhead, enabling the system to operate in near real
To streamline the dataset preparation process by incorporating semi-automated data labeling tools
To assess the model using metrics like Precision, Recall, mAP (mean Average Precision), and FPS
To develop a prototype application or interface that demonstrates the object detection system in a
real-world scenario, such as object tracking in video feeds or real-time image analysis.
9
b. Testing the model on unseen data to assess generalization and robustness.
5) Prototype Development
a. Creating a user-friendly prototype or interface that demonstrates real-time or batch
object detection capabilities.
b. Ensuring the prototype is deployable on standard computing hardware or cloud-
based platforms.
Bounding Box Displays bounding boxes with labels and confidence scores
Visualization for detected objects.
Model Training & Allows training with custom datasets and fine-tuning of pre-
Customization trained models.
Data Annotation & Supports annotation tools and dataset preprocessing (e.g.,
Management augmentation, class balancing).
10
1.6 Organization of the Report
Chapter 2: Literature Survey – A review of existing personal finance tools and analysis of
their limitations.
References – A compilation of all scholarly articles, websites, and tools cited, in IEEE format.
The project "Object Detection Using Machine Learning" addresses the growing need for
intelligent, automated systems capable of accurately identifying and locating objects within images
and video streams. By leveraging advanced machine learning algorithms such as YOLO, SSD, and
Faster R-CNN, the project aims to develop a robust and scalable object detection model that
performs well in real-world scenarios.
Through clearly defined objectives, the project focuses on building an accurate and efficient
detection system, optimizing it for real-time applications, and ensuring it is adaptable through
custom training and dataset management. Effective budget management ensures the optimal use of
resources, while the project scope outlines practical boundaries to keep the development focused
and achievable.
11
Chapter 2: Literature Survey
12
Use Case: Tracks individual and team expenses, ideal for recording equipment purchases,
software subscriptions, and travel.
Advantages:
o Receipt scanning
o Real-time syncing and approvals
o Mobile app availability
Limitations:
o Paid plans for full features
o May be more suitable for individual expense tracking than full project budgeting
13
From this comparison, it is evident that while several applications offer comprehensive features,
many are region-specific, overly complex for average users, or lack essential features such as
data export or customized alerts.
The project lacks high-quality, labeled datasets tailored to the specific objects or environment it
targets.
Lack of access to high-end GPUs or cloud infrastructure can hinder model training and testing
efficiency.
Some models (e.g., Faster R-CNN) may not meet real-time speed requirements without
optimization.
Project expenses are not tracked in real-time, increasing the risk of overspending or poor financial
visibility.
The project does not outline how the model will be deployed (e.g., on web, mobile, or embedded
systems).
Annotation of training data is time-consuming and may lead to inconsistency without automation
tools.
14
Security and Privacy Concerns
No clear strategy for managing sensitive image data or ensuring compliance with privacy
standards like GDPR.
The solution may not be optimized for multiple platforms, reducing its accessibility and
scalability.
The development of the "Object Detection Using Machine Learning" project demonstrates
strong potential in addressing real-world challenges through intelligent automation. However, to
ensure the project is successful, scalable, and sustainable, a number of operational and technical
gaps must be addressed.
The review of existing expense tracking applications highlights that while there are several tools
available—such as Google Sheets, Expensify, Zoho Expense, and QuickBooks—each comes with
its own strengths and limitations. The comparative analysis shows that for research-driven, small-
to-medium-scale technical projects, a combination of customizable tools like Google Sheets and
automated platforms like Zoho Expense can provide an effective balance between control and
efficiency in budget management.
Despite the project’s strong technical foundation, several gaps have been identified, such as
limited access to domain-specific datasets, lack of real-time model performance, absence of a user
interface, manual data annotation, and no integrated expense tracking system. These issues can
negatively impact the accuracy, usability, and scalability of the final solution if not addressed.
15
Chapter 3: Requirement Analysis
The requirement analysis for the project "Object Detection Using Machine Learning"
involves identifying and defining the technical, functional, and resource needs essential for
successful implementation. At the core, the project requires a clearly defined dataset
containing labeled images for training and validation, ideally sourced from real-world
environments relevant to the intended application (e.g., surveillance, traffic, industrial
settings). Software requirements include machine learning frameworks like TensorFlow or
PyTorch, along with tools for data annotation, model training, and visualization. Hardware
requirements may involve access to high-performance GPUs or cloud computing platforms to
handle intensive training workloads.
3.1 Functional Requirements
The system shall allow users to upload static images or video files for object detection.
The system shall support real-time video stream input (e.g., from a webcam or IP camera).
Object Detection
The system shall detect and classify multiple objects in each image or video frame using a
trained machine learning model.
The system shall provide bounding boxes around detected objects along with the object class
and confidence score.
The system shall enable training of object detection models using labeled datasets.
The system shall evaluate model performance using metrics such as Precision, Recall, and
mAP (mean Average Precision).
Dataset Management
The system shall allow users to import, manage, and annotate datasets.
16
The system shall support dataset preprocessing such as resizing, augmentation, and
normalization.
User Interface
The system shall provide a graphical user interface (GUI) for image/video upload, detection
visualization, and result display.
The system shall display detection results with class labels and confidence scores in real
time.
The system shall allow users to export detection results (e.g., as images with bounding boxes
or text-based output).
The system shall store processed data and model outputs for later review.
17
It should include error handling to notify users of failures (e.g., failed uploads, model
loading errors).
Maintainability
The system codebase shall be modular and well-documented to allow for future updates,
bug fixes, or enhancements.
Model configurations and parameters should be stored in an easily editable format (e.g.,
YAML or JSON).
Portability
The system shall be operable across different operating systems (e.g., Windows, Linux).
It should support deployment in both GUI-based and command-line environments.
Security and Privacy
The system shall ensure that all user-uploaded data is stored securely and not shared
without consent.
For deployments involving sensitive data, compliance with data protection standards
(e.g., GDPR) shall be enforced.
18
Annotation Tools:
o LabelImg, CVAT, or Roboflow for dataset labeling
Package Managers:
o pip or conda
Visualization & UI Tools:
o Streamlit, Flask, or Dash (for building web interfaces)
Version Control:
o Git (with GitHub or GitLab)
Optional:
o SQLite/PostgreSQL (for storing results or tracking data)
o Google Colab, AWS, or Google Cloud for training at scale
o Google Sheets API / Zoho Expense API (for expense tracking)
Hardware Requirements
Processor (CPU):
o Minimum: Intel i5 or AMD Ryzen 5
o Recommended: Intel i7 or AMD Ryzen 7 (or higher)
RAM:
o Minimum: 8 GB
o Recommended: 16 GB or more
Storage:
o Minimum: 256 GB SSD
o Recommended: 512 GB SSD or more
Graphics Processing Unit (GPU):
o Minimum: NVIDIA GTX 1050 or equivalent
o Recommended: NVIDIA RTX 2060 / RTX 30-series or better (for faster
training)
Display & Peripherals:
o Standard monitor, keyboard, and mouse
o Optional webcam (for real-time video input testing)
19
3.4 Use-Case Diagram
Actors:
User – The person uploading or providing images/videos for detection.
System Administrator – Manages models and system configurations (optional).
ML Model – The object detection model (considered as a system actor).
Upload User uploads image or video files that need to be processed by the object
Image/Video detection system.
The system uses the trained ML model to identify and label objects
Detect Objects
within the uploaded media.
Detected objects are displayed visually to the user, often with bounding
Display Results
boxes and labels.
20
Table 3.1 Use-Cases and their Descriptions
Mobile App/UI
Camera Module
Audio Generator
Speaker
21
Figure 3.2 Sequence Diagram of the System
3. DetectionResult Class
Attributes:
o resultID: String
o mediaID: String (link to Media)
o detectedObjects: List of ObjectLabel (objects detected)
o detectionTime: DateTime
Methods:
o generateReport(): Creates a summary of the detection results.
4. ObjectLabel Class
Attributes:
o labelID: String
o labelName: String (e.g., "Car", "Person")
o confidenceScore: Float (probability of correct detection)
o boundingBox: String (coordinates of the detected object)
5. MLModel Class
Attributes:
o modelID: String
o modelVersion: String
23
o accuracy: Float (model's detection accuracy)
Methods:
o detectObjects(): Detects objects in an image/video.
o retrainModel(): Retrains the model with new data.
6. Admin Class
Attributes:
o adminID: String
o name: String
o email: String
Methods:
o manageDataset(): Modifies and manages the dataset for training.
o retrainModel(): Triggers the model retraining process.
o viewLogs(): Views logs of system activities and performance.
2. Data Collection
Gather image/video data from various sources (datasets, sensors, cameras, etc.).
3. Data Preprocessing
Resize images
4. Dataset Splitting
Training set
Validation set
Test set
5. Model Selection
6. Model Training
7. Model Evaluation
25
Metrics: mAP (mean Average Precision), IoU, accuracy
Fine-tuning
Hyperparameter tuning
9. Model Deployment
Integrate the model into an application (web app, mobile app, etc.)
26
Figure 3.4 Activity Diagram of the System
27
3.8 Conclusion of Chapter 3
This project focuses on building an efficient object detection system using machine learning,
illustrated through a detailed activity diagram. The process begins with data collection and
preprocessing, ensuring high-quality, annotated data for training. The workflow continues with
model selection, training, and evaluation, using advanced architectures like YOLO, SSD, or
Faster R-CNN to achieve accurate object recognition.
The activity diagram provides a clear visual representation of the project's lifecycle, including key
decision points for model optimization and real-time detection deployment. This structured
approach helps streamline development, improve team collaboration, and ensure robust
performance across different environments.
28
Chapter 4: Design and Implementation
Image/Video Sources:
Storage:
Image Annotation:
Data Augmentation:
Normalization:
o Pixel normalization
Model Selection:
29
o Pre-trained models: YOLOv5/8, SSD, Faster R-CNN, EfficientDet
Training Pipeline:
o Hyperparameter tuning
Compute Infrastructure:
o GPU/TPU clusters
4. Evaluation Layer
Metrics:
o Precision, Recall
Visualization Tools:
o Confusion matrices
5. Inference/Prediction Layer
Deployment Options:
30
o Web interface
Optimization:
o Serving Frameworks:
Monitoring Tools:
7.Optional Enhancements
Real-time Processing:
AutoML Integration:
31
1.2 Technology Stack Overview
1. Data Collection & Storage
Data Sources:
o Camera streams (IP/USB), video files, image datasets (e.g., COCO, Pascal VOC)
Web Scraping & APIs:
o requests, BeautifulSoup, Selenium, public APIs (e.g., Google Images, Open
Images Dataset)
Storage:
o Local: HDD/SSD file systems
o Cloud: AWS S3, Google Cloud Storage, Azure Blob Storage
o Database (if metadata storage needed): PostgreSQL, MongoDB
Frontend Design:
Tech Stack:
o HTML/CSS + JavaScript
32
Frameworks:
Styling:
Chart/Visualization Libraries:
2. Responsibilities
User Interface
Visualization
Backend Design:
1. Tech Stack
o Web Framework: FastAPI or Flask (FastAPI preferred for speed and async
support)
33
script
o Containerization: Docker
2. Responsibilities
Model Management
35
1.3 Data Visualization and Notifications
Once receipts are processed and expense data is extracted, users are presented with an
interactive dashboard that displays their financial activity through various visual components.
This includes pie charts showing expense distribution by category, bar or line graphs for
monthly spending trends, and tables listing individual transactions. Users can filter these views
by date range, vendor, or payment method, allowing for deep insights into their spending
behavior. Visualization tools such as Chart.js, Recharts, or D3.js are integrated into the
frontend (built using React.js or Vue.js), providing a responsive and engaging interface. Each
visual element is clickable, enabling users to review detailed receipt data, make corrections, or
drill down into specific expense groups.
🧪 2. Integration Testing
36
o Use tools like Postman or Supertest to simulate API calls from the frontend to
the backend.
ML-specific testing focuses on the performance and accuracy of the object detection
and OCR modules:
o Object Detection Evaluation:
o Use metrics like mAP (mean Average Precision), IoU (Intersection over Union),
and Precision/Recall.
o Test on a diverse dataset of receipts with varying formats, lighting conditions,
and languages.
o OCR Testing:
o Calculate Character Error Rate (CER) and Word Error Rate (WER).
🧭 4. Functional Testing
This verifies that the system behaves as expected from an end-user perspective:
o Upload receipts and validate correct extraction of date, merchant, amount, etc.
o Ensure that the visualizations update dynamically with new expense data.
In conclusion, the Expense Management System leveraging Object Detection and OCR is a
sophisticated tool designed to automate and streamline expense tracking, enhance financial
management, and provide valuable insights to users. By combining machine learning
technologies, such as object detection for receipt field recognition and OCR for text extraction,
with intuitive data visualization and real-time notifications, this system empowers users to
manage their finances with minimal effort.
The various testing methodologies implemented—from unit and integration testing to machine
learning model evaluation—ensure that every component functions correctly, that the system is
reliable and secure, and that it delivers an accurate and smooth user experience. The use of
37
diverse testing strategies also guarantees that potential issues, from functional bugs to
performance bottlenecks, are identified and resolved early in the development process.
Ultimately, this system offers not only a robust technical foundation but also a user-centric
interface that brings automation and accuracy to everyday expense tracking. By making
financial data more accessible, organized, and actionable, it enables individuals and businesses
to stay on top of their spending, plan their budgets effectively, and make informed financial
decisions with ease.
This integration of cutting-edge technologies and best practices in software and machine
learning testing ensures the creation of a powerful, scalable, and user-friendly solution that
meets the needs of modern financial management.
38
Chapter 5: Features Implementation and Functionalities
User Authentication and the Dashboard are key components that ensure a secure,
personalized, and user-friendly experience. These elements not only safeguard
sensitive financial data but also present it in an organized and actionable way.
User authentication is a critical aspect of ensuring that only authorized individuals
have access to personal financial data. This process involves verifying the identity of
users, securely storing their credentials, and providing role-based access to different
parts of the system.
Image Input and Upload images (JPG, PNG, PDF), preprocess for better detection (resizing,
Preprocessing noise reduction, normalization).
Detect and draw bounding boxes around objects (e.g., merchant, amount),
Object Detection
classify detected objects, and localize them in the image.
Data Extraction and Extract key financial fields (e.g., merchant name, date, amount) and
Structuring structure them into a readable format (JSON, database).
Model Confidence and Calculate and display confidence scores for object detection, filter out low-
Accuracy confidence detections, and ensure accurate detection.
41
Feature Functionality
Real-Time Object Detect objects in real-time for live image processing or video streams, with
Detection integration to camera/scanner for continuous scanning.
Interactive Visualize detected objects with bounding boxes and labels on the uploaded
Visualization images, with options for error correction and user interaction.
Perform inference on new images to detect objects, apply OCR, and extract
Model Inference
text.
Handling Different Support for processing single images or multi-page documents (e.g., invoices,
Input Formats receipts).
Accuracy Evaluation Evaluate model accuracy with metrics like mAP, IoU, Precision/Recall, and
and Tuning fine-tune the model with hyperparameter adjustments.
42
Chapter 6: Results and Conclusion
Object detection using machine learning with YOLO is fast, accurate, and capable of real-time
performance. It detects multiple objects in a single pass by treating detection as a single
regression problem. YOLO is efficient, generalizes well to new data, and supports lightweight
versions for mobile deployment. Its open-source nature and broad applicability make it popular
in fields like surveillance, healthcare, and assistive technology.
Real-Time Performance – YOLO processes images quickly, making it ideal for real-
time applications.
Single-Pass Detection – It detects objects and predicts bounding boxes in one forward
pass.
Wide Applicability – Used in fields like security, autonomous driving, and assistive
technologies.
Output Layers
43
Confidence Function Logic
Input
44
Output
45
6.3 Challenges Faced
Challenges in object detection using YOLO include difficulty detecting small or overlapping
objects due to its grid-based approach. It may also struggle with objects at extreme angles or in
low-light conditions, and requires large labeled datasets for accurate training.
6.3.1 Poor Detection of Small Objects – YOLO may miss small objects due to its grid-based
division of images.
6.3.3 Complex Backgrounds – Accuracy drops when objects appear in cluttered or dynamic
scenes.
6.3.4 Angle and Scale Variability – Limited performance when objects are rotated or vary
greatly in size.
6.3.5 Requires Large Labeled Datasets – Needs extensive, annotated training data for good
accuracy.
Each challenge contributed significantly to team learning and improved the system’s reliability
Here are some future enhancements in object detection using machine learning with YOLO:
6.4.1 Improved Small Object Detection – Enhancing YOLO's architecture to better detect
small or distant objects.
46
6.4.2 Integration with Transformers – Combining YOLO with vision transformers for better
contextual understanding.
6.4.3 Edge AI Optimization – Further model compression for faster inference on mobile
and embedded devices.
6.4.4 3D Object Detection – Expanding YOLO to detect objects in 3D space for AR/VR
and autonomous systems.
In conclusion, YOLO (You Only Look Once) has transformed object detection by offering a fast
and efficient solution for real-time applications. Its single-stage architecture allows for real-time
processing, making it ideal for scenarios like autonomous vehicles, security, and robotics.
YOLO’s ability to detect objects in a single pass reduces computational costs while maintaining
impressive speed. While it performs well in detecting clearly defined objects, it may struggle
with small or overlapping objects. Despite these challenges, YOLO’s versatility and continuous
evolution—through models like YOLOv5 and YOLOv7—make it adaptable across various
industries. Although YOLO excels in speed and efficiency, optimizing performance requires
careful tuning and a well-curated dataset. Overall, YOLO remains a powerful tool for object
detection, though its limitations should be considered for specific tasks.
47
References
48
Personal Details
49