Minor Project Report
Minor Project Report
Minor Project Report
CROWD COUNTING
Submitted in partial fulfillment of the requirements
for the award of the degree of
Bachelor of Technology
Computer Science & Engineering
DECLARATION
I/We, the undersigned student(s) of B.Tech. (CSE), hereby declare that the minor project titled
CROWD COUNTING submitted to the Department of Computer Science & Engineering, at Dr.
Akhilesh Das Gupta Institute of Professional Studies , Delhi, affiliated with Guru Gobind Singh
Indraprastha University, Dwarka New Delhi, in partial fulfillment of the requirements for the
degree of Bachelor of Technology in Computer Science & Engineering, has not previously been
used as the basis for the award of any degree, diploma, or similar title or recognition. The list of
members involved in the project is as follows:
1 Sagar 35496202721
4 Priyanshu 00296202721
Place: Delhi
Date:
ACKNOWLEGEMENT
The note starts with thanks to Almighty who actually created this piece of work and helped us
when things were not easy for us.
I/We very grateful and indebted to our Faculty/Guide Dr. Rakesh Kumar Arora who immensely
helped and rendered her valuable advice, precious time, knowledge and relevant information
regarding the collection of material. He has been a major source of inspiration throughout the
project as he not only guided us throughout the Minor Project Report Crowd Counting but also
encouraged us to solve problems that arose during this report.
His guidance and suggestions about this Minor Project report have really enlightened us. It has
been a great help to support to have him around.
And finally, I/We would like to mention appreciation to our parents and friends who have been
instrumental throughout this period by providing unrelenting encouragement.
1 Sagar 35496202721
4 Priyanshu 00296202721
Place: Delhi
Date:
CERTIFICATE
I/We hereby certify that the work that is being presented in the project report entitled Crowd
Counting to the partial fulfilment of the requirements for the award of the degree of Bachelor of
Technology Computer Science & Engineering from Dr Akhilesh Das Gupta Institute of
Professional Studies, New Delhi. This is an authentic record of our own work carried out during
a period from Aug, 2024 to Nov, 2024 under the guidance of Dr. Rakesh Kumar Arora,
Professor in CSE department.
The matter presented in this project has not been submitted by us for the award of any other degree
elsewhere.
This is to certify that the above statement made by the candidates is correct to the best of our
knowledge.
Crowd counting is a crucial task in computer vision with applications across various domains,
including public safety, urban planning, and event management. Traditional methods for crowd
counting, such as manual counting or simple detection-based algorithms, are often inadequate in
densely populated or dynamic environments. This project proposes a deep learning-based approach
to accurately estimate the number of individuals in diverse crowd scenes. Leveraging
convolutional neural networks (CNNs) and density map estimation, the model addresses
challenges like scale variation, occlusions, and cluttered backgrounds. By training on large-scale
datasets with varying crowd densities and environmental contexts, our model aims to achieve
robust generalization and adaptability to real-world conditions. Experimental results indicate a
significant improvement in counting accuracy and computational efficiency over existing methods.
This research contributes to the advancement of crowd analytics by providing an automated,
scalable solution suitable for real-time applications in video surveillance and event monitoring
systems.
This project addresses these challenges by developing a robust and scalable crowd counting system
using advanced deep learning techniques, specifically designed to handle diverse and challenging
crowd conditions. Our approach leverages convolutional neural networks (CNNs) to generate
density maps, which represent the spatial distribution of individuals across a scene. These density
maps are then used to accurately estimate crowd size, providing a more nuanced view of crowd
dynamics beyond simple headcounts. To address the problem of scale variation in high-density
settings, the model employs multi-scale feature fusion, which enables it to capture fine-grained
details as well as broader context, enhancing performance in both sparse and dense crowd scenes.
The results indicate that the proposed deep learning-based crowd counting system not only
achieves high accuracy in varied scenarios but is also capable of operating in near real-time,
making it suitable for deployment in video surveillance and live monitoring systems. By advancing
the state-of-the-art in automated crowd counting, this research provides a valuable tool for data-
driven decision-making in scenarios where understanding crowd behavior and size is essential,
contributing significantly to the fields of crowd analytics, computer vision, and artificial
intelligence.
List of Figure
The practice of determining how many people is present in a specific place at a given time is known
as crowd counting. Due to the complexity of the settings and the unpredictable crowd behavior, this
is a difficult task, because to its many uses in a variety of industries, including security, public safety,
marketing, and urban planning, crowd counting has emerged as an important research area in
computer vision and image processing. Crowd management, crowd safety, and resource utilization
can all be improved with the development of effective crowd
counting techniques.
Crowd counting methods range from conventional hand-crafted feature-based methods to more
sophisticated deep learning-based methods. To estimate the crowd density, hand-crafted feature-
based algorithms frequently use attributes like color, texture. These techniques frequently call for
manual parameter adjustment and might not work well in complicated situations.
The application of multi-scale networks, attention processes, and adversarial training are some of the
most recent developments in crowd counting research. Multi-scale networks increase accuracy by
estimating the crowd density using several visual scales. The model uses attention mechanisms to
concentrate on the portions of the image that are most important and dismiss the rest. For better
training of the crowd counting models, adversarial training is employed to produce more realistic
crowd images.
In this project, we propose a deep learning-based approach to crowd counting that uses CNNs to
generate density maps and accurately estimate crowd size. The model is trained on large, diverse
datasets that reflect real-world variability, making it capable of generalizing across different crowd
types and environments. We evaluate the model’s performance on well-established benchmark
datasets, and the results show that it consistently outperforms traditional methods in terms of
accuracy, efficiency, and adaptability.
By providing an accurate, scalable, and efficient solution, this project aims to advance the field of
crowd analytics, with the potential for real-time applications in video surveillance and event
monitoring. Our approach addresses key challenges in crowd counting and provides a foundation for
future work in both the technical and practical aspects of crowd management.
Detection based approach.
Train Develop a
classifier for classifier for
full body specific body
parts. [5] parts. [6]
Key Objectives
Applications
Public Transportation: Monitor passenger flow to manage peak hours and reduce
congestion.
Smart Retail: Analyze foot traffic patterns to optimize store layouts and promotions.
Large Events: Ensure crowd control at concerts, sports events, and festivals.
Disaster Management: Aid in evacuation planning by assessing real-time crowd densities.
Deep Learning Models: Convolutional neural networks (CNNs) trained on diverse datasets
to recognize and count people in images or video streams.
Computer Vision: Techniques such as object detection, density map generation, and
segmentation for analyzing visual data.
Edge and Cloud Computing: Real-time processing and scalability to handle large-scale
deployments.
Data Analytics: Extracting actionable insights from crowd statistics for decision-makers.
Fig. 1. Density map estimate for a particular picture is illustrated with GT and prediction
Problem Statement
Early crowd counting research relied on detection-based techniques. These methods often use a
moving window on an image as a person head detector. Recently, various novel object detectors have
been introduced, including R-CNN, YOLO, and SSD, which can
significantly improve detection accuracy in sparse settings. They will, however, produce inadequate
results in crowded environments where there is occlusion and background noise.
To reduce the above problems, some works introduce regression-based approaches which directly
learn the mapping from an image patch to the count. They usually first extract global features
(texture, gradient, edge features), or local features (SIFT, LBP, HOG,
GLCM). Then some regression techniques such as linear regression are used to learn a mapping
function to the crowd counting. These methods are successful in dealing with the problems of
occlusion and background clutter, but they always ignore spatial information.
This chapter provides a detailed analysis of the system's requirements, initial investigation, feasibility
assessment, and technical specifications. It includes data flow representations to clarify the overall
functionality and data management within the plant recognition system.
2.1 General
In general, crowd counting is a significant technology used to estimate the number of people in a
given area through images or video. It has widespread applications across fields like public safety,
event management, urban planning, and smart city development. Accurately counting people in real-
world environments is a complex task, especially in crowded settings where individuals may be
obscured or closely packed together. Traditional counting methods, which often rely on manual
counting or basic image detection algorithms, are limited in densely populated or dynamic scenes due
to issues such as occlusion (when people are partially or fully blocked from view), perspective
distortion, and scale variations.
Below are the main points to consider during the initial investigation phase:
Determine what the crowd counting model will be used for (e.g., safety monitoring, event
management, traffic analysis, marketing insights).
Specify accuracy requirements, update frequency, and acceptable error tolerance.
Decide if real-time or batch processing is needed.
Image and Video Feeds: Decide on the sources for data, such as CCTV cameras, drones, or
still images.
Pre-existing Datasets: Research available datasets (e.g., ShanghaiTech, UCF_CC_50, Mall
Dataset) that are commonly used for crowd counting model training.
Data Acquisition: Plan for collecting real-world data if there are no existing datasets specific
to your scenario.
Traditional Methods:
o Detection-based: Uses object detection algorithms like Faster R-CNN to count
individuals, which works best in sparse crowds.
o Regression-based: Predicts the count by learning the relationship between image
features and crowd density.
Modern Methods:
o Density Estimation: Uses CNNs to estimate a density map, which is then integrated to
get the count. This approach is widely used for dense crowd counting.
o Deep Learning-based Models: CNN-based models like CSRNet, MCNN, or newer
transformer-based models for high-density crowds.
o Computer Vision with AI: Models that use a mix of computer vision and AI techniques
like YOLO, SSD, or crowd-sourced segmentation.
Convolutional Neural Networks (CNNs): CNN-based models such as CSRNet, MCNN, and
CANNet.
Transformer-based Models: Models using attention mechanisms, which have shown
promise in handling complex spatial relationships in dense crowds.
Hybrid Models: Combining CNNs and RNNs or using GANs (Generative Adversarial
Networks) to improve prediction accuracy and handle occlusions.
Mean Absolute Error (MAE) and Mean Squared Error (MSE): Commonly used metrics
in crowd counting to measure the difference between the predicted count and the actual count.
Root Mean Squared Error (RMSE): To assess how well the model handles larger errors.
Density Map Quality: Evaluate the quality of density maps if using a density estimation
approach.
Create a small prototype using a subset of data to validate the feasibility and accuracy of the
chosen approach.
Experiment with multiple models to understand which is the most suitable for the project
requirements.
The feasibility study for the crowd counting project assesses whether a deep learning-based system
can realistically meet the requirements of accurate, real-time crowd estimation in various
environments.
1. Technical Feasibility
Data Availability: Assess the availability and quality of data sources (e.g., CCTV feeds,
aerial drones, pre-existing datasets) to ensure they can support reliable model training and
testing.
Algorithm and Model Feasibility: Determine the models that can be employed, such as
CNN-based models (e.g., CSRNet, MCNN) or transformer-based models, and assess their
suitability for the specific crowd conditions (e.g., sparse vs. dense).
Computational Resources:
o Processing Power: Check if the available processing power (e.g., GPUs, cloud
computing resources) can handle the model training and inference load, especially for
real-time applications.
o Storage Requirements: Evaluate the data storage needs for raw data, processed
frames, and model checkpoints.
o Scalability: Assess if the current infrastructure can scale to meet potential future
needs, such as adding more cameras or supporting additional sites.
Latency Requirements: For real-time crowd counting applications, analyze the maximum
allowable latency and determine if the selected models and hardware can meet this latency
requirement.
2. Operational Feasibility
Staffing and Expertise: Ensure the availability of a team with expertise in machine learning,
computer vision, and software development. Assess the potential need for ongoing training or
the involvement of external consultants.
System Integration: Evaluate how the crowd counting system will integrate with existing
monitoring and operational systems (e.g., security or event management platforms).
Data Collection and Labeling:
o For custom model training, there may be a need for labeled data. Determine if manual
or semi-automated labeling processes are needed and feasible.
Maintenance and Updates: Determine the resources required for ongoing model retraining,
maintenance, and updates to adapt to new environments, data, or changes in crowd patterns.
3. Economic Feasibility
Data Privacy and Protection: Assess the legal implications of collecting crowd images or
videos, particularly in public spaces. Evaluate compliance with relevant privacy laws such as
GDPR or CCPA.
Ethical Considerations:
o Address concerns related to surveillance and privacy for individuals in crowded areas.
Ensure that the technology respects personal boundaries and legal requirements.
Permission and Access:
o Obtain any necessary permits for video recording in public areas, and confirm that
cameras or drones comply with local regulations.
Bias and Fairness: Evaluate any potential biases in the dataset that may affect the model’s
accuracy across different demographics and settings. Implement checks and safeguards to
mitigate these biases.
5. Market Feasibility
Demand Analysis: Assess the demand for crowd counting technology in various sectors
(e.g., event management, urban planning, retail, public safety).
Competitive Analysis:
o Investigate existing crowd counting solutions and identify differentiators for your
project.
o Consider features, accuracy levels, pricing, and unique capabilities that competitors
may offer.
Potential Applications: Identify and analyze specific use cases to see if there’s a good
market fit for the technology you’re developing.
Technical Risks:
o Challenges such as occlusions, variable lighting, and crowd density can affect
accuracy. Include potential mitigations, such as advanced models or data
preprocessing techniques.
Financial Risks: Consider economic risks, like cost overruns due to unexpected model
retraining or high hardware costs.
Operational Risks: Assess risks of system failures, integration issues, or maintenance
difficulties, and prepare contingency plans.
Legal Risks: Address potential legal repercussions associated with data privacy violations or
misuse of data, and implement necessary precautions.
Based on findings, create a summary to highlight the feasibility status, potential bottlenecks,
estimated costs, expected benefits, and recommended approach.
Outline the next steps, which could involve prototyping, additional data collection, or
technical trials.
The technical feasibility of the crowd counting project examines whether current technology, tools,
and resources can support the development and deployment of an automated crowd counting system.
This evaluation covers the ability of deep learning techniques, hardware requirements, and available
datasets to meet the project’s goals of accuracy, real-time processing, and adaptability across varied
environments.
Model Complexity:
o Simple Models: Evaluate if lightweight models like single-shot detectors (SSD) or
MobileNet can meet the accuracy needs in sparse or moderate crowd scenarios.
o Complex Models: For dense crowds or environments with frequent occlusion,
consider more advanced models like CSRNet, MCNN, or transformer-based
architectures, which may require significant computational resources.
Crowd Density Requirements:
o Detection-based models may perform well in sparse settings but may struggle in dense
environments.
o Density estimation models (e.g., CSRNet) can better handle dense crowds by
generating density maps, but they can be computationally expensive.
Testing Algorithms: Conduct preliminary tests with multiple models on sample data to
assess accuracy and performance and determine which models are best suited for the task.
Hardware:
o Edge Devices: For on-site, real-time processing (e.g., cameras with embedded GPUs
or mobile devices), determine if edge computing can support model inference.
o Centralized Servers: For applications where video data can be streamed to a
centralized server, check if GPUs or cloud resources are available for training and
deploying heavier models.
o Cloud-Based Solutions: Assess if cloud providers (e.g., AWS, Google Cloud) can
offer sufficient computational resources and scalability for model training and
inference.
Latency Requirements: Evaluate the processing speed requirements for real-time
applications. Certain high-density models may introduce latency, which is a challenge in
time-sensitive scenarios.
Storage: Estimate the data storage needs for raw video or image frames, model checkpoints,
and results. For large-scale applications, consider data compression and storage optimization
techniques.
Occlusion Handling: For dense crowds, evaluate algorithms’ ability to manage occlusion
where individuals overlap. Advanced models like transformer-based architectures or density
maps might offer solutions, but their effectiveness should be tested.
Perspective Variation: Analyze the models’ capability to account for varying scales due to
different camera perspectives. Some models (e.g., CSRNet) use multi-column CNNs or other
techniques to handle perspective variation.
Lighting and Environmental Changes: Consider the performance of models under
changing lighting conditions or weather variations, especially if outdoor surveillance is
needed. Augmentation techniques during training can sometimes help models generalize to
different conditions.
Data Quality Requirements: Low-quality or blurry data can hinder model performance.
Conduct tests to see if preprocessing techniques, such as noise reduction or resolution
adjustments, are necessary to maintain accuracy.
Data Security: Implement data encryption for both stored and transmitted data, especially if
video streams are sent over a network.
Privacy Controls: Ensure that the system complies with privacy laws, such as GDPR or
CCPA, if handling personally identifiable information (PII). Techniques like blurring faces or
anonymizing data may be necessary.
8. Prototype Testing
The economic feasibility of the crowd counting project assesses whether the financial investment
required to develop and deploy the system is justified by the expected benefits, both in terms of
initial costs and long-term returns. This analysis considers the costs of development, hardware,
software, and ongoing maintenance, as well as the potential savings and revenue generation
associated with automated crowd management.
Hardware Costs:
o Cameras and Sensors: If additional hardware is required (e.g., high-resolution
CCTV, thermal cameras, or drones), estimate purchase, installation, and setup costs.
o Edge Devices: For on-site processing, edge devices with GPUs (e.g., Nvidia Jetson,
Google Coral) or high-performance cameras may be necessary.
o Servers and GPUs: If centralized or cloud-based processing is required, consider
costs for servers or cloud GPU instances for model training and inference.
Software and Licensing:
o AI Frameworks and Libraries: Most frameworks (e.g., TensorFlow, PyTorch) are
open-source, but some pre-trained models or enterprise-level platforms may have
licensing fees.
o Data Annotation Tools: For projects that require extensive data labeling, consider the
costs of annotation software and, if necessary, outsourcing labeling services.
Cloud Storage and Processing:
o Cloud Services: Estimate costs for data storage, processing, and possibly real-time
data streaming on cloud platforms (e.g., AWS, Google Cloud, Azure). Factors include
data transfer, storage, and compute usage.
o Data Transmission: For applications that rely on real-time video feeds, account for
potential costs of data transmission and bandwidth.
Revenue Generation:
o New Services or Insights: For commercial applications, crowd counting insights
could be sold as a service to other businesses (e.g., real-time event monitoring for
security, customer traffic data for retailers).
o Operational Efficiency: Automating crowd counting can reduce manual labor costs,
particularly for large-scale events or venues that currently rely on manual methods.
Cost Savings:
o Reduced Staffing Needs: In cases where crowd counting is currently done manually,
automation reduces the need for staffing, leading to direct cost savings.
o Enhanced Safety and Compliance: Preventing overcrowding through real-time
monitoring can reduce liabilities, potentially lowering insurance costs.
Improved Resource Allocation:
o Having accurate crowd data can help optimize resources such as staffing, security, or
transportation, leading to cost savings in labor and logistics.
Net Present Value (NPV): Calculate the NPV of expected cash flows from cost savings and
potential revenue over a set period (e.g., 3-5 years), minus initial investments and operational
costs.
ROI Formula: ROI=Net Profit (Total Benefits - Total Costs)Total Costs×100\text{ROI} = \
frac{\text{Net Profit (Total Benefits - Total Costs)}}{\text{Total Costs}} \times
100ROI=Total CostsNet Profit (Total Benefits - Total Costs)×100
Payback Period: Determine how long it will take for the project to "break even" (i.e., when
the benefits match the initial investment).
Financial Risks:
o Cost Overruns: Include a contingency budget for potential overruns in hardware,
cloud resources, or staffing.
o Model Accuracy and Retraining Needs: If the model does not achieve the desired
accuracy or needs frequent retraining, this could increase operational costs.
Legal and Compliance Costs:
o Compliance with privacy laws, such as GDPR, may require additional costs for data
protection measures (e.g., anonymization), which should be accounted for in the
budget.
o If storing or processing personally identifiable information (PII), there could be fines
or legal expenses in case of non-compliance.
6. Cost-Benefit Analysis
Summarize all findings and provide a clear economic analysis showing the financial viability
of the project.
Recommendations:
o Identify whether the project should proceed, based on the cost-benefit analysis and
ROI.
o Suggest any cost-saving alternatives (e.g., starting with a smaller pilot project, using
cloud solutions initially before committing to more hardware) if the initial costs seem
high.
1. Software Specifications
Programming Languages
Python: For implementing machine learning models, data preprocessing, and inference
pipelines.
C++: For performance-critical components, such as real-time video processing with OpenCV.
JavaScript: If integrating with web-based dashboards for visualization.
Operating System
Ubuntu Linux (20.04 or higher): Recommended for most deep learning and computer vision
tasks.
Windows 10/11: If using frameworks compatible with Windows, especially for development.
macOS: Suitable for development but not ideal for GPU-heavy tasks.
Version Control
Git: For code management and collaboration.
Platforms: GitHub, GitLab, or Bitbucket.
2. Hardware Specifications
Development Environment
CPU:
o Intel Core i7/i9 or AMD Ryzen 7/9: High-performance processors for training and
development.
GPU:
o NVIDIA GPUs with CUDA support:
Development: RTX 3060/3070/3080.
Training: NVIDIA A100, V100, or RTX 3090 for large-scale models.
o Minimum: GTX 1660/RTX 2060 for smaller models or testing.
RAM:
o Minimum: 16 GB.
o Recommended: 32 GB or higher for handling large datasets and models.
Storage:
o SSD (Solid State Drive):
Minimum: 512 GB.
Recommended: 1 TB or more for datasets and model checkpoints.
Deployment Environment
Network
High-Speed Internet:
o Necessary for live video streaming or cloud-based deployments.
LAN/WAN Support:
o For closed-network, real-time processing in secure environments.
Peripherals
Cameras:
o High-Resolution Cameras (e.g., 1080p or higher) for accurate crowd detection.
o IP Cameras: For real-time data streaming.
o Recommended: Cameras with a wide field of view for large areas.
Monitoring Displays:
o High-resolution monitors for visualizing heatmaps and real-time analytics .
3. Scalability Considerations
Distributed Systems:
o Use clusters with tools like Kubernetes for load balancing and scalability.
Storage:
o If datasets are large, consider using distributed storage like HDFS or Amazon S3.
A Data Flow Diagram (DFD) visually represents how data flows through the crowd counting system,
showing how input is processed, stored, and output. Below is a description of the various levels in the
DFD for a typical crowd counting system:
At this level, the DFD provides a high-level view of the system’s major components and their
interactions with external entities. The focus is on how the crowd counting system interacts with
external data sources (e.g., video streams, user interfaces) and outputs (e.g., crowd counts, alerts).
External Entities:
o Video Feed Source: Represents cameras capturing video footage of crowds (e.g.,
CCTV cameras, IP cameras).
o System User: The operator or system administrator who interacts with the system to
configure settings, view results, or manage alerts.
o Cloud/External Storage: For storing historical data or video footage for later
retrieval and analysis.
Processes:
o Crowd Counting System: The main system responsible for processing video feeds,
detecting and counting people, and generating results (crowd count, density map, etc.).
Data Stores:
o Crowd Count Database: Stores real-time and historical crowd counting data (e.g.,
total count, timestamped results).
o Video Data: (Optional) Stores captured video for future analysis or evidence
purposes.
Data Flows:
o Video Feed: Continuous input from cameras to the system.
o Processed Count Data: The output, which includes crowd count results, density
maps, and alerts sent to users or stored in databases.
o User Commands: Inputs from the user for system configuration or monitoring.
o Video Storage/Backup: (Optional) Stores footage for analysis or backup.
This level delves deeper into the processes and sub-processes within the crowd counting system.
Here, we break down the main processes into detailed steps that show how data flows through
different parts of the system.
External Entities:
o Video Feed Source: Continuous stream of video data from cameras.
o System User: Interacts with the system to monitor crowd statistics, set thresholds for
alerts, and access historical data.
o Cloud/External Storage: Stores processed data and video backups for redundancy or
future use.
Processes:
o Video Pre-processing:
Input: Raw video feed.
Actions: Convert video stream into frames, resize, normalize, and enhance the
image for better feature extraction (e.g., noise removal, lighting correction).
Output: Pre-processed video frames ready for further analysis.
o Data Storage:
Input: Crowd count results, video data, alerts.
Actions: Store processed crowd count data and metadata (e.g., timestamp) in
the database. Optionally, video footage is stored in a separate system.
Output: Stored data in the database for later access.
Data Stores:
o Crowd Count Database: Stores real-time crowd counts, alerts, and historical data.
o Video Storage: Optionally stores video footage for further analysis.
Data Flows:
o Raw Video Stream: From cameras to the pre-processing step.
o Pre-Processed Video Frames: Sent from video pre-processing to crowd detection.
o Crowd Count & Density Map: Generated from the crowd detection and counting
process.
o Alerts: Generated based on crowd thresholds and sent to the system user.
o Stored Data: Sent to the Crowd Count Database or Cloud/External Storage for long-
term storage.
o User Commands: From the system user to the system for configuration or data
requests.
If further detail is needed, the DFD can be expanded into more granular steps within each process.
This may include additional sub-processes, like video frame extraction, model training, or real-time
inference using edge devices. However, for most systems, the Level 1 DFD is sufficient to convey
the major processes and data flows.
Chapter 3: System Design
The System Design phase focuses on translating the requirements and specifications into a blueprint
for the final implementation. This includes the overall architecture of the system, components and
modules, how they interact with each other, and detailed design decisions for hardware, software, and
network integration.
The design methodology for the crowd counting system focuses on developing a robust, efficient,
and scalable solution to accurately detect and count crowds in real-time from video feeds. The
methodology follows an iterative and structured approach, with a clear focus on problem-solving,
system architecture, model development, testing, and refinement.
The Data Processing Layer plays a central role in the crowd counting system. This layer is
responsible for transforming raw data (video streams or images) into meaningful insights such as
crowd counts, density maps, and trends over time. It includes several stages of data handling, starting
from pre-processing the raw input data to running inference with deep learning models, followed by
post-processing the results for final display or analysis.
1. Data Acquisition
Data acquisition refers to the collection of input data, typically from surveillance cameras, sensors, or
real-time video streams. This is the first step in the data pipeline and includes:
Image or Video Collection: Continuous video feed from high-resolution cameras or static
images.
Sensor Integration (optional): If applicable, data from additional sensors like LiDAR or
thermal cameras can be integrated for enhanced crowd detection, especially in low-light
conditions.
2. Preprocessing
Before the raw input data can be fed into machine learning models, it must undergo several pre-
processing steps to standardize and optimize it for model input:
In a real-world environment, image data can often be noisy due to factors like lighting variations,
occlusions, and camera imperfections. Noise reduction methods can be applied, such as:
Crowd counting models usually begin by detecting individuals or groups of people in images:
Object Detection:
o Techniques like YOLO (You Only Look Once) or Faster R-CNN are used to detect
people within the image. These methods create bounding boxes around the detected
people.
o For videos, optical flow or motion tracking may be employed to detect and track
moving individuals.
Segmentation:
o In some cases, semantic segmentation (e.g., using FCN or Mask R-CNN) is applied
to label each pixel, allowing a more granular identification of people in highly dense
crowds.
o Density Maps: For very crowded or occluded scenes, density map generation can be
employed, where a heatmap is created to represent crowd density (instead of detecting
each individual).
This stage is where the core crowd counting model comes into play. The processing layer leverages
trained deep learning models to estimate the crowd size:
Regression Models: For counting crowds where individuals are not easily distinguishable,
density-based regression models (like CSRNet, MCNN) are employed. These models predict
a density map and aggregate the values to estimate the total count.
Tracking: In video feeds, tracking algorithms (e.g., Kalman filters, SORT, or DeepSORT)
can help maintain identity across frames, improving the overall accuracy of counting in
dynamic environments.
6. Post-Processing
After the crowd count is estimated, additional processing steps might be required:
Smoothing:
o Apply temporal smoothing for video streams to reduce fluctuations in crowd count,
particularly in noisy or fast-moving environments.
Data Aggregation:
o For large areas or multiple camera feeds, aggregate counts from different regions or
cameras to provide a total count for a given location.
Visualization:
o Overlay the crowd count on the original video frames or images. For instance, display
bounding boxes or heatmaps to show areas of higher density, or display the estimated
count number in real-time.
For systems that require real-time crowd counting (e.g., surveillance cameras or public events),
efficient processing is essential:
Edge Computing: In some systems, especially those deployed in remote locations or with
limited connectivity, edge devices (e.g., NVIDIA Jetson, Google Coral) may be used to
process video streams locally before sending the results to a central server.
Cloud Processing: For more complex systems that require more computational power, cloud-
based servers can process data from multiple cameras and return results in real-time.
After processing, the crowd count and any other insights derived from the data (e.g., crowd density,
movement patterns) must be stored and integrated for further analysis:
Database Storage: Store processed data in databases (e.g., SQL, NoSQL) for historical
analysis and reporting.
Data Syncing: Sync data between edge devices and central servers for centralized data
storage and analysis.
This section focuses on the architecture, selection of models, training, and evaluation methods used
for crowd detection and counting, which are crucial for ensuring the system’s accuracy and
efficiency.
The User Interface (UI) Layer serves as the bridge between the crowd counting system and its end-
users. It is the front-end part of the system that displays real-time data and insights, allowing users to
interact with the system and make decisions based on the processed information. The UI layer must
be intuitive, responsive, and present the most relevant data in a clear, easily understandable format,
ensuring that the users can monitor crowd density and respond effectively in real-time.
This section discusses the design, functionality, and components of the User Interface Layer of the
crowd counting system, focusing on the different user needs and how the interface facilitates
interaction with the underlying system.
The User Interface (UI) Design for a crowd counting system plays a crucial role in ensuring that
users can interact with the system efficiently, interpret the data quickly, and make informed
decisions. The UI must be intuitive, visually appealing, and provide all the necessary tools and
information in an easily accessible and understandable way. This section outlines the design
principles, layout, and key components for the user interface of the crowd counting system.
1. Real-Time Crowd Data
The primary purpose of the home screen is to present real-time crowd data at a glance. This data
should be visually highlighted and clearly visible so that users can quickly assess the situation.
Crowd Count:
o The total number of people detected in the monitored area should be prominently
displayed in a large, bold number. This could be presented in a central location on the
screen for immediate visibility.
Crowd Density:
o Crowd density should be shown either as a numerical value (e.g., people per square
meter) or a color-coded heatmap that indicates the level of crowding in different parts
of the monitored area. Areas with high density could be shown in red or orange, and
areas with lower density in green.
3. Alert/Notification Center
The alert and notification center is essential for ensuring users are immediately aware of important
events or changes in the crowd situation.
Critical Alerts:
o Alerts for overcrowding, high crowd density, or potential safety issues should be
displayed in real-time on the home screen. Alerts should be color-coded to indicate
severity (e.g., red for urgent, orange for medium priority, and yellow for low priority).
Recent Alerts:
o A list of recent alerts can be shown beneath the main dashboard. Each alert should
include a timestamp, the severity level, and a brief description of the issue (e.g.,
“Crowd density exceeded threshold at entrance”).
Alert Icons:
o A small alert icon in the header or sidebar should indicate the number of unresolved
alerts. Clicking on the icon should open a detailed alert log, where users can see all
active and past alerts.
The home screen should also provide quick navigation to other parts of the system for users to access
more detailed information and perform management tasks.
Navigation Sidebar:
o The sidebar should be easily accessible on the left (or as a collapsible menu) with
icons or links to key sections of the system:
Dashboard: Real-time crowd monitoring view.
Camera Feeds: Access and manage individual camera feeds.
Analytics: View historical data, trends, and reports.
Alerts: Manage active alerts and thresholds.
Settings: Adjust system settings, including camera configuration and user
management.
It is essential that the home screen also indicates the overall health and status of the system to ensure
that all components (e.g., cameras, data processing) are functioning correctly.
Warning Icons:
o If there are issues with any part of the system (e.g., camera malfunctions or processing
delays), warning icons (e.g., yellow or red triangles) should alert users to potential
problems.
6. User Profile and Settings
At the top of the home screen, there should be an user profile section that allows for quick access to
personal settings and account management.
Profile Icon:
o The user profile icon should be clickable, opening a dropdown or pop-up with options
such as:
View Profile: Allows users to see and update their account details.
Settings: Access detailed system settings or personal preferences.
Log Out: Log out of the system securely.
Role-Specific Features:
o Depending on the user’s role (e.g., admin, security personnel, event coordinator),
certain options may be visible or hidden. For example, administrators may have access
to the settings menu, while regular users might only see the real-time dashboard and
alerts.
For users who need to analyse trends or track long-term crowd behavior, the home screen could offer
a snapshot of historical data.
Crowd Trends:
o A mini graph or chart could show the trend of crowd density over time (e.g., crowd
count over the past hour or day). This allows users to track whether the crowd is
increasing or decreasing and helps with early intervention for crowd management.
Upcoming Events:
o If the system is used for large events, the home screen could show an event calendar
or upcoming event countdown, letting users anticipate crowd density at specific times
or locations.
Time-based Analytics:
o Provide access to a time slider or interactive graph that allows users to zoom into
specific time periods to view crowd density trends or particular crowd management
actions.
Chapter 4: Testing
Testing is a critical phase in the development of any software system, especially for applications
like crowd counting systems that require accuracy, real-time performance, and reliable data
handling. The goal of testing is to ensure that the system functions as expected under different
conditions, performs well, and delivers accurate results. The following testing methods should be
applied to ensure the success and robustness of the crowd counting system.
In a crowd counting system, accuracy, real-time processing, and reliability is paramount. To ensure
the system functions properly and efficiently under various conditions, employing the right testing
techniques and strategies is crucial. These techniques and strategies are designed to address
different aspects of the system, such as performance, accuracy, usability, security, and integration.
2. OpenCV:
- OpenCV (Open-Source Computer Vision Library) will be used for image processing tasks. It
provides tools for reading, manipulating, and analyzing images and video streams. This includes pre-
processing steps such as resizing, normalization, and augmentation, which are crucial for preparing
data for the CNN.
3. Python:
- Python is the primary programming language for this project due to its extensive libraries and
ease of use. It will be used to implement the CNN model, perform image processing with OpenCV,
and manage data flow and training processes.
In the development of a crowd counting system, ensuring the accuracy and efficiency of the code
is crucial for its real-time performance and scalability. Debugging helps identify and fix errors in the
system, while code improvement enhances the overall quality, maintainability, and performance.
Below are the strategies and techniques for debugging and code improvement in a crowd counting
system.
4.2.1. Debugging Techniques
Debugging techniques are essential for identifying, analysing, and resolving issues in software
development. Here’s an overview of widely used debugging techniques:
Description: Insert print() statements or logging messages into the code to trace the
program's flow and examine variable states.
Usage:
o Place print statements at critical points to understand how data and control flow
through the program.
o Use logging libraries (e.g., Python’s logging module) for more detailed and
configurable output, allowing different logging levels (e.g., INFO, DEBUG,
WARNING).
Best Practice: Remove or comment out excessive print statements once debugging is
complete or adjust logging levels to avoid cluttering output.
2. Interactive Debuggers
Description: This involves explaining your code line-by-line to a "rubber duck" (or any
object) to help clarify logic and find errors.
Purpose: Speaking through your code often helps you spot logical errors or misconceptions,
as it forces you to consider each line’s purpose.
Effective Practice: Explain assumptions, expected outcomes, and any observed issues as if
teaching someone else.
4. Unit Testing
Description: Write test cases for individual functions or components to verify that each part
of the code performs as expected.
Usage:
o Automated testing frameworks like JUnit (Java), Pytest (Python), or Mocha
(JavaScript) can run tests and report any failures.
o Use test-driven development (TDD), where tests are written before the code itself,
which can prevent many bugs from appearing later.
Benefits: Unit tests serve as both a debugging and regression testing tool, making it easier to
locate issues by isolating failed test cases.
Description: When a bug’s origin is unclear, especially in larger codebases, binary search
debugging involves systematically narrowing down the code region causing the issue.
Usage:
o Comment out or disable half of the code or sections to identify if the problem lies
within the disabled section. Repeat this process until the error source is found.
Best Practice: This technique is particularly useful in combination with version control,
where you can use git bisect to identify the commit that introduced the bug.
Description: Code reviews involve having another developer inspect your code, while pair
programming pairs two developers to work on the code simultaneously.
Benefits:
o Fresh eyes often spot overlooked issues.
o Developers can catch logical and structural issues before the code is even run.
Best Practice: Establish a structured code review process or pairing rotation in your team for
continuous improvement and bug prevention.
Description: Tools that analyze code for errors, vulnerabilities, and code style issues without
executing it.
Examples: Linting tools like ESLint for JavaScript, Flake8 for Python, or SonarQube for a
variety of languages.
Usage:
o Identify syntax errors, potential bugs, and bad practices.
o Most IDEs integrate linting and static analysis tools, providing feedback as you write
code.
Benefits: Helps identify issues early, especially syntax and style errors, which can prevent
bugs from appearing during runtime.
8. Memory Debugging
Description: For languages like C/C++ with manual memory management, memory
debugging tools help identify leaks, uninitialized variables, and other memory-related issues.
Examples:
o Valgrind for memory leaks and profiling.
o IDE-based tools for memory profiling (e.g., Visual Studio’s memory diagnostics).
Best Practice: Regular memory checks can help prevent memory-related crashes and
performance issues, especially in long-running applications.
9. Reverse Debugging
Description: Reverse debugging allows you to "step back" through code to see how a
program reached its current state.
Examples:
o gdb supports reverse execution, though it can be slow.
o rr (a tool for Linux) records program execution so you can rewind during debugging.
Usage: Useful for complex bugs where tracking the program's history is necessary to
understand the bug's origin.
Description: Profiling helps identify performance bottlenecks and areas of high resource
consumption that may lead to unexpected behavior.
Examples:
o Tools like cProfile for Python, Chrome DevTools for JavaScript, or perf for
Linux can be used for profiling.
Usage: Profiling is useful for debugging performance issues and determining if slow code
paths are affecting application stability.
Description: Event logs and tracepoints record specific events without interrupting program
flow, useful for debugging concurrent or multi-threaded applications.
Usage:
o Insert tracepoints in places where breakpoints would disrupt the flow, especially in
real-time or performance-sensitive applications.
o Enable and examine event logging in multi-threaded applications to trace deadlocks or
race conditions.
Example: System trace tools like strace for Linux, and Tracepoints in Visual Studio for
non-intrusive debugging.
Description: Automated tools and AI-assisted debugging can help identify potential issues or
even suggest fixes.
Examples:
o Tools like DeepCode, GitHub Copilot, or Snyk provide insights, error suggestions,
and sometimes code fixes based on learned patterns.
Usage: Use these tools for quick feedback on potential code issues, especially useful for large
codebases.
Each debugging technique has its strengths and is best suited for specific types of issues, so using a
combination of techniques is often the most effective approach to identify and fix bugs efficiently.
Improving the code quality of a crowd counting project requires specific techniques tailored to its
focus areas, such as machine learning, computer vision, data handling, and real-time performance.
Below are code improvement techniques specific to this type of project:
Model Optimization:
o Use quantization, pruning, or knowledge distillation to reduce model size and
inference time without significantly compromising accuracy.
o Convert models to formats like TensorRT, ONNX, or TFLite for deployment.
Efficient Architectures:
o Adopt lightweight models like MobileNet, EfficientNet, or YOLO-tiny for real-time
applications.
o For density map-based crowd counting, explore efficient architectures such as
CSRNet or MCNN.
Batch Inference:
o Process frames in small batches to improve inference throughput when working with
streams.
2. Optimize Real-Time Performance
Pipeline Efficiency:
o Use asynchronous pipelines for video processing, leveraging tools like OpenCV’s
threading, GStreamer, or multithreading libraries.
o Minimize I/O bottlenecks by preloading or caching frames when possible.
GPU/Hardware Acceleration:
o Leverage GPUs or TPUs for faster inference.
o Optimize OpenCV operations with CUDA or other hardware-specific backends.
Efficient Pre-processing:
o Resize and normalize images efficiently, ensuring batch-wise operations instead of
individual frame handling.
o Use libraries like Albumentations for data augmentation to enhance training datasets.
Dataset Versioning:
o Use tools like DVC (Data Version Control) to manage changes in datasets and
maintain consistency across experiments.
Annotation Tools:
o Integrate or create efficient annotation tools to ensure high-quality labeled data for
supervised training.
Configurable Pipelines:
o Use YAML or JSON files for model and training configurations, allowing for easy
hyperparameter tuning and experimentation.
o Implement modular training scripts that separate data loading, model definition,
training, and evaluation.
Checkpointing:
o Save model checkpoints during training and enable early stopping based on validation
loss.
5. Code Modularity and Maintainability
Reusable Utilities:
o Create reusable functions for tasks like frame extraction, bounding box generation,
and heatmap overlay.
Configuration Management:
o Store project configurations in centralized files and avoid hardcoding paths or
parameters.
6. Optimize Post-Processing
Parallelization:
o Leverage libraries like Numba or Cython to speed up computationally intensive
tasks, such as density map generation.
Error Handling:
o Implement robust error handling for scenarios like failed frame processing, corrupted
video files, or inference errors.
Serverless Architecture:
o For cloud-based deployments, explore serverless frameworks (e.g., AWS Lambda,
Google Cloud Functions) for scalability.
API Integration:
o Use frameworks like FastAPI or Flask to expose crowd counting functionalities as
REST APIs.
9. Documentation
Inline Documentation:
o Document the preprocessing steps, model configurations, and post-processing logic
within the code.
User Guides:
o Create deployment and usage guides, including installation instructions and system
requirements.
Automated Documentation:
o Use tools like Sphinx to auto-generate documentation from docstrings.
Data Privacy:
o Anonymize video data used for training and inference to comply with data privacy
laws like GDPR.
Bias Mitigation:
o Ensure models are trained on diverse datasets to avoid biases based on crowd
composition or region. Dataset Table for Crowd Counting
Explanation of Columns
Fig 5 Testing
Chapter 5: Implementation
The implementation of a crowd counting system involves translating the design and requirements
into working code that can accurately and efficiently count people in real-time, typically using video
surveillance or camera feeds. The process requires integrating multiple components such as image
processing, machine learning algorithms, video capture, real-time analytics, and user interfaces.
Below is a step-by-step guide for implementing a crowd counting system.
Objective: Specify if you want to count people in real-time or analyze recorded footage.
Data sources: Choose video feeds from cameras (CCTV, drones, or mobile cameras).
Accuracy Requirements: Decide on the level of accuracy required and set performance
goals.
Data Source: Collect videos or images of crowded scenes relevant to your use case (public
places, stadiums, shopping malls).
Dataset: You can use datasets like:
o ShanghaiTech: For dense crowd counting.
o UCF-QNRF: Contains very large crowds.
o Mall Dataset: Includes CCTV camera footage.
Annotation: Label each image with crowd counts or density maps if using density-based
methods.
4. Data Preprocessing
Image Scaling: Resize images for efficient processing, maintaining aspect ratio.
Normalization: Normalize image data for better model convergence.
Augmentation: Add variations (rotation, flip, brightness adjustments) to improve model
generalization.
5. Model Training
Split Dataset: Use an 80-20 split for training and testing or cross-validation for evaluation.
Training Settings: Use common optimizers (Adam, SGD), and monitor metrics like Mean
Absolute Error (MAE) and Mean Squared Error (MSE) on validation sets.
Hyperparameter Tuning: Adjust batch size, learning rate, and other parameters for optimal
results.
For the software implementation of a crowd counting project, you’ll want a well-organized pipeline
that handles data ingestion, pre-processing, model training, and deployment. Here’s a step-by-step
guide on implementing it in Python using popular machine learning and deep learning libraries.
First, ensure you have the necessary libraries installed. You can use a Python environment manager
(like virtualenv or conda) to keep dependencies organized.
The dataset should contain images along with ground truth density maps or annotations for training.
Here’s how to load and pre-process the data.
Use Mean Squared Error (MSE) as the loss function to compare the predicted and ground truth
density maps.
Implement the training loop, which feeds images and their corresponding density maps to the model.
After training, evaluate the model using metrics like Mean Absolute Error (MAE) and Mean Squared
Error (MSE).
Export the Model: Save the trained model to a file for deployment
Inference: Load the model and run inference on new images in real-time or in a production
environment.
For real-time applications, load the saved model and use it with a video stream or camera feed.
Use matplotlib or OpenCV to display the density map over the original image.
Conclusion & Future Scope
Conclusion
In this crowd counting project, we successfully developed a system capable of accurately estimating
the number of people in a given area using computer vision and machine learning techniques. By
leveraging advanced methods such as deep learning models (e.g., Convolutional Neural Networks, or
CNNs), we were able to analyse images and videos to identify and count individuals in various
environments.
By leveraging high-quality datasets, cutting-edge algorithms, and real-time data processing methods,
the system is capable of accurately estimating crowd sizes in a variety of settings, from crowded
urban areas and transport hubs to large-scale events and public gatherings. This capability is
invaluable for a wide range of applications, including crowd control, disaster management, retail
analytics, and smart city initiatives.
One of the key advantages of this project is its ability to scale. Whether deployed on edge devices for
localized processing or in the cloud for large-scale analysis, the system provides flexible deployment
options that can meet the needs of both small and large operations. The combination of object
detection, density map generation, and advanced tracking methods ensures that even in highly
crowded or complex environments, the system can deliver reliable and actionable insights.
Moreover, the ethical considerations embedded in the design, such as anonymizing data and
complying with privacy regulations, reflect the commitment to responsible AI usage. By ensuring
that privacy is safeguarded while providing real-time crowd monitoring capabilities, the project
balances technological innovation with social responsibility.
Looking ahead, the crowd counting system can be enhanced further by incorporating more
sophisticated models, improving real-time accuracy, and expanding its applicability to a wider array
of scenarios, such as traffic flow analysis, emergency response, and social distancing monitoring.
Additionally, the integration of multi-modal sensors, such as LiDAR or thermal cameras, could
improve the system's robustness in low-visibility environments.
In conclusion, the Crowd Counting Project has not only demonstrated its technical feasibility but
also highlighted its real-world value in improving safety, efficiency, and decision-making in crowded
environments. With continued development and refinement, this technology has the potential to
revolutionize how we understand and manage human activity in public spaces, contributing to
smarter, safer, and more efficient cities worldwide.
1. Model Performance: Our crowd counting model demonstrated high accuracy, even in dense
and varied crowds. We used a combination of state-of-the-art pre-trained models and custom
adjustments to optimize performance for the specific conditions of our dataset.
2. Challenges: The system faced difficulties in cases of occlusion, overlapping individuals, and
varying lighting conditions, which affected the accuracy in certain scenarios. Future
improvements could involve using more advanced techniques like attention mechanisms or
temporal analysis for videos to enhance accuracy in challenging conditions.
3. Applications: The potential applications for this system are vast. It can be used in areas like
public safety, event management, transportation, and urban planning, where understanding
crowd size is crucial for decision-making. Real-time crowd counting could be integrated into
surveillance systems to monitor crowd behavior or optimize resource allocation.
Overall, this project demonstrated the feasibility and practical utility of machine learning for crowd
counting tasks, with significant potential for real-world implementation in public safety and crowd
management applications.
This is Landing page of our website when you click on the website link the link direct you on
this page. The landing page of our website is designed to immediately convey its purpose and the
value it offers, focusing on clarity, engagement, and visual impact.
Fig. 7 About us
This page about “why us”. The "Why Us" section of a crowd counting project website is
essential for differentiating our solution and persuading potential clients or users to choose their
service. This page should highlight our unique value propositions, benefits, and competitive
advantages. This page builds trust and emphasizes the value of your solution, compelling visitors
to choose your service over competitors.
Fig. 8 Team members
This page is introducing our team members to all, who worked hard for this project and make the
project workable. The Team Member page of our website is an opportunity to showcase the
expertise, diversity, and dedication of the people behind the project. This page should make the
team relatable and credible to visitors by highlighting their skills, achievements, and roles in the
project.
Sagar- Frontend
Pratham- Frontend
Jatin- Code
Priyanshu- Code
Future work
One of the primary challenges in crowd counting is accurately estimating the number of people across
varying scales and densities within a single image.
1.2 Transformer-based Architectures for Better Contextual Understanding
Recently, Vision Transformers (ViTs) have gained popularity in many computer vision tasks due to
their ability to capture long-range dependencies and global context more effectively than traditional
CNNs.
Crowd counting models often perform poorly when applied to datasets or environments that differ
significantly from their training data.
Data annotation for crowd counting is labor-intensive and expensive. As a result, self-supervised
learning and few-shot learning are gaining traction as promising techniques to reduce reliance on
large-scale labeled datasets:
While most crowd counting research has focused on static images, real-world applications often
involve analysing video streams.
For real-world applications like surveillance and event monitoring, real-time performance is crucial.
Future research will emphasize.
Heavy occlusions in densely packed crowds make it difficult for traditional models to count
accurately. Using depth information from stereo cameras or LiDAR sensors can help separate
overlapping individuals and improve counting performance:
Background clutter remains a significant challenge in real-world scenarios, where non-human objects
or complex scenes can lead to false positives. Future models will likely incorporate:
As crowd counting becomes more widely used, ethical concerns related to privacy and surveillance
are gaining attention. Future research will explore techniques to ensure privacy while maintaining
accuracy.
Bias in training data can lead to models that underperform in certain demographic groups or
environmental conditions. Addressing fairness and bias in crowd counting involves.
References:-
[1] Kefan, X., Song, Y., Liu, S., & Liu, J. (2018). Analysis of crowd stampede risk mechanism:
A systems thinking perspective. Kybernetes.
[2] Tomar, A., Kumar, S., & Pant, B. (2022, March). Crowd Analysis in Video Surveillance: A
Review. In 2022 International Conference on Decision Aid Sciences and Applications
(DASA) (pp. 162-168). IEEE.
[3] Shi, X., Li, X., Wu, C., Kong, S., Yang, J., & He, L. (2020, May). A real-time deep network
for crowd counting. In ICASSP 2020-2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) (pp. 2328-2332). IEEE.
[4] Fu, M., Xu, P., Li, X., Liu, Q., Ye, M., & Zhu, C. (2015). Fast crowd density estimation with
convolutional neural networks. Engineering Applications of Artificial Intelligence, 43, 81-88.
[5] Tuzel, O., Porikli, F., & Meer, P. (2008). Pedestrian detection via classification on
riemannian manifolds. IEEE transactions on pattern analysis and machine intelligence,
30(10), 1713-1727.
[6] Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection
with discriminatively trained part-based models. IEEE transactions on pattern analysis and
machine intelligence, 32(9), 1627-1645.
[7] Chan, A. B., & Vasconcelos, N. (2011). Counting people with low-level features and
Bayesian regression. IEEE Transactions on image processing, 21(4), 2160-2177.
[8] Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. Advances in
neural information processing systems, 23.
[9] Wang, Y., & Zou, Y. (2016, September). Fast visual object counting via example-based
density estimation. In
2016 IEEE international conference on image processing (ICIP) (pp. 3653-3657). IEEE.
[10] Jeong, J., Choi, J., Jo, D. U., & Choi, J. Y. (2022). Congestion-Aware Bayesian Loss for
Crowd Counting. IEEE Access, 10, 8462-8473.
[11] Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation
with point supervision. In Proceedings of the IEEE/CVF International Conference on
Computer Vision (pp. 6142-6151).
[12] Hafeezallah, A., Al-Dhamari, A., & Abu-Bakar, S. A. R. (2021). U-ASD Net: Supervised
Crowd Counting Based on Semantic Segmentation and Adaptive Scenario Discovery. IEEE
Access, 9, 127444-127459.
[13] Huang, L., Zhu, L., Shen, S., Zhang, Q., & Zhang, J. (2021). SRNet: Scale-Aware
Representation Learning Network for Dense Crowd Counting. IEEE Access, 9, 136032-
136044.
[14] Elharrouss, O., Almaadeed, N., Abualsaud, K., Al-Maadeed, S., Al-Ali, A., & Mohamed, A.
(2022). FSC-Set: Counting, Localization of Football Supporters Crowd in the Stadiums.
IEEE Access, 10, 10445-10459.
[15] Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., & Wu, H. (2019). Adcrowdnet: An attention-
injective deformable convolutional network for crowd understanding. In Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition (pp. 3225-3234).
[16] Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from synthetic data for crowd
counting in the wild. In
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp.
8198-8207).
[17] Xie, Y., Lu, Y., & Wang, S. (2020, October). Rsanet: Deep recurrent scale-aware network for
crowd counting. In
2020 IEEE International Conference on Image Processing (ICIP) (pp. 1531-1535). IEEE.
[18] Liu, Y., Wen, Q., Chen, H., Liu, W., Qin, J., Han, G., & He, S. (2020). Crowd counting via
cross-stage refinement networks. IEEE Transactions on Image Processing, 29, 6800-6812.
[19] Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., & Lin, L. (2019). Crowd counting with deep
structured scale integration network. In Proceedings of the IEEE/CVF international
conference on computer vision (pp. 17741783).
[20] Sindagi, V. A., & Patel, V. M. (2019). Multi-level bottom-top and top-bottom feature fusion
for crowd counting. In Proceedings of the IEEE/CVF international conference on computer
vision (pp. 1002-1012).
[21] Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016). Single-image crowd counting via
multi-column convolutional neural network. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 589-597).
[22] Chan, A. B., Liang, Z. S. J., & Vasconcelos, N. (2008, June). Privacy preserving crowd
monitoring: Counting people without people models or tracking. In 2008 IEEE conference on
computer vision and pattern recognition (pp. 1-7). IEEE.
[23] Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in
extremely dense crowd images. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 2547- 2554).
[24] Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M.
(2018). Composition loss for counting, density map estimation and localization in dense
crowds. In Proceedings of the European conference on computer vision (ECCV) (pp. 532-
546).
[25] Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012, September). Feature mining for localised
crowd counting. In
Bmvc (Vol. 1, No. 2, p. 3).
[26] Zhang, C., Li, H., Wang, X., & Yang, X. (2015). Cross-scene crowd counting via deep
convolutional neural networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 833-841)