Social Distance

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

ABSTRACT

In the fight against the COVID-19, social distancing has proven to be a very effective measure to
slow down the spread of the disease. People are asked to limit their interactions with each other,
reducing the chances of the virus being spread with physical or close contact. In past also
AI/Deep Learning has shown promising results on a number of daily life problems. In this
proposed system we can use Python, Computer Vision and deep learning to monitor social
distancing at public places and workplaces. To ensure social distancing protocol in public places
and workplace, the social distancing detection tool that can monitor if people are keeping safe
distance from each other by analyzing real time video streams from the camera, Monitoring
People at workplaces, factories, shops we can integrate this tool to their security camera systems
and can monitor whether people are keeping a safe distance from each other or not.
INTRODUCTION

The pandemic situation has taken over the world and has made the conditions worst, as of now
there is no vaccination developed for the contagious disease and hence social distancing has
emerged as one of the best methods to prevent the spread of COVID-19. As the name suggests,
social distancing implies that people should physically distance themselves from one another.
The cases have been escalating at a very fast rate all over the world and thus social distancing is
important. To monitor social distancing at public places, this survey paper provides a pinpointing
solution. In this pandemic period using CCTV and drones we can keep a track on human
activities at public places and henceforth we can compute and summarize distances between
people and monitor the social distancing violations across the city. This proposed survey will
also there and then restrict people from coming together and prevent social gatherings. People
who gather in massive amounts at religious places can make conditions worse. Recently all
countries in the world were and mostly are in the lockdown period and this has imposed the
citizens to be at home but as time passes people will tend to visit more and more public places,
religious places and tourist destinations, so in those circumstances this system of monitoring
social distancing will be beneficial all around the world. With the help of computer vision and
deep learning and the installed CCTV we can keep a track on humans and compute the distance
between them in pixels by using computer distance algorithms and set the standard maintained
distance to be followed and get an overview of people violating the law and concerned
authorities can take the actions accordingly
LITERATURE SURVEY

[1] A. Agarwal, S. Gupta, and D. K. Singh, “Review of optical flow technique for moving
object detection,” in 2016 2nd International Conference on Contemporary Computing and
Informatics (IC3I). IEEE, 2016, pp. 409–413.

Object detection in a video is a challenging task in the field of image processing. Some
applications of the domain are Human Machine Interaction (HMI), Security and Surveillance,
Supplemented Authenticity, Traffic Monitoring on Roads, Medicinal Imaging etc. There happens
to be a number of methods available for object detection. Each of the method has some
constraints on the kind of application it has been used for. This paper presents one of such
method which is termed as Optical Flow technique. This technique is found to be more robust
and efficient for moving object detection and the same has been shown by an experiment in the
paper. Applying optical flow to an image gives flow vectors of the points corresponding to the
moving objects. Next part of marking the required moving object of interest counts to the post
processing. Post processing is the legitimate contribution of the paper for moving object
detection problem. This here is discussed as Blob Analysis. It is tested on datasets available
online, real time videos and also on videos recorded manually. The results show that the moving
objects are successfully detected using optical flow technique and the required post processing.

[2] Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A
review,” IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp.
3212–3232, 2019.
Due to object detection's close relationship with video analysis and image understanding, it has
attracted much research attention in recent years. Traditional object detection methods are built
on handcrafted features and shallow trainable architectures. Their performance easily stagnates
by constructing complex ensembles that combine multiple low-level image features with high-
level context from object detectors and scene classifiers. With the rapid development in deep
learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are
introduced to address the problems existing in traditional architectures. These models behave
differently in network architecture, training strategy, and optimization function. In this paper, we
provide a review of deep learning-based object detection frameworks. Our review begins with a
brief introduction on the history of deep learning and its representative tool, namely, the
convolutional neural network. Then, we focus on typical generic object detection architectures
along with some modifications and useful tricks to improve detection performance further. As
distinct specific detection tasks exhibit different characteristics, we also briefly survey several
specific tasks, including salient object detection, face detection, and pedestrian detection.
Experimental analyses are also provided to compare various methods and draw some meaningful
conclusions. Finally, several promising directions and tasks are provided to serve as guidelines
for future work in both object detection and relevant neural network-based learning systems.

[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object
detection with region proposal networks,” in Advances in neural information processing
systems, 2015, pp. 91–99.

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize


object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these
detection networks, exposing region proposal computation as a bottleneck. In this work, we
introduce a Region Proposal Network (RPN) that shares full-image convolutional features with
the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-
convolutional network that simultaneously predicts object bounds and objectness scores at each
position. RPNs are trained end-to-end to generate high-quality region proposals, which are used
by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can
be trained to share convolutional features. For the very deep VGG-16 model, our detection
system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art
object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using
300 proposals per image.

[4] N. S. Punn and S. Agarwal, “Crowd analysis for congestion control early warning
system on foot over bridge,” in 2019 Twelfth International Conference on Contemporary
Computing (IC3). IEEE, 2019, pp. 1–6.

Crowds occur in a variety of situations like concerts, rallies, marathons, stadiums, railway
stations, etc. Crowd analysis is essential from the point of view of safety and surveillance,
abnormal behavior detection and thereby reducing the chance of a mishap. Generally, congestion
in the crowd can lead to severe problems like a stampede. This congestion is due to increasing
crowd count; thereby increasing the crowd density in regions and abnormal crowd motion. Most
of the congestion control approaches follow a hardware-oriented approach. This paper proposes
a software-oriented approach, Congestion Control Early Warning System (CCEWS), for
congestion control with the help of object detection and object tracking technique. Object
detection is performed by following the faster R-CNN architecture in which Google inception
model is used as a pre-trained CNN model and with the help of proposed object tracking
technique the crowd abnormality is analyzed. The proposed congestion control technique
exhibits quite significant results on the proposed dataset made from the virtual simulation of
FOB (foot over bridge) scenario.

[5] A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer vision and


deep learning techniques for pedestrian detection and tracking:A
survey,”Neurocomputing, vol. 300, pp. 17–33, 2018.

Pedestrian detection and tracking have become an important field in the computer vision
research area. This growing interest, started in the last decades, might be explained by the
multitude of potential applications that could use the results of this research field, e.g. robotics,
entertainment, surveillance, care for the elderly and disabled, and content-based indexing. In this
survey paper, vision-based pedestrian detection systems are analysed based on their field of
application, acquisition technology, computer vision techniques and classification strategies.
Three main application fields have been individuated and discussed: video surveillance, human-
machine interaction and analysis. Due to the large variety of acquisition technologies, this paper
discusses both the differences between 2D and 3D vision systems, and indoor and outdoor
systems. The authors reserved a dedicated section for the analysis of the Deep Learning
methodologies, including the Convolutional Neural Networks in pedestrian detection and
tracking, considering their recent exploding adoption for such a kind systems. Finally, focusing
on the classification point of view, different Machine Learning techniques have been analysed,
basing the discussion on the classification performances on different benchmark datasets. The
reported results highlight the importance of testing pedestrian detection systems on different
datasets to evaluate the robustness of the computed groups of features used as input to classifiers.

[6] Y. Xu, J. Dong, B. Zhang, and D. Xu, “Background modeling methods in video analysis:
A review and comparative evaluation,” CAAI Transactions on Intelligence Technology,
vol. 1, no. 1, pp. 43–60, 2016.

Foreground detection methods can be applied to efficiently distinguish foreground


objects including moving or static objects from background which is very important in the
application of video analysis, especially video surveillance. An excellent background model can
obtain a good foreground detection results. A lot of background modeling methods had been
proposed, but few comprehensive evaluations of them are available. These methods suffer from
various challenges such as illumination changes and dynamic background. This paper first
analyzed advantages and disadvantages of various background modeling methods in video
analysis applications and then compared their performance in terms of quality and the
computational cost. The Change detection.Net (CDnet2014) dataset and another video dataset
with different environmental conditions (indoor, outdoor, snow) were used to test each method.
The experimental results sufficiently demonstrated the strengths and drawbacks of traditional
and recently proposed state-of-the-art background modeling methods. This work is helpful for
both researchers and engineering practitioners.

[7] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition ´ via sparse
spatio-temporal features,” in 2005 IEEE International Workshop on Visual Surveillance
and Performance Evaluation of Tracking and Surveillance. IEEE, 2005

A common trend in object recognition is to detect and leverage the use of sparse,
informative feature points. The use of such features makes the problem more manageable
while providing increased robustness to noise and pose variation. In this work we develop
an extension of these ideas to the spatio-temporal case. For this purpose, we show that the
direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and
we propose an alternative. Anchoring off of these interest points, we devise a recognition
algorithm based on spatio-temporally windowed data. We present recognition results on a
variety of datasets including both human and rodent behavior.

Figure: Visualization of cuboid based behavior recognition. Spatiotemporal volume of mouse footage
shown at top. We apply a spatiotemporal interest point detector to find local regions of interest in space
and time (cuboids) which serve as the substrate for behavior recognition.

[8] M. Piccardi,“Background subtraction techniques: a review,” in 2004 IEEE


International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583),
vol. 4. IEEE, 2004

Background subtraction is a widely used approach for detecting moving objects from static
cameras. Many different methods have been proposed over the recent years and both the
novice and the expert can be confused about their benefits and limitations. In order to
overcome this problem, this paper provides a review of the main methods and an original
categorization based on speed, memory requirements and accuracy. Such a review can
effectively guide the designer to select the most suitable method for a given application in a
principled way. Methods reviewed include parametric and non-parametric background
density estimates and spatial correlation approaches.
[9] H. Tsutsui, J. Miura, and Y. Shirai, “Optical flowbased person tracking by multiple
cameras,” in Conference Documentation International Conference on Multisensor Fusion
and Integration for Intelligent Systems. MFI 2001 (Cat. No. 01TH8590). IEEE, 2001

This paper describes an optical flow-based person tracking method using multiple cameras
in indoor environments. There are usually several objects in indoor environments which
may obstruct a camera view. If we use only one camera, tracking may fail when the target
person is occluded by other objects. This problem can be solved by using multiple cameras.
In our method, each camera tracks the target person independently. By exchanging
information among cameras, the three dimensional position and the velocity of the target
are estimated. When a camera loses the target by occlusion, the target position and velocity
in the image are estimated using information from other cameras which are tracking the
target.
[10] S. A. Niyogi and E. H. Adelson, “Analyzing gait with spatiotemporal surfaces,” in
Proceedings of 1994 IEEE Workshop on Motion of Nonrigid and Articulated Objects.
IEEE, 1994

Human motions generate characteristic spatiotemporal patterns. We have developed a set


of techniques for analyzing the patterns generated by people walking across the field of
view. After change detection, the XYT pattern can be fit with a smooth spatiotemporal
surface. This surface is approximately periodic, reflecting the periodicity of the gait. The
surface can be expressed as a combination of a standard parameterized surface-the
canonical walk-and a deviation surface that is specific to the individual walk.

[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep


convolutional neural networks,” in Advances in neural information processing systems,
2012

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images
in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1
and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-
art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five
convolutional layers, some of which are followed by max-pooling layers, and three fully-connected
layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a
very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-
connected layers we employed a recently-developed regularization method called “dropout” that
proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and
achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best
entry
Figure: An illustration of the architecture of our CNN, explicitly showing the delineation of
responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the
other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s
input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by
253,440–186,624–64,896–64,896–43,264– 4096–4096–1000.

[12] N. S. Punn and S. Agarwal, “Crowd analysis for congestion control early warning
system on foot over bridge,” in 2019 Twelfth International Conference on Contemporary
Computing (IC3). IEEE, 2019]

Crowds occur in a variety of situations like concerts, rallies, marathons, stadiums, railway
stations, etc. Crowd analysis is essential from the point of view of safety and surveillance,
abnormal behavior detection and thereby reducing the chance of a mishap. Generally,
congestion in the crowd can lead to severe problems like a stampede. This congestion is
due to increasing crowd count; thereby increasing the crowd density in regions and
abnormal crowd motion. Most of the congestion control approaches follow a hardware-
oriented approach. This paper proposes a software-oriented approach, Congestion Control
Early Warning System (CCEWS), for congestion control with the help of object detection
and object tracking technique. Object detection is performed by following the faster R-CNN
architecture in which Google inception model is used as a pre-trained CNN model and with
the help of proposed object tracking technique the crowd abnormality is analyzed. The
proposed congestion control technique exhibits quite significant results on the proposed
dataset made from the virtual simulation of FOB (foot over bridge) scenario.
Fig : Faster R-CNN architecture with Google Net as pre-trained CNN model 

[13] Pias,“Object detection and distance measurement,” https://github.com/


paul-pias/Object-Detection and-Distance-Measurement, 2020.

Visual field occlusion is one of the causes of urban traffic accidents in the process of reversing. In order
to meet the requirements of vehicle safety and intelligence, a method of target distance measurement
based on deep learning and binocular vision is proposed. The method first establishes binocular stereo
vision model and calibrates intrinsic extrinsic and extrinsic parameters, uses Faster R-CNN algorithm to
identify and locate obstacle objects in the image, then substitutes the obtained matching points into a
calibrated binocular stereo model for spatial coordinates of the target object. Finally, the obstacle
distance is calculated by the formula. In different positions, take pictures of obstacles from different
angles to conduct physical tests. Experimental results show that this method can effectively achieve
obstacle object identification and positioning, and improve the adverse effect of visual field blindness on
driving safety
Fig : Object detection model based on deep learning

[14] A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer vision and


deep learning techniques for pedestrian detection and tracking:A
survey,”Neurocomputing, vol. 300, pp. 17–33, 2018.

Three main application fields have been individuated and discussed: video surveillance, human-
machine interaction and analysis. Due to the large variety of acquisition technologies, this paper
discusses both the differences between 2D and 3D vision systems, and indoor and outdoor
systems. The authors reserved a dedicated section for the analysis of the Deep Learning
methodologies, including the Convolutional Neural Networks in pedestrian detection and
tracking, considering their recent exploding adoption for such a kind system. Finally, focusing on
the classification point of view, different Machine Learning techniques have been analyzed,
basing the discussion on the classification performances on different benchmark datasets. The
reported results highlight the importance of testing pedestrian detection systems on different
datasets to evaluate the robustness of the computed groups of features used as input to classifiers.

AIM

Recently, the outbreak of Coronavirus Disease (COVID-19) has spread rapidly across the world
and thus social distancing has become one of mandatory preventive measures to avoid physical
contact. Our project aim is to emphasize a surveillance method which uses Open-CV, Computer
vision and Deep learning to keep a track on the pedestrians and avoid overcrowding and
maintaining a sufficient distance gap between them.
Social distancing aims to decrease or interrupt transmission of COVID-19 in a population by
minimizing contact between potentially infected individuals and healthy individuals, or
between population groups with high rates of transmission and population groups with no or
low levels of transmission.

OBJECTIVE

One way of limiting the spread of an infectious disease, for instance, Covid-19, is to
practice social distancing. 

The objective is to reduce transmission, delaying the epidemic peak, reducing the size of the
epidemic peak, and spreading cases over a longer time to relieve pressure on the healthcare
system

Social distancing is one of the non-pharmaceutical infection control actions that can stop or


slow down the spread of a highly contagious disease

MOTIVATION

As the pandemic situation has taken over the world, social distancing is one of the major
precautions which needs to be taken. As people come together in crowds, they are more likely to
come into close contact with someone that has COVID-19 and hence World Health Organization
has proposed a strict law for maintaining physical distance of 1 meter (3 feet) in every pair.
Thus, to keep a track of the social distancing among the public this idea of social distancing
detector emerged.

PROPOSED SYSTEM

The proposed system focuses on how to identify the person on image/video stream whether the
social distancing is maintained or not with the help of computer vision and deep learning
algorithm by using the OpenCV, Tensor flow library.

Approach

1. Detect humans in the frame with yolov3.

2. Calculates the distance between every human who is detected in the frame.

3. Shows how many people are at High, Low and Not at risk.
METHODOLOGY

COMPUTE
PAIRWISE
INPUT IMAGE OBJECT
DISTANCE
OR FRAME DETECTION
BETWEEN
CENTRIOD

CHECK DISTANCE
MATRIX FOR
SHOW RESULT PEOPLE <N PIXELS
APART

We use computer vision, and deep learning to implement social distancing detectors.

The steps to build a social distancing detector include:

1. Apply object detection to detect all people (and only people) in a video stream

2. Compute the pairwise distances between all detected people

3. Based on these distances, check to see if any two people are less than N pixels apart

Process

 We will use YOLO for object detection. Once the objects(people) are detected, we will
then draw a bounding box around them.
 Using the centroid of the boxes we then measure the distances between them.
 For the distance measure, Euclidean Distance was used.
 A box is colored RED if unsafe and GREEN if safe.
YOLO (You Only Look Once) is a clever convolutional neural network (CNN) for doing object
detection in real-time. The algorithm applies a single neural network to the full image, and then
divides the image into regions and predicts bounding boxes and probabilities for each region.
These bounding boxes are weighted by the predicted probabilities.
YOLO is popular because it achieves high accuracy while also being able to run in real-time.
The algorithm “only looks once” at the image in the sense that it requires only one forward
propagation pass through the neural network to make predictions. After non-max suppression
(which makes sure the object detection algorithm only detects each object once), it then outputs

recognized objects together with the bounding boxes.

REQUIREMENT SPEECIFICATION

Software Requirements

 Opencv
 Keras
 Tensorflow
 Imutils
 Numpy

Hardware Requirements

 Minimum i3 processor 4gb ram 80hdd


 Camera

APPLICATION BENEFIT
 Reducing spread of the disease: The coronavirus is thought to spread mainly from person
to person. This can happen between people who are in close contact with one another.
Droplets that are produced when an infected person coughs or sneezes may land in the
mouths or noses of people who are nearby, or possibly be inhaled into their lungs.
 Tracking people to know about the actions performed by an individual in a society
 Activity recognition aims to recognize the actions and goals of one or more person from a
series of observations on the agent’s / persons actions and the environmental conditions.
 Pedestrian detection  is an essential and significant task in any intelligent video
surveillance system, as it provides the fundamental information
for semantic understanding of the video footages. It has an obvious extension to
automotive applications due to the potential for improving safety systems.

CONCLUSION

As we envision the world post COVID-19 pandemic the need of self-responsibility emerges irrefutably.
The scenario would mostly focus on accepting and obeying the precautions and rules that WHO has
imposed more precisely as responsibility of one will totally embark on themselves and not government.
Social Distancing would undoubtedly be the most important factor as COVID 19 spreads through close
contact with infected ones. In order to supervise large mobs, an effective solution is important and in
our project we focus on that. Using Camera, authorities can keep a track of human activities and control
large crowd to come together and prevent violating the law. As far as people are maintaining a safe
distance they would be indicated with green boundary box, and if not red.
REFERENCES

[1] A. Agarwal, S. Gupta, and D. K. Singh, “Review of optical flow technique for moving object
detection,” in 2016 2nd International Conference on Contemporary Computing and Informatics
(IC3I). IEEE, 2016, pp. 409–413.

[2] Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,”
IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212–3232,
2019.

[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with
region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–
99.

[4] N. S. Punn and S. Agarwal, “Crowd analysis for congestion control early warning system on
foot over bridge,” in 2019 Twelfth International Conference on Contemporary Computing (IC3).
IEEE, 2019, pp. 1–6.

[5] A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer vision and deep
learning techniques for pedestrian detection and tracking:A survey,”Neurocomputing, vol. 300,
pp. 17–33, 2018.

[6] Y. Xu, J. Dong, B. Zhang, and D. Xu, “Background modeling methods in video analysis: A
review and comparative evaluation,” CAAI Transactions on Intelligence Technology, vol. 1, no.
1, pp. 43–60, 2016.
[7] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition ´ via sparse spatio-
temporal features,” in 2005 IEEE International Workshop on Visual Surveillance and
Performance Evaluation of Tracking and Surveillance. IEEE, 2005, pp. 65–72.

[8] M. Piccardi,“Background subtraction techniques: a review,” in 2004 IEEE International


Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol. 4. IEEE, 2004,
pp. 3099–3104

[9] H. Tsutsui, J. Miura, and Y. Shirai, “Optical flowbased person tracking by multiple cameras,”
in Conference Documentation International Conference on Multisensor Fusion and Integration
for Intelligent Systems. MFI 2001 (Cat. No. 01TH8590). IEEE, 2001, pp. 91–96.

[10] S. A. Niyogi and E. H. Adelson, “Analyzing gait with spatiotemporal surfaces,” in


Proceedings of 1994 IEEE Workshop on Motion of Nonrigid and Articulated Objects. IEEE,
1994, pp. 64–69.

[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep


convolutional neural networks,” in Advances in neural information processing systems, 2012,
pp. 1097– 1105

[12] N. S. Punn and S. Agarwal, “Crowd analysis for congestion control early warning system on
foot over bridge,” in 2019 Twelfth International Conference on Contemporary Computing (IC3).
IEEE, 2019, pp. 1–6.

[13] Pias,“Object detection and distance measurement,” https://github.com/ paul-pias/Object-


Detectionand-Distance-Measurement, 2020.

[14] A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer vision and deep
learning techniques for pedestrian detection and tracking:A survey,”Neurocomputing, vol. 300,
pp. 17–33, 2018.

You might also like