A Computer Vision Based Framework For Visual Gun Detection Using SURF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/281776103

A computer vision based framework for visual


gun detection using SURF
CONFERENCE PAPER JANUARY 2015
DOI: 10.1109/EESCO.2015.7253863

READS

12

1 AUTHOR:
Gyanendra Kumar Verma
National Institute of Technology, Kurukshetra
21 PUBLICATIONS 28 CITATIONS
SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate,


letting you access and read them immediately.

Available from: Gyanendra Kumar Verma


Retrieved on: 28 February 2016

A Computer Vision based Framework for Visual Gun


Detection using SURF
Rohit Kumar Tiwari

Gyanendra K. Verma

Department of Computer Engineering


National Institute of Technology
Kurukshetra, India
[email protected]

Department of Computer Engineering


National Institute of Technology
Kurukshetra, India
[email protected]

AbstractTodays automatic visual surveillance is prime


need for security and this paper presents first step in the
direction of automatic visual gun detection. The objective of our
paper is to develop a framework for visual gun detection for
automatic surveillance. The proposed framework exploits the
color based segmentation to eliminate unrelated object from an
image using K-mean clustering algorithm. Speeded up robust
features (SURF) interest point detector is used to locate the
object (gun) in the segmented images. Our framework is robust
enough in terms of scale, rotation, affine and occlusion. We have
implemented and tested the system over sample images of gun,
collected by us. We got promising performance of our system to
detect a gun. Further, our system performs very well under
different appearance of images. Thus our system is rotation, scale
and shape invariant.
Keywords Gun Detection, Video Surveillance, SURF, Point
Detector

I.

INTRODUCTION

Object detection in general is an extensively studied field


of image processing and its main task is to find a particular
object of interest in dynamic environment. One application of
object detection in images is in security application [1] and
legal evidence [2]. Now-a-days, deployment of CCTV
operators at control rooms for monitoring of CCTV footages
has become very common method to detect suspicious
activities. But it is not possible for him to effectively monitor
the activities all the time. There is a possibility for them to
miss the detection of some abnormal activity and this miss
detection rate increase with number of video streams running
on same screen and working hours of CCTV operator [3]. In
such a situation an automatic visual object detector can play a
crucial role. For example, it can raise an alert signal if any
abnormal activity takes place under CCTV surveillance.
In the recent past there have been some accidents which
have brought the demand of an automated visual weapon
detection system, in picture. Specially, the detection of visual
gun is of particular interest because guns are broadly available
and are used as a weapon. According to a study, in USA, 72%
of the crime involves use of gun in 2011 [4]. Some of the
countries like Ireland carrying gun at public place are either
prohibited or restricted. So the detection of gun is of high
importance for safety concern. Our work focuses on automatic
detection of gun in the images which can be further used in

tracking of gun. As the electronic devices are becoming


cheaper day by day, it seems to be a good idea to have an
automated Gun Detection System (GDS) for surveillance of
public using CCTV. The automatic GDS raise an alarm signal
if it detects gun in a frame extracted from video feed, after that
CCTV operator can allocate his attention on the video scene
and either accept or discard that detection.
Our proposed scheme of automatic detection of gun uses
hybrid approach of color based segmentation and SURF
interest point detector. Color based segmentation is performed
using k-means clustering algorithm to eliminate unrelated
color or object present in the image. After that morphological
processing is applied on each object to bridge small gaps and
for boundary extraction. Then SURF feature of object
boundary is extracted which is used for matching with stored
descriptor to find out similarity with gun. If similarity score is
greater than 50%, then system will raise an alarm signal.
There are two contributions of our visual gun detection
framework- first one is that there has been no other research
work on visual gun detection (to the best knowledge of
authors) and second it can detect multiple objects in the
images unlike any other point detector such as SIFT [5],
SURF [6], BRISK [7] etc. Point detectors are able to detect
only one object in the image even though there is multiple
object of same kind. This paper is organized as follows,
Section 2 discuss about various challenges of visual gun
detection followed by weapon detection literature in Section 3.
Section 4 presents our gun detection approach followed by
experimental setup and experimental result and discussion in
Section 5 and Section 6 respectively and then conclusion in
Section 7.
II.

CHALLANGES

Guns are widely available in different color like black,


silver etc. in the world which makes its detection in images a
challenging task. Its shape is another issue because one model
of gun varies from another model by some amount due to
variation in body parts like hammer, grip safety, back sight,
front sight etc. and different surface characteristics. Two
different shapes of gun are shown in Fig.1 (a) and Fig.1 (b).
Some other issues are scale, rotation and view variation of gun
in images. Scale variation arises due to change in distance of
gun from the CCTV when it captures the video. Rotation and
view variation occurs due to deviation in orientation of gun

978-1-4799-7678- 2/15/$31.00 2015 IEEE

and its plane respectively. However, there are many


techniques to deal with scale and rotation variations but view
variation is most challenging problem in field of object
detection even today. Fig.1 (c) and Fig.1 (e) show the
illustration of scale, rotation and view variation. Another
challenges that make the problem more complex are partial or
full occlusion of gun, deformation, loss of information due to
transformation from 3-D world into 2-D world, illumination
change, shadow, presence of noises in image, and real-time
processing requirement [8]. Some of the these are shown in
Fig. 1. Partial or full occlusions occur because guns are mostly
carried in either hand or in holster. Use of equipments to target
long distance object causes deformation of gun. Illumination
change and shadow arises due to change in lighting condition
in operational environment. Noises in image occur during
acquisition and transmission of image. There are various types
of noises like salt and paper noise, Gaussian noise, Raleigh
noise etc that are introduced during different conditions of
acquisition and transmission of image. Real-time processing
requirement includes low space and response time
requirement. As the GDS system comprises of real-time
application so it must have low response time.
III.

RELATED WORK

The use of CCTV has become general for monitoring


various activities. Typical applications include traffic
monitoring like statistics of traffic flow, traffic rule violations,
people counting etc and intruder, suspicious activity detection
in automated surveillance [9]. All of the above applications
utilized the information carried in consecutive frames of
video. For gun detection we have focused the detection of gun
in single frame then using optical flow method to track it in
later frame and perform gun detection again to reduce the
chances of false detection.
Although we have already stated that no work has been
done on visual gun detection according to our knowledge, but
there is a lot of research in field of concealed weapon
detection and few in field of visual knife detection. Concealed
weapon detection (CWD) involves detection of weapons
concealed underneath a persons clothing. It only detects that
person carrying any object under his clothes or not but it did
not decide whether the object belong to weapon or not and
which type of weapon carried out by the person. Most of the

(a)

(d)

(b)

(c)

(e)

(f)

Figure 1. Illustration of different issues of Gun Detection (a) Partial Occlusion


(b) Different Shape (c) View change (d) Illumination change (e) Scale and
Rotation variation (f) Noisy image

methods of it are based on some imaging techniques like infra


red imaging, millimeter wave imaging etc. Here we will
discuss some of them.
David M. Sheen et al. [10] proposed a method of CWD
for airports and other secure location based on three
dimensional millimeter (mm) wave imaging technique.
Millimeter wave imaging is derived from the microwave
holography techniques that uses phase and amplitude
information recorded over two dimensional apertures to
reconstruct a focused image of target object. Millimeter waves
are nonionizing and easily penetrate the common clothing
material and reflect from the human body part or weapons.
Zhiyun Xue et al. [11] proposed a method of CWD based on
fusion of infra red (IR) imaging and color visual image. He
uses the fusion of infra red image and visual image method to
maintain the natural color of original image. R. Blum et al.
[12] developed a method of CWD based on fusion of IR or
mm wave image and visual image using multi resolution
mosaic technique where concealed weapon is first detected by
fuzzy k-means clustering method from IR or mm wave image.
E. M. Upadhyay et al. [13] proposed a method of CWD using
image fusion. They used homomorphic filter, entropy of
blocks and blending approach of fusion to generate multi
exposure- multi modal image from a set of visual and IR
image with multiple exposures.
Marcin Kmiec et al. [14] proposed a method for visual
knives detection based on active appearance model and Harris
corner detector. The limitation of above method is based on
the accuracy of Harris corner detector because their approach
first finds the tip of knives using Harris corner detector and
uses that for active appearance modeling of knife.
IV.

METHODOLOGY

In this section we explain our approach of visual gun


detection. Fig. 2 shows the block diagram of our proposed
solution. Our detection approach consists of several steps
which will be discussed in following.
A. System Initialization
System initialization loads the stored SURF descriptor of
the gun that is used to measure the similarity score with
extracted SURF features of blobs. The SURF descriptor used
finding similarity is shown in Fig.3 (a).
B. Preprocessing
Preprocessing steps involve removal of different noises
from the image which arises during the acquisition or
transmission of image. As the images are acquired through
camera of high resolution so image resizing is performed to
make it appropriate for processing of system. In our developed
system we used image with resolution of 400*300 pixel after
resizing.
C. Color based segmentation
Color based segmentation using k-means clustering [15] is
applied on image obtained from preprocessing to extract the
color related to gun, i.e. black if the gun is of black color. It el-

978-1-4799-7678- 2/15/$31.00 2015 IEEE

System Initialization

Preprocessing

Input Image

Blob Extraction

Yes

Morphological Closing and


Boundary Extraction

SURF Feature Extraction

Is there more
Blobs?

No

Similarity > 50 %
Yes

No
Insert Rectangle for each stored
bounding in input Image

Store Bounding Box of Blob

Figure 2. Block Diagran of Proposed Method

-iminates color or objects which do not resemble gun. kmeans clustering algorithm takes image and number of cluster
k as input and then quickly approximates k clusters of image
color such that the sum of the squared distances between
each point and its closest cluster center is minimized. After
performing algorithm on the input image system takes a
cluster of those colors which are similar to that of gun.
Referenced image and color based segmentation result are as
shown in Fig.3 (b) and Fig.3 (c) respectively, where green
colored part shows the black colored part of the original
image.
D. Blob Extraction
This step extracts a blob from the segmented image that is
of area greater than 1000 pixels because many blobs of small
area arises during segmentation due to noises. A blob is a
connected component of pixel where some properties are
constant or varies lightly. There are three blobs of area greater
than 1000 pixels in the segmented image shown in Fig.3 (c).
E. Morphological Closing and Boundary Extraction
Morphological closing is performed to eliminate the
presence of small gaps in the blob. Then boundary of blob is
extracted. The boundary of object is used for matching process
because inner texture varies from object to object of same
kind, but their outline remains constant or varies lightly. The
concept of using boundary for matching is taken from the
human visual system as it can recognize any object from its
boundary structure. Morphological closing is achieved by
erosion of image with structuring element followed by dilation
with structuring element as shown in (1). Boundary is
obtained by subtracting the result of erosion of image from the
original image as shown in (2).

Ic = ( I

SE )

SE

(1)

B = I- ( I SE )
(2)
Where I is the original image, Ic is the output of closing,
SE is the structuring element, B represents the boundary of
image.
F. SURF Feature Extraction
SURF features are being used in our methodology because
these are invariant to scale, rotation and affine up to some
extent and faster than other interest point detectors like SIFT,
Harris [16] etc. SURF is scale variant because it identifies
interest point at different scale. It is rotation invariant due to
its descriptor property which is extracted by considering the
neighboring pixels around each interest point. It is faster than
any other point detector since it uses the concept of integral
image for identifying interest points and creating their
descriptor. SURF is three times faster than SIFT point
detector. SURF feature extraction involves two stepsidentification of interest points and creation of descriptor for
each interest point. Interest points are the points like corners,
blobs etc. in the image which are invariant to scale, rotation
and affine changes. SURF interest points are identified using
Hessian matrix as follow:
Lxx( X , ) Lxy ( X , )
H(X, ) =

Lxy ( X , ) Lyy ( X , )

(3)

Where Lxx(X,) is the convolution of Gaussian second


order derivative with respect to x with image I at point X=
(x,y). is the scale of Gaussian function. A point is classified
as interest point if the determinant of Hessian matrix at current
scale is local maxima of above and below the current level and
above the specified threshold value in 3*3 neighborhood.
After indentifying each interest point, it creates a feature
descriptor of length 64 which is invariant to scale, rotation and
affine invariant up to some extent.
G. Matching
Matching has been performed to calculate the similarity
score between the stored descriptor of gun and blob. SURF
features of boundary of object are being used for matching
because inner texture of object body varies from object to
object of same kinds which will results into different SURF
features for same interest point different images whereas the
outline of object remains constant or varies lightly. Similarity
score of matched boundary features will be used to find out up
to what extent the shape of object is similar to shape of gun.
Nearest neighbor algorithm [5] is used to calculate matching
between two descriptors using sum of square difference (SSD)
as metric:
n

SSD = (x i y i ) 2

(4)

i =1

Where x and y are two feature vector. Two feature vectors


are said to be matched if the ratio of SSD of first to second
nearest neighbor is less than 0.6 otherwise matching is
ambiguous. Matching results of each blob and gun descriptor
is shown in Fig.3 (d). A blob is categorized as gun if half of
the features of gun are matching with the blob. Here, the matc-

978-1-4799-7678- 2/15/$31.00 2015 IEEE

EXPERIMENTAL SETUP

V.

(c)

(b)

(a)

For implementation of our system we used MatLab 2013a


on computer with Intel Core i3 processor and 3GB RAM. As
there is no standard dataset of gun images so we created our
own dataset of 15 positive images in which gun is present and
10 negative images in which gun is absent. Our dataset is
prepared in such a way that it consists of image of different
type of gun with different scale, rotation and orientation. In
some images, some of gun is occluded by either hand or some
other object and consist of multiple guns. In each images some
other objects are also present besides gun with different
background. Resolution of each image of dataset is of
4000*3000 pixels.
Matching ratio and threshold are two parameters which we
considered in our experiment. Matching ratio is the ratio of
SSD of first and second best match of a feature. If matching
ratio is greater than 0.6 means one features of gun are closely
matching with the two features of other object which means
the matching is ambiguous because one feature cannot be
similar to two features of another object. If matching ratio is
less than 0.6 means first and second best match are quite
different and matching is accurate. Matching threshold is the
percentage of features of gun that are matching with the other
object features. Here, we have taken matching threshold as
50% because guns are of different shape which varies lightly
and another reason is of partial occlusion of gun.
The assumptions which we have considered in our
approach are uniform gun color and orientation. Gun should
be of uniform color because we are using color based
segmentation which segments the image based on the color. If
a single gun is made up of different color, then segmentation
part of our algorithm will segment the gun into different
clusters based on their color where each cluster will consist of
only some part of gun. If there are many guns of different
uniform color in a single image, in that case there is a need to
run algorithm on each cluster of color to detect each gun of
different color in image. There should not be too much
variation in orientation of gun with respect to CCTV because
it will result in different view of gun as well as different visual
shape.

(d)

(e)
Figure 3. (a) SURF Descriptor of visual gun (b) Referenced Image (c) Color
based segmentation of referenced image (d) Matching result of each blob of
image. (e) Final gun detection result.

-hing threshold is 50% because partial occlusion as well as


light variation in shape of gun is also considered.
After applying all of the above procedures, we have
the bounding box of those blobs which are categorized as gun.
A bounding box is a smallest rectangle which covers a blob. In
input image, a rectangle is inserted for each stored bounding
box whose result is shown in Fig.3 (e).
TABLE I.
Number of images in
positive dataset
15

Number of correctly
detected positive
images
13

EXPERIMENTAL RESULTS AND DISCUSSIONS

In this section we will discuss the results as well


characteristics of the system. We used two metrics to evaluate
the efficiency of our system namely true positive rate and false
positive rate. Table 1 shows the performance of the visual gun
detection system in terms of true and false positive rate. It
shows that the true positive rate the system is 86.67% whereas
false positive rate is 0%, that means our system accuracy is
88.67%.
Fig.4 shows the detection result of gun in different images.

PERFORMANCE OF VISUAL GUN DETECTION SYSTEM

True positive rate


86.67%

VI.

Number of images in
negative dataset
10

86.67 %.

978-1-4799-7678- 2/15/$31.00 2015 IEEE

Number of wrongly
detected negative
images
0

False positive rate


0%

(a)

(c)

(b)

(f)

(e)

(g)

(d)

(h)

Figure 4. Result of visual gun detection (a) Different viewing angle detection (b) and (c) scale invariant detection (d) rotation invaraint detection
(e) Detection of multiple guns (f),(g) and (h) partialy occluded gun detection

It proves that system is robust to scale, rotation and viewing


angle up to some extent. Last three detection result of Fig.4
demonstrate the robustness of system against the partial
occlusion. Fig.4 (e) reveals the detection of multiple guns
present in same image which distinguish our system from the
other interest point detectors like SURF, SIFT etc., which can
detect only one object.

[5]

[6]
[7]

[8]

VII. CONCLUSION
In this paper, we have proposed a method for detection visual
guns in images using color based segmentation and SURF
interest point detector. Color based segmentation is used to
eliminate unrelated color or objects which are not of interest.
Then SURF features are used to find out similarity of each
segmented object with the gun descriptor. If more than 50% of
features of gun descriptor are matching with the SURF
features of object then that object is labeled as gun. The
novelty of our approach is that it is robust to partial occlusion,
scale, rotation, affine variation and can detect the presence of
multiple guns in an image. However, we believe that there are
many possible ways to improve the performance of our system
like making it robust to illumination change and minimizing
its real-time processing requirement like time and space
complexity.

[9]

[10]

[11]

[12]

[13]

[14]

REFERENCES
[1]

[2]

[3]

[4]

Y.M. Liang, S.W. Shih, A.C. Shih, "Human action segmentation and
classification based on the isomap algorithm," Multimedia Tools and
Applications pp. 1 20, 2011. Springer
A. Maalouf, M.C. Larabi, D. Nicholson, "Offline quality monitoring for
legal evidence images in video-surveillance applications,". Multimedia
Tools and Applications, pp. 1-30, 2012. Springer
Andrzej Glowacz, Marcin Kmiec and Andrzej Dziech "Visual detection
of knives in security applications using active appearance model,
Multimedia Tools Applications, Springer, 2013.
Global impact of gun violence.
[Online], Available at
http://www.gunpolicy.org/firearms/region/.

[15]

[16]

D.G. Lowe, Distinctive image features from scale- invariant


keypoints, International Journal of Computer Vision, Vol. 60, No. 2 ,
pp. 91-110, 2004.
H. Bay, T. Tuytelaars, L..V. Gool, SURF: speeded up robust features,
In 9th European Conference on Computer Vision, pp. 7-13, 2006.
S. Leutenegger, M. Chli and R.Y. Siegwart, BRISK: binary robust
invariant scalable keypoints, IEEE International Conference on
Computer Vision, pp. 2548-2555, Barcelona, 2011.
Alper Yilmaz, Omar Javed and Mubarak Shah, "Object tracking: a
survey," ACM Computing Surveys, Vol. 38, No. 4, 2006.
Sanjivani Shantaiya,Keshri Verma and Kamal Mehta, "A survey on
approaches of object detection," International Journal of Computer
Applications, Vol. 65, No. 18, pp. 14-20, 2013.
David M. Sheen, Douglas L. McMakin, and Thomas E. Hall, Threedimensional millimeter-wave imaging for concealed weapon detection,
IEEE Transactions On Microwave Theory And Techniques, Vol. 49, No.
9, pp. 1581-1592, September 2001.
Z. Xue, R. S. Blum, and Y. Li, "Fusion of visual and IR Images for
concealed weapon detection," Proceedings of the Fifth International
Conference on Information Fusion ,pp. 1198-1205,2002.
R. Blum, Zhiyun Xue, Zheng Liu and D.S. Forsyth, "Multisensor
concealed weapon detection by using a multiresolution mosaic
approach, Proceedings of the Vehicular Technology Conference, pp.
4597-4601, 2004.
E. M. Upadhyaya and M. K. Rana, Exposure fusion for concealed
weapon detection, 2nd International Conference on Devices, Circuits
and Systems, pp. 1-6, 2014.
Marcin Kmiec, Andrzej Glowacz, Andrzej Dziech,"Towards robust
visual knife detection in images: active appearance models initialised
with shape-specific interest points," 5th International Conference on
Multimedia Communications, Services and Security,pp. 148-158, 2012.
Ming-Ni Wu, Chia-Chen Lin, Chin-Chen Chang, "Brain tumor detection
using color-based k-means clustering segmentation," International
Conference on Intelligent Information Hiding and Multimedia Signal
Processing, pp. 245-250, 2007.
C. Harris and M. Stephenes, A combined corner and edge detector, In
Proceedings of 4th Alvey Vision Conference, pp. 147151, 1988.

978-1-4799-7678- 2/15/$31.00 2015 IEEE

You might also like