Development of Framework For Detecting Smoking Scenes
Development of Framework For Detecting Smoking Scenes
Corresponding Author:
Poonam G,
Department of Computer Science and Engineering,
Rashtreeya Vidyalaya College of Engineering (RVCE), Bengaluru, India.
Email: [email protected]
1. INTRODUCTION
In this age of online and social media, cinemas remain a prominent form of entertainment and
influence on youth. India produces approximately 800 to 1000 movies a year, leading to the requirement of
displaying the warning message when the smoking scene is showcased. The warnings as of now are
displayed manually.
This proposed work explains the development of a framework to automatically detect the smoking
scenes using neural network model and then display the required warning message. The challenge to detect
the smoking scenes in video clips is that only the small portion of the smoking event may be showcased and
it may be displayed for fraction of a second. To overcome this challenge, object detection methods are used
to detect different kind of cigarettes. These cigarettes may have varying shapes, colors and size. Then a
warning message such as “Smoking Kills” or “Smoking is injurious to health” is displayed in the video clip.
Google’s Object detection API is built on top of Tensorflow. There are different pre-trained
Tensorflow models available for object detection such as Single Shot Multibox Detector (SSD) with
MobileNet, SSD with Inception V2, Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101,
Faster R-CNN with Resnet 101 and Faster R-CNN with Inception Resnet v2. In our proposed approach
Faster R-CNN with Inception Resnet v2 is chosen. Faster R-CNN has achieved much better speed and
accuracy. Future models followed various approaches but could outperform Faster R-CNN by a significant
margin. Faster R-CNN may not be the simplest or fastest method for object detection, but it is still one of the
best performing. At present, Faster R-CNN with Inception ResNet model of Tensorflow is the slowest but
most accurate model.
2. RELATED WORKS
Research work carried out in [1] exploits the Region Proposal Network (RPN) of the Faster R-CNN
model to detect pedestrians. Even though R-CNN’s Region Proposal Network (RPN) performs well,
the results can be degraded by downstream classifiers. The two main reasons that may lead to this situation
are: handling of small instances due to the insufficient resolution of the features and the mining of hard
negative cases is difficult due nearly no presenece if bootstraping methodologies to achieve the same.
Another research work carried out in [2] concentrates on the datasets's impact on deep learning and
the application and the importance of deep learning through Faster R-CNNs. The work tries to summarize the
deep learning algorithms and common data sets used in the field of computer vision. Additionally, the study
builds a newer dataset in accordance to the previously available and commonly used datasets. Faster R-CNN
is then applied over this newly built dataset.
Application of the faster R-CNN is explored on various benchmarks on which the algorithm has a
proven improved results ranging from object detection to face recognition in [3]. The worked carried out
provides the results of training a Faster R-CNN model on the large scale face dataset WIDER [4]. The work
also tries to explain the results on WIDER dataset along with two more state of the art and widey used
datasets FDDB and IJB-A.
Online handwritten graphics may contain mathematical expressions and diagrams. Detection of
these symbols consist of methods designed for a single graphic type [5]. In this work, evaluation of the Faster
R-CNN object detection algorithm as a general method for detection of symbols in handwritten graphics is
carried out. Different configurations of the Faster R-CNN method are evaluated, and issues relative to the
handwritten nature of the data are pointed out. Considering the online recognition context, evaluation of
efficiency and accuracy trade-offs of using Deep Neural Networks of different complexities as feature
extractors is carried out.
solved. It higher detection quality in the main improvement done over R-CNN and SPP-Net. Here all the
layers can be updated during training process and it does not require the features to be stored in a disk.
The Fast R-CNN training time is 9 times faster when compared to R-CNN which is 3 times faster than SPP-
Net and testing time required is 213 times faster than R-CNN and when compared to SPP-Net it s 10 times
faster. Along with the decrease in training time, there is increase in the level of accuracy.
4. PROPOSED METHODOLOGY
The following section gives the details of the methodology used in the proposed work.
4.2 Dataset
Hundreds of images are required to train the classifier for good detection. Videos containing
smoking scenes are collected. These videos contain cigarettes with different shape, size and color. From these
videos 5 frames are extracted per second. All these frames are converted to 200 X 200 JPEG images which
forms our training dataset. The training dataset contains 660 images. The cigarettes in the images have
variety of lighting conditions and backgrounds. Also there are images in which cigarette is partially seen.
Using Labeling tool, the location of cigarette in an image is marked by drawing rectangles. The location of
cigarette is stored in an XML file which contains information about image height, width and the coordinates
of the bounding rectangle drawn. Each image size is less than 100KB since the time required for training
becomes large if the image size is high.
4.3 Training
Training the model for detecting the cigarettes can be done on Google cloud services, CPU or GPU.
During training, for a particular time interval Tensorflow stores the checkpoints. Loss is reported in each step
of training. The loss reported by the model is a combination of classification loss and regression loss.
Training the classifier is stopped when loss is dropped to 0.04. The latest checkpoint created by Tensorflow
at a loss of 0.4 is used for detection the cigarettes.
5. RESULTS
The model can input in 3 different forms: image, video or a live webcam feed. The system results
are evaluated using images and videos. First the system was evaluated using different datasets with different
images. Dataset 1 contain 10 images among which 5 images have cigarettes. Dataset 2 contain 20 images
among which 10 images have cigarettes. Dataset 3 contain 30 images among which 15 images have
cigarettes. Accuracy, sensitivity and specificity are the performance measures considered to evaluate the
Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 1, January 2019 : 22 – 26
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 25
model with these datasets. The results are shown in Table 2. Since datasets 2 and 3 contain images where
cigarette is partially visible and also due to illumination changes in the image, the accuracy is reduced.
Next the model is evaluated using a video dataset that contain 10 videos. The results are displayed in
Table 3. The proposed approach gives an average accuracy 94.08 for the video dataset considered.
Our results are compared with the smoking event detection ratio histogram method [13].
The comparision is shown in Table 4.
The results prove that object detection through faster R-CNN can be used for detection of smoking
scenes by considering cigarette as an object.
6. CONCLUSION
The results show that proposed method can be adopted for displaying warning messages during
smoking scenes. Our proposed work displays warning message by detecting cigarettes. This work can be
extended to detect smoking scenes which do not contain a cigarette but exhaling the smoke. In Indian movies
and television shows, a similar kind of messages are displayed during the event of alcohol consumption for
this proposed work can be extended.
REFERENCES
[1] Zhang, Liliang, et al. “Is faster R-CNN doing well for pedestrian detection?”. European Conference on Computer
Vision. Springer, Cham, 2016; 443-457.
[2] Zhou, Xinyi, Wei Gong, WenLong Fu, Fengtong Du. “Application of deep learning in object detection”. Computer
and Information Science (ICIS). IEEE/ACIS 16th International Conference on, IEEE. 2017; 631-634.
[3] H. Jiang and E. Learned-Miller. “Face Detection with the Faster R-CNN”. 2017 12th IEEE International
Conference on Automatic Face & Gesture Recognition (FG 2017). Washington, DC, 2017; 650-657.
Development of framework for detecting smoking scene in video clips (Poonam G)
26 ISSN: 2502-4752
[4] S. Yang, P. Luo, C. C. Loy, and X. Tang. “WIDER FACE: A face detection benchmark”, CVPR, 2016.
[5] F. D. Julca-Aguilar and N. S. T. Hirata, “Symbol Detection in Online Handwritten Graphics Using Faster R-CNN”,
2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria, 2018, 151-156.
[6] Xu, P. “Study on Moving Objects by Video Monitoring System of Recognition and Tracing Scheme”. Indonesian
Journal of Electrical Engineering and Computer Science(IJEECS). 2013; 11(9), 4847-4854.
[7] Mengxin Li, Jingjing Fan, Ying Zhang, Rui Zhang, Weijing Xu, Dingding Hou1. “Moving Object Detection and
Tracking Algorithm”. Indonesian Journal of Electrical Engineering and Computer Science(IJEECS). 2013; 11(10),
5539 – 5544.
[8] A. Krizhevsky, I. Sutskever, and G. Hinton. “ImageNet classification with deep convolutional neural networks”.
NIPS,2012.
[9] M. D. Zeiler and R. Fergus. “Visualizing and understanding convolutional neural networks”. In ECCV,
[10] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”.
ICLR. 2015.
[11] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition”. CVPR, 2016.
[12] Ren, Shaoqing, Kaiming He, Ross Girshick, Jian Sun. “Faster R-CNN: towards real-time object detection with
region proposal networks”. IEEE transactions on pattern analysis and machine intelligence. 2017; 39 ( 6),
1137-1149.
[13] Wu, Pin, Jun-Wei Hsieh, Jiun-Cheng Cheng, Shyi-Chyi Cheng, Shau-Yin Tseng. “Human smoking event detection
using visual interaction clues”. 20th International Conference on Pattern Recognition (ICPR) IEEE, 2010;
4344-4347.
Indonesian J Elec Eng & Comp Sci, Vol. 13, No. 1, January 2019 : 22 – 26