100% found this document useful (1 vote)
257 views5 pages

A Water Behavior Dataset For An Image-Based Drowning Solution

Uploaded by

aftabsukhera1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
257 views5 pages

A Water Behavior Dataset For An Image-Based Drowning Solution

Uploaded by

aftabsukhera1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Water Behavior Dataset for an Image-Based

Drowning Solution
1st Saifeldin Hasan 2nd John Joy 3rd Fardin Ahsan
Electrical Engineering Department Electrical Engineering Department Electrical Engineering Department
2021 IEEE Green Energy and Smart Systems Conference (IGESSC) | 978-1-6654-3456-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/IGESSC53124.2021.9618700

Rochester Institute of Technology Rochester Institute of Technology Rochester Institute of Technology


Dubai, UAE Dubai, UAE Dubai, UAE
[email protected] [email protected] [email protected]

4th Huzaifa Khambaty 5th Manan Agarwal 6th Jinane Mounsef


Electrical Engineering Department ELectrical Engineering Department Electrical Engineering Department
Rochester Institute of Technology Rochester Instutute of Technology Rochester Institute of Technology
Dubai, UAE Dubai, UAE Dubai, UAE
[email protected] [email protected] [email protected]

Abstract—Drowning is responsible for an estimated of 320,000 repetitive task [4], [5]. Although lifeguards are usually highly
deaths annually worldwide, roughly 25% of those deaths are in alert, their task is extremely difficult resulting in egregious
swimming pools. This is probably due to the fact that a drowning examples of inattention. The aquatics industry is acutely aware
person, to the untrained eye, will appear to be normally playing
or floating in the water. While drowning, a person is unable to call of the challenges they face to prevent drownings in lifeguarded
for help, as the nervous system focuses on gathering oxygen for swimming areas. Addressing the question of how one can help
the lungs. To assist the lifeguards with their rescue mission, we lifeguards complete the highly challenging task of identifying
propose a water behavior dataset curated to support the design rare events (drownings) while completing a repetitive scanning
of image-based methods for drowning detection. The dataset task is a multifaceted problem that requires a multifaceted
includes three major water activity behaviors (swim, drown,
idle) that have been captured by overhead and underwater solution.
cameras. Moreover, we develop and test two methods to detect Recently, many research works have been devoted to drown-
and recognize the drowning behavior using the proposed dataset. ing behavior signs understanding. Lu and Ten [6] presented
Both methods use deep learning and aim to support a fast and a vision-based approach to detection of drowning incidents
smart pool rescue system by watching for the early signs of in swimming pools at the earliest possible stage using a
drowning rather than looking for a drowned person. The results
show a high performance of the presented methods validating number of video clips of simulated drowning. The approach
our dataset, which is the first public water behavior dataset and detects, tracks swimmers and parses observation sequences
the main contribution of the work. of swimmer features for possible drowning behavioral signs.
Index Terms—water behavior dataset, drowning, computer In [7], a real time drowning detection method uses a HSV
vision, early rescue thresholding mechanism along with contour detection to detect
a drowning person in indoor swimming pools and sends an
I. I NTRODUCTION alarm to the lifeguard if the previously detected person is
The burden of drowning for children has become a leading missing for a specific amount of time. A real-time vision
public health problem [1]. The high rates of drowning are an system operating at an outdoor swimming pool is presented
impediment to achieving reductions of early childhood mortal- in [8]. The system is designed to automatically recognize
ity [2]. Many of these death incidents are attributed to a poor different swimming activities and to detect occurrence of
adult supervision. Various drowning scenarios involve newly early drowning incidents. To learn unique traits of different
mobile toddlers who wander off, preschoolers who discover swimming behaviors, the authors simulate and collect unique
ungated swimming pools, in addition to older children and traits of early drowning behaviors and numerous swimming
adults at unguarded public swimming areas, such as beaches, styles.
rivers, and residential swimming pools. Effective interventions The industry has made a shift in the past few years to ad-
that mitigate drowning risk will improve health outcomes. Yet, dress the drowning crisis by moving toward more safeguarded
interventions, such as fencing around pools, lifeguards, and pools. This also brought in a generation of drowning preven-
flotation devices are not always feasible. A recent study shows tion products. Many of these products enable the lifeguard
a small but alarming number of drowning deaths at public with a deeper vision through a 3D monitoring screen [9] or the
swimming areas that are guarded by professional lifeguards swimmer with a wearable that tracks how long a swimmer’s
[3]. There is strong evidence that humans simply are not face is submerged [10]. Other surveillance technologies have
very good at noticing rare events while completing a boring, advanced to the point of being able to answer a distress call

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on April 27,2023 at 05:00:08 UTC from IEEE Xplore. Restrictions apply.
using artificial intelligence software to monitor the swimmer’s TABLE I: Summary of the overhead and underwater videos
activity and detect potential problems when the swimmer has characteristics for each of the three water activities.
been struggling for more than seven seconds [11]. Category No. of No. of No. of No. of
videos frames videos videos
According to the previous surveys, most of the research has (overhead) (overhead) (underwater) (underwater)
been focused on the detection of the swimmer’s location to Swim 19 9,384 19 8,388
identify submerged swimmers with no movement. It is not Drown 18 6,080 14 7,363
Idle 10 9,265 11 6,259
applicable to water activity recognition, in our case such as
activities that describe specific behaviors of swimmers. On
the other hand, these recognition works applied the model B. Method 2: Body Pose Estimation
on either simulated swimming scenarios or on their own
While the features from the pre-trained DNNs can be useful
private real-time video frames. Based on these considerations,
for human activity recognition in the water, another network,
this paper first proposes a water activity behavior dataset of
the High Resolution Network (HRNet), can be trained to
videos that are captured above water and under water. The
detect and compute the body pose keypoints. A BGR to
videos illustrate three types of water activities that describe
RGB filter is applied first to the frames before passing them
the behaviors of swimming, drowning and staying idle (resting
to the DNN to ameliorate the network’s performance.The
or playing). Moreover, we present two image-based methods
HRNet consists of parallel high-to-low resolution subnetworks
that train a deep learning model on the proposed dataset.
with repeated information exchange across multi-resolution
The remainder of the paper is organized as follows. Section
subnetworks (multi-scale fusion). The model has been pre-
II presents the two drowning recognition methods. The exper-
trained on the COCO tain2017 dataset [17] that contains over
imental setup describing the proposed dataset and the results
50,000 images and 150,000 person instances labeled with 17
are presented in Section III. Finally, the conclusions are drawn
keypoints. Finally, the detected keypoints are classified using a
in Section IV.
DNN of 5 layers with (50,50,50) neurons in the hidden layers
to classify the three different human water activities (swim,
II. D ROWNING R ECOGNITION
drown, idle).
We utilize existing deep neural networks (DNNs) pre-
trained on the large ImageNet dataset [12], and adapt them III. E XPERIMENTAL S ETUP AND R ESULTS
for water behavior recognition with the sole purpose of iden- A. Water Behavior Dataset
tifying the early signs of drowning. The pre-trained feature To train and test the previously described models, a dataset
representations provide a starting point for creating robust clas- of short videos displaying water human activities is curated.
sifiers for drowning detection. We consider two scenarios for For this purpose, two different cameras are used above water
incorporating pre-trained neural networks. First, we use pre- and under water. The DJI OSMO+ is used to film above
trained DNNs to re-train them by fine-tuning their parameters. water, with a resolution of 1920x1080 at 30 fps, and a GoPRO
Next, we detect the different body keypoints with a Deep High hero 7 is used for the underwater scenes, with a resolution of
Resolution Network (HRNet) [13] and use them to train a 1920x1440 at 30 fps.
generic DNN. All the models are trained on a NVIDIA GTX For practicality purpose, every video shows only one person
1070 GPU. performing one of three different activities: swimming, drown-
ing and staying idle. Every video in the dataset is presented
A. Method 1: Scene Classification
and labeled as ”activity id person.mp4”. The underscore (’ ’)
The first method is applied on the proposed dataset to delimiter is used to separate the fields of interest for a better
perform a standard video scene classification. The method visibility. The first field is the activity type (swim, drown,
trains a deep neural network (DNN) using transfer learning idle), followed by ’id’ to indicate the number of the video
in order to perform predictive labelling on the video frames to and finally ’person’ to indicate that we are labelling videos
classify the different human activities in water. We test three of a person. Individual frames are generated every 0.0333
different DNN architectures: ResNet50 [14], VGG16 [15], and seconds from their respective videos. The video frames are
MobileNet [16]. These networks have been pre-trained on the resized to (640x144) for the scene classification method and
ImageNet dataset that includes over 1.2 million images for to (640x368) for the pose estimation method. The dataset
1,000 object classes. We finally re-train the DNNs by fine- includes a total of 91 videos of an average of 57 minutes each.
tuning the parameters of the neural networks. Fine-tuning is They are split into overhead and underwater videos, with sub-
essentially training the network for several more iterations on a categories for the different three activities. There are 47 videos
new dataset. This process will adapt the generic filters trained for the overhead scenes, and 44 videos for the underwater
on the ImageNet dataset to the drowning recognition problem. scenes. The subjects in the dataset are all males ranging from
We train the networks for 15 epochs using stochastic gradient 18 – 21 years of age, mostly of middle eastern ethnicity. Fig.
descent. At around 15 epochs, all of the network architectures 1 and Fig. 2 show sample images from both overhead and
achieve near 100% accuracy on the training set, so no more underwater videos for each water activity. Table I presents a
improvement in training can be achieved. summary of the dataset characteristics displaying the number

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on April 27,2023 at 05:00:08 UTC from IEEE Xplore. Restrictions apply.
(a) Overhead - Swim
(a) Underwater - Swim

(b) Overhead - Drown


(b) Underwater - Drown

(c) Overhead - Idle


Fig. 1: Sample images from the overhead videos. From top to (c) Underwater - Idle
bottom, images show the swim, drown and idle cases for the
same person, respectively. Fig. 2: Sample images from the underwater videos. From top
to bottom, images show the swim, drown and idle cases for
the same person, respectively.
of videos and frames in the overhead and underwater scenes
for each water activity.
For performance evaluation, we use the accuracy, f1-score,
B. Experimental Setup precision and recall recognition rates for both applied methods,
scene classification and pose estimation.
We split the dataset into training/testing sets for the over-
head and underwater cases separately. We select the number C. Scene Classification Results
of frames in each category (overhead and underwater) to be
the same to avoid an unbalanced dataset, where one category For the scene classification method, we evaluate the per-
(overhead or underwater) is more represented than the other. formance of the three DNN architectures. We construct our
The training set consists of 90% of the images and the testing experiments as described in Section III.B. Table II shows
set includes the remaining 10%. For each of the overhead and the recognition accuracies of the three models when they are
underwater cases, we need to classify the testing set into one trained on the frames of the three activity classes, as described
of three classes corresponding to swim, drown and idle. in Table I. Our results show that ResNet50 significantly

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on April 27,2023 at 05:00:08 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Accuracy (%) of models trained and tested on the
proposed water behavior dataset for the three activity classes.
DNN Overhead Underwater Average
Accuracy (%) Accuracy (%) Accuracy (%)
ResNet50 98.70 95.0 96.85
MobileNet 89.90 76.60 83.25
VGG16 98.30 95.10 96.70

TABLE III: F1-score of models tested on the proposed water


behavior dataset for the drowning activity class.
DNN Overhead Underwater Average
F1-Score F1-Score F1-Score
ResNet50 0.93 0.97 0.96 Fig. 3: Water activity recognition using video scene classifica-
MobileNet 0.68 0.83 0.76 tion (Method 1) with ResNet50. (a) Swim activity. (b) Drown
VGG16 0.95 0.97 0.96
activity.
TABLE IV: Performance evaluation of ResNet50 tested on the
proposed water behavior dataset for the three activity classes
above water.
Class F1-Score Precision Recall
swim 0.99 0.98 0.99
drown 0.97 0.97 0.97
idle 0.98 1.00 0.99

TABLE V: Performance evaluation of ResNet50 tested on the


proposed water behavior dataset for the three activity classes
under water.
Class F1-Score Precision Recall
swim 0.95 0.98 0.97 Fig. 4: Water activity recognition using pose estimation clas-
drown 0.93 0.94 0.94
idle 0.97 0.91 0.94 sification (Method 2). (a) Above water. (b) Under water.

TABLE VI: Performance evaluation of the pose estimation


outperforms VGG16 for the overhead and underwater cases, method tested on the proposed water behavior dataset.
while it performs slightly better than VGG16. Category Accuracy F1-Score
We also analyze the results of our fine-tuning procedure (%) (Drown)
Overhead 99.1 0.98
on all three DNN networks for the drowning activity class, Underwater 97.7 0.97
in particular, to evaluate the performance of this method at Average 98.4 0.98
detecting a distress call. Table III shows the f1-score values
of the three models when they are tested on the drowning TABLE VII: Performance evaluation of the pose estimation
class frames, as described in Table I. The f1-score, in this method tested on the proposed water behavior dataset for the
case, evaluates the false positives and negatives, instead of three activity classes above water.
only assessing the performance of the models at detecting Activity Class F1-Score Precision Recall
swim 0.99 0.99 0.99
true positives and negatives, as represented by the recognition drown 0.98 0.98 0.99
accuracy. The results show that the ResNet50 and VGG16 idle 0.99 0.99 0.99
significantly outperform the MobileNet network.
Finally, we evaluate the performance of ResNet50 at iden- TABLE VIII: Performance evaluation of the pose estimation
tifying each of the three different activity classes. Table IV method tested on the proposed water behavior dataset for the
and Table V display the f1-score, precision and recall values three activity classes under water.
to describe the ResNet50 performance for the overhead and Activity Class F1-Score Precision Recall
underwater cases, respectively. The results show a very good swim 0.98 0.97 0.99
drown 0.97 0.98 0.96
recognition of all three activities. Fig. 3 shows two samples idle 0.98 0.98 0.98
of frames where the recognition outcome is displayed for the
swim and drown activity cases using ResNet50.
We evaluate the performance of the HRNet architecture in
D. Pose Estimation Results Table VI, which shows the recognition accuracies of the pose
For the pose estimation method, the HRNet detects the pose estimation method when it is applied to the frames of the three
keypoints in the frames of the testing set, as shown in Fig. 4. activity classes above water and under water, respectively.
The method shows a fast response of less than half a second at Moreover, Table VI shows the f1-score values for the particular
detecting the different keypoints across the swimmer’s body. case of the drown activity. Our results show that the pose

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on April 27,2023 at 05:00:08 UTC from IEEE Xplore. Restrictions apply.
estimation method performs well for the three activity classes [7] N. Salehi, M. Keyvanara, S. A. Monadjemmi, ”An automatic video-
for the overhead and underwater cases, slightly outperforming based drowning detection system for swimming pools Using active con-
tours,” International Journal of Image, Graphics and Signal Processing,
the scene classification method (Table II and Table III). vol. 8, no. 8, p. 1, August 2016.
Next, we compute the f1-score, precision and recall for [8] H. L. Eng, K. A. Toh, W. Y. Yau and J. Wang, ”DEWS: A live
each of the three activity classes for both above water and visual surveillance system for early drowning detection at pool,” IEEE
transactions on circuits and systems for video technology, vol. 18, no.
under water, as shown in Table VII and Table VIII. Here 2, pp. 196-210, Mar 2008.
too, the performance of the pose estimation method proves [9] https://www.aqua-conscience.com/
to be efficient at recognizing the different water behavior [10] https://www.wavedds.com/
[11] https://www.angeleye.tech/en/en-lifeguard/
activities. Again, comparing Table VII and Table VIII to Table [12] O. Russakovsky, J. Deng, H. Su, J.Krause, S. Satheesh, S. Ma, Z. Huang,
IV and Table V respectively, we notice that the pose estimation A. Karpathy, A. Khosla, and M. Bernstein, ”ImageNet large scale visual
method performs better than the video scene classification recognition challenge,” International Journal of Computer Vision, vol.
115, no. 3, pp. 211–252, April 2015.
method. [13] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep High-Resolution Rep-
We observe that the pose estimation method is independent resentation Learning for Human Pose Estimation,” arXiv:1902.09212
of the scene variations and solely relies on the body pose and [cs], Feb. 2019, Accessed: Jun. 15, 2021. [Online]. Available:
https://arxiv.org/abs/1902.09212.
the swimmer’s behavior in the water. This is in contrast to the [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ”ImageNet classification
video scene classification method that relies on the features with deep convolutional neural networks,” Advances in Neural Informa-
of the overall frame to learn the water activity class. This tion Processing Systems, vol. 25, pp. 1097–1105, 2012.
[15] K. Simonyan and A. Zisserman, ”Very deep convolutional networks for
might be affected by several scene factors that constrain the large-scale image recognition,” in International Conference on Learning
classification and result in a less robust recognition method Representations, San Diego, May 2015, pp. 1-14.
compared to the pose estimation method. [16] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T.
Weyand, M. Andreetto and H. Adam, ”Mobilenets: Efficient convolu-
IV. C ONCLUSION tional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861. April 2017.
We proposed a water behavior dataset of videos that were [17] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P.
captured above water and under water for drowning recogni- Dollár and C. L. Zitnick, ”Microsoft coco: Common objects in context,”
in European conference on computer vision, September 2014.
tion. The videos illustrated three types of water activities to
describe the behaviors of swimming, drowning and staying
idle. Moreover, we presented two image-based methods that
trained different deep learning models on the proposed dataset.
Both methods, the scene classification and pose estimation
methods, proved to be efficient at recognizing each of the wa-
ter behavior activities. In the first method, ResNet50 showed to
perform the best. However, the pose estimation method slightly
outperformed the scene classification method, knowing that the
former depended less on the scene variations. The recognition
process relied on keypoint features that described the body
pose, which better related to the concept of human behavior
in the water.
ACKNOWLEDGMENT
The authors would like to thank Luke Cunningham and Kim
Beasley of BlueGuard - Al Wasl Swimming Academy who
supported with creating the proposed water behavior dataset.
R EFERENCES
[1] D. You, G. Jones, K. Hill, T. Wardlaw and M. Chopra M, ”Levels and
trends in child mortality, 1990-2009,” Lancet, vol. 376, no. 9745, pp.
931–933, September 2010.
[2] M. Peden, K. Oyegbite, J. Ozanne-Smith and A. A. Hyder, ”World
Report on Child Injury Prevention: Summary,” Geneva, Switzerland:
World Health Organization, 2008.
[3] Redwoods Group, ”Teen dies in accident at YMCA,” available
at: http://www.redwoodsgroup.com/YMCAs/RiskManagement/ Aquatic-
sAlerts.html. Accessed July 23, 2020.
[4] J. Duncan, G. W. Humphreys GW, ”Visual search and stimulus similar-
ity,” Psychological Review, vol. 96, pp. 433–458, July 1989.
[5] J. M. Wolfe, T. S. Horowitz, N. M. Kenner, ”Rare items often missed
in visual searches,” Nature, vol. 435, pp. 439–440, May 2005.
[6] W. Lu, Y. P. Tan, ”A vision-based approach to early detection of
drowning incidents in swimming pools,” IEEE transactions on circuits
and systems for video technology, vol. 14, no. 2, pp. 159-78, March
2004.

Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on April 27,2023 at 05:00:08 UTC from IEEE Xplore. Restrictions apply.

You might also like