Real World Activity Summary For Senior Home Monitoring: Multimedia Tools and Applications July 2011
Real World Activity Summary For Senior Home Monitoring: Multimedia Tools and Applications July 2011
Real World Activity Summary For Senior Home Monitoring: Multimedia Tools and Applications July 2011
net/publication/221263478
CITATIONS READS
19 689
4 authors:
Some of the authors of this publication are also working on these related projects:
Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-identification View project
All content following this page was uploaded by Guo Ye on 08 April 2014.
ABSTRACT
From a senior person's daily activities, one can tell a lot about
the health condition of the senior person. Thus we believe that
senior home activity analysis will play an important role in the
health care of senior people. Toward this goal, we propose a
senior home activity summary system. One challenging prob
lem in such a real world application is that senior's activities
are usually accompanied by nurse's walking. It is impractical
to predefine and label all the potential activities of all the po
tential visitors. To address this problem, we propose a novel
feature filtering technique to reduce or eliminate the effects of
the interest points that belong to other people. To evaluate the
proposed activity summary system, we have collected a senior
home activity dataset (SAR), and performed activity recogni
tion for eating and walking classes. The experimental results
show that the proposed system provides quite accurate activity
summaries for a real world application scenario.
Index Terms- Activity Recognition, Senior Home Moni
toring, Health Care, Activity Summary, Feature Filtering, Tem
poral Smoothing
1. INTRODUCTION
Fig. 1. Examples of Our Senior Activity Recognition Dataset.
The world is aging. Many countries will face severe population
aging problems in the near future. In Japan, one of the fastest
aging countries in the world, the ratio of the population under . . . .
plIed outdoors �d md�o�s thus provldmg a�c�rate reports. of
20 to the popUlation over 65 was 9.3 in 1950. By 2025, this
the elderly ph?,slcal actIvity: More�ver, their l!llplementatI?n
ratio is predicted to be 0.59. In China, the percentage of the el-
seems be feasible due to usmg mobIle ph�nes mtegrated With
derly people over 65 is 7.6%. As the world ages, the health care
many sensors, such �s accelerometer, audIO, event, GPS, �t�.
of the senior people is becoming a major social problem. Re-
searchers have proposed to use various types of sensors to mon- f,I0wever, �hese solutIOns do �O! handle ��re complex aC�lVI
tIes, especlall?, for elderly ��tIVIty recogmtlOn. Our goal IS to
itor people's health conditions at home [1 2 3 4 5]. Moreover
people are investigating home design id�a� t� �nsure that th� �evelop a s�mor p�rson actIVIty sum�ary syste,m th�t ��tomat-
Ically prOVides daIly reports of a semor person s activItIes.
elderly live in their own homes comfortably [1].
We believe that human activity analysis will play an impor- In the past few years, human activity analysis has attracted
tant role in the health care of senior people. From the daily more and more attentions from researchers in computer vision
activities of a senior person, such as how many meals the per- and multimedia community [6, 7, 8, 9, 10]. However, there has
son eats each day, how much time the person sits on the sofa, been little work on senior home activity analysis partly because
how much walk the person does, etc., one can tell a lot about there is no dataset available. We hope that our dataset will
the person's health conditions. Such information will be very contribute to the research community and inspire more research
useful for the senior person's relatives as well as the medical activities in this direction. Fig. 1 shows some samples of our
doctors. Hence, some work have already been proposed to ac- dataset. One practical problem that we encountered in working
tivity recognition using on-body sensors integrated with mo- with the data is that when a senior person is doing non-walking
bile phones [3, 4, 5]. In general, those solutions could be ap- activities such as eating or not at home, the nurse sometimes
walks around in the room. As a result, the video clip may be
This research was partially supported by the grant from NSFC(No. incorrectly classified as walking. This is a general problem of
61075045), Program for New Century Excellent Talents in University, the Na
tional Basic Research Program of China (No. 2011CB707000), and the Fun
two different actions occurring at the same time. This problem
damental Research Funds for the Central Universities (No. ZYGX2009X0I3). has not been addressed before. In this paper, we propose to use
We also thank the anonymous reviewers for their valuable suggestions. a feature filtering approach to remove the effect of the nurse
approaches typically do not make distinctions between motions (Xk,i, Yk,i) denote the pixel position of Pk,i, 1 :S i :S Nk,
from different people. For example, it is perfectly fine to clas k 2: 1. Let Hk,i denote the HOG descriptor at Pk,i' We use
sify both senior walking and nurse walking as walking. But fk to denote the set of all the HOG descriptors for frame k.
in our application, the system would provide a wrong activity That is fk = {Hk,l, ..., Hk,Nk}'
summary of the senior if we did not make distinctions between Our system keeps two templates: a static template 80 and a
the senior and the other people. dynamic template 8k. The static template 80 is created offline.
Similar to [8], we represent an action as a space-time object It consists of the HOG descriptors of all the STIPs that are ex
and characterize it by a collection of Spatio-Temporal Interest tracted from a small number of manually-selected frames. We
Points(STIPs) [7]. We denote a video sequence by V = {It} manually check to make sure that these STIPs belong to the
and its STIPs by Q ={di}. We use the NBMIM approach as senior person.
the action classifier due to its efficiency, and the detail refers to The dynamic template is maintained automatically and
[8]. changes from frame to frame. Assuming the current frame is k,
the dynamic template 8k-1 is the set of the HOG descriptors
2.2. Feature Filtering
of the STIPs in frame k - 1 that belong to the senior person.
Therefore the dynamic template at frame k is a subset of fk-l'
In our activity summary system, we are interested in classify To determine which STIPs belong to the senior at each frame,
ing the activities of a senior person into four categories: 'Se our system keeps track of the center of the senior person. We
nior Walking', 'Senior Eating', 'Senior OtherAction', and 'Se will describe the tracking algorithm later in this section. After
nior NoAction'. 'Senior NoAction' means the senior is not at we obtain the center of the senior person at frame k 1, we re
-
home, which is detected when there is no motion. We collected move all the STIPs whose Euclidean distances from the center
training samples for the other three action categories: 'Senior are larger than a pre-specified threshold. The HOG descriptors
Walking', 'Senior Eating', 'Senior OtherAction'. The training of the remaining points form the dynamic template 8k-l.
data only contains the activities of the senior person. Given the static template 80 and dynamic template 8k-1.
for each STIP in frame k we compute a weight based on 3. OUR SENIOR ACTIVITY RECOGNITION
the matching error between the point and the two templates. DATASET
To simplify notation, we denote 0k-I = 80 U 8 k-l. Let
d(Hk,i, Ok-I) denote the closest distance between Hk,i and In order to evaluate our system on real world senior activity
the vectors in Ok-I, that is, data, we have collected a senior activity dataset of 4 month
long (we named this dataset SAR, which is short for Senior
d(Hk,i, Ok-I) argminHEok_11lH - Hk,ill,
= ( 1) Activity Recognition, and it is located at Jinrui Honghe Gar-
. . .
where II . II is the L2 norm. The slffillanty between Hk,i and den, Chengdu, China)l. Daily activities in senior homes were
the templates is defined as recorded by using one SONY DCR-SR68E camera per room.
In total, there are 6 senior people involved in this project. The
recording lasts for 10 days for each person. The total size of the
(2) recorded data is approximately 1.8TB with 25f/s. Fig.l shows
some example images from the recorded video dataset.
where a is a variance parameter which is set empirically [ 12]. Data labeling is extremely labor intensive. It is still ongo
tk,i is used as the contribution weight of Hk,! for activity ing. So far, we have finished labeling two activity categories
classification. Let sk i denote the score of Hk,i wIth respect to for one senior person. The two activity categories are 'Eating'
activity class c. The total weighted score of frame k is and 'Walking'. In this paper, we report the performance of our
system using the data of this senior person. Each video clip is
(3) classified as 'Senior Eating', 'Senior Walking', 'Senior Other
Actions', and 'Senior NoActions'. A video clip is regarded as
no action if no motion is detected from the video clip.
Next we describe how to update the center of the senior so
that we can update the dynamic template to get ready for the
next frame. Let Pk denote the position of the center of the se 4. EXPERIMENTAL RESULTS AND ANALYSIS
start to dominate (e.g. when the nurse walks out of the room), Ihttp://www.uestcrobot.netlsenioractivity/
the dynamic template will be corrected. 2http://www.irisa.fr/vistalEquipelPeoplelLaptev/interestpoints.html
SE SW SO SN
NW NO NN NW I NN NW NO NN NW NO NN
SE 10 5 53 0 0 110 2 23 0/0 2/0 0
SW 1 0 0 2 12 14/2 3 9 5/0 0/0 0
SO 1 1 1 0 0 7/20 16 79 2/2 3/0 0
SN 0 0 0 0 0 0/0 0 2 0/5 0/5 50
# Clips 12 6 54 2 12 22 21 III 7 5 50
31.82% 0% 0%
Rate 83.30% 83.30% 98.15% 100% 100% 76.19% 71.17% 100%
/90.91% /71.42% 1100%
filtering. The second integer '2' is the number of video clips [3] S. Consolvo, D.W. McDonald, T. Toscos, M.Y. Chen,
being classified as 'SW' with feature filtering. For those en J. Froehlich, B. Harrison, P. Klasnja, A. LaMarca,
tries with just a single integer, the results are the same with or L. LeGrand, R. Libby, et aI., "Activity sensing in the
without feature filtering. For example, the nurse walking usu wild: a field trial of ubifit garden, " in Proceeding of the
ally ('NW') does not affect the results of the senior walking, twenty-sixth annual SIGCHI coriference on Human fac
thus the results in column 'SW'+'NW' all have a single num tors in computing systems. ACM, 2008, pp. 1797-1806.
ber. The final row shows the recognition rates of the activities
in each column. We summarize the results of Table 1 and Table [4] T. Gu, S. Chen, X. Tao, and J. Lu, "An unsupervised ap
2 into Table 3. From Table 3, the absolute improvements of the proach to activity recognition and segmentation based on
activity recognition rates are 8.27% and 6.7% for the second object-use fingerprints, " Data & Knowledge Engineering,
day and third day, respectively. In total, the activity recognition vol. 69, no. 6, pp. 533-544, 2010.
rate improved from 80.82% without feature filtering to 88.45%
[5] E. Miluzzo, N.D. Lane, K. Fodor, R. Peterson, H. Lu,
with feature filtering. Furthermore, with temporal smoothing,
M. Musolesi, S.B. Eisenman, X. Zheng, and A.T. Camp
the recognition rate improved to 91.59%.
bell, "Sensing meets mobile social networks: the design,
implementation and evaluation of the cenceme applica
Before After Improve(%) tion, " in Proceedings of the 6th ACM coriference on Em
2th Day � =76.82% � =85.1O% 8.27% bedded network sensor systems. ACM, 2008, pp. 337-
3th Day � =86.60% frffi =88.52% 6.70% 350.
Table 2. The total activity recognition performance compari [6] J.C. Niebles, H. Wang, and Fei-Fei Li, "Unsuper-
son. vised learning of human action categories using spatial
temporal words, " International Journal of Computer Vi
sion, vol. 3, pp. 299-318, 2008.