0% found this document useful (0 votes)
12 views11 pages

Resnet 152

Uploaded by

moomina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views11 pages

Resnet 152

Uploaded by

moomina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

AUTOMATED HUMAN ACTIVITY RECOGNITION FROM

CONTROLLED ENVIRONMENT VIDEOS

by

Pranay Mandadapu

W
A Thesis Submitted in
IE
Partial Fulfillment of the

Requirements for the Degree of


EV

Master of Science
PR

in Computer Science

at

The University of Wisconsin-Milwaukee

December 2023
ABSTRACT

AUTOMATED HUMAN ACTIVITY RECOGNITION FROM


CONTROLLED ENVIRONMENT VIDEOS

by

Pranay Mandadapu

The University of Wisconsin-Milwaukee, 2023


Under the Supervision of Professor Rohit J. Kate

This thesis explores deep learning methods for Human Activity Recognition (HAR) from videos

to automate the annotation of human activities in videos. The research is particularly relevant for

continuous monitoring in healthcare settings such as nursing homes and hospitals. The innovative

W
part of the approach lies in using YOLO models to first detect humans in video frames and then
IE
isolating them from the rest of the image for activity recognition which leads to an improvement

in accuracy. The study employs pre-trained deep residual networks, such as ResNet50, ResNet152-
EV
V2, and Inception-ResNetV2, which were found to work better than custom CNN-based models.

The methodology involved extracting frames at one-minute intervals from 12-hour-long videos of

18 subjects and using this data for training and testing the models for human activity recognition.
PR

This thesis contributes to HAR research by demonstrating the effectiveness of combining deep

learning with advanced image processing, suggesting new directions for healthcare monitoring

applications.

ii
W
IE
© Copyright by Pranay Mandadapu, 2023
All Rights Reserved
EV
PR

iii
TABLE OF CONTENTS
LIST OF FIGURES ...................................................................................................................... VI
LIST OF TABLES...................................................................................................................... VII
LIST OF ABBREVIATIONS.................................................................................................... VIII
ACKNOWLEDGEMENTS .......................................................................................................... IX
CHAPTER 1 ................................................................................................................................... 1
1 I NTRODUCTION:......................................................................................................................... 1
1.1 Background and Research Challenge......................................................................... 1
1.2 Significance of Research ............................................................................................. 2
1.3 Objectives and Methodology....................................................................................... 2
1.4 Hypothesis Testing and Model Development.............................................................. 3
CHAPTER 2 ................................................................................................................................... 4
2 LITERATURE REVIEW:.......................................................................................................... 4

W
CHAPTER 3 ................................................................................................................................... 7
3. METHODOLOGY AND MATERIALS ............................................................................................ 7
IE
3.1 Data Source....................................................................................................................... 7
3.2 Machine Learning and Deep Learning Techniques.......................................................... 8
3.2.1 Classification.............................................................................................................. 8
3.2.2 Neural Networks ........................................................................................................ 9
EV
3.2.3 Convolution Neural Networks ................................................................................. 10
3.2.4 Pre-trained Image Processing Models ..................................................................... 10
3.2.4.1 ResNet50........................................................................................................... 10
3.2.4.2 ResNet152V2.................................................................................................... 11
3.2.4.3 Inception-ResNet V2......................................................................................... 12
PR

3.2.5 Object detection system ........................................................................................... 14


3.2.5.1 YOLO V3.......................................................................................................... 14
3.2.5.2 YOLO V8.......................................................................................................... 15
3.2.6 Model Evaluation ..................................................................................................... 16
3.2.6.1 Accuracy ........................................................................................................... 16
3.2.6.2 Precision............................................................................................................ 16
3.2.6.3 Recall (or Sensitivity) ....................................................................................... 16
3.2.6.4 F1 Score ............................................................................................................ 16
3.2.7 Python Libraries ....................................................................................................... 17
3.3 Methodology.................................................................................................................... 19
3.3.1 Data Pre-processing & Selection of Frames ............................................................ 19
3.3.2 Data Distribution of Images ..................................................................................... 19
3.3.3 Pre-Trained Models Training................................................................................... 23
3.3.4 Evaluation of Machine Learning Model .................................................................. 24
CHAPTER 4 ................................................................................................................................. 26
4. RESULTS................................................................................................................................. 26
4.1 Results and Analysis ....................................................................................................... 26
iv
4.1.1 Evaluation of Inception-ResNet V2 without YOLO image pre-processing ............ 26
4.1.2 Evaluation of Inception-ResNet V2 with YOLO image pre-processing ................. 27
4.1.3 Subject-wise evaluation without YOLO pre-processing. ........................................ 30
4.1.4 Subject-wise evaluation with YOLO pre-processing. ............................................. 34
4.2 Discussion ....................................................................................................................... 35
CHAPTER 5 ................................................................................................................................. 37
5 CONCLUSION........................................................................................................................... 37
5.1 Summary.......................................................................................................................... 37
5.2 Limitations and Future Work .......................................................................................... 37
BIBLIOGRAPHY ......................................................................................................................... 39

W
IE
EV
PR

v
LIST OF FIGURES
FIGURE 3.1 COLLAGE OF DIFFERENT SUBJECTS DOING DIFFERENT ACTIVITIES............................................................................................. 8
FIGURE 3.2 YOLO MODEL OBJECT AND HUMAN DETECTION WITH PROBABILITIES ...................................................................................15
FIGURE 3.3: DATA DISTRIBUTION OF UNCROPPED IMAGES AMONG DIFFERENT CLASSES ACROSS DIFFERENT SUBJECTS ..............................21
FIGURE 3.4: FROM TOP TO BOTTOM, IMAGE FRAME FROM THE VIDEO, HUMAN DETECTED WITH YOLO V8 AND CROPPED HUMAN
SUBJECT....................................................................................................................................................................................22
FIGURE 3.6: A RCHITECTURE O VERVIEW ...............................................................................................................................................23
FIGURE 4.1: TEST SUBJECT 1031 – SITTING POSITION ..........................................................................................................................31
FIGURE 4.2: TRAIN SUBJECT 1002 – SITTING POSITION. .......................................................................................................................32
FIGURE 4.3: TEST SUBJECT 1073 – STANDING POSITION. .....................................................................................................................33
FIGURE 4.4: TRAIN SUBJECT 1025 – STANDING POSITION. ...................................................................................................................33

W
IE
EV
PR

vi
LIST OF TABLES
TABLE 3.1 PRE-TRAINED MODELS PERFORMANCE ON IMAGENET DATASET.............................................................................................13
TABLE 3.2 DATA DISTRIBUTION OF UNCROPPED IMAGES AMONG DIFFERENT CLASSES AND SUBJECTS.......................................................20
TABLE 4.1 CONFUSION MATRIX OF INCEPTION-RESNET V2 WITHOUT YOLO IMAGE PRE-PROCESSING....................................................26
TABLE 4.2 YOLO DETECTION RATE FROM THE ORIGINAL DATASET .........................................................................................................28
TABLE 4.3 CONFUSION MATRIX OF INCEPTION-RESNET V2 WITH YOLO IMAGE PRE-PROCESSING ..........................................................29
TABLE 4.4 SUBJECT-WISE ACCURACY WITHOUT YOLO IMAGE PRE-PROCESSING.....................................................................................30
TABLE 4.5 SUBJECT-WISE ACCURACY WITH YOLO IMAGE PRE-PROCESSING ..........................................................................................34

W
IE
EV
PR

vii
LIST OF ABBREVIATIONS
HAR Human Activity Recognition
CNN Convolutional Neural Network
YOLO You Look Only Once
IoHT Internet of Healthcare Things
IoT Internet of Things
ML Machine Learning
ResNet Residual Network

W
IE
EV
PR

viii
ACKNOWLEDGEMENTS

I extend my heartfelt thanks to my advisor, Prof. Rohit J. Kate, for his invaluable guidance and

support throughout my thesis research. His endless patience, encouragement, and dedication have

shaped my research journey. His mentorship has been instrumental in the completion of my work.

I am also grateful to Prof. Scott Strath and the Department of Kinesiology at the University

of Wisconsin Milwaukee for their generosity in providing the experimental data for this study.

W
Thanks to Prof. Jun Zhang and Prof. Scott Strath for their willingness to serve on my thesis

committee.
IE
Lastly, my most profound appreciation goes to my parents. Their constant love,
EV
unwavering support, and encouragement have been the bedrock of my academic pursuits. I am

eternally grateful for their guidance, faith in me, and all the sacrifices they have made on my behalf.
PR

ix
Chapter 1

1 Introduction:

1.1Background and Research Challenge


This thesis explores the Human Activity Recognition (HAR) [2], focusing on developing deep

learning models to annotate human activities in videos automatically. The central research

motivation is the inefficiency and lack of scalability of manual annotation for video datasets. For

instance, in our dataset, human annotators meticulously labeled every second of 12-hour-long

videos for each of the 18 subjects. These annotations span diverse activities, including sitting,

W
walking, standing, lying, crouching/kneeling/squatting, and other less frequent postures like
IE
stepping and dark/obscured/off-frame (oof) scenarios. This manual process is time-consuming,

labor-intensive, and costly, thus highlighting the need for an automated solution.
EV
The motivation for this research is deeply rooted in the desire to enhance the efficiency and

accuracy of activity recognition in settings where continuous monitoring is crucial. One of the

driving inspirations behind this work is the potential application of automated HAR systems in
PR

nursing homes and hospitals [1]. In such environments, continuous monitoring is vital for patient

safety and care, yet resource constraints and the impracticality of round -the-clock manual

observation often hinder it. By automating the activity recognition process, this research aims to

provide a scalable solution that could significantly improve patient monitoring, ensuring timely

intervention and care while reducing the workload on healthcare staff.

1
1.2 Significance of Research

The novelty of this research is in going beyond the conventional use of Convolutional Neural

Networks (CNNs) in Human Activity Recognition (HAR). While employing CNNs and pre-

trained models like ResNet 50 [6] in HAR is not novel, this research introduces a unique

application of these deep-learning techniques. The novelty lies in the integration of advanced

image processing using Yolo V8 [13] for detecting and isolating humans in the video frames and

building a separate model for activity recognition for these images.

Specifically, this study innovatively employs two separate models: one trained on the

W
original, unaltered dataset and another on a subset where humans are isolated from their

environment. This bifurcated model approach is designed to enhance the accuracy and efficiency
IE
of activity recognition. The decision of which model to use is dynamically based on whether a

human is detected in the frame, allowing for a more focused and precise annotation of human
EV
activities, or not in which case the model trained on the unaltered dataset is employed.

Such an approach has not been extensively explored in existing HAR research,
PR

particularly in a controlled environment like a metabolic chamber.

1.3 Objectives and Methodology

The primary objective of this research is to develop an accurate model capable of automatically

annotating human activities from video frames. This study utilizes a dataset comprising 18

subjects, each captured in extensive 12-hour-long video sessions doing activities of daily living in

a metabolic chamber. The methodology involves initially extracting frames from these videos at

one-minute intervals. These frames are then used for training and for evaluating by employing the

trained model to label the observed activities automatically.

Reproduced with permission of copyright owner. Further reproduction prohibited without permission.

You might also like