Chapter 2
Chapter 2
LITERATURE SURVEY
Human Activity Recognition (HAR) has been a prominent research domain in computer vision,
particularly with the evolution of deep learning. Traditional HAR techniques used handcrafted
features like HOG (Histogram of Oriented Gradients) and SIFT (Scale-Invariant Feature
Transform), often paired with classifiers such as Support Vector Machines (SVMs). However,
these methods had limitations in dealing with complex backgrounds and dynamic postures.
With the introduction of Convolutional Neural Networks (CNNs) and transfer learning, HAR
performance has seen remarkable improvements.
This chapter discusses the existing research approaches and highlights key findings that shaped
the direction of this project.
3
2.1 Existing Research
Several recent studies have investigated human activity recognition using image data:
CNN-Based Models: CNNs automatically learn spatial hierarchies from image data. Early
HAR models utilized simple CNNs to extract deep features and classify activities. While
effective, they often suffered from overfitting on small datasets.
Transfer Learning with Pretrained Models: Researchers have employed models like
VGG16, ResNet50, InceptionV3, and MobileNetV2, pretrained on ImageNet, for HAR tasks.
These models provide high-level features that generalize well across domains:
VGG16/VGG19 offer deep, structured layers good for static posture classification.
ResNet50 introduces skip connections, helping mitigate the vanishing gradient problem.
InceptionV3 captures multi-scale features efficiently.
EfficientNetB7 balances model accuracy and parameter size through compound scaling.
Real-Time and Video-Based HAR: Some studies extend HAR to video sequences using 3D
CNNs or LSTM networks. However, these require temporal data and high computational
power, unlike still image-based classification.
4
2.3 Summary
This literature review highlights the evolution from handcrafted feature methods to modern
deep learning-based techniques for human activity recognition. The surveyed studies
emphasize that no single model is universally optimal; hence, an ensemble of carefully selected
models can provide a more robust and accurate HAR system. These insights strongly influenced
the design of our proposed solution, where multiple deep learning architectures were trained
and integrated into an ensemble framework to enhance performance and reliability.