Paper 503
Paper 503
Presentations
Abstract. This project integrates computer vision and gesture recognition tech-
niques to develop an interactive slideshow navigation system. The program
utilizes the OpenCV library for image processing and the CV zone library for
hand tracking. Users can control the slideshow by performing specific hand
gestures in front of a webcam. The system begins by allowing the user to select
a folder containing PNG images, which are then sequentially renamed. The
main functionality involves gesture-based control for navigating through the
images in the slideshow. Hand gestures, detected using the Hand Tracking
Module, are mapped to actions such as moving to the previous or next slide,
erasing annotations, and showing and drawing pointers on the images. Addi-
tionally, the system provides real-time feedback by displaying the webcam feed
alongside the slideshow. The interactive nature of this project makes it suitable
for presentations or educational purposes where users can dynamically interact
with the displayed content.
1 Introduction
2 Literature Survey
In their study, authors Devivara Prasad et. al. [6], explores the significance of gesture
recognition in Human-Computer Interaction (HCI), emphasizing its practical
applications for individuals with hearing impairments and stroke patients. They used
image feature extraction tools and AI-based classifiers for 2D and 3D gesture
recognition. Their proposed system harnesses machine learning, and real-time image
processing with Media Pipe, and OpenCV to enable efficient and intuitive
presentation control using hand gestures, addressing the challenges of accuracy and
robustness. The research focuses on enhancing the user experience, particularly in
scenarios where traditional input devices are impractical, highlighting the potential of
gesture recognition in HCI.[13][15]
Reethika et. al. [7], presents a study on Human-Computer Interaction (HCI) with a
focus on hand gesture recognition as a natural interaction technique. It explores the
significance of real-time hand gesture recognition, particularly in scenarios where
traditional input devices are impractical. The methodology involves vision-based
techniques that utilize cameras to capture and process hand motions, offering the
potential to replace conventional input methods. The paper discusses the advantages
and challenges of this approach, such as the computational intensity of image
processing and privacy concerns regarding camera usage. Additionally, it highlights
the benefits of gesture recognition for applications ranging from controlling computer
mouse actions to creating a virtual HCI device [16].
Hajeera Khanum [8], outlines a methodology that harnesses OpenCV and Google's
MediaPipe framework [167[18] to create a presentation control system that interprets
hand gestures. Using a webcam, the system captures and translates hand movements
into actions such as slide control, drawing on slides, and erasing content, eliminating
the need for traditional input devices. While the paper does not explicitly enumerate
3
insights into the potential for further improvement and the use of filtering methods to
mitigate the effects of poor lighting, contributing to the field of dynamic hand gesture
recognition.
RutikaBhor et. al. [13] presents a real-time hand gesture recognition system for
efficient human-computer interaction. It allows remote control of PowerPoint
presentations through simple gestures, using Histograms of Oriented Gradients and K-
Nearest Neighbor classification with around 80% accuracy. The technology extends
beyond PowerPoint to potentially control various real-time applications. The paper
addresses challenges in creating a reliable gesture recognition system and optimizing
lighting conditions. It hints at broader applications, such as media control, without
intermediary devices, making it relevant to the human-computer interaction field.
References cover related topics like gesture recognition in diverse domains.
3 Methodology
The project's primary objective is to make the presentation easy for the presenter to
deliver in a comfortable by controlling the complete presentation through hand
gestures.
The whole concept of this project is demonstrated in the Fig.1 It gives a complete
step by step process from uploading of files to till terminated of the presentation.
3.1 Data Collection
In this project the input data is given by the user in the form of ppt slides in images
format where the user will convert the ppt slides into images and those images will be
stored in a folder. The folder with images is the data for this project, specified in the
Fig. 1.
5
To rename and organize a set of PNG images, the initial step involves assigning
sequential numbers to them in the desired order. This can be achieved through
scripting or batch operations using programming or command-line tools. Once
renamed, the images will have consecutive identifiers, making it easier to organize
and retrieve them in a logical order.
After successfully renaming the PNG images with sequence numbers, the next step
is to sort them based on these assigned numerical values. Sorting ensures that the
images are used in the correct order, following the numerical sequence. This process
is crucial when creating presentations (PPT) or when a specific order is required for
image usage, as it ensures that the images are in the desired sequence for easy access
and presentation purposes. Overall, these procedures simplify the task of organizing
and working with PNG images in a structured and orderly manner. After uploading the
files folder, the data preprocessing starts renaming the images and sorting immediately and
storing them back in the folder takes place as show in Fig.2.
Hand Detection: The method recognizes and localizes a hand's position within a
video frame. The hand detection is the key objective in this research, and we em -
ployed the Kanade-Lucas-Tomasi (KLT) algorithm to identify and locate all known
objects in a scene [14]. The algorithm starts by identifying feature points in the first
frame of a video or image sequence. These features could include corners, edges, or
any other distinguishing points in the image. The Harris corner detector [15] is com-
monly used for feature detection. It detects corners by analyzing intensity changes in
various directions. Once the features are identified in the first frame, the algorithm
attempts to track them in subsequent frames. It is assumed that the features move in
small steps between frames.
A small window is considered around each initial frame feature point. The
algorithm searches the next frame for the best window match. Feature point optical
flow is estimated using the Lucas-Kanade method [10]. The motion is assumed to be
6
constant in a local neighborhood around the feature point. The optical flow equation
is solved for each window pixel around the feature point. Motion parameters (w) and
spatial intensity gradients (Ix and Iy) are related by this equation. The KLT algorithm
analyzes spatial gradient matrix as specified in equation 1, eigenvalues to determine
feature point tracking reliability. Spatial gradients of intensity in the window around
the feature point determine the matrix. A feature point is reliable for tracking if its
matrix eigenvalues are above a threshold. Fig. 3 describes the tracking of the hand
with the help of matrix eigen values.
(1)
[ ∑ w2 I 2x ∑ w2 I x I y
∑ w 2 I x I y ∑ w 2 I 2y ]
Finger Tracking: After detecting the hand, the algorithm records the location of
individual fingers. It may entail estimating hand landmarks to pinpoint crucial spots
on the fingers, particularly the fingertips.
Finger State Classification: The algorithm defines each finger's state as "up" (1) or
"down" (0) based on its location and movement. To establish these classifications, it
most likely evaluates the angles and placements of the fingers compared to a refer-
ence hand form.
Finger State Combination: The algorithm creates a combination of finger states for
the entire hand. For instance, if all fingers are labeled "up," it may indicate "5". If all
the fingers are marked "down," it may indicate "0."
4 Results
The hand tracking mechanism, finger state classification, and combination allow each
finger to be identified and assigned to a specific task. Figure 4 depicts this classifica-
tion for the purpose of presentation. The first gesture is used to move the slide to the
previous slide, the second gesture is used for the next slide, the third one is used for
7
the pointer to point the object on the slide, the fourth gesture is used to delete the
object drawn with the help of the fifth gesture, and the final gesture is used to exit the
presentation.
There were several experiments that we carried out in order to assess the effectiveness
of the system. The first experiment was designed to determine how accurate the de-
tection and classification of hand gestures turned out to be. We discovered that the
system was able to accurately detect and categorize hand gestures in most situations.
Figure 5 Shows the hand tracking and gesture accuracy with the help of KLT algo-
rithm of the system, an accuracy rate of approximately 95%. We conducted a second
experiment in which we examined the system's capability of controlling a presentation
with hand gestures. It was discovered by us that the system was able to control the
slides in a smooth manner and carry out a variety of actions, such as moving forward
or going back to the slide that came before it.
In the current model, we simply set the gesture array using the built-in Hand
Tracking Module, saving time on training, and collecting hand gestures. Converting
PowerPoint to images and uploading them will take very little time. The accuracy of
the built-in model ranges from 95 to 97%. The previous model required more time for
8
hand tracking because there was no built-in model for detecting hand gestures, and
the accuracy was less than 95%.
For this project HD camera is mandatory, the range of normal inbuilt cameras in
existing laptops is 5 meters. To get a long range of gesture recognition we need to use
external long-range cameras. Once the termination gesture is used the files will be
deleted. If the user wants to use the files again then they should upload the files again.
5 Conclusion
This project is an innovative and interactive presentation control system that utilizes
computer vision and gesture recognition. It offers a hands-free and engaging way to
interact with your presentation slides. With the ability to control slide navigation
through specific hand gestures, such as moving to the next or previous slide, your
project provides a convenient and intuitive alternative to traditional clickers or
keyboard shortcuts. Additionally, the option to write on the slides and erase content
by making hand movements enhances the interactivity of your presentations. The
pointer highlighter feature allows you to draw attention to specific details on your
slides, making it a powerful tool for emphasizing key points.
Furthermore, the capability to terminate the presentation with a gesture provides an
efficient way to conclude your talk. Overall, your project empowers presenters to
connect with their audience more dynamically and engagingly, all while using the
OpenCV interface and hand-tracking technology. It is a valuable addition to the realm
of presentation tools, enabling more interactive and captivating communication. The
future enhancement for this will be adding voice commands along with the hand
gesture and presenter recognition.
The challenges in this are that we are currently using 6 gestures to control the pre-
sentation, but we can improve it by adding more gestures. We intend to add speech
commands to perform operations such as moving the slides back and forth.
References
1. D.O. Lawrence, and M.J. Ashleigh, Impact Of Human-Computer Interaction (HCI) on
Users in Higher Educational System: Southampton University As A Case Study, Vol.6, No
3, pp. 1-12, September (2019)
2. Sebastian Raschka, Joshua Patterson, and Corey Nolet, Machine Learning in Python: Main
Developments and Technology Trends in Data Science, Machine Learning, and Artificial
Intelligence, (2020)
3. Xuesong Zhai, Xiaoyan Chu, Ching Sing Chai, Morris Siu Yung Jong, Andreja Istenic,
Michael Spector, Jia-Bao Liu, Jing Yuan, Yan Li, A Review of Articial Intelligence (AI) in
Education from 2010 to 2020, (2021)
4. D. Jadhav, Prof. L.M.R.J. Lobo, Hand Gesture Recognition System to Control Slide Show
Navigation IJAIEM, Vol. 3, No. 4 (2014)
5. Ren, Zhou, et al. Robust part-based hand gesture recognition using Kinect sensor, IEEE
Transactions on multimedia 15.5, pp.1110-1120, (2013), Page 8-11.
9