0% found this document useful (0 votes)
4 views50 pages

Ss

The document outlines a project aimed at developing a real-time air-writing recognition system that accurately identifies alphanumeric characters using hand gestures. It discusses the limitations of existing systems and proposes a novel approach leveraging advanced algorithms for improved accuracy and reduced complexity. The document details the system architecture, modules involved, functional and non-functional requirements, as well as implementation specifics using Python and various libraries.

Uploaded by

chandanchanduyr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views50 pages

Ss

The document outlines a project aimed at developing a real-time air-writing recognition system that accurately identifies alphanumeric characters using hand gestures. It discusses the limitations of existing systems and proposes a novel approach leveraging advanced algorithms for improved accuracy and reduced complexity. The document details the system architecture, modules involved, functional and non-functional requirements, as well as implementation specifics using Python and various libraries.

Uploaded by

chandanchanduyr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Contents

• Introduction
• Existing System
• Proposed System
• Modules
• Functional Requirements
• Non-functional Requirements
• Technical Architecture
• Block Diagram
• Flow Chart
• Computational Resources
• Algorithms
• Implementation
• Screenshots
• Test Cases
• Future Scope
• Conclusion
• References
Introduction
• The primary objective of this project is to develop a real-time system for accurately
recognizing alphanumeric characters in a three-dimensional space using hand or
finger gestures, commonly known as air writing.

• This system aims to provide a seamless and intuitive mode of interaction with
intelligent devices like smart TVs and robots.

• Leveraging advanced algorithms, including the OpenCV, the system will robustly
track hand movements and extract trajectories from a cost-effective web camera.

• The key focus is on achieving high accuracy in character recognition, enabling


immediate and reliable transcription of gestures.
Existing System
•The existing air-writing recognition systems predominantly rely on methods like

Hidden Markov Algorithm for accurate recognition of drawn text or symbols.

•HMMs require a significant amount of training data to accurately model different

gestures or symbols. Collecting and annotating this data can be time-consuming and

expensive.

•HMMs typically consider only a limited context window, which might not capture

long-range dependencies in air-writing gestures. This limitation can result in

decreased accuracy, especially for complex gestures that involve intricate

movements.
Proposed System

•This project proposes an innovative hand tracking algorithm to extract air-writing

trajectories captured by a single web camera.

•This algorithm effectively addresses the "push-to-write" issue and eliminates the

need for user-imposed restrictions, such as delimiters or imaginary boundaries.

•Experimental results demonstrate that this proposed approach not only achieves

significantly higher recognition accuracy but also reduces network complexity

compared to prevalent image-based methods.


Proposed System

•The system has successfully created a recognition system tailored for smart

televisions, and its performance was benchmarked against prior research.

•Due to the absence of public datasets, the experiment utilized two types of datasets

- one for digits and another for directional symbols.

•The input is not limited to a particular set of characters, symbols or letters. Our

system’s input is limitless.


Proposed System
• Our air-writing recognition system will be explored to perform the experiments
with Python, and will enable us to provide the required output.

• The program scripts will be implemented further with the optimization techniques
and compared with the base results i.e with default parameters.

• Input can be replicated with precision.

• Almost limitless possibilities for input recognition.

• Never changes.

• The system can replicate any scribbles that are provided as input.
Modules
The System of after careful analysis has been identified to be presented with the
following modules.
1.Painter Module

2.Hand Gesture Module

3.Recognition Models

4.User Interface

5.Testing Module
1. Painter Module:
• This module will handle the functionality related to virtual painting.
Functions:
• detect_hands(): Detects hands in the video stream.
• track_hand_movements(): Tracks hand movements for painting.
• change_brush_color(color): Changes the brush color based on hand gestures.
• clear_canvas(): Clears the painting canvas.
2. Hand Gesture Module

• This module will handle the interpretation of hand gestures for various actions.

Functions:

• interpret_gestures(): Interprets hand gestures to trigger specific actions.

• detect_fingers_up(): Detects which fingers are raised.


3. Recognition Models:

• This module includes the machine learning models for recognizing hand-drawn
characters.

• It loads and uses the models for alphabet and digit recognition (AlphaMODEL
and NumMODEL).

4. User Interface:

• This module sets up a Flask web application, registers the Virtual Painter
blueprint, and defines routes for different pages
5. Testing Module

• This module will contain unit tests to verify the functionality of your Virtual
Painter application.

Example Test Cases:

• test_detect_hands(): Tests hand detection functionality.

• test_track_hand_movements(): Tests hand movement tracking.

• test_change_brush_color(): Tests brush color change based on gestures.

• test_clear_canvas(): Tests canvas clearing functionality.


Functional Requirements:
• Gesture Interpretation
• Alphanumeric Recognition
• Real-Time Processing
• Noise Handling
• Trajectory Extraction
• Dynamic Lighting Adaptation
Non Functional Requirements
Non functional requirements include quantitative constraints, such as
response time and accuracy.
• Portability
• Security
• Maintainability
• Reliability
• Scalability
• Performance
• Reusability
• Flexibility
Technical Architecture
System Architecture
Flow Chart
Computational Resources

Software Requirements :

Operating System : Windows 10 or more

Environment : Pycharm Community Edition

Frameworks & Libraries

Deep Learning Framework


Hardware Requirements:
Processor : I3.

Ram : 4GB.

Hard Disk : 500 GB.

Web Camera
Algorithms

Convolutional Neural Networks: A Convolutional Neural Network (CNN)


is a type of deep neural network designed specifically for processing
structured grid-like data, such as images or videos. It comprises several
interconnected layers that work together to analyze visual data.

Components:
Input Layer: The initial layer of the CNN receives the raw input data, which is
typically an image represented as a grid of pixel values. The input layer
doesn't perform any computations; it simply passes the data to the
subsequent layers.
Convolutional Layers: These layers are the core building blocks of a CNN.
They consist of filters (also known as kernels) that convolve across the input
image by performing element-wise multiplication and summation. This
process detects various features such as edges, textures, or patterns at
different spatial locations. Multiple filters are used to capture different
features.
Activation Functions: After the convolution operation, an activation function
is applied element-wise to the output of the convolutional layer. Rectified
Linear Unit (ReLU) is a commonly used activation function that introduces
non-linearity, allowing the network to learn more complex relationships
within the data.
Pooling Layers: These layers follow the convolutional layers and reduce the
spatial dimensions of the data while retaining the most important
information. Pooling operations like max pooling or average pooling
aggregate information within specific regions of the input, reducing
computational complexity and helping to make the network more robust to
variations in the input.
Fully Connected Layers (Dense Layers): After several convolutional and
pooling layers, the flattened output is fed into fully connected layers. These
layers are similar to those in traditional neural networks and are responsible
for making predictions based on the high-level features learned by the
preceding layers. They can perform tasks like classification or regression.
Output Layer: The final layer of the CNN produces the network's output.
Depending on the task (e.g., image classification, object detection), the
output layer might use different activation functions (e.g., softmax for
classification) and output formats (e.g., a probability distribution over
classes).
Loss Function: It's a crucial component that measures the difference
between predicted outputs and actual labels. The goal during training is to
minimize this loss by adjusting the network's parameters.
Optimizer: CNNs are trained using optimization algorithms like Stochastic
Gradient Descent (SGD) or its variants. The optimizer updates the network's
parameters based on the calculated loss to improve the model's performance.
Implementation
handTracking.py
import cv2
import mediapipe as mp

class handDetector():
def __init__(self, mode = False, maxHands = 2, modelComplexity = 1, detectionCon = 0.5,
trackCon = 0.5):
self.mode = mode
self.maxHands = maxHands
self.modelComplex = modelComplexity
self.detectionCon = detectionCon
self.trackCon = trackCon

self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.modelComplex,
self.detectionCon, self.trackCon)
self.mpDraw = mp.solutions.drawing_utils
self.tipIds = [4, 8, 12, 16, 20]

def findHands(self, img, draw = True):


imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
return img

def findPosition(self, img, handNo = 0, draw = True):


self.lmList = []
if self.results.multi_hand_landmarks:
myHand = self.results.multi_hand_landmarks[handNo]
for id, lm in enumerate(myHand.landmark):
h, w, c = img.shape
cx, cy = int(lm.x*w), int(lm.y*h)
self.lmList.append([id, cx, cy])
if draw:
cv2.circle(img, (cx,cy), 5, (255,0,0),cv2.FILLED)
return self.lmList

def fingersUp(self):
fingers = []
if self.lmList[self.tipIds[0]][1] < self.lmList[self.tipIds[0]-1][1]:
fingers.append(1)
else:
fingers.append(0)
for id in range(1,5):
if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id]-2][2]:
fingers.append(1)
else:
fingers.append(0)
return fingers

virtualPainter.py

import cv2
import numpy as np
import os
import HandTrackingModule as htm
from flask import Blueprint, render_template
from tensorflow.keras.models import load_model
import keyboard
import pygame
import time
VirtualPainter = Blueprint("HandTrackingModule", __name__,
static_folder="static",template_folder="templates")

@VirtualPainter.route("/feature")
def strt():
############## Color Attributes ###############
WHITE = (255, 255, 255)
BLACK = (0,0,0)
RED = (0,0,255)
YELLOW = (0,255,255)
GREEN = (0,255,0)
BACKGROUND = (255,255,255)
FORGROUND = (0,255,0)
BORDER = (0,255,0)
lastdrawColor = (0,0,1)
drawColor = (0,0,255)
BOUNDRYINC = 5

############## CV2 Attributes ###############


cap = cv2.VideoCapture(0)
width, height = 1280, 720
cap.set(3, width) #640, 1280
cap.set(4, height) #480, 720
imgCanvas = np.zeros((height,width,3), np.uint8)

############## PyGame Attributes ###############


pygame.init()
FONT = pygame.font.SysFont('freesansbold.tff', 18)
DISPLAYSURF = pygame.display.set_mode((width, height),flags=pygame.HIDDEN)
pygame.display.set_caption("Digit Board")
number_xcord = []
number_ycord = []

############## Header Files Attributes ###############


folderPath = "header"
myList = os.listdir(folderPath)
overlayList = []

for imPath in myList:


image = cv2.imread(f'{folderPath}/{imPath}')
overlayList.append(image)
header = overlayList[0]
############## Predication Model Attributes ###############
label=""
PREDICT = "off"
AlphaMODEL = load_model("bModel.h5")
NumMODEL = load_model("bestmodel.h5")
AlphaLABELS = { 0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j',
10: 'k', 11: 'l', 12: 'm', 13: 'n', 14: 'o', 15: 'p', 16: 'q', 17: 'r', 18: 's', 19: 't',
20: 'u', 21: 'v', 22: 'w', 23: 'x', 24: 'y', 25: 'z', 26: ''}
NumLABELS = {0:'0', 1: '1',
2: '2', 3: '3',
4: '4', 5: '5',
6: '6', 7: '7',
8: '8', 9: '9'}
rect_min_x, rect_max_x = 0,0
rect_min_y, rect_max_y = 0,0

############## HandDetection Attributes ###############


detector = htm.handDetector(detectionCon=0.85)
xp , yp = 0, 0
brushThickness = 15
eraserThickness = 30
modeValue = "OFF"
modeColor = RED

while True:
SUCCESS, img = cap.read()
img = cv2.flip(img,1)

img = detector.findHands(img)
lmList = detector.findPosition(img, draw=False)
cv2.putText(img,"Press A for Alphabate Recognisition Mode ",(0,145),3,0.5,
(255,255,0),1,cv2.LINE_AA)
cv2.putText(img,"Press N for Digit Recognisition Mode ",(0,162),3,0.5,
(255,255,0),1,cv2.LINE_AA)
cv2.putText(img,"Press O for Turn Off Recognisition Mode ",(0,179),3,0.5,
(255,255,0),1,cv2.LINE_AA)
cv2.putText(img,f'{"RECOGNISITION IS "}{modeValue}',
(0,196),3,0.5,modeColor,1,cv2.LINE_AA)

if keyboard.is_pressed('a'):
if PREDICT!="alpha":
PREDICT = "alpha"
modeValue, modeColor = "ALPHABATES", GREEN
if keyboard.is_pressed('n'):
if PREDICT!="num":
PREDICT = "num"
modeValue, modeColor = "NUMBER", YELLOW

if keyboard.is_pressed('o'):
if PREDICT!="off":
PREDICT = "off"
modeValue, modeColor = "OFF", RED

xp , yp = 0, 0
label=""
rect_min_x, rect_max_x = 0,0
rect_min_y, rect_max_y = 0,0
number_xcord = []
number_ycord = []
time.sleep(0.5)
if len(lmList)>0:

x1,y1 = lmList[8][1:]
x2,y2 = lmList[12][1:]

fingers = detector.fingersUp()
# print(fingers)

if fingers[1] and fingers[2]:

#add

number_xcord = sorted(number_xcord)
number_ycord = sorted(number_ycord)

if(len(number_xcord) > 0 and len(number_ycord)>0 and PREDICT!="off"):


if drawColor!=(0,0,0) and lastdrawColor != (0,0,0):
rect_min_x, rect_max_x = max(number_xcord[0]-BOUNDRYINC, 0), min(width,
number_xcord[-1]+BOUNDRYINC)
rect_min_y, rect_max_y = max(0, number_ycord[0]-BOUNDRYINC),
min(number_ycord[-1]+BOUNDRYINC, height)
number_xcord = []
number_ycord = []

img_arr = np.array(pygame.PixelArray(DISPLAYSURF))
[rect_min_x:rect_max_x,rect_min_y:rect_max_y].T.astype(np.float32)

cv2.rectangle(imgCanvas,(rect_min_x,rect_min_y),
(rect_max_x,rect_max_y),BORDER,3)
image = cv2.resize(img_arr, (28,28))
# cv2.imshow("Tmp",image)
image = np.pad(image, (10,10), 'constant' , constant_values =0)
image = cv2.resize(image,(28,28))/255
# cv2.imshow("Tmp",image)

if PREDICT == "alpha":
label =
str(AlphaLABELS[np.argmax(AlphaMODEL.predict(image.reshape(1,28,28,1)))])
if PREDICT == "num":
label =
str(NumLABELS[np.argmax(NumMODEL.predict(image.reshape(1,28,28,1)))])
pygame.draw.rect(DISPLAYSURF,BLACK,(0,0,width,height))
cv2.rectangle(imgCanvas,(rect_min_x+50,rect_min_y-20),
(rect_min_x,rect_min_y),BACKGROUND,-1)
cv2.putText(imgCanvas,label,(rect_min_x,rect_min_y-
5),3,0.5,FORGROUND,1,cv2.LINE_AA)
else:
number_xcord = []
number_ycord = []

xp, yp = 0, 0
if y1<125:
lastdrawColor = drawColor
if 0 < x1 < 200:
imgCanvas = np.zeros((height,width,3), np.uint8)
elif 210 < x1 < 320:
header = overlayList[0]
drawColor = (0,0,255)
elif 370 < x1 < 470:
header = overlayList[1]
drawColor = (0,255,255)
elif 520 < x1 < 630:
header = overlayList[2]
drawColor = (0,255,0)
elif 680 < x1 < 780:
header = overlayList[3]
drawColor = (255,0,0)
elif 890 < x1 < 1100:
header = overlayList[4]
drawColor = (0,0,0)
elif 1160 < x1 < 1250:
cap.release()
cv2.destroyAllWindows()
return render_template("index.html")
quit()

cv2.rectangle(img, (x1,y1-25), (x2,y2+25), drawColor, cv2.FILLED)

elif fingers[1] and fingers[2] == False:

#add
number_xcord.append(x1)
number_ycord.append(y1)
#addEnd
cv2.circle(img, (x1,y1-15), 15, drawColor, cv2.FILLED)
if xp == 0 and yp == 0:
xp, yp = x1, y1
if drawColor == (0, 0, 0):
cv2.line(img, (xp,yp), (x1,y1), drawColor, eraserThickness)
cv2.line(imgCanvas, (xp,yp), (x1,y1), drawColor, eraserThickness)
else:
cv2.line(img, (xp,yp), (x1,y1), drawColor, brushThickness)
cv2.line(imgCanvas, (xp,yp), (x1,y1), drawColor, brushThickness)
pygame.draw.line(DISPLAYSURF, WHITE, (xp,yp), (x1,y1), brushThickness)
xp, yp = x1, y1
else:
xp, yp = 0, 0
imgGray = cv2.cvtColor(imgCanvas, cv2.COLOR_BGR2GRAY)
_, imgInv = cv2.threshold(imgGray, 50, 255, cv2.THRESH_BINARY_INV)
imgInv = cv2.cvtColor(imgInv, cv2.COLOR_GRAY2BGR)
img = cv2.bitwise_and(img, imgInv)
img = cv2.bitwise_or(img, imgCanvas)
img[0:132,0:1280] = header
pygame.display.update()
# cv2.imshow("Paint",imgCanvas)
cv2.imshow("Image",img)
cv2.waitKey(1)
strt()
Screen Shots

Fig. Clear Screen


Fig. Finger Recognition
Fig. Alphabet mode Activation
Fig. Alphabet Recognition
Fig. Number Recognition ‘2’
Fig. Number Recognition ‘7’
Fig. Alphabet Recognition ‘G’
Fig. Writing Alphabet
Fig. Writing Number
Test Cases
S.no Test Case Scenario Expected Actual Result
Output Output
1 Splitting Data Input taken Successfully Pass
Data into Splitted
classes
2 Load data Dataset Dataset Successfully Pass
sets loaded loaded
3 Splitting Dataset Splitted to Successfully Pass
dataset into training and splitted to
training and testing train data
testing and test
data

4 Model CNN Model Model Successfully Pass


Creation Created model
created
5 Training set Train data Training Successfully Pass
evaluation done trained
6 Validating Test data Test dataset Successfully Pass
test validated validated
datasets
7 Getting Test dataset Accuracy in Successfully Pass
accuracy percentage got accuracy
8 Load test Test data Detecting Successfully Pass
set price detected
9 Print result Test results Printing test Successfully Pass
results or saved
save in csv
file
10 10Testing Test result Printing test Unable to Fail
with result predict
unknown
picture
Conclusion
Convolutional Neural Networks (CNNs) have revolutionized the field of air writing
recognition, allowing machines to accurately interpret gestures made in the air with
remarkable precision. By training on large datasets of hand movements captured
through sensors or cameras, CNNs can effectively learn the intricate patterns and
dynamics of gestures, enabling robust recognition even in dynamic and diverse
environments. This technology holds immense promise in various domains, from
enhancing user experience in virtual reality environments to enabling hands-free
interaction with digital devices. Moreover, CNNs have demonstrated impressive
performance in real-time applications, making them well-suited for practical
deployment in scenarios requiring quick and accurate gesture interpretation.
Future Scope
Looking ahead, the future of air writing recognition using CNNs appears promising, with
ongoing research focused on advancing the accuracy, robustness, and efficiency of
gesture recognition systems. Continued innovation in deep learning algorithms, coupled
with advancements in sensor technologies, is expected to further enhance the capabilities
of CNN- based models. Moreover, the integration of CNNs with edge computing devices
and Internet of Things (IoT) platforms holds the potential to enable seamless and intuitive
interaction between humans and smart environments. However, challenges such as
ensuring privacy and security in gesture-based interactions, as well as addressing biases
in training data, remain areas of concern that require careful attention. Nonetheless, the
combination of CNNs and air writing recognition represents a transformative technology
with wide-ranging applications across industries, paving the way for enhanced human-
References
1. Graves, A., & Schmidhuber, J. (2009). Offline Handwriting Recognition with Multidimensional
Recurrent Neural Networks. In Advances in Neural Information Processing Systems (pp. 545-552).

2. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and Transferring Mid-Level Image
Representations using Convolutional Neural Networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (pp. 1717-1724).

3. Choi, S., & Kim, J. (2017). Vision-Based Sign Language Recognition Using Convolutional Neural
Networks. Sensors, 17(7), 1623.

4. Pu, J., Xiong, X., & Zhang, H. (2018). A Real-Time Gesture Recognition System Based on Deep
Learning. IEEE Access, 6, 17885-17894.

5. Pham, D. T., & Lee, S. (2020). Vision-Based Hand Gesture Recognition Using Deep Learning: A
Review. Sensors, 20(2), 487.

You might also like