Ss
Ss
• Introduction
• Existing System
• Proposed System
• Modules
• Functional Requirements
• Non-functional Requirements
• Technical Architecture
• Block Diagram
• Flow Chart
• Computational Resources
• Algorithms
• Implementation
• Screenshots
• Test Cases
• Future Scope
• Conclusion
• References
Introduction
• The primary objective of this project is to develop a real-time system for accurately
recognizing alphanumeric characters in a three-dimensional space using hand or
finger gestures, commonly known as air writing.
• This system aims to provide a seamless and intuitive mode of interaction with
intelligent devices like smart TVs and robots.
• Leveraging advanced algorithms, including the OpenCV, the system will robustly
track hand movements and extract trajectories from a cost-effective web camera.
gestures or symbols. Collecting and annotating this data can be time-consuming and
expensive.
•HMMs typically consider only a limited context window, which might not capture
movements.
Proposed System
•This algorithm effectively addresses the "push-to-write" issue and eliminates the
•Experimental results demonstrate that this proposed approach not only achieves
•The system has successfully created a recognition system tailored for smart
•Due to the absence of public datasets, the experiment utilized two types of datasets
•The input is not limited to a particular set of characters, symbols or letters. Our
• The program scripts will be implemented further with the optimization techniques
and compared with the base results i.e with default parameters.
• Never changes.
• The system can replicate any scribbles that are provided as input.
Modules
The System of after careful analysis has been identified to be presented with the
following modules.
1.Painter Module
3.Recognition Models
4.User Interface
5.Testing Module
1. Painter Module:
• This module will handle the functionality related to virtual painting.
Functions:
• detect_hands(): Detects hands in the video stream.
• track_hand_movements(): Tracks hand movements for painting.
• change_brush_color(color): Changes the brush color based on hand gestures.
• clear_canvas(): Clears the painting canvas.
2. Hand Gesture Module
• This module will handle the interpretation of hand gestures for various actions.
Functions:
• This module includes the machine learning models for recognizing hand-drawn
characters.
• It loads and uses the models for alphabet and digit recognition (AlphaMODEL
and NumMODEL).
4. User Interface:
• This module sets up a Flask web application, registers the Virtual Painter
blueprint, and defines routes for different pages
5. Testing Module
• This module will contain unit tests to verify the functionality of your Virtual
Painter application.
Software Requirements :
Ram : 4GB.
Web Camera
Algorithms
Components:
Input Layer: The initial layer of the CNN receives the raw input data, which is
typically an image represented as a grid of pixel values. The input layer
doesn't perform any computations; it simply passes the data to the
subsequent layers.
Convolutional Layers: These layers are the core building blocks of a CNN.
They consist of filters (also known as kernels) that convolve across the input
image by performing element-wise multiplication and summation. This
process detects various features such as edges, textures, or patterns at
different spatial locations. Multiple filters are used to capture different
features.
Activation Functions: After the convolution operation, an activation function
is applied element-wise to the output of the convolutional layer. Rectified
Linear Unit (ReLU) is a commonly used activation function that introduces
non-linearity, allowing the network to learn more complex relationships
within the data.
Pooling Layers: These layers follow the convolutional layers and reduce the
spatial dimensions of the data while retaining the most important
information. Pooling operations like max pooling or average pooling
aggregate information within specific regions of the input, reducing
computational complexity and helping to make the network more robust to
variations in the input.
Fully Connected Layers (Dense Layers): After several convolutional and
pooling layers, the flattened output is fed into fully connected layers. These
layers are similar to those in traditional neural networks and are responsible
for making predictions based on the high-level features learned by the
preceding layers. They can perform tasks like classification or regression.
Output Layer: The final layer of the CNN produces the network's output.
Depending on the task (e.g., image classification, object detection), the
output layer might use different activation functions (e.g., softmax for
classification) and output formats (e.g., a probability distribution over
classes).
Loss Function: It's a crucial component that measures the difference
between predicted outputs and actual labels. The goal during training is to
minimize this loss by adjusting the network's parameters.
Optimizer: CNNs are trained using optimization algorithms like Stochastic
Gradient Descent (SGD) or its variants. The optimizer updates the network's
parameters based on the calculated loss to improve the model's performance.
Implementation
handTracking.py
import cv2
import mediapipe as mp
class handDetector():
def __init__(self, mode = False, maxHands = 2, modelComplexity = 1, detectionCon = 0.5,
trackCon = 0.5):
self.mode = mode
self.maxHands = maxHands
self.modelComplex = modelComplexity
self.detectionCon = detectionCon
self.trackCon = trackCon
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.modelComplex,
self.detectionCon, self.trackCon)
self.mpDraw = mp.solutions.drawing_utils
self.tipIds = [4, 8, 12, 16, 20]
def fingersUp(self):
fingers = []
if self.lmList[self.tipIds[0]][1] < self.lmList[self.tipIds[0]-1][1]:
fingers.append(1)
else:
fingers.append(0)
for id in range(1,5):
if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id]-2][2]:
fingers.append(1)
else:
fingers.append(0)
return fingers
virtualPainter.py
import cv2
import numpy as np
import os
import HandTrackingModule as htm
from flask import Blueprint, render_template
from tensorflow.keras.models import load_model
import keyboard
import pygame
import time
VirtualPainter = Blueprint("HandTrackingModule", __name__,
static_folder="static",template_folder="templates")
@VirtualPainter.route("/feature")
def strt():
############## Color Attributes ###############
WHITE = (255, 255, 255)
BLACK = (0,0,0)
RED = (0,0,255)
YELLOW = (0,255,255)
GREEN = (0,255,0)
BACKGROUND = (255,255,255)
FORGROUND = (0,255,0)
BORDER = (0,255,0)
lastdrawColor = (0,0,1)
drawColor = (0,0,255)
BOUNDRYINC = 5
while True:
SUCCESS, img = cap.read()
img = cv2.flip(img,1)
img = detector.findHands(img)
lmList = detector.findPosition(img, draw=False)
cv2.putText(img,"Press A for Alphabate Recognisition Mode ",(0,145),3,0.5,
(255,255,0),1,cv2.LINE_AA)
cv2.putText(img,"Press N for Digit Recognisition Mode ",(0,162),3,0.5,
(255,255,0),1,cv2.LINE_AA)
cv2.putText(img,"Press O for Turn Off Recognisition Mode ",(0,179),3,0.5,
(255,255,0),1,cv2.LINE_AA)
cv2.putText(img,f'{"RECOGNISITION IS "}{modeValue}',
(0,196),3,0.5,modeColor,1,cv2.LINE_AA)
if keyboard.is_pressed('a'):
if PREDICT!="alpha":
PREDICT = "alpha"
modeValue, modeColor = "ALPHABATES", GREEN
if keyboard.is_pressed('n'):
if PREDICT!="num":
PREDICT = "num"
modeValue, modeColor = "NUMBER", YELLOW
if keyboard.is_pressed('o'):
if PREDICT!="off":
PREDICT = "off"
modeValue, modeColor = "OFF", RED
xp , yp = 0, 0
label=""
rect_min_x, rect_max_x = 0,0
rect_min_y, rect_max_y = 0,0
number_xcord = []
number_ycord = []
time.sleep(0.5)
if len(lmList)>0:
x1,y1 = lmList[8][1:]
x2,y2 = lmList[12][1:]
fingers = detector.fingersUp()
# print(fingers)
#add
number_xcord = sorted(number_xcord)
number_ycord = sorted(number_ycord)
img_arr = np.array(pygame.PixelArray(DISPLAYSURF))
[rect_min_x:rect_max_x,rect_min_y:rect_max_y].T.astype(np.float32)
cv2.rectangle(imgCanvas,(rect_min_x,rect_min_y),
(rect_max_x,rect_max_y),BORDER,3)
image = cv2.resize(img_arr, (28,28))
# cv2.imshow("Tmp",image)
image = np.pad(image, (10,10), 'constant' , constant_values =0)
image = cv2.resize(image,(28,28))/255
# cv2.imshow("Tmp",image)
if PREDICT == "alpha":
label =
str(AlphaLABELS[np.argmax(AlphaMODEL.predict(image.reshape(1,28,28,1)))])
if PREDICT == "num":
label =
str(NumLABELS[np.argmax(NumMODEL.predict(image.reshape(1,28,28,1)))])
pygame.draw.rect(DISPLAYSURF,BLACK,(0,0,width,height))
cv2.rectangle(imgCanvas,(rect_min_x+50,rect_min_y-20),
(rect_min_x,rect_min_y),BACKGROUND,-1)
cv2.putText(imgCanvas,label,(rect_min_x,rect_min_y-
5),3,0.5,FORGROUND,1,cv2.LINE_AA)
else:
number_xcord = []
number_ycord = []
xp, yp = 0, 0
if y1<125:
lastdrawColor = drawColor
if 0 < x1 < 200:
imgCanvas = np.zeros((height,width,3), np.uint8)
elif 210 < x1 < 320:
header = overlayList[0]
drawColor = (0,0,255)
elif 370 < x1 < 470:
header = overlayList[1]
drawColor = (0,255,255)
elif 520 < x1 < 630:
header = overlayList[2]
drawColor = (0,255,0)
elif 680 < x1 < 780:
header = overlayList[3]
drawColor = (255,0,0)
elif 890 < x1 < 1100:
header = overlayList[4]
drawColor = (0,0,0)
elif 1160 < x1 < 1250:
cap.release()
cv2.destroyAllWindows()
return render_template("index.html")
quit()
#add
number_xcord.append(x1)
number_ycord.append(y1)
#addEnd
cv2.circle(img, (x1,y1-15), 15, drawColor, cv2.FILLED)
if xp == 0 and yp == 0:
xp, yp = x1, y1
if drawColor == (0, 0, 0):
cv2.line(img, (xp,yp), (x1,y1), drawColor, eraserThickness)
cv2.line(imgCanvas, (xp,yp), (x1,y1), drawColor, eraserThickness)
else:
cv2.line(img, (xp,yp), (x1,y1), drawColor, brushThickness)
cv2.line(imgCanvas, (xp,yp), (x1,y1), drawColor, brushThickness)
pygame.draw.line(DISPLAYSURF, WHITE, (xp,yp), (x1,y1), brushThickness)
xp, yp = x1, y1
else:
xp, yp = 0, 0
imgGray = cv2.cvtColor(imgCanvas, cv2.COLOR_BGR2GRAY)
_, imgInv = cv2.threshold(imgGray, 50, 255, cv2.THRESH_BINARY_INV)
imgInv = cv2.cvtColor(imgInv, cv2.COLOR_GRAY2BGR)
img = cv2.bitwise_and(img, imgInv)
img = cv2.bitwise_or(img, imgCanvas)
img[0:132,0:1280] = header
pygame.display.update()
# cv2.imshow("Paint",imgCanvas)
cv2.imshow("Image",img)
cv2.waitKey(1)
strt()
Screen Shots
2. Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and Transferring Mid-Level Image
Representations using Convolutional Neural Networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (pp. 1717-1724).
3. Choi, S., & Kim, J. (2017). Vision-Based Sign Language Recognition Using Convolutional Neural
Networks. Sensors, 17(7), 1623.
4. Pu, J., Xiong, X., & Zhang, H. (2018). A Real-Time Gesture Recognition System Based on Deep
Learning. IEEE Access, 6, 17885-17894.
5. Pham, D. T., & Lee, S. (2020). Vision-Based Hand Gesture Recognition Using Deep Learning: A
Review. Sensors, 20(2), 487.