Batch 13
Batch 13
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
Submitted by
1.S.Architha (H.T.N0:19N01A0597)
2.Khuteja Nazlee (H.T.N0:19N01A0563)
3.S. Pooja (H.T.N0:19N01A0598)
4:P.Laxmi (H.T.N0:19N01A0580)
5:S.Sai Surya (H.T.N0:19N01A0599)
DECEMBER 2022
SREE CHAITANYA COLLEGE OF ENGINEERING
(Affiliated to JNTUH, HYDERABAD)
THIMMAPUR, KARIMNAGAR, TELANGANA-505 527
Department of Computer Science and Engineering
CERTIFICATE
This is to certify that the mini project report entitled “Air Canvas using Python-OpenCV”
is being submitted by S.ARCHITHA (19N01A0598), KHUTEJA NAZLEE (19N01A0563),
S.POOJA (19N01A0598), P.LAXMI(19N01A0580), S.SAISURYA (19N01A0599) for partial
fulfillment of the requirement for the award of the degree of Bachelor of Technology in
Computer Science and Engineering discipline to the Jawaharlal Nehru Technological
University, Hyderabad during the academic year 2022 - 2023 is a bonafide work carried out by
him under my guidance and supervision.
The result embodied in this report has not been submitted to any other University or
institution for the award of any degree of diploma.
EXTERNAL EXAMINER
i
SREE CHAITANYA COLLEGE OF ENGINEERING
(Affiliated to JNTUH, HYDERABAD)
THIMMAPUR, KARIMNAGAR, TELANGANA-505 527
Department of Computer Science and
DECLARATION
It contains no material previously published or written by another person nor material which
has been accepted for the award of any other degree or diploma of the university or other institute
of higher learning, except where due acknowledgment has been made in the text.
1.S.Architha(19N01A0597)
2.Khuteja Nazlee(19N01A0563)
3.S.Pooja(19N01A0598)
4.P.Laxmi(19N01A0580)
5.S.Sai Surya(19N01A0599)
ii
SREE CHAITANYA COLLEGE OF ENGINEERING
(Affiliated to JNTUH, HYDERABAD)
THIMMAPUR, KARIMNAGAR, TELANGANA-505 527
Department of Computer Science and Engineering
ACKNOWLEDGEMENTS
The Satisfaction that accomplishes the successful completion of any task would be
incomplete without the mention of the people who make it possible and whose constant
guidance and encouragement crown all the efforts with success.
S.Architha
iii
ABSTRACT
Writing in air has been one of the most fascinating and challenging research areas in field of
image processing and pattern recognition in the recent years. It contributes immensely to the
advancement of an automation process and can improve the interface between man and
machine in numerous applications. Several research works have been focusing on new
techniques and methods that would reduce the processing time while providing higher
recognition accuracy. Object tracking is considered as an important task within the field of
Computer Vision. The invention of faster computers, availability of inexpensive and good
quality video cameras and demands of automated video analysis has given popularity to
object tracking techniques. Generally, video analysis procedure has three major steps: firstly,
detecting of the object, secondly tracking its movement from frame to frame and lastly
analysing the behaviour of that object. For object tracking, four different issues are taken
into account; selection of suitable object representation, feature selection for tracking,
object detection and object tracking. In real world, Object tracking algorithms are the
primarily part of different applications such as: automatic surveillance, video indexing and
vehicle navigation etc. The project takes advantage of this gap and focuses on developing a
motion-to-text converter that can potentially serve as software for intelligent wearable
devices for writing from the air. This project is a reporter of occasional gestures. It will
use computer vision to trace the path of the finger. The generated text can also be used for
various purposes, such as sending messages, emails, etc. It will be a powerful means of
communication for the deaf. It is an effective communication method that reduces
mobile and laptop usage by eliminating the need to write
CHAPTER 1
INTRODUCTION
Air Canvas, a project built in python, is a computer vision project using Media pipe which is
a cross-platform framework for building multimodal applied machine learning pipeline.
Computer Vision is a field of Artificial Intelligence (AI) that enables computers and systems
to derive meaningful information from digital image, video and other visual inputs- and take
action or recommendation based on that information.
Using the tip of the pointing finger one can interact within the canvas. Aim of building this
project is to make the virtual classes effective and easy for teachers who are facing
difficulties drawing/ writing from the mouse and they don’t have either touchscreen laptop or
any other pen input devices. They can simply draw on the board using the web cam and tip of
their finger, no additional hardware is required.
An air canvas is a type of interactive display that allows users to draw or paint in mid-air,
using gestures or other input devices. While it is possible to use objects, such as pens or
sticks, to create an air canvas, this would likely require some additional technology, such as a
depth-sensing camera or other input device that can detect the movement of objects in 3D
space.
This technology could then be used to track the movement of the objects and convert them
into digital brush strokes or other drawing tools. However, creating an air canvas using
objects would likely be a complex and challenging task, and there are likely to be many
technical challenges involved. It is also worth noting that using objects to create an air canvas
may not be as intuitive or user-friendly as using gestures or other input devices designed
specifically for this purpose. An air canvas is a type of interactive display that allows users to
draw or paint in mid-air, using gestures or other input devices.
Air Canvas is a software tool that allows users to create and manipulate digital images using
hand gestures. It uses a combination of computer vision and machine learning algorithms to
track hand movements and interpret them as drawing commands. Air Canvas allows users to
draw, paint, and erase with their hands, using a webcam or other video input device as the
input source. It can be used for a variety of creative and artistic purposes, as well as for
educational and research applications.
1.1 OVERVIEW
An air canvas is a virtual whiteboard that allows users to draw or write in the air using
gestures. It can be implemented using the OpenCV (Open Source Computer Vision) library,
which is a popular open-source library for computer vision tasks.To implement an air canvas
using OpenCV, you will need a camera to capture the gestures made by the user. You will
also need some way of detecting and tracking the gestures, such as using color tracking or
motion tracking algorithms.
Once the gestures have been detected and tracked, you can use OpenCV to draw the
corresponding lines or text on a virtual canvas. This can be displayed on a screen or projected
onto a surface, allowing the user to see their drawings in real-time.Overall, implementing an
air canvas using OpenCV requires a combination of computer vision techniques and
interactive graphics. It can be a challenging project, but can also be very rewarding as it
allows users to interact with technology in a natural and intuitive way.
1.2 MOTIVATION
The underlying inspiration was a requirement for a dustless study hall for the understudies to
concentrate in. I realize that there are numerous ways like touch screens and then some yet
what might be said about the schools which cannot bear the cost of it to purchase such
gigantic huge screens and instruct on them like a TV. Along these lines, I thought why not
can a finger be followed, however that too at an underlying level-without profound learning.
Consequently it was OpenCV which acted the hero for these PC vision projects.
The superior pen contains of a tri pivotal accelerometer, a microcontroller, and a RF remote
transmission module for detecting and amassing velocity will increase of hand composing
and movement directions. Our implanted project first concentrates the time-and recurrence
area highlights from the rate boom indicators and, then, at that point, sends the symptoms
with the aid of using making use of RF transmitter. In beneficiary section RF symptoms may
be gotten with the aid of using RF recipient and given to microcontroller. The regulator
procedures the records finally then results may be proven on Graphical LCD.
1.Fingertip detection:
The existing system only works with your fingers, and there are no highlighters, paints, or
relatives. Identifying and characterizing an object such as a finger from an RGB image
without a depth sensor is a great challenge.
The system uses a single RGB camera to write from above. Since depth sensing is not
possible, up and down pen movements cannot be followed. Therefore, the fingertip's entire
trajectory is traced, and the resulting image would be absurd and not recognized by the
model.
This computer vision experiment uses an Air canvas, which allows you to draw on a screen
by waving a finger equipped with a colorful tip or a basic colored cap. These computer vision
projects would not have been possible without OpenCV's help. There are no keypads,
styluses, pens, or gloves needed for character input in the suggested technique.
In this proposed framework, going to utilize webcam and show unit(monitor screen). Here,
will be utilizing pen or hand for drawing attractive images in front of the camera then we will
attract those images, it will be shown on the presentation unit. Our mounted framework is suit
for decoding time-collection pace boom alerts into extensive thing vectors. Users can make
use of the pen to compose digits or make hand motions and so on may be proven at the
presentation unit.
Modules of Proposed System
1.Color Tracking
Understanding the HSV ( Hue , Saturation , Value ) shading space for Color Tracking.
Furthermore, following the little hued object at fingertip. The approaching picture from the
webcam is to be changed over to the HSV shading space for recognizing the hued object at
the tip of finger.
2. Trackbars
When the trackbars are arrangement, we will get the realtime esteem from the trackbars and
make range. This reach is a numpy structure which is utilized to be passed in the capacity
cv2.inrange(). This capacity returns the Mask on the hued object. This Mask is a high contrast
picture with white pixels at the situation of the ideal tone.
3. Contour Detection
Recognizing the Position of Colored item at fingertip and shaping a circle over it. We are
playing out some morphological procedure on the Mask, to make it liberated from
contaminations and to distinguish shape without any problem. That is Contour Detection.
4. Frame Processing
Following the fingertip and drawing focuses at each position for air material impact. That is
Frame Processing
5. Algorithmic Optimization.
Making the code efficient to work the program without a hitch. Algorithmic Optimizatio
1.5 OBJECTIVE
The main objective of an air canvas using OpenCV is to allow users to draw or write in the
air using gestures, and have their drawings or writing displayed in real-time on a screen or
projected surface. This can be achieved by using a camera to capture the gestures made by
the user, and then using computer vision algorithms to detect and track the gestures.
1.Providing an intuitive and natural way for users to interact with technology
5.Serving as a platform for further development and experimentation with computer vision
algorithms and interactive graphics.
CHAPTER – 2
LITERATURE SURVEY
1. Hardware Requirement
Dual Core CPU
Minimum 1 gb of RAM
Window 7 or greater
Web Camera
2. Software Requirement
Python
Numpy Module
OpenCV module
Mediapipe
1.HARDWARE REQUIREMENTS:
Dual Core CPU: A dual core CPU is a type of central processing unit (CPU) that has two
independent cores, or processing units, on the same chip. It is capable of processing two
streams of instructions simultaneously, which can increase the performance of certain types
of tasks.
Minimum 1gb of RAM: RAM (random access memory) is a type of computer memory that
is used to store data that is actively being used or processed by the system. It is volatile
memory, meaning it is wiped clean when the power is turned off.
The amount of RAM that a computer has can affect its performance. In general, having more
RAM can allow a computer to run more programs concurrently and improve the speed at
which it can complete tasks.
A minimum of 1 GB (gigabyte) of RAM is often recommended for basic tasks such as web
browsing, word processing, and email. However, the actual amount of RAM that is required
for a particular task or application can vary depending on the specific needs of the software
and the operating system. For example, more resource-intensive tasks such as video editing
or gaming may require more RAM to run smoothly.
It's worth noting that the minimum requirements for RAM can vary depending on the specific
operating system and software being used. For example, the minimum RAM requirement for
running the latest version of Microsoft Windows is 2 GB, while the minimum requirement
for running macOS is 4 GB. It is always a good idea to check the system requirements for the
specific software or operating system that you are using to ensure that your computer has
enough RAM to run it effectively.
Web Camera: A webcam, also known as a web camera, is a video camera that is used to
capture images and video for transmission over the internet. Webcams are typically small and
portable, making them convenient for use with computers and other devices.
Webcams can be used for a variety of purposes, such as video conferencing, live streaming,
and recording video for social media or other online platforms. They are often integrated into
laptops and desktop computers, but can also be purchased as standalone devices that can be
connected to a computer or other device via USB.
Most webcams have a built-in microphone, which allows them to capture audio as well as
video. Some webcams also have additional features such as a built-in LED light or the ability
to pan, tilt, and zoom to capture a wider field of view.
Webcams can vary in terms of their video quality, with higher-end models typically offering
higher resolution and a more detailed image. They can also vary in terms of their frame rate,
which is the number of frames captured per second. A higher frame rate can result in a
smoother, more realistic video, but may also require more bandwidth and processing power.
It's worth noting that webcams can be vulnerable to security risks, such as the potential for
unauthorized access or surveillance. If you are concerned about the security of your webcam,
you may want to consider using a physical cover to block the camera when it is not in use, or
disabling the webcam in your device's settings.
2.SOFTWARE REQUIREMENTS:
Python:
Python is a general-purpose interpreted, interactive, object-oriented, and high-level
programming language. It was created by Guido van Rossum during 1985- 1990. Like Perl,
Python source code is also available under the GNU General Public License (GPL).
Why to Learn Python?
Python is a high-level, interpreted, interactive and object-oriented scripting language.
Python is designed to be highly readable. It uses English keywords frequently where as other
languages use punctuation, and it has fewer syntactical constructions than other languages.
Python is a MUST for students and working professionals to become a great Software
Engineer specially when they are working in Web Development Domain. I will list down
some of the key advantages of learning Python:
As mentioned before, Python is one of the most widely used language over the web. I'm
going to list few of them here:
Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
Easy-to-maintain − Python's source code is fairly easy-to-maintain.
A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
12 Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable − You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more efficient.
Databases − Python provides interfaces to all major commercial databases.
GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs than
shell scripting.
Numpy
NumPy is the fundamental package for scientific computing in Python. It is a Python library
that provides a multidimensional array object, various derived objects (such as masked arrays
and matrices), and an assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier
transforms, basic linear algebra, basic statistical operations, random simulation and much
more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional
arrays of homogeneous data types, with many operations being performed in compiled code
for performance. There are several important differences between NumPy arrays and the
standard Python sequences:
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow
dynamically). Changing the size of an ndarray will create a new array and delete the
original.
The elements in a NumPy array are all required to be of the same data type, and thus
will be the same size in memory. The exception: one can have arrays of (Python,
including NumPy) objects, thereby allowing for arrays of different sized elements.
NumPy arrays facilitate advanced mathematical and other types of operations on large
numbers of data. Typically, such operations are executed more efficiently and with
less code than is possible using Python’s built-in sequences.
A growing plethora of scientific and mathematical Python-based packages are using
NumPy arrays; though these typically support Python-sequence input, they convert
such input to NumPy arrays prior to processing, and they often output NumPy arrays.
In other words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s
.
In other words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s
built-in sequence types is insufficient - one also needs to know how to use NumPy
arrays.
The points about sequence size and speed are particularly important in scientific computing.
As a simple example, consider the case of multiplying each element in a 1-D sequence with
the corresponding element in another sequence of the same length.
OpenCV
OpenCV is a huge open-source library for computer vision,machine learning, and image
processing. OpenCV supports a wide variety of programming languages like
Python, C++, Java, etc. It can process images and videos to identify objects, faces, or even
the handwriting of a human. When it is integrated with variouslibraries, such as
Numpy which is a highly optimized library for numerical operations, then the
number of weapons increases in your Arsenal i.e whatever operations one can do in Numpy
can be combined with OpenCV.This OpenCV tutorial will help you learn the Image-
processing from Basics to Advance, like operations on Images, Videos using a huge set of
Opencv-programs and projects.
MediaPipe
It is a cross-platform framework for building multimodal applied machine learning
pipelines
MediaPipe is a framework for building multimodal (eg. video, audio, any time series data),
cross platform (i.e Android, iOS, web, edge devices) applied ML pipelines. With MediaPipe,
a perception pipeline can be built as a graph of modular components, including, for instance,
inference models (e.g., TensorFlow, TFLite) and media processing functions.
Cutting edge ML models
Face Detection
Multi-hand Tracking
Hair Segmentation
Object Detection and Tracking
Objectron: 3D Object Detection and Tracking
AutoFlip: Automatic video cropping pipeline
Cross Platform ML solutions
Build once, deploy anywhere. Works optimally across mobile (iOS, Android), desktop server
and the Web
Ondevice ML Acceleration
Performance optimized end-to-end ondevice inference with ML acceleration for mobile GPU
& EdgeTPU compute
How Google uses MediaPipe
MediaPipe is used by many internal Google products and teams including: Nest, Gmail, Lens,
Maps, Android Auto, Photos, Google Home, and YouTube.
MediaPipe for Hand
MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs machine
learning (ML) to infer 21 3D landmarks of a hand from just a single frame. Whereas current
state-of-the-art approaches rely primarily on powerful desktop environments for inference,
our method achieves real-time performance on a mobile phone, and even scales to multiple
hands. We hope that providing this hand perception functionality to the wider research and
development community will result in an emergence of creative use cases, stimulating new
CHAPTER-5
Design and implementation
5.1 Architecture of the proposed system
5.1.1 Architectures
5.1.2
Module description
1. The input module captures the user's hand gestures or movements and passes them to
the gesture recognition module.
2. The gesture recognition module interprets the input and determines the appropriate
action to take, such as moving the brush or changing the brush size. It then sends this
information to the drawing module.
3. The drawing module uses the input from the gesture recognition module to update the
canvas image, adding lines or strokes as needed.
4. The output module receives the updated canvas image from the drawing module and
displays it to the user on the screen.
5. The user interface module receives input from the user, such as button clicks or
selections, and passes this input to the appropriate module for processing. For
example, if the user selects a different brush color, the user interface module would
pass this information to the drawing module, which would update the brush color
accordingly.
6. The communication module may also be involved in the process, for example if the
user wants to save their drawing or share it with others. In this case, the
communication module would handle the transfer of data to and from external devices
or systems.
5.2 ALGORITHM
1.Start reading the frames and convert the captured frames to HSV color space (Easy for
color detection).
2. Prepare the canvas frame and put the respective ink buttons on it.
3.Adjust the track bar values for finding the mask of the colored marker.
5.Detect the contours, find the center coordinates of largest contour and keep storing them in
the array for successive frames (Arrays for drawing points on canvas).
6.Finally draw the points stored in an array on the frames and canvas
1. Writing Mode - In this state, the system will trace the fingertip coordinates and stores
them.
2. Colour Mode – The user can change the colour of the text among the various available
colours.
1. 3. Backspace - Say if the user goes wrong, we need a gesture to add a quick
backspace Motion tracking: This algorithm is responsible for tracking the user's
hand movements in real-time. It may use techniques such as frame differencing,
optical flow, or feature tracking to identify the location and movement of the
hand.
2. Skeleton tracking: This algorithm is responsible for detecting the bones and joints in
the user's hand and arm, and creating a skeleton model of them. This allows the
application to more accurately interpret the user's hand gestures.
3. Gesture recognition: This algorithm is responsible for interpreting the user's hand
gestures and determining the appropriate action to take, such as moving the brush or
changing the brush size. It may use techniques such as hidden Markov models,
decision trees, or machine learning algorithms to classify the gestures.
4. Drawing: This algorithm is responsible for rendering the lines or strokes on the screen
as the user moves their hand. It may use techniques such as image manipulation or
graphics rendering to create the final image.
5. Machine learning: Depending on the complexity of the gestures that the application
needs to recognize, it may also use machine learning algorithms to train a model to
classify the gestures. This may involve using a dataset of labeled gestures to train the
model, and then using the trained model to classify new gestures in real-time.
5.3 System design
Testing is the process of exercising software with the intent of finding errors and ultimately
correcting them. The following testing techniques have been used to make this project free of
errors. Content Review
The whole content of the project has been reviewed thoroughly to uncover typographical
errors, grammatical error and ambiguous sentences.
Navigation Errors
Different users were allowed to navigate through the project to uncover the navigation
errors. The views of the user regarding the navigation flexibility and user friendliness were
taken into account and implemented in the project.
Unit Testing
Focuses on individual software units, groups of related units.
Unit – smallest testable piece of software.
A unit can be compiled /assembled / linked/loaded; and put under a test harness.
Unit testing done to show that the unit does not satisfy the application and /or its
implemented software does not match the intended designed structure.
Integration Testing
Focuses on combining units to evaluate the interaction among them
Integration is the process of aggregating components to create larger components.
Integration testing done to show that even though components were individually
satisfactory, the combination is incorrect and inconsistent.
System testing
Focuses on a complete integrated system to evaluate compliance with specified requirements
(test characteristics that are only present when entire system is run)
A system is a big component.
System testing is aimed at revealing bugs that cannot be attributed to a component as such,
to inconsistencies between components or planned interactions between components.
Concern: issues, behaviors that can only be exposed by testing the entire integrated system
(e.g., performance, security, recovery)each form encapsulates (labels, texts, grid etc.). Hence
in case of project in V.B. form are the basic units. Each form is tested thoroughly in term of
calculation, display etc.
Regression Testing
Each time a new form is added to the project the whole project is tested thoroughly to rectify
any side effects. That might have occurred due to the addition of the new form. Thus
regression testing has been performed.
White-Box testing
White-box testing (also known as clear box testing, glass box testing, transparent box testing
and structural testing) tests internal structures or workings of a program, as opposed to the
functionality exposed to the end-user. In white-box testing an internal perspective of the
system, as well as programming skills, are used to design test cases. The tester chooses inputs
to exercise paths through the code and determine the appropriate outputs.
Black-
Black-box testing
box testing treats the software as a “black box”, examining functionality without any
knowledge of internal implementation. The tester is only aware of what the software is
supposed to do, not how it does it. Black-box testing methods include: equivalence
5455
partitioning, boundary value analysis, all-pairs testing, state transition tables, decision table
testing, fuzz testing, model-based testing, use case testing, exploratory testing and
specification-based testing. Specification-based testing aims to test the functionality of
software according to the applicable requirements. This level of testing usually requires
thorough test cases to be provided to the tester, who then can simply verify that for a given
input, the output value (or 55ehavior), either “is” or “is not” the same as the expected value
specified in the test case. Test cases are built around specifications and requirements, i.e.,
what the application is su posed to do. It uses external descriptions of the software, including
specifications, requirements, and designs to derive test cases. These tests can be functional or
non-functional, though usually functional.
Alpha Testing
Alpha testing is simulated or actual operational testing by potential users/customers or an
independent test team at the developers’ site. Alpha testing is often employed for off-the-
shelf software as a form of internal acceptance testing, before the software goes to beta
testing.
Beta Testing
Beta testing comes after alpha testing and can be considered a form of external user
acceptance testing. Versions of the software, known as beta versions, are released to a limited
audience outside of the programming team. The software is released to groups of people so
that further testing can ensure the product has few faults or bugs. Sometimes, beta versions
are made available to the open public to increase the feedback field to a maximal number of
future users
import numpy as np
import cv2
cv2: The OpenCV (Open Source Computer Vision) library. It provides a wide
range of functions and tools for computer vision tasks, including image and
video processing, object detection, and machine learning.
deque: A double-ended queue implementation from the collections module. It
allows you to add and remove elements from both ends of the queue
efficiently.
The import numpy as np line imports the numpy library and assigns it the
alias np, which is a common convention. This allows you to refer to the library
using the shorter np name instead of typing out numpy every time.
The import cv2 line imports the cv2 module, which provides access to the
functions and tools in the OpenCV library.
The from collections import deque line imports the deque class from the
collections module. This allows you to create deque objects, which are double-
ended queues that you can add and remove elements from efficiently.
def setValues(x):
print("")provided d
The setValues function you have defined takes a single parameter x, but it does not do anything
with it. It simply prints an empty string.
It is likely that this function is intended to be used as a callback function for a trackbar in the
OpenCV library. A trackbar is a graphical widget that allows the user to set a value by sliding a
knob along a range of values. When the trackbar is moved, the callback function is called with the
new value of the trackbar.
## Creating
Creating the
the trackbars
trackbars needed
needed for
for adjusting
adjusting the
the marker
colour
marker colour
cv2.namedWindow("Color
cv2.namedWindow("Color detectors")
detectors")
cv2.createTrackbar("Upper
cv2.createTrackbar("UpperHue",
Hue","Color
"Colordetectors",
detectors",153,
153,180,setValues)
180,setValues)
cv2.createTrackbar("Upper
cv2.createTrackbar("UpperSaturation",
Saturation","Color
"Colordetectors",
detectors",255,
255,255,setValues)
255,setValues)
bpoints = [deque(maxlen=1024)]
gpoints = [deque(maxlen=1024)]
rpoints = [deque(maxlen=1024)]
ypoints = [deque(maxlen=1024)]
This code appears to be defining four different arrays, each of which is a deque
(double-ended queue) with a maximum length of 1024. The deques are named
"bpoints", "gpoints", "rpoints", and "ypoints", and they are each associated with a
different color.
A deque is a data structure that allows you to add and remove elements from both the
front and the back of the queue. It is similar to a list, but it has more efficient insert
and delete operations for elements at the beginning and end of the queue. The
"maxlen" parameter specifies the maximum number of elements that the deque can
hold. If the deque reaches its maximum length and a new element is added, the oldest
element will be automatically removed to make room for the new one.
In this code, it looks like the four deques are being used to store points of different
colors. It is not clear from this code snippet how the deques are being used or what the
points represent. Without more context, it is difficult to provide a more detailed
explanation of this code.
# These indexes will be used to mark the points in particular arrays of specific colour
blue_index = 0
green_index = 0
red_index = 0
yellow_index = 0
#The =kernel
kernel to be used for dilation purpose
np.ones((5,5),np.uint8)
kernel = np.ones((5,5),np.uint8)
colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (0, 255, 255)]
colorIndex = 0 akdhwfh"kernel" is a 2D array of ones with a shape of (5, 5) and a data type
of "np.uint8". It is being used to create a kernel for the purpose of dilation. Dilation is a
morphological operation in image processing that is used to increase the size of features in an
image. It works by applying a structuring element (such as the kernel defined in this code) to
the input image and increasing the size of the features by adding pixels to the boundaries of
the features.
"kernel" is a 2D array of ones wi"kernel" is a 2D array of ones with a shape of (5, 5) and a
data type of "np.uint8". It is being used to create a kernel for the purpose of dilation. Dilation
is a morphological operation in image processing that is used to increase the size of features
in an image. It works by applying a structuring element (such as the kernel defined in this
code) to the input image and increasing the size of the features by adding pixels to the
boundaries of the features.
"colors" is a list of tuples that represent four different colors: blue, green, red, and yellow.
Each tuple consists of three integers that represent the red, green, and blue (RGB) values of
the color.th a shape of (5, 5) and a data type of "np.uint8". It is being used to create a kernel
for the purpose of dilation. Dilation is a morphological operation in image processing that is
used to increase the size of features in an image. It works by applying a structuring element
(such as the kernel defined in this code) to the input image and increasing the size of the
features by adding pixels to the boundaries of the features.
The top-left corner and bottom-right corner of the rectangle, specified as (x, y)
coordinates
In this code, four rectangles are being drawn on the image. The first rectangle is a thin black
outline with a thickness of 2. The remaining three rectangles are filled with the colors
specified in the "colors" list. The position and size of each rectangle is determined by the
coordinates of the top-left and bottom-right corners.
It is not clear from this code snippet how the resulting image is being used or what it
represents. Without more context, it is difficult to provide a more detailed explanation of this
code.
The "cv2.putText" function is used to draw text on an image. It takes several parameters:
The font to use for the text, specified using one of the constants from the
"cv2.FONT_HERSHEY_" family
The color of the text, specified as an (R, G, B) tuple The thickness of the text's
outline, specified as a positive integer
The line type, specified using one of the constants from the "cv2.LINE_" family
In this code, five lines of text are being drawn on the "paint Window" image. Each
line of text is positioned at a different (x, y) coordinate and has a different color and
thickness. The text is also being rendered using the
"cv2.FONT_HERSHEY_SIMPLEX" font and a scale of 0.5.
The final line of code in this snippet uses the "cv2.namedWindow" function to create
a window with the name "Paint" and the "cv2.WINDOW_AUTOSIZE" flag, which
indicates that the window should automatically resize to fit the displayed image. It is
not clear from this code snippet how the resulting image or window are being used or
what they represent. Without more context, it is difficult to provide a more detailed
explanation of this code.
cap = cv2.VideoCapture(0)
The "cv2.VideoCapture" function is used to open a video stream or a video file and create a
video capture object that can be used to read frames from the stream or file. It takes a single
parameter, which specifies the source of the video stream. In this case, the parameter is 0,
which indicates that the default webcam of the computer should be used as the source.
Once the video capture object has been created, it can be used to read frames from the video
stream using the "cap.read" method. The "cap.read" method returns a Boolean value
indicating whether a frame was successfully read, as well as the frame itself. The frames can
then be processed or displayed using various functions from the OpenCV library or other
image processing libraries.
It is not clear from this code snippet how the video capture object or the frames it captures are
being used or what they represent. Without more context, it is difficult to provide a more
detailed explanation of this code.
# Keep looping
# Keep looping
while True:
while True:
# Reading the frame from the camera
# Reading the frame from the camera
ret, frame = cap.read()
ret, frame = cap.read()
#Flipping the frame to see same side of yours
#Flipping the frame to see same side of yours
frame = cv2.flip(frame, 1)
frame = cv2.flip(frame, 1)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
Upper_hsv = np.array([u_hue,u_saturation,u_value])
Lower_hsv = np.array([l_hue,l_saturation,l_value])
The main loop of the code is an infinite while loop that reads a frame from the camera (stored
in a variable called cap) using the read() method, and then flips the frame horizontally using
the flip() method. This is done so that the output video appears as if it were being viewed
from the same side as the viewer.
Next, the code converts the frame from the BGR color space to the HSV color space using
the cvtColor() method. This is done because it is often easier to perform color-based image
processing tasks in the HSV color space, as it separates the hue (color), saturation (intensity),
and value (brightness) channels.
The code then uses the getTrackbarPos() method to get the values of several trackbars,
which are GUI widgets that allow the user to adjust a value by sliding a thumb along a track.
These trackbars are being used to set upper and lower bounds for the HSV values of the
pixels in the frame. The upper bounds are stored in the Upper_hsv array, and the lower
bounds are stored in the Lower_hsv array
Finally, the code uses these upper and lower bounds to perform some kind of color-based
image processing on the frame, although it is not clear from the code snippet exactly what
this processing entails.
# Adding the colour buttons to the live frame for colour access
-1)
22222
The first line of code uses the inRange() function to create a binary mask from the
HSV frame (hsv) by thresholding the values of the pixels in the frame. This mask will
have pixels set to 255 (white) wherever the values of the pixels in the frame fall
within the specified range (Lower_hsv to Upper_hsv) and pixels set to 0 (black)
everywhere else. This mask will highlight or "mask out" the pixels in the frame that
fall within the specified range, which will be used to identify the pointer.
The next three lines of code apply morphological transformations to the mask to
refine it. The erode() function erodes away the boundaries of the white pixels in the
mask, reducing their size. The morphologyEx() function performs an "opening"
operation, which consists of an erosion followed by a dilation. This can be used to
remove small, isolated pixels from the mask. The dilate() function dilates the white
pixels in the mask, increasing their size.
It is not clear from the code snippet exactly what the "pointer" is or how it is being
used, but it appears that the mask is being used to identify and locate it in the video
frame.
# Find contours for the pointer after idetifying it
cv2.CHAIN_APPROX_SIMPLE)
center = None
The findContours() function is used to locate the contours of the objects in the binary
mask (Mask). The first argument to this function is the mask itself, and the second
argument specifies the mode of contour retrieval. The RETR_EXTERNAL flag
specifies that only the extreme outer contours of the objects in the mask should be
returned. The third argument specifies the contour approximation method to be used.
The CHAIN_APPROX_SIMPLE flag specifies that the contours should be
approximated using a simple algorithm that removes redundant points and compresses
the contour coordinates.
The findContours() function returns a list of contours (cnts) and a hierarchy of the
contours. The hierarchy is not being used in this code snippet, so it is discarded using
the underscore character (_) as a dummy variable.
The center variable is also being initialized to None, although it is not clear from the
code snippet exactly what this variable will be used for. It may be used to store the
coordinates of the center of the pointer or some other aspect of its shape or position.
If len(cnts)>0:
# Get the radius of the enclosing circle around the found contour
((x, y), radius) = cv2.minEnclosingCircle(cnt)
M = cv2.moments(cnt)
The code first checks if there are any contours present in the binary mask (cnts). If there
are, it selects the largest contour (cnt) using the sorted() function and the contourArea()
method. This ensures that the pointer, which is assumed to be the largest object in the mask,
is selected.
Next, the code uses the minEnclosingCircle() function to fit a circle around the contour and
get the coordinates of the center of the circle (x, y) and its radius. The code then uses the
circle() function to draw the circle around the contour in the frame.
Finally, the code calculates the center of the contour using the moments() method and the
spatial moments of the contour. The moments() method returns a dictionary of moments that
can be used to calculate various properties of the contour, such as its area, centroid, and
orientation. In this case, the centroid (center of mass) of the contour is being calculated by
dividing the first and second spatial moments by the zero-order moment (area). The resulting
coordinates are stored in the center variable.
It is not clear from the code snippet exactly what the "pointer" is or how it is being used, but
it appears that the code is using the contours and moments of the pointer to identify its
position and shape in the video frame.
# Now checking if the user wants to click on any button above the screen
if center[1] <= 65:
bpoints = [deque(maxlen=512)]
gpoints = [deque(maxlen=512)]
rpoints = [deque(maxlen=512)]
ypoints = [deque(maxlen=512)]
blue_index = 0
green_index = 0
red_index = 0
yellow_index = 0
paintWindow[67:,:,:] = 255
colorIndex = 0 # Blue
colorIndex = 1 # Green
colorIndex = 3 # Yellow
else :
if colorIndex == 0:
bpoints[blue_index].appendleft(center)
elif colorIndex == 1:
gpoints[green_index].appendleft(center)
elif colorIndex == 2:
rpoints[red_index].appendleft(center)
elif colorIndex == 3:
ypoints[yellow_index].appendleft(center)
The code first checks if the y coordinate of the pointer's center (center[1]) is less than
or equal to 65. If it is, then the pointer is considered to be within the top 65 pixels of
the frame, which corresponds to the area where the color buttons are located. In this
case, the code checks the x coordinate of the pointer's center (center[0]) to determine
which button the pointer is hovering over.
If the pointer is over the "CLEAR" button (40 <= center[0] <= 140), the code resets
several variables that are used to store the points of the pointer's path (bpoints,
gpoints, rpoints, ypoints). These variables are lists of deques (double-ended queues)
that are used to store the coordinates of the pointer as it moves around the frame. The
code also resets several variables that are used to track the position of the pointer in
the lists (blue_index, green_index, red_index, yellow_index).
If the pointer is over one of the color buttons (160 <= center[0] <= 255, 275 <=
center[0] <= 370, 390 <= center[0] <= 485, or 505 <= center[0] <= 600), the code
sets a variable called colorIndex to the index of the selected color.
If the pointer is not within the top 65 pixels of the frame, then it is assumed to be
within the main area of the frame where the user can draw. In this case, the code
checks the value of colorIndex to determine which color the user has selected, and
then appends the coordinates of the pointer's center to the appropriate list of points
(bpoints, gpoints, rpoints, or ypoints).
It is not clear from the code snippet exactly what the "pointer" is or how it is being
used, but it appears that the code is using the position of the pointer to allow the user
to select colors and draw on the video frame. The paintWindow variable, which is
not shown in the code snippet, may be used to store the drawings made by the user.
else:
bpoints.append(deque(maxlen=512))
blue_index += 1
gpoints.append(deque(maxlen=512))
green_index += 1
rpoints.append(deque(maxlen=512))
red_index += 1
ypoints.append(deque(maxlen=512))
yellow_index += 1
. Each of these lists is a list of deques (double-ended queues) that are used to store the
coordinates of the pointer as it moves around the frame. When the pointer not
detected, it is assumed that it has stopped moving and a new eque is appended to the
list to store the coordinates of the pointer's next movement. The maxlen argument to
the deque() function specifies the maximum number of elements that the deque can
hold.
It is not clear from the code snippet exactly what the "pointer" is or how it is being
used, but it appears that the code is using these lists and variables to store the
coordinates of the pointer's movements and allow the user to draw on the video frame.
for i in range(len(points)):
for j in range(len(points[i])):
continue
colors[i], 2)
a separate "canvas" (paintWindow) based on the coordinates of the "pointer" stored in four
lists (bpoints, gpoints, rpoints, ypoints).
The points variable is a list of the four lists, and the outer for loop iterates over the elements
of this list This code appears to be using the OpenCV library to draw lines on a video frame
and. The inner for loop iterates over the elements of each list, and the nested for loop iterates
over the elements of each deque in the list.
For each element in the deque, the code uses the line() function to draw a line on the video
frame (frame) and the canvas (paintWindow) between the current element and the previous
element in the deque. The colors list is used to specify the color of the line, and the 2
argument specifies the thickness of the line.
The continue statement is used to skip over any elements in the deque that are None, which
may occur if the pointer was not detected in the previous frame.
It is not clear from the code snippet exactly what the "pointer" is or how it is being used, but
it appears that the code is using the stored coordinates of the pointer's movements to allow
the user todraw lines on the video frame and the canvas.
# Show all the windows
cv2.imshow("Tracking", frame)
cv2.imshow("Paint", paintWindow)
cv2.imshow("mask",Mask)
This code is using the OpenCV (cv2) library to display three images in separate windows.
The first window, "Tracking", shows the frame of a video or a single image. The second
window, "Paint", shows an image called "paintWindow". The third window, "mask", shows
an image called "Mask".
cv2.imshow() is a function that takes in two arguments: the name of the window and the
image to be displayed in that window. When the code is run, it will open three windows and
display the respective images in them. The windows will remain open until the user closes
them or until the program is stopped.
The purpose of this code is likely to allow the user to view and analyze the images in each
window. The "Tracking" window may show the original video or image, while the "Paint"
and "mask" windows may show some kind of processed version of the original image, such
as a mask or an annotated version
The "cap" object is a video capture object that is used to capture video frames from a camera or
a video file. The cap.release() method releases the camera and frees up any resources it was
using. This is important to do when you are finished using the camera or video file, as it ensures
that the resources are released and can be used by other programs.
The cv2.destroyAllWindows() function closes all windows that were created using the
cv2.imshow() function. This is useful when you have multiple windows open and want to close
them all at once.
This code is likely being used to clean up resources and close windows at the end of the
program. It is good practice to release resources and close windows when you are finished using
them to ensure that your program is not using unnecessary resources and is not cluttering up the
user's desktop with open windows.
Chapter-7
Results and Output Screens
Chapter:8
Conclusion & Future Work
The system has the potential to challenge traditional writing methods. It eradicates the
need to carry a mobile
phone in hand to jot down notes, providing a simple on-
the-go way to do the same. It will also serve a great
purpose in helping especially abled people communicate
easily. Even senior citizens or people who find it difficult
to use keyboards will able to use system effortlessly.
Extending the functionality, system can also be used to
control IoT devices shortly. Drawing in the air can also be
made possible. The system will be an excellent software
for smart wearables using which people could better
interact with the digital world. Augmented Reality can
make text come alive. There are some limitations of the
system which can be improved in the future. Firstly,
using a handwriting recognizer in place of a character
recognizer will allow the user to write word by word,
making writing faster. Secondly, hand-gestures with a
pause can be used to control the real-time system as
done by [1] instead of using the number of fingertips.
Thirdly, our system sometimes recognizes fingertips in
the background and changes their state. Air-writing
systems should only obey their master's control gestures
and should not be misled by people around. Also, we
used the
EMNIST dataset, which is not a proper air-character
dataset. Upcoming object detection algorithms such as
YOLO v3 can improve fingertip recognition accuracy and
speed. In the future, advances in Artificial Intelligence
will enhance the efficiency of air-writing.
Future Work-:
Given more time to deal with this project, we would improve hand contour recognition,
investigate our unique Air Canvas objectives, and attempt to comprehend the multicore
module. To improve hand gesture tracking, we would need to dig more into OpenCV. There
are various strategies for contour analysis, yet in this specific algorithm, it very well might be
advantageous to investigate the color histogram used to create the contours in question.
Besides, we could explore different interpolation methods. PyGame incorporates a line
drawing technique (pygame.draw.line()) that could be valuable in creating smoother, cleaner
lines. On a similar vein, implementing a variety of brush shapes, surfaces, and even an eraser
would make Air Canvas more powerful as a drawing program. Permitting the client to save
their final work or watch their drawing cycle as a playback animation could likewise be
remarkable highlights that look like genuine innovative software. Maybe there would even be
an approach to interface Air Canvas to genuine computerized drawing projects, for example,
Adobe Photoshop, Clip Studio Paint, or GIMP! At last, we could make critical walks by
sorting out how multicore processing works with in-order information processing.
).