0% found this document useful (0 votes)
6 views6 pages

Virtual AI Mouse

Uploaded by

aarekela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Virtual AI Mouse

Uploaded by

aarekela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

AI Based Virtual Mouse is an AI-based project that allows users to access the mouse with hand

gestures, without having to physically touch the mouse. This solution integrates a chatbot with a
basic camera, rather than a traditional mouse. This solution cuts down on the need for hardware
such as a wireless mouse or a Bluetooth mouse by capturing hand motions and fingertip recognition
using computer vision and a webcam or built-in camera. The user can perform various actions such
as clicking,volume control, brightness control and dragging with different hand gestures.

​ mp_drawing = mp.solutions.drawing_utils: This line assigns the drawing_utils module from the
Mediapipe library to the variable mp_drawing. This module is used for drawing landmarks and annotations on
images and videos.
​ mp_hands = mp.solutions.hands: This line assigns the hands module from the Mediapipe library to the
variable mp_hands. This module is used for hand tracking and landmark detection.
​ click = 0: This line initializes a variable click with a value of 0. It's likely used to keep track of whether a
click action has been performed or not.
​ video = cv2.VideoCapture(0): This line initializes a video capture object named video using OpenCV. It
opens the default camera (camera index 0) for video input.
​ volume_level = 50: This line initializes a variable volume_level with a value of 50. This may represent an
initial volume level for some audio-related functionality in the script.
​ brightness_level = 50: This line initializes a variable brightness_level with a value of 50. This may
represent an initial brightness level for some screen-related functionality in the script.
​ prev_indexfingertip_x = None: This line initializes a variable prev_indexfingertip_x with the value None. It
is used to store the previous X-coordinate of the index fingertip.
​ prev_indexfingertip_y = None: This line initializes a variable prev_indexfingertip_y with the value None. It
is used to store the previous Y-coordinate of the index fingertip.
​ prev_middlefingertip_x = 0: This line initializes a variable prev_middlefingertip_x with a value of 0. It is
used to store the previous X-coordinate of the middle fingertip.
​ indexfingertip_y = 0: This line initializes a variable indexfingertip_y with a value of 0. It is used to store
the Y-coordinate of the index fingertip.
​ middlefingertip_x = 0: This line initializes a variable middlefingertip_x with a value of 0. It is used to
store the X-coordinate of the middle fingertip.

​ with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.8) as hands::

​ This line initializes a context manager that creates an instance of the Hands class from the mp_hands module
(Mediapipe hand tracking).
​ min_detection_confidence and min_tracking_confidence are set to 0.8, which means that only hand
detections with confidence scores greater than or equal to 0.8 will be considered for tracking.
​ while video.isOpened()::

​ This line starts a while loop that continues as long as the video capture source (represented by the video
object) is open and valid.
​ _, frame = video.read():

​ This line reads a frame from the video capture source using the video.read() method.
​ The _ is used to discard the return value that represents whether the read operation was successful or not.
​ The resulting frame is stored in the variable frame.
​ image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB):

​ This line converts the frame (which is in the BGR color format commonly used by OpenCV) into the RGB
color format.
​ This conversion is necessary because the mediapipe library expects RGB images.
​ image = cv2.flip(image, 1):

​ This line horizontally flips the image using the cv2.flip() function. This is often done to ensure that the
hand tracking works correctly regardless of the camera's orientation.
​ imageHeight, imageWidth, _ = image.shape:

​ This line extracts the height and width of the image using the shape attribute of the NumPy array.
​ is used to discard the third element in the shape tuple, which represents the number of color channels (e.g.,
3 for RGB).
​ results = hands.process(image):

​ This line processes the image using the hands object (Mediapipe hand tracking).
​ It returns the results of hand detection and tracking, including the landmarks and confidence scores.
​ image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR):

​ This line converts the image back to the BGR color format.
​ This conversion is done because the subsequent code may involve OpenCV functions that expect BGR
images.

​ if results.multi_hand_landmarks::

● This line checks if there are any hand landmarks detected in the results object (previously
obtained through hand tracking).
● If there are hand landmarks, it enters the block of code.
​ for num, hand in enumerate(results.multi_hand_landmarks)::

● This line iterates over each detected hand within the results.multi_hand_landmarks list.
● num represents the index of the hand (0 for the first hand, 1 for the second, if present), and hand
represents the landmarks for that hand.
​ mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, ...):

● This line uses mp_drawing.draw_landmarks to draw the landmarks and hand connections on the
image.

● hand represents the landmarks for the current hand.

● mp_hands.HAND_CONNECTIONS specifies the connections between hand landmarks to be drawn.

● The DrawingSpec object is used to specify the color, thickness, and circle radius for drawing the
landmarks.
​ if results.multi_hand_landmarks != None::
● This line checks if there are any detected hand landmarks again. It's essentially doing the same
check as in line 1.
​ for handLandmarks in results.multi_hand_landmarks::

● This line iterates over each set of hand landmarks (one set for each detected hand) within
results.multi_hand_landmarks.

​ for point in mp_hands.HandLandmark::

● This line iterates over each hand landmark type defined in mp_hands.HandLandmark. It represents
specific points on the hand (e.g., index finger tip).
​ normalizedLandmark = handLandmarks.landmark[point]:

● This line retrieves the normalized coordinates of the current hand landmark (point) from the
detected hand landmarks (handLandmarks).
​ pixelCoordinatesLandmark = mp_drawing._normalized_to_pixel_coordinates(...):

● This line converts the normalized landmark coordinates to pixel coordinates on the image, taking
into account the image dimensions (imageWidth and imageHeight).
​ point = str(point):

● This line converts the point variable, which represents a hand landmark type, into a string.
​ if point == 'HandLandmark.INDEX_FINGER_TIP'::

● This line checks if the current point is equal to the string representation of the index finger tip.
​ indexfingertip_x = pixelCoordinatesLandmark[0] and indexfingertip_y =
pixelCoordinatesLandmark[1]:
● These lines extract the X and Y coordinates of the index finger tip in pixel coordinates.
​ win32api.SetCursorPos((indexfingertip_x * 4, indexfingertip_y * 5)):

● This line uses the win32api module to set the cursor position on the screen based on the detected
finger tip position. It scales the coordinates by multiplying them by 4 and 5, respectively.
​ brightness_level = int((indexfingertip_y / imageHeight) * 100):
● This line calculates a brightness level based on the vertical position of the index finger tip on the
screen. It maps the Y-coordinate to a brightness level between 0 and 100.
​ set_brightness(brightness_level):

● This line sets the screen brightness level based on the calculated brightness_level.
​ prev_indexfingertip_x = indexfingertip_x and prev_indexfingertip_y = indexfingertip_y:
● These lines update the previous X and Y coordinates of the index finger tip for future reference.
​ except::
● This is a generic exception handling block. If any errors occur in the try block (e.g., if setting cursor
position fails), they are caught here, and the code continues without raising an exception.

​ elif point == 'HandLandmark.MIDDLE_FINGER_TIP'::

● This line checks if the current point represents the middle finger tip.
​ try::
● This begins a try block to handle potential exceptions.
​ middlefingertip_x = pixelCoordinatesLandmark[0] and middlefingertip_y =
pixelCoordinatesLandmark[1]:
● These lines extract the X and Y coordinates of the middle finger tip in pixel coordinates.
​ if prev_middlefingertip_x != 0::
● This line checks if the previous X-coordinate of the middle finger tip is not zero, indicating that there
was a previous position.
​ x_diff = middlefingertip_x - prev_middlefingertip_x:
● This line calculates the difference in X-coordinates between the current and previous positions of
the middle finger tip.
​ if x_diff > 10::
● This line checks if the X-coordinate difference is greater than 10, suggesting that the hand has
moved to the right.
​ volume_level += 10 and volume_level -= 10:

● These lines adjust the volume_level by incrementing or decrementing it by 10, respectively.


​ if volume_level > 100: and if volume_level < 0::

● These lines ensure that volume_level remains within the range of 0 to 100.
​ pyautogui.press('volumeup') and pyautogui.press('volumedown'):

● These lines simulate pressing the volume up and volume down keys using the pyautogui library,
respectively.
​ except::
● This is a generic exception handling block. If any errors occur in the try block, they are caught here,
and the code continues without raising an exception.
​ elif point == 'HandLandmark.THUMB_TIP'::

● This line checks if the current point represents the thumb tip.
​ try::
● This begins a try block to handle potential exceptions.
​ thumbfingertip_x = pixelCoordinatesLandmark[0] and thumbfingertip_y =
pixelCoordinatesLandmark[1]:
● These lines extract the X and Y coordinates of the thumb tip in pixel coordinates.
​ The code calculates the Euclidean distance (Distance_x and Distance_y) between the index fingertip and
thumb fingertip in both the X and Y directions.
​ if Distance_x < 12 or Distance_x < -12: and if Distance_y < 12 or Distance_y < -12::
● These lines check if the distance between the index fingertip and thumb fingertip is within a certain
threshold in both the X and Y directions.
​ click = click + 1:

● This line increments the click counter, which is likely used to keep track of the number of clicks
performed.
​ if click % 5 == 0::
● This line checks if the number of clicks is a multiple of 5.
​ pyautogui.click():

● This line simulates a mouse click using the pyautogui library.

​ prev_indexfingertip_y = indexfingertip_y:
● This line updates the prev_indexfingertip_y variable with the current value of
indexfingertip_y. This is done to store the Y-coordinate of the index fingertip for the
next iteration of the loop.
​ prev_middlefingertip_x = middlefingertip_x:
● This line updates the prev_middlefingertip_x variable with the current value of
middlefingertip_x. This is done to store the X-coordinate of the middle fingertip for the
next iteration of the loop.
​ feedback_text = f"Brightness: {brightness_level}%, Volume: {volume_level}%":
● This line creates a text string feedback_text that includes the current values of
brightness_level and volume_level. It's formatted as a string.
​ cv2.putText(image, feedback_text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0),
2):
● This line adds text to the image using OpenCV's cv2.putText function.
● image is the image to which the text is added.
● feedback_text is the text string to be added.
● (10, 30) specifies the position where the text is placed (top-left corner of the image).
● cv2.FONT_HERSHEY_SIMPLEX is the font type.
● 1 is the font scale.
● (0, 255, 0) specifies the color of the text in BGR format (here, it's green).
● 2 is the thickness of the text.
​ cv2.imshow('Hand Tracking', image):
● This line displays the modified image with the added feedback text in a window titled
"Hand Tracking" using OpenCV's cv2.imshow function.
​ if cv2.waitKey(10) & 0xFF == ord('q')::
● This line checks for a key press event with a wait time of 10 milliseconds. It's used to
capture user input.
● cv2.waitKey(10) waits for a key event for 10 milliseconds.
● & 0xFF is a bitwise AND operation with 0xFF to extract the lower 8 bits of the key code.
● ord('q') returns the ASCII code for the 'q' key.
● The line checks if the pressed key is 'q', and if so, it breaks out of the loop.
​ video.release():
● This line releases the video capture object (video). It's important to release the video
source when done to free up system resources.

This code is a Python script that uses the MediaPipe library to perform hand tracking and gesture recognition using a
webcam feed. It allows you to control screen brightness and volume on your computer based on the position and
gestures of your hand in front of the camera. Here's a breakdown of what's happening in the code:
​ Importing necessary libraries:
● mediapipe is used for hand tracking and landmark detection.
● cv2 (OpenCV) is used for computer vision tasks.
● numpy is used for numerical operations.
● win32api is used for controlling the mouse cursor.
● pyautogui is used for simulating keyboard and mouse input.
● screen_brightness_control is used to adjust the screen brightness.
​ Initializing variables:
● video is used to capture the video feed from the default webcam (index 0).
● volume_level and brightness_level are used to store the current volume and screen brightness
levels.
● Various variables like prev_indexfingertip_x, prev_indexfingertip_y, prev_middlefingertip_x,
indexfingertip_y, and middlefingertip_x are used to track hand and finger positions and previous
values.
​ Setting up MediaPipe hands model:
● The script initializes the MediaPipe hands model with minimum detection and tracking confidence
thresholds.
​ Main loop:
● The script enters a continuous loop to process video frames from the webcam.
​ Hand tracking and landmark detection:
● It reads each frame and converts it to the RGB color space.
● It flips the frame horizontally to mirror the video feed.
● It uses the MediaPipe hands model to detect and track hand landmarks (such as fingertips) in the
frame.
● It draws landmarks and connections on the frame using mp_drawing.
​ Gesture recognition and control:
● For each detected hand and landmark, it checks the type of landmark (e.g., thumb tip, index finger
tip).
● Based on the position of the index finger tip, it adjusts the screen brightness and cursor position.
● Based on the position of the middle finger tip and its movement, it adjusts the volume level using
pyautogui.
● It also checks for a "pinch" gesture between the thumb and index finger and simulates a mouse
click using pyautogui.click().
​ Updating feedback text:
● The script displays feedback text on the screen, showing the current brightness and volume levels.
​ Displaying the frame:
● It displays the processed frame with annotations and feedback text.
​ Exiting the loop:
● The loop continues until the 'q' key is pressed, at which point the video feed is released, and the
program exits.

You might also like