0% found this document useful (0 votes)
2 views10 pages

Automative Media Player (Explanation of Code)

The document describes a Python script that utilizes OpenCV, MediaPipe, and PyAutoGUI to detect hand gestures via webcam and simulate keyboard inputs based on the number of fingers shown. It includes a function to count fingers based on hand landmarks and implements a loop to process video frames, detect hand gestures, and trigger corresponding keyboard actions. The script continuously captures video, recognizes gestures, and displays the processed frames until the user exits the program.

Uploaded by

subharoopkabi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views10 pages

Automative Media Player (Explanation of Code)

The document describes a Python script that utilizes OpenCV, MediaPipe, and PyAutoGUI to detect hand gestures via webcam and simulate keyboard inputs based on the number of fingers shown. It includes a function to count fingers based on hand landmarks and implements a loop to process video frames, detect hand gestures, and trigger corresponding keyboard actions. The script continuously captures video, recognizes gestures, and displays the processed frames until the user exits the program.

Uploaded by

subharoopkabi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

import cv2

import mediapipe as mp
import pyautogui
import time

Certainly! The code snippet you provided imports several Python


libraries. Let me break down each one for you:

1. `cv2`: OpenCV (Open Source Computer Vision Library) is a popular


library used for various computer vision tasks such as image and video
manipulation, object detection, face recognition, etc. In your code, it's
imported with the alias `cv2`.

2. `mediapipe`: MediaPipe is an open-source framework developed by


Google for building cross-platform, real-time multimodal applied
machine learning pipelines. It provides ready-to-use ML solutions for
various tasks such as hand tracking, pose detection, face detection, etc.
Here, it's imported with the alias `mp`.

3. `pyautogui`: PyAutoGUI is a Python library for automating keyboard


and mouse movements, especially useful for tasks like GUI automation,
testing, etc. It allows you to control the mouse and keyboard
programmatically. In your code, it's imported as `pyautogui`.

4. `time`: This is a standard Python library providing various time-related


functions. It allows you to manipulate time, measure time intervals, and
so on. Here, it's imported without an alias.

Combining these libraries, it seems like you might be working on a


project that involves computer vision, potentially using MediaPipe for
tasks like hand tracking or pose detection, and automating mouse and
keyboard actions using PyAutoGUI. The `time` library might be used for
introducing time delays or for timing operations within your code.
def count_fingers(lst):
cnt = 0

thresh = (lst.landmark[0].y*100 - lst.landmark[9].y*100)/2

if (lst.landmark[5].y*100 - lst.landmark[8].y*100) > thresh:


cnt += 1

if (lst.landmark[9].y*100 - lst.landmark[12].y*100) > thresh:


cnt += 1

if (lst.landmark[13].y*100 - lst.landmark[16].y*100) > thresh:


cnt += 1

if (lst.landmark[17].y*100 - lst.landmark[20].y*100) > thresh:


cnt += 1

if (lst.landmark[5].x*100 - lst.landmark[4].x*100) > 6:


cnt += 1

return cnt

This Python function `count_fingers(lst)` seems to be designed to


count the number of fingers detected in a hand pose. Let's break down
the code:

1. **Function Definition**:
```python
def count_fingers(lst):
```
- This line defines a function named `count_fingers` that takes one
argument `lst`. It seems `lst` is expected to be an object containing
landmarks detected on a hand using some library like MediaPipe.

2. **Initialization**:
```python
cnt = 0
```
- Initializes a variable `cnt` to count the number of fingers detected.
It's set to 0 initially.

3. **Threshold Calculation**:
```python
thresh = (lst.landmark[0].y*100 - lst.landmark[9].y*100)/2
```
- Calculates a threshold value based on the difference in y-
coordinates of specific landmarks on the hand. This threshold might be
used to determine whether a finger is open or closed.

4. **Finger Counting**:
- The following `if` statements check certain conditions to determine
whether each finger is open or closed:
```python
if (lst.landmark[5].y*100 - lst.landmark[8].y*100) > thresh:
cnt += 1
```
- Checks if the distance between landmarks corresponding to the tip
of the thumb and the base of the index finger is greater than the
threshold.
- Similar logic is applied to other fingers, from index finger to pinky.

5. **Thumb Position Check**:


```python
if (lst.landmark[5].x*100 - lst.landmark[4].x*100) > 6:
cnt += 1
```
- Checks if the x-coordinate difference between two landmarks
associated with the thumb is greater than 6. This might be used to check
if the thumb is extended or not.

6. **Return Count**:
```python
return cnt
```
- Returns the count of fingers detected based on the conditions
checked in the function.
So, overall, this function calculates the number of fingers based on the
landmarks detected in a hand pose, using some predefined conditions
and thresholds.

cap = cv2.VideoCapture(0)

drawing = mp.solutions.drawing_utils
hands = mp.solutions.hands
hand_obj = hands.Hands(max_num_hands=1)

This code segment seems to be capturing video from a webcam feed


using OpenCV (`cv2`). Let's break down each part:

1. **Camera Capture Initialization**:


```python
cap = cv2.VideoCapture(0)
```
- This line initializes a video capture object named `cap` which will
capture video from the default camera (index 0). If you have multiple
cameras connected, you can specify a different index to capture from a
different camera.

2. **Mediapipe Setup**:
```python
drawing = mp.solutions.drawing_utils
hands = mp.solutions.hands
hand_obj = hands.Hands(max_num_hands=1)
```
- These lines import the necessary modules from the MediaPipe
library (`mediapipe`) and set up a hand tracking object.
- `mp.solutions.drawing_utils` provides utility functions to draw
landmarks and connections on the image.
- `mp.solutions.hands` provides a pre-trained model for hand
tracking.
- `hand_obj = hands.Hands(max_num_hands=1)` initializes the hand
tracking object, specifying that it should detect a maximum of 1 hand in
the frame.
Overall, this code segment sets up a video capture object using
OpenCV and initializes a hand tracking object using the MediaPipe
library. The hand tracking object will be used to detect and track hands
in the video frames captured by the webcam.

start_init = False

prev = -1

while True:
end_time = time.time()
_, frm = cap.read()
frm = cv2.flip(frm, 1)

res = hand_obj.process(cv2.cvtColor(frm, cv2.COLOR_BGR2RGB))

if res.multi_hand_landmarks:

hand_keyPoints = res.multi_hand_landmarks[0]

cnt = count_fingers(hand_keyPoints)

if not(prev==cnt):
if not(start_init):
start_time = time.time()
start_init = True

elif (end_time-start_time) > 0.2:


if (cnt == 1):
pyautogui.press("right")

elif (cnt == 2):


pyautogui.press("left")

elif (cnt == 3):


pyautogui.press("up")

elif (cnt == 4):


pyautogui.press("down")

elif (cnt == 5):


pyautogui.press("space")

prev = cnt
start_init = False

drawing.draw_landmarks(frm, hand_keyPoints,
hands.HAND_CONNECTIONS)

cv2.imshow("window", frm)

if cv2.waitKey(1) == 27:
cv2.destroyAllWindows()
cap.release()
Break

This code appears to be a Python script designed to control keyboard


inputs based on hand gestures detected in real-time using a webcam.
Let's break it down:

1. **Variable Initialization**:
- `start_init` is a boolean variable initialized to `False`. It's used to
track whether the hand gesture recognition process has started.
- `prev` is a variable initialized to `-1`. It's used to track the previous
finger count.

2. **Main Loop**:
- `while True:` initiates an infinite loop to continuously process video
frames from the webcam.

3. **Capturing Video Frame**:


- `end_time = time.time()`: Records the current time for timing
purposes.
- `_, frm = cap.read()`: Captures a frame from the webcam feed using
`cap.read()`. The underscore `_` is used to discard the return value, and
the captured frame is stored in the variable `frm`.
- `frm = cv2.flip(frm, 1)`: Flips the frame horizontally, probably to
correct for the mirror effect of the webcam.

4. **Hand Detection and Gesture Recognition**:


- `res = hand_obj.process(cv2.cvtColor(frm, cv2.COLOR_BGR2RGB))`:
Processes the frame to detect hands using the hand tracking object
(`hand_obj`).
- `if res.multi_hand_landmarks:`: Checks if hand landmarks are
detected in the frame.
- `hand_keyPoints = res.multi_hand_landmarks[0]`: Retrieves the
landmarks of the first detected hand.
- `cnt = count_fingers(hand_keyPoints)`: Calculates the number of
fingers based on the detected hand landmarks using the `count_fingers`
function.

5. **Gesture Recognition and Keyboard Control**:


- Checks if the finger count has changed compared to the previous
count:
- If it has changed and `start_init` is `False`, it initializes the start
time (`start_time`) and sets `start_init` to `True`.
- If the time elapsed since `start_time` is greater than 0.2 seconds,
it recognizes the hand gesture based on the finger count (`cnt`) and
simulates keyboard inputs using `pyautogui.press()` accordingly. It then
updates `prev` with the current finger count and sets `start_init` back to
`False`.

6. **Drawing Hand Landmarks**:


- `drawing.draw_landmarks(frm, hand_keyPoints,
hands.HAND_CONNECTIONS)`: Draws landmarks and connections on the
frame to visualize hand tracking.

7. **Displaying the Frame**:


- `cv2.imshow("window", frm)`: Displays the processed frame in a
window named "window".

8. **Exiting the Program**:


- `if cv2.waitKey(1) == 27:`: Waits for a key press for 1 millisecond. If
the key pressed is the escape key (ESC, ASCII value 27), it destroys all
OpenCV windows, releases the webcam (`cap.release()`), and exits the
loop, terminating the program.

Overall, this code continuously captures video frames, detects hand


gestures, recognizes specific gestures based on finger counts, and
controls keyboard inputs accordingly. It uses OpenCV for webcam access,
MediaPipe for hand tracking, and PyAutoGUI for simulating keyboard
inputs.

You might also like