Document From ManGo
Document From ManGo
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION TO THE PROJECT
Football, as one of the world’s most popular sports, is not only about physical prowess
but also about strategic and tactical precision. In recent years, the importance of data analytics
in football has skyrocketed, influencing everything from player training to in-game decision-
making.
Coaches and analysts have traditionally relied on manual methods to review match
footage and derive insights about formations, tactics, and player performance. However, these
manual processes are time-consuming, labor-intensive, and prone to human error, limiting their
effectiveness in fast-paced sports like football.
YOLO is chosen due to its real-time object detection capabilities and high accuracy,
making it suitable for fast-moving sports like football. OpenCV is employed for video
processing, enabling frame-by-frame analysis and manipulation.
Together, these tools allow for the identification of key elements on the field, such as
players, the ball, and referees, to extract meaningful insights. By detecting patterns and tracking
movement, the system can aid in analyzing team formations, player strategies, and event
occurrences such as shots on goal, fouls, and passes.
The project enhances tactical insights by offering precise data on player positioning
and movements throughout the match. Additionally, it improves referee monitoring and
decision-making, while enabling teams and coaches to make informed adjustments during and
after games. Overall, it contributes to a deeper understanding of the game, benefiting both
coaching and player development.
The primary problem in football analysis lies in the manual tracking and assessment of
player movements, ball possession, and team strategies, which is time-consuming, prone to errors,
and lacks real-time precision.
The growing demand for more detailed and accurate football analysis, combined with the
limitations of manual video review, creates a need for automated solutions. Current manual
analysis methods are inefficient for tracking complex patterns such as player movements, ball
trajectories, and tactical formations over the course of an entire game. Additionally, the ability to
extract real-time insights during live matches is still underdeveloped, leaving coaches and analysts
dependent on post-match reviews. This bottleneck in real-time, data-driven decision-making has
motivated the development of automated systems for football analysis.
1. Operating System:
o Windows 10/11, Linux (Ubuntu preferred), or macOS.
2. Programming Language:
o Python 3.7 or higher.
3. Libraries and Frameworks:
o YOLO (You Only Look Once): For real-time object detection.
o OpenCV (Open-Source Computer Vision Library): For video and image
processing.
o TensorFlow or PyTorch (optional): If you're training your YOLO model from
scratch.
o NumPy: For numerical operations.
o Supervision: (For further object detection tracking).
o Matplotlib/Seaborn: For visualizing the results and generating plots (heatmaps,
etc.).
4. Development Environment:
o IDE/Text Editor: PyCharm, VSCode, Jupyter Notebook, or any Python-compatible
IDE.
1. CPU:
CHAPTER 2
LITERATURE SURVEY
This paper presents a method of quantifying ball possession and its usage in foot- ball
sports data analytics by using object detection and object tracking. After comparing the
performance of YOLOv5 and YOLOv8 which are two state-of-the-art object detection models, the
latter was chosen to be used along with BYTETrack for object detection and tracking. The input
will be a video stream of a football game taken from a tactical camera which is passed to the object
detection module. The detected objects are individually tracked and ball possession is calculated
per player by assigning unique track-id for all players. Finally, aggregating player’s individual ball
possession into their respective teams provides a way of estimating the team’s ball possession.
This paper critically reviews existing literature relating to performance analysis (PA) in
football, arguing that an alternative approach is warranted. The paper considers the applicability
of variables analysed along with research findings in the context of their implications for
professional practice. This includes a review of methodological approaches commonly adopted
throughout PA research, including a consideration of the nature and size of the samples used in
relation to generalisability. Definitions and classifications of variables used within performance
analysis are discussed in the context of reliability and validity. The contribution of PA findings to
the field is reviewed. The review identifies an overemphasis on researching predictive and
performance controlling variables. A different approach is proposed that works with and from
performance analysis information to develop research investigating athlete and coach learning,
thus adding to applied practice. Future research should pay attention to the social and cultural
influences that impact PA delivery and athlete learning in applied settings.
ball possession, which may not be analyzed based only on the low-level feature. They used an
iterative procedure, which is more reliable than the previous naïve procedure, to process the
candidate trajectory. Moreover, they added goalmouth and ellipse detection to improve the ball
size estimation. The experimental results show that the improved algorithm boosts the accuracy of
the ball detection and tracking from about 85% to above 96% for the video segments in which all
frames without the field have been removed in advance.
CHAPTER 3
SYSTEM ANALYSIS
The football analysis system is designed to automate and enhance the accuracy of tracking
players, referees, and the ball during a match. This system integrates multiple technologies to
overcome the challenges posed by manual methods and traditional video analysis.
Current football evaluations rely on manual analysis of match footage, which is laborious,
time-consuming, and prone to error. Analysts often review video footage after a match to track
player movement, ball position, and overall gameplay. This process often requires a lot of
repetition and lacks the accuracy needed to measure performance, such as tracking player
movements or calculating metrics such as speed and displacement.
Existing systems do provide some automated features, such as basic replay and event
tracking systems, but these are often expensive, require complex setups, and may not be accessible
to all football teams, especially at lower levels. Furthermore, they do not focus on providing in-
depth, accurate data like individual player tracking or ball possession analytics, which are essential
for performance evaluation and strategic planning.
Despite advancements in football analysis technology, there are still several limitations that
hinder its full potential. These limitations arise from issues related to technology, cost, accuracy,
and accessibility, all of which affect the system’s overall efficiency and usability.
solutions, these teams are left with basic video footage and little-to-no detailed analysis.
This limits their ability to use data to enhance player development or improve match
strategies, creating a disparity in the level of analysis between top-tier and grassroots teams.
By automating these analytical processes, the system generates detailed data on ball
possession, player performance metrics, and overall team dynamics, surpassing the capabilities of
manual analysis. The key functionalities of the system are outlined as follows:
Functionalities:
Beyond individual player analysis, the system provides real-time insights into team
formations and movement patterns. By monitoring positional and tactical dynamics,
coaches can evaluate how effectively their team is executing strategies and adapting to the
opponent’s formation, offering valuable insights for improving team performance.
This automated system not only simplifies football analysis but also offers actionable
insights into both individual and team performance. By incorporating object detection, team
identification, and motion analysis, it provides a powerful tool for post-match analysis and
strategic planning, enhancing the effectiveness of football analysis beyond traditional approaches.
The proposed system offers numerous advantages that significantly improve football analysis they
are:
By analyzing team formations and movement patterns, the system offers real-time insights
into the effectiveness of strategies employed during the match. Coaches can evaluate how
well their team is adapting to the opposition's tactics and make necessary adjustments to
improve formations and overall gameplay.
6. Data-Driven Decision Making:
The system empowers teams to make more informed, objective decisions by providing a
wealth of data and insights. This supports strategic planning, player performance
assessments, and real-time tactical changes, giving teams a competitive advantage on the
field.
7. Scalability and Versatility:
Whether applied to professional matches or amateur leagues, the system is adaptable to
different levels of competition. Its flexibility also allows it to be tailored to various analysis
objectives, such as player development or team performance enhancement.
CHAPTER 4
SYSTEM DESIGN
Here’s a general outline for a high-level design of a project, such as your football analysis
system:
1. System Overview
• Objective: Analyze football matches using YOLO for object detection and OpenCV for
video processing.
• Components:
o Input Module
o Processing Module
o Output Module
Input Module
Processing Module
Output Module
3. Data Flow
Low-level design (LLD) involves detailing the internal workings of each component
described in the high-level design. It focuses on the implementation aspects, including data
structures, algorithms, and interactions within each module.
Here’s a more detailed look at the low-level design for your football analysis system:
Data Flow
• Object Tracking -> Event Detection: Analyze tracking data to detect events.
• Event Detection -> Annotate Frame: Annotate frames with tracking and event data.
• Annotate Frame -> Save Video Frame: Save annotated frames to output video.
• Annotate Frame -> Display Results: Display the annotated frames (optional).
• Event Detection -> Save Event Data: Save event data to file.
CHAPTER 5
• RoboFlow:
The dataset used for training the system in this project comes from Roboflow, a platform known
for providing high-quality, labeled datasets tailored for computer vision tasks. This dataset consists of
images related to football, which are used to train the YOLO model for detecting players, referees,
and the ball in various scenarios. The annotations provided with the dataset ensure accurate object
detection and tracking during analysis. By leveraging Roboflow's resources, the project benefits from
a well-curated and comprehensive dataset, contributing to the system's performance and accuracy.
The Dataset is downloaded using an API key in YOLOv5 format to train the model.
Website link:
https://universe.roboflow.com/roboflow-jvuqo/football-players-detection-bzlaf/dataset/1
• Kaggle:
Sample input videos titled DFL Bundesliga Shootout were downloaded from Kaggle.
These videos provided real-world football footage, which was used to test and evaluate the
performance of the developed model. The dataset was essential in validating the system's
ability to detect and track key elements such as players, referees, and the ball during actual
gameplay. By working with authentic match footage, the project effectively simulated real-life
conditions, allowing for fine-tuning of object detection and tracking algorithms. This helped
improve the system's accuracy in analyzing player movements, ball possession, and game
dynamics within a football match.
Website link: https://www.kaggle.com/competitions/dfl-bundesliga-data-shootout
The dataset is provided in YOLOv5 format, streamlining the process of training object
detection models. This format is optimized for use with YOLO models, making it easy to directly
integrate the data into the training pipeline. With the pre-annotated images and proper dataset
splitting, this setup ensures efficient and structured training, allowing for improved model
accuracy and faster development cycles.
Data cleaning and preprocessing are essential steps in ensuring that the dataset is suitable for
training machine learning models. In this project, data cleaning involved identifying and removing
any inconsistencies, duplicates, or irrelevant data from the dataset to maintain the quality of the input.
This was necessary to avoid introducing noise into the model during training.
For the Roboflow dataset, this meant reviewing the annotations for accuracy, ensuring that all
instances of the classes (players, referees, goalkeepers, and the ball) were correctly labeled and
removing any mislabeled or incomplete data points. The Kaggle videos were also reviewed to ensure
that only relevant portions of gameplay were used.
Preprocessing included converting the dataset into a format suitable for the YOLO model, such
as resizing and normalizing images to ensure uniformity. The dataset was split into training, validation,
and test sets in an 80-13-7 ratio to ensure balanced training and evaluation phases. The annotations
were converted to the YOLO format, which involved creating bounding boxes around the detected
objects in each image. The videos from Kaggle were preprocessed by extracting frames and feeding
them into the YOLO model for object detection and tracking analysis.
These cleaning and preprocessing steps were crucial for improving the model's performance
and ensuring that it could accurately detect and track objects during a football match.
CHAPTER 6
The first step in EDA involved analyzing the distribution of the different classes: players,
goalkeepers, referees, and the ball. This helped identify any imbalances in the dataset. A balanced
class distribution is essential for ensuring that the model does not overfit or underfit any specific class.
For instance, if one class, such as players, had significantly more instances than the others, the model
might perform better at detecting players but struggle with other classes.
The dataset contained 650 annotated images, split into 80% training, 13% validation, and 7%
testing sets. EDA explored the image dimensions and quality to ensure consistency. Variations in
image sizes can affect model performance, especially in object detection tasks, so any significant
discrepancies in image quality were identified and addressed during preprocessing.
The annotated bounding boxes in the images were inspected for size and placement accuracy.
EDA revealed the average size of bounding boxes for each class, which helped in fine-tuning the
YOLO model. Small objects like the ball might require different settings compared to larger objects
like players or goalkeepers.
EDA also examined how frequently certain objects appeared in the images and how many
objects were present in each image. For instance, most images likely contained multiple players, while
only a few had a referee or the ball. This analysis helped understand the dataset’s complexity and
allowed for adjustments in model training to ensure all objects were accurately detected.
For the videos downloaded from Kaggle (DFL Bundesliga Shootout), frame-by-frame analysis
was performed to check the distribution of events and key moments, such as when the ball was in play
or when significant player actions occurred. This step ensured that the video frames used for training
were representative of typical football matches and contained enough diverse scenarios for the model
to learn from.
CHAPTER 7
METHODOLOGIES
The objective of this project is to develop a system capable of real-time analysis of football
matches by detecting and tracking key elements such as players and the ball using YOLO (You
Only Look Once) object detection, OpenCV for video processing, and Python as the development
framework.
1. Data Collection
• Video Input:
o The system takes in either live football match footage or pre-recorded videos as
input. The video can be sourced from publicly available matches or custom
recordings.
o OpenCV is used to capture video frames, which are further processed for object
detection.
2. Pre-processing
• Frame Extraction:
o Each video is broken down into individual frames using OpenCV. This allows the
model to analyze each frame as an image for object detection.
• Frame Resizing:
o Frames are resized to fit the input dimensions required by the YOLO model (e.g.,
416x416 pixels for YOLOv3), ensuring consistent object detection.
• Normalization:
o The pixel values of the frames are normalized to ensure optimal performance of the
YOLO model, which improves detection accuracy and speed.
o Pre-trained YOLO weights are used to detect players, the ball, and referees.
Bounding boxes are drawn around detected objects in each frame, and class labels
are assigned to the objects.
• Class Customization:
o The YOLO model is customized to focus on specific classes relevant to football
analysis, such as players, the ball, and referees. Additional classes can be added
based on the analysis needs.
2. OpenCV
• Preprocessing:
o Video Enhancement: OpenCV helps in preparing the input video for analysis by
performing tasks such as resizing to ensure consistent frame dimensions,
normalizing pixel values for uniformity, and applying filters to reduce noise. This
step is crucial for improving the performance and accuracy of the YOLOv5 model
by providing cleaner and more standardized input data.
• Post-processing:
o Detection Visualization: After YOLOv5 detects objects (players, ball) in the video
frames, OpenCV is used to draw bounding boxes, labels, and other annotations on
these frames. This visualization aids in interpreting and validating detection results.
o Tracking and Analysis: OpenCV’s tracking algorithms can be employed to follow
the movement of players and the ball across frames. This helps in analyzing
dynamic events, such as player interactions and ball trajectories, by providing
continuous tracking data and insights.
• Pixel Segmentation with K-Means: K-Means clustering to segment pixels based on their
RGB values. Each pixel in the video frame is treated as a data point in a 3D color space
(representing red, green, and blue channels). By grouping similar-colored pixels together,
K-Means effectively segmented the dominant colors in the frame, such as the t-shirt colors
of players.
• Clustering for T-shirt Color Detection: K-Means with a chosen number of clusters (K)
corresponding to the expected number of distinct t-shirt colors on the field. This clustering
allowed me to group pixels of similar colors, identifying which pixels correspond to
players’ t-shirts, enabling clear differentiation between teams based on uniform color.
1. Data Preparation
o The first step in building the model involved preparing the dataset, which consisted
of annotated images of players, referees, the ball, and the goalkeeper from
Roboflow. These images were split into training, validation, and test sets in an
80:13:7 ratio, ensuring that the model had sufficient data for learning and
evaluation.
o Data augmentation techniques were applied to the training images, such as random
rotations, flips, and scaling, to improve model generalization. Augmentation helps
simulate various real-world scenarios and increases the diversity of the training
data, reducing overfitting.
2. Model Selection (YOLO)
o The core of the model was YOLO, a state-of-the-art, real-time object detection
algorithm. YOLO was chosen because of its ability to detect multiple objects
(players, ball, referees) in a single pass over the image, making it highly efficient
for analyzing fast-paced sports like football.
o YOLO works by dividing an image into a grid and predicting bounding boxes for
the objects along with their class probabilities. The model was trained to detect
specific football-related classes, including "player," "referee," "ball," and
"goalkeeper."
3. Transfer Learning
o Transfer learning was implemented to leverage a pre-trained YOLO model,
specifically YOLOv5, which had already learned basic features from a large,
generic dataset like COCO. This approach significantly reduced training time and
improved accuracy by adapting the model to the specific football dataset.
o The pre-trained YOLOv5 model was fine-tuned using the football dataset, allowing
it to specialize in detecting the defined football objects with greater accuracy.
4. Training the Model
o The model was trained using the Roboflow dataset, with the goal of minimizing the
loss function, which measures the difference between the predicted and actual
bounding boxes and class labels.
o During training, key hyperparameters such as learning rate, batch size, number of
epochs, and image size were fine-tuned to optimize model performance.
5. Object Detection
o Once trained, the model was able to accurately detect and label players, referees,
and the ball within video frames. Bounding boxes around each object provided
exact locations of these elements, while class labels (player, ball, referee) ensured
correct identification.
o The YOLO model’s speed and accuracy were crucial for processing football videos,
where fast-moving objects and frequent player interactions demand quick and
precise detection.
7.4 RESULTS
For this football analysis project, the following evaluation metrics were used to determine how
well the model performed in detecting and tracking players, referees, and the ball, as well as in
delivering actionable insights.
1. Precision
o Precision measures the accuracy of the positive predictions made by the model. In
object detection, precision refers to the proportion of correctly detected objects
(e.g., players, referees, or the ball) out of all the objects that the model predicted as
positive.
o Formula:
o A high precision score indicates that the model has fewer false positives, meaning
it accurately detects objects without misclassifying too many non-objects.
2. Recall
o Recall measures the model's ability to correctly identify all relevant objects in the
dataset. In object detection, this means detecting all players, referees, and the ball
present in the images.
o Formula:
oA high recall score indicates that the model is able to detect most, if not all, of the
relevant objects, minimizing missed detections.
3. Mean Average Precision (mAP)
o mAP is a key metric for object detection models. It computes the average precision
at various Intersection over Union (IoU) thresholds, giving a more comprehensive
view of the model’s accuracy in detecting objects across multiple scales.
o IoU measures the overlap between the predicted bounding box and the ground truth
bounding box.
o Formula:
where N is the number of classes and APi is the average precision for class i.
o A higher mAP score indicates better overall performance of the model in detecting
and localizing objects.
4. Confusion Matrix
o A confusion matrix is a table used to evaluate the performance of a classification
model. It provides a visual representation of true positives, false positives, true
negatives, and false negatives.
o For this project, the confusion matrix would illustrate how well the model classified
objects into the four classes: players, referees, goalkeepers, and the ball.
o It helps identify where the model struggles, such as confusing players for referees
or misclassifying background objects as players.
CHAPTER 8
TESTING
Fig.8.1: Output Screenshot with Player speed and distance covered, Ball control of each team,
Players numbers, Team t-shirt colored bounded box and ball acquisition
Fig.8.1: Output Screenshot with no ball acquisition as no player is near the ball
CHAPTER 9
• Tactical Analysis and Formations: The system can be enhanced to automatically recognize
and analyze team formations, transitions between offensive and defensive strategies, and
positional heatmaps. This would provide coaches and analysts with detailed insights into
team dynamics and overall tactics.
• Player Performance Metrics: Additional performance metrics can be extracted, such as
player stamina, speed, and distance covered throughout the match. This data could be
valuable for individual player performance analysis.
• Pose Estimation Models: Integrating pose estimation techniques could allow for the
identification of specific player movements, such as dribbling, kicking, or defensive
postures. Pose estimation models like OpenPose could help in identifying player positions
with higher granularity.
• Action Recognition Models: Future versions of the system could detect and classify player
actions, such as passing, shooting, or tackling, using action recognition models. This would
allow for more detailed player performance metrics and tactical analysis.
Multi-Camera Integration
• Multi-Angle Video Analysis: By integrating footage from multiple camera angles, the
system could provide a 3D analysis of the game. This would improve object detection
accuracy, especially in scenarios where one camera angle might obstruct the view of
players or the ball.
• 3D Player Tracking: Multi-camera systems could enable 3D reconstruction of the football
field, allowing more accurate tracking of players' positions, distances between players, and
ball trajectories.
Cross-Sport Adaptation
• Generalizing to Other Sports: While the system is currently designed for football, the
underlying methodology can be adapted to other sports such as basketball, tennis, or
hockey. This would allow for broader applications and the ability to analyze multiple types
of team sports using a similar framework.
• Automated Referee Decisions: The system could be enhanced to assist referees in real-time
by identifying fouls, offsides, and other infringements more accurately, potentially
reducing human error in officiating decisions. This could be implemented as a tool for
Video Assistant Referee (VAR) systems.
• Instant Replay and Review System: The system could offer an instant replay function,
highlighting critical moments in the match for referees to review in real-time, providing
additional insights into their decision-making process.
CHAPTER 10
BIBLIOGRAPHY
Journal Articles:
1. Morgulev E, Azar OH, Lidor R. Sports analytics and the big-data era. International Journal of
Data Science and Analytics. 2018; 5:213-22.
2. Mackenzie R, Cushion C. Performance analysis in football: A critical review and
implications for future research. Journal of sports sciences. 2013;31(6):639-76.
3. Gong B, Cui Y, Zhang S, Zhou C, Yi Q, Go´mez-Ruano MA´. Impact of technical and
physical key performance indicators on ball possession in the Chinese Super League.
International Journal of Performance Analysis in Sport. 2021;21(6):909-21.
4. Lago-Pen˜as C, Dellal A. Ball possession strategies in elite soccer according to the evolu-
tion of the match-score: the influence of situational variables. Journal of human kinetics.
2010;25(2010):93-100.
5. Bradley PS, Lago-Pen˜as C, Rey E, Sampaio J. The influence of situational variables on ball
possession in the English Premier League. Journal of Sports Sciences. 2014;32(20):1867-73.
6. Dimopoulos GR. Implementing Multi-Class Object Detection in Soccer Matches Through
YOLOv5.
7. Cioppa A, Giancola S, Delie`ge A, Kang L, Zhou X, Cheng Z, et al. SoccerNet-Tracking:
Multiple Object Tracking Dataset and Benchmark in Soccer Videos. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops;
2022. p. 3491-502.
8. Yu X, Xu C, Leong HW, Tian Q, Tang Q, Wan KW. Trajectory-based ball detection and
tracking with applications to semantic analysis of broadcast soccer video. In: Proceedings of
the eleventh ACM international conference on Multimedia; 2003. p. 11-20.
9. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20: 273-97.
10. Castellano J, Casamichana D, Lago C. The use of match statistics that discriminate between
successful and unsuccessful soccer teams. Journal of human kinetics. 2012; 31:139.
Conference Papers:
1. Iyer GN, Bala VS, Sohan B, Dharmesh R, Raman V. Automated third umpire decision
making in cricket using machine learning techniques. In: 2020 4th International Conference
on Intelligent Computing and Control Systems (ICICCS). IEEE; 2020. p. 1216-21.
Online Resources:
CHAPTER 11
APPENDIX
1. Sample Source Code/Pseudo Code
i. main.py:
from utils import read_video, save_video
import cv2
import numpy as np
def main():
# Read Video
video_frames = read_video('input_videos/08fd33_4.mp4')
# Initialize Tracker
tracker = Tracker('models/best.pt')
tracks = tracker.get_object_tracks(video_frames,
read_from_stub=True,
stub_path='stubs/track_stubs.pkl')
tracker.add_position_to_tracks(tracks)
camera_movement_estimator = CameraMovementEstimator(video_frames[0])
camera_movement_per_frame=camera_movement_estimator.get_camera_movement(video_fra
mes,
read_from_stub=True,
stub_path='stubs/camera_movement_stub.pkl')
camera_movement_estimator.add_adjust_positions_to_tracks(tracks,camera_movement_per_f
rame)
# View Trasnformer
view_transformer = ViewTransformer()
view_transformer.add_transformed_position_to_tracks(tracks)
tracks["ball"] = tracker.interpolate_ball_positions(tracks["ball"])
speed_and_distance_estimator = SpeedAndDistance_Estimator()
speed_and_distance_estimator.add_speed_and_distance_to_tracks(tracks)
team_assigner = TeamAssigner()
team_assigner.assign_team_color(video_frames[0],
tracks['players'][0])
team = team_assigner.get_player_team(video_frames[frame_num],
track['bbox'],
player_id)
tracks['players'][frame_num][player_id]['team'] = team
tracks['players'][frame_num][player_id]['team_color'] = team_assigner.team_colors[team]
player_assigner =PlayerBallAssigner()
team_ball_control= []
ball_bbox = tracks['ball'][frame_num][1]['bbox']
if assigned_player != -1:
tracks['players'][frame_num][assigned_player]['has_ball'] = True
team_ball_control.append(tracks['players'][frame_num][assigned_player]['team'])
else:
team_ball_control.append(team_ball_control[-1])
team_ball_control= np.array(team_ball_control)
# Draw output
output_video_frames=
camera_movement_estimator.draw_camera_movement(output_video_frames,camera_movement
_per_frame)
speed_and_distance_estimator.draw_speed_and_distance(output_video_frames,tracks)
# Save video
save_video(output_video_frames, 'output_videos/output_video.avi')
if __name__ == '__main__':
main()
ii. yolo_inference.py:
model = YOLO('models/best.pt')
results = model.predict('input_videos/08fd33_4.mp4',save=True)
print(results[0])
print('=====================================')
print(box)
Utils:
iii. video_utils.py:
import cv2
def read_video(video_path):
cap = cv2.VideoCapture(video_path)
frames = []
while True:
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
return frames
def save_video(ouput_video_frames,output_video_path):
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter(output_video_path, fourcc, 24, (ouput_video_frames[0].shape[1],
ouput_video_frames[0].shape[0]))
for frame in ouput_video_frames:
out.write(frame)
out.release()
iv. bbox_utils.py:
def get_center_of_bbox(bbox):
x1,y1,x2,y2 = bbox
return int((x1+x2)/2),int((y1+y2)/2)
def get_bbox_width(bbox):
return bbox[2]-bbox[0]
def measure_distance(p1,p2):
return ((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)**0.5
def measure_xy_distance(p1,p2):
return p1[0]-p2[0],p1[1]-p2[1]
def get_foot_position(bbox):
x1,y1,x2,y2 = bbox
return int((x1+x2)/2),int(y2)
Training
football_training_yolo_v5.ipynb:
shutil.move('football-players-detection-1/test',
'football-players-detection-1/football-players-detection-1/test'
)
shutil.move('football-players-detection-1/valid',
'football-players-detection-1/football-players-detection-1/valid'
)
!yolo task=detect mode=train model=yolov5x.pt data={dataset.location}/data.yaml epochs=100
imgsz=640
Trackers:
trackers.py:
from ultralytics import YOLO
import supervision as sv
import pickle
import os
import numpy as np
import pandas as pd
import cv2
import sys
sys.path.append('../')
from utils import get_center_of_bbox, get_bbox_width, get_foot_position
class Tracker:
def __init__(self, model_path):
self.model = YOLO(model_path)
self.tracker = sv.ByteTrack()
def add_position_to_tracks(sekf,tracks):
def interpolate_ball_positions(self,ball_positions):
ball_positions = [x.get(1,{}).get('bbox',[]) for x in ball_positions]
df_ball_positions = pd.DataFrame(ball_positions,columns=['x1','y1','x2','y2'])
return ball_positions
return detections
detections = self.detect_frames(frames)
tracks={
"players":[],
"referees":[],
"ball":[]
}
# Track Objects
detection_with_tracks = self.tracker.update_with_detections(detection_supervision)
tracks["players"].append({})
tracks["referees"].append({})
tracks["ball"].append({})
if cls_id == cls_names_inv['player']:
tracks["players"][frame_num][track_id] = {"bbox":bbox}
if cls_id == cls_names_inv['referee']:
tracks["referees"][frame_num][track_id] = {"bbox":bbox}
if cls_id == cls_names_inv['ball']:
tracks["ball"][frame_num][1] = {"bbox":bbox}
pickle.dump(tracks,f)
return tracks
def draw_ellipse(self,frame,bbox,color,track_id=None):
y2 = int(bbox[3])
x_center, _ = get_center_of_bbox(bbox)
width = get_bbox_width(bbox)
cv2.ellipse(
frame,
center=(x_center,y2),
axes=(int(width), int(0.35*width)),
angle=0.0,
startAngle=-45,
endAngle=235,
color = color,
thickness=2,
lineType=cv2.LINE_4
)
rectangle_width = 40
rectangle_height=20
x1_rect = x_center - rectangle_width//2
x2_rect = x_center + rectangle_width//2
y1_rect = (y2- rectangle_height//2) +15
y2_rect = (y2+ rectangle_height//2) +15
x1_text = x1_rect+12
if track_id > 99:
x1_text -=10
cv2.putText(
frame,
f"{track_id}",
(int(x1_text),int(y1_rect+15)),
cv2.FONT_HERSHEY_SIMPLEX,
0.6,
(0,0,0),
2
)
return frame
def draw_traingle(self,frame,bbox,color):
y= int(bbox[1])
x,_ = get_center_of_bbox(bbox)
triangle_points = np.array([
[x,y],
[x-10,y-20],
[x+10,y-20],
])
cv2.drawContours(frame, [triangle_points],0,color, cv2.FILLED)
cv2.drawContours(frame, [triangle_points],0,(0,0,0), 2)
return frame
def draw_team_ball_control(self,frame,frame_num,team_ball_control):
# Draw a semi-transparent rectaggle
overlay = frame.copy()
cv2.rectangle(overlay, (1350, 850), (1900,970), (255,255,255), -1 )
alpha = 0.4
cv2.addWeighted(overlay, alpha, frame, 1 - alpha, 0, frame)
team_ball_control_till_frame = team_ball_control[:frame_num+1]
# Get the number of time each team had ball control
team_1_num_frames =
team_ball_control_till_frame[team_ball_control_till_frame==1].shape[0]
team_2_num_frames =
team_ball_control_till_frame[team_ball_control_till_frame==2].shape[0]
team_1 = team_1_num_frames/(team_1_num_frames+team_2_num_frames)
team_2 = team_2_num_frames/(team_1_num_frames+team_2_num_frames)
return frame
player_dict = tracks["players"][frame_num]
ball_dict = tracks["ball"][frame_num]
referee_dict = tracks["referees"][frame_num]
# Draw Players
for track_id, player in player_dict.items():
color = player.get("team_color",(0,0,255))
frame = self.draw_ellipse(frame, player["bbox"],color, track_id)
if player.get('has_ball',False):
frame = self.draw_traingle(frame, player["bbox"],(0,0,255))
# Draw Referee
for _, referee in referee_dict.items():
frame = self.draw_ellipse(frame, referee["bbox"],(0,255,255))
# Draw ball
for track_id, ball in ball_dict.items():
frame = self.draw_traingle(frame, ball["bbox"],(0,255,0))
output_video_frames.append(frame)
return output_video_frames
Team Assigner:
team_assigner.py:
from sklearn.cluster import KMeans
class TeamAssigner:
def __init__(self):
self.team_colors = {}
self.player_team_dict = {}
def get_clustering_model(self,image):
# Reshape the image to 2D array
image_2d = image.reshape(-1,3)
return kmeans
def get_player_color(self,frame,bbox):
image = frame[int(bbox[1]):int(bbox[3]),int(bbox[0]):int(bbox[2])]
top_half_image = image[0:int(image.shape[0]/2),:]
player_color = kmeans.cluster_centers_[player_cluster]
return player_color
player_colors = []
for _, player_detection in player_detections.items():
bbox = player_detection["bbox"]
player_color = self.get_player_color(frame,bbox)
player_colors.append(player_color)
self.kmeans = kmeans
self.team_colors[1] = kmeans.cluster_centers_[0]
self.team_colors[2] = kmeans.cluster_centers_[1]
def get_player_team(self,frame,player_bbox,player_id):
if player_id in self.player_team_dict:
return self.player_team_dict[player_id]
player_color = self.get_player_color(frame,player_bbox)
team_id = self.kmeans.predict(player_color.reshape(1,-1))[0]
team_id+=1
if player_id ==91:
team_id=1
self.player_team_dict[player_id] = team_id
return team_id
speed_and_distance.py:
import cv2
import sys
sys.path.append('../')
from utils import measure_distance ,get_foot_position
class SpeedAndDistance_Estimator():
def __init__(self):
self.frame_window=5
self.frame_rate=24
def add_speed_and_distance_to_tracks(self,tracks):
total_distance= {}
start_position = object_tracks[frame_num][track_id]['position_transformed']
end_position = object_tracks[last_frame][track_id]['position_transformed']
distance_covered = measure_distance(start_position,end_position)
time_elapsed = (last_frame-frame_num)/self.frame_rate
speed_meteres_per_second = distance_covered/time_elapsed
speed_km_per_hour = speed_meteres_per_second*3.6
total_distance[object][track_id] += distance_covered
def draw_speed_and_distance(self,frames,tracks):
output_frames = []
for frame_num, frame in enumerate(frames):
for object, object_tracks in tracks.items():
if object == "ball" or object == "referees":
continue
for _, track_info in object_tracks[frame_num].items():
if "speed" in track_info:
speed = track_info.get('speed',None)
distance = track_info.get('distance',None)
if speed is None or distance is None:
continue
bbox = track_info['bbox']
position = get_foot_position(bbox)
position = list(position)
position[1]+=40
position = tuple(map(int,position))
cv2.putText(frame, f"{speed:.2f}
km/h",position,cv2.FONT_HERSHEY_SIMPLEX,0.5,(0,0,0),2)
cv2.putText(frame, f"{distance:.2f}
m",(position[0],position[1]+20),cv2.FONT_HERSHEY_SIMPLEX,0.5,(0,0,0),2)
output_frames.append(frame)
return output_frames
player_ball_assigner.py:
import sys
sys.path.append('../')
class PlayerBallAssigner():
def __init__(self):
self.max_player_ball_distance = 70
def assign_ball_to_player(self,players,ball_bbox):
ball_position = get_center_of_bbox(ball_bbox)
miniumum_distance = 99999
assigned_player=-1
player_bbox = player['bbox']
distance_left = measure_distance((player_bbox[0],player_bbox[-1]),ball_position)
distance_right = measure_distance((player_bbox[2],player_bbox[-1]),ball_position)
distance = min(distance_left,distance_right)
miniumum_distance = distance
assigned_player = player_id
return assigned_player
view_transformer.py:
import numpy as np
import cv2
class ViewTransformer():
def __init__(self):
court_width = 68
court_length = 23.32
[265, 275],
[910, 260],
[1640, 915]])
self.target_vertices = np.array([
[0,court_width],
[0, 0],
[court_length, 0],
[court_length, court_width]
])
self.pixel_vertices = self.pixel_vertices.astype(np.float32)
self.target_vertices = self.target_vertices.astype(np.float32)
self.persepctive_trasnformer = cv2.getPerspectiveTransform(self.pixel_vertices,
self.target_vertices)
def transform_point(self,point):
p = (int(point[0]),int(point[1]))
if not is_inside:
return None
reshaped_point = point.reshape(-1,1,2).astype(np.float32)
tranform_point = cv2.perspectiveTransform(reshaped_point,self.persepctive_trasnformer)
return tranform_point.reshape(-1,2)
def add_transformed_position_to_tracks(self,tracks):
position = track_info['position_adjusted']
position = np.array(position)
position_trasnformed = self.transform_point(position)
position_trasnformed = position_trasnformed.squeeze().tolist()
tracks[object][frame_num][track_id]['position_transformed'] = position_trasnformed