Data Mining Methods For Performance Analysis in Youth Football - A Review of Literature
Data Mining Methods For Performance Analysis in Youth Football - A Review of Literature
net/publication/350063943
CITATIONS READS
0 259
3 authors, including:
Khachik Tadevosyan
Friedrich-Alexander-University of Erlangen-Nürnberg
1 PUBLICATION 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Khachik Tadevosyan on 15 March 2021.
Ardian Palloshi
Title 1
Abstract 3
Introduction 4
Methodology 5
Results 7
Discussion 23
References 27
2 | Page
Abstract:
The data mining and machine learning methods are gaining popularity from the theoretical
background to its practical application in various domains. In this study, we conducted a
literature review of data mining and machine learning methods, that are being implemented
to analyse and improve the performance of youth football (soccer) players. We have
followed the PRISMA process to perform the literature review. We have searched various
databases (e.g., research gate, google scholar, etc.) and have identified total 58 articles
relevant to the topic that have been published between the years 2000 and 2020. However,
based on the eligibility criteria and after further screening, we have finally considered and
included 12 articles to perform the literature review. The topics related to youth football
player performance that have been identified are Injury risk assessment, small sided game
players, prediction of player skill and future success, talent identification and selection and
passing network dynamics. The methods that have studied in these articles are Shapley
additive explanations (SHAP), Artificial Neural Networks, Decision Trees, Regression
(Logistic/Linear), Association rules, Network centrality, unsupervised learning, one-time
analysis of variance (ANOVA) and other statistical methods. It is observed that Artificial
neural networks have relatively high performance and speed of classification when applied
to complex and high dimensional problems but it has a very low explainable capability. On
the other hand, SHAP method has relatively high explainable and interpretation capability.
Some of the studies have tools used to analyze the youth football player performance are
XG Boost, IBM SPSS Statistics, Programming language R platform etc. However, some of
the studies have not explicitly stated about the usage the tools. We have structured the
review into four broad sections and they are Introduction, Methods, Results and Discussion.
Under methods section, we have mentioned about the eligibility criteria, information sources,
search technique and study selection process.
3 | Page
Introduction:
Association football, more commonly known as football or soccer is the most famous sports
in the world with around 260 million players in over 200 countries. (Junge & Dvorak, 2004).
Participation in sports gives youths the opportunity to engage in physical activity, which, in
turn, leads to a number of positive health effects (Warburton, Nicol, & Bredin, 2006). Football
attracts the interest of many researchers, such as investigating factors affecting the sport
performance of the football players, the motives for football participation, the motives of fans
to travel in big football events, football injuries etc.
The aim of this literature review is to identify different data mining methods that are used for
performance analysis in youth football. We all know that data mining methods are already
essential sources of development and growth in almost any industry existing around the
world. Football and sports in general are not an exception in this sense. According to some
reports, already today many of the most successful coaches in Europe trust data more than
their instincts, beliefs and feedback from players and staff. Famous German coach Julian
Nagelsmann is reportedly so data-driven in his decision-making process that sometimes
players have problems with understanding the ground basis of his selections, decisions or
some approaches. Data Mining can be widely applied in sports because the majority of
components that create long-term success might be significantly improved if we have
game-changing knowledge derived through data analysis. As it was already explained in the
introduction, aside from football as a general phenomenon, we concentrate on methods and
strategies of data mining in youth football in particular which take even more dramatic
4 | Page
importance. The historical examples clearly prove that young players are more sensitive to
different factors forming the overall performance on the football pitch. So, experience,
football knowledge and logical development patterns based on performance dynamics are
even more unpredictable in youth football.
Methodology:
In this section, we discuss the purpose, eligibility criteria, information sources, search
technique and study selection process.
Purpose:
5 | Page
Eligibility criteria:
We have considered the following database sources Academia, IEEE, Research Gate,
MDPI, Semantic scholar, Google scholar, Springer link, PLOS, JSSM and have considered
studies between the years 2000 and year 2020 (till date) as there are not many studies
relevant to the topic before the year 2000.
Search strategy:
We have systematically searched for potentially relevant articles. We created synonym lists
on the following three aspects Youth soccer players, machine learning & data mining. We
used the following combination of keywords: (Machine learning OR Data mining OR Youth
Soccer players OR Performance analysis) AND [(Youth & Soccer) OR (Soccer AND
Performance Analysis)] AND (soccer OR football). This search strategy was adapted for
Google Scholar. We combined the results of the research and we removed the duplicates.
The full search strategy is available on the Prisma Flow Diagram.
Search criteria:
Based on the above-mentioned search terms used by all the team members, we have
collected all the studies. Then, through initial screening, we have removed duplicates
(studies common in our search result). Consequently, we have screened all the unique
studies and have excluded the studies based on the content and depth of the study.
Likewise, we have further checked the screened studies and excluded studies not relevant
to youth football or not relevant to data mining/machine learning. We excluded articles if the
study was not available as full text or was not published in English. We have tracked
references in the full-text articles to ensure that no relevant articles were missed Finally, we
have categorized them into qualitative and quantitative study.
6 | Page
Table 1. Search terms
Search terms
AI AND youth football
algorithm AND youth football
association AND youth football
big data AND youth football
Black box AND youth football
classification AND youth football
data mining AND youth football
deviation AND youth football
evaluation AND youth football
injury AND youth football
interpretation AND youth football
machine learning AND youth football
methods AND youth football
models AND youth football
optimization AND youth football
patterns AND youth football
performance AND youth football
prediction AND youth football
probability AND youth football
regression AND youth football
simulation AND youth football
White box AND youth football
Results:
The literature search of the nine databases resulted in 58 hits. After duplicates had been
removed, 36 articles were screened on the basis of title and abstract. This led to 20 articles
being assessed for eligibility. Reference lists in the articles were searched further for
7 | Page
additional relevant publications, but this reference tracking did not result in further inclusions.
Studies were predominantly excluded because either they were not including youth players,
or they were not a performance analysis. Three and Nine out of twelve studies (displayed in
table 2) are included under qualitative and quantitative studies respectively.
At least one of included studies are sourced from each of the nine databases. Further, two
studies each are sourced from the following databases researchgate and google scholar and
the result of the database is shown below in table 2.
Database Result
Academia 1
IEEE 1
Research Gate 2
MDPI 1
Semantic scholar 1
Google scholar 2
Springer 1
PLOS 1
JSSM 2
Total 12
PRISMA flow diagram depicts the flow of information through the different phases of a
systematic review and maps out the number of records identified, included and excluded.
8 | Page
The result of studies searched in the nine databases is shown in the below PRISMA flow
diagram (figure 1).
9 | Page
Figure 1. Flow Diagram of the methodology for article search and selection (based on-
PRISMA)
10 | Page
Data mining methods:
Eight out of twelve included studies use have no overlap and have used a unique data
mining/machine learning method. However, four of the twelve included studies have used a
single method twice and these machine learning methods are Artificial neural networks and
unsupervised learning. The result of the data mining and machine learning methods used in
the twelve included studies are depicted in the below table
11 | Page
Table 4. Summary (methods and results) of previous research/studies
A Machine 202 To assess injury Preseason SHAP (SHapley Additive During the season, XGBoost
Learning 0 risk in elite-level anthropometric exPlanations):is a method to half of the players (n
Approach to youth football measurements explain individual predictions. = 368) sustained at
Assess Injury players based on (height, weight, This approach visualises least one injury. Of
Risk in Elite anthropometric, and sitting height) every single player or injury the first
Youth Football motor were taken and case and gives an overview occurring injuries,
Players coordination and test batteries to of the variables in the model 173 were identified
physical assess motor by order of importance as overuse and 195
performance coordination and (vertically listed features), as acute injuries.
measures with a physical fitness with the top ones having a The machine
machine learning (strength, higher global impact on the learning algorithm
approach flexibility, speed, model than bottom ones. was able to identify
agility, and the injured players
endurance) were in the hold-out test
performed sample with 85%
precision.
Effects of 201 The aim of this legal tutors The entropy S of the spatial Descriptive statistics Global
manipulations 5 study was to extend provided written distribution: (mean positioning
of player knowledge on the informed consent S= − ∑i=1. N p(i) log p(i) ± standard tracking
numbers vs. functional utility of authorizing their the entropy of a random deviation) of these device
Field SSCG in participation in this variable is the average level quantities are (Qstarz,
dimensions understanding how study after being of "information", "surprise", or summarized in Model:
on specific informed of the "uncertainty" inherent in the Table 3. Differences BT-Q1000e
inter-individual manipulations of benefits and risks variable's possible outcomes. can be X)
coordination field dimensions of the experiment. Longitudinal and latitudinal observed for all
during and player numbers (spherical) coordinates were variables and
small-sided constrained youth converted to Euclidean relative spaces per
games in youth football players’ (planar) coordinates using the player treatments
football performance Haversine formula (118, 133 and 152
behaviours m2
Intelligent 201 This study aims to All study protocols, Artificial neural network: The results XLSTAT
Prediction of 6 predict the potential procedure, Determination of SRPI with projected by RMSE 2016,
Soccer pattern of soccer material and fewer parameters such as R2 for training and JMP10
Technical Skill technical skill on instrument of the and root mean square error validation are software
on Youth Malaysia youth research were (RMSE) values. 0.017and 0.190
Soccer soccer players approved by the It is involving three main respectively
Player’s relative university Human statistical analysis which is
Relative performance using Research Ethics Principal component analysis
Performance multivariate Committee (PCA), Hierarchical
Using analysis and Agglomerative cluster
Multivariate artificial neural analysis (HACA) and Artificial
Analysis and network techniques Neural Network (ANN) to
Artificial Neural facilitate the
Network analysis of the model.
Techniques
12 | Page
Yea SNA Metrics/Concepts
Title Purpose R. Instrument Results Tools
r discussed
Talent selection 201 The aim of the The area under Binary logistic regression descriptive binary
in youth 9 current study is a the curve (AUC) (BLR): characteristics of the logistic
football: comparison of the from the To calculate the likelihood of measured variables regression
Specific rather prognostic validity receiver operating each individual being for professional and (BLR) in
than general of two frequently characteristic was categorized as a professional nonprofessional programmin
motor used areas within used to compare or non-professional player. players. According g language
performance talent selection in the prognostic to the results of the R
predicts future youth football: validity of both BLR analysis
player status of physiologically motor (see Table 3), only
football talents driven general performance three models were
motor performance areas significant.
(GMP) capacities
(40m sprint, agility,
counter movement
jump, Yo-Yo
intermittent
recovery test)
The relative 200 The aim of this samples of Linear regression: The results showed Kolmogorov-
age effect in 5 study was to international youth regression analyses were an Smirnov
youth soccer determine the selections were used to examine the over-representation tests and
across Europe extent examined relationship between the of players born in Linear
of the relative age number of players per age the first quarter of regression
effect in male and category the selection year or analysis
female national for each sample and the all the national youth
youth selections corresponding month of birth selections at the
across ten different (starting with month 1 and under-15 (U-15),
European ending U-16, U-17 and U-18
leagues. It was with month 12). age categories, as
predicted that the well as for the UEFA
relative age effect U-16 tournaments
would have a clear and Meridian Cup.
impact on selection Players with a
procedures in greater relative age
favour of those are more likely to be
born early in the identified
selection year. as ‘‘talented’’
Exploring Team 201 study aims to The participants Network mining centrality Results suggested SPSS
Passing 7 explore how included 44 male scores (Betweenness and that lower team Statistics is
Networks and passing networks elite players from Closeness): passing dependency a software
Player and positioning under-15 and team passing dependency for for a given package
Movement variables can be under-17 a given player is expressed player (expressed by used for
Dynamics in linked to the match age groups by lower betweenness lower betweenness interactive,
Youth outcome in youth network centrality scores and network centrality or batched,
Association elite association high intra-team scores) and high statistical
Football football. well-connected passing intra-team analysis.
relations are expressed by well-connected
higher closeness network passing relations Cytoscape
centrality scores. (expressed by R software
higher closeness to develop
network centrality passing
scores) networks
were related to
better outcomes.
13 | Page
Yea SNA Metrics/Concepts
Title Purpose R. Instrument Results Tools
r discussed
Risk factors for 201 To prospectively Hazard ratios One-way analysis of variance 96 acute knee IBM
acute knee 6 evaluate risk factors (HRs) and 95% (ANOVA): injuries were SPSS
injury in female for acute time-loss confidence ANOVA concept for continuous recorded, 21 of them Statistics
youth football knee injury, in intervals (CIs) were variables and the Chi square ACL injuries. Multiple for
particular Anterior calculated from test for categorical variables. Cox regression Windows
cruciate ligament individual variable showed a 4-fold OS
injury ACL injury in and multiple Cox higher ACL injury
female youth regression rate for players with
football players. analyses familial disposition of
ACL injury (HR 3.57;
95% CI 1.48-8.62)
Automatic 201 To try to tackle the The authors unsupervised learning, Six professional N/A
extraction of 9 positional data developed an association football games were
performance accessibility Unsupervised rule mining, subgroup analysed with the
metrics from problem. Typically, Football Analytics discovery and a proposed tool. According to
football players in high performance Tool – UnFOOT. approach some metrics
with data sports, the UnFOOT combines obtained (Table 1),
mining positional data data mining the best players of
comes with event techniques and the match are usually
data, which can basic statistics to found on the tool's
support more measure the top three players of
detailed analysis of performance of the winning team. In
the performance of players and teams two cases, they even
player. However, only from had the best score
labelled football positional data. overall. Even though
(soccer) data is The capabilities of the overall score was
hard to acquire and the tool involve not originally
it pre-processing the designed to predict
usually needs match data, the best player of the
humans to annotate extraction of match, the authors
the match events. features, used it to validate the
This process makes visualization of scoring function.
it more expensive to player and team
be obtained by performance
smaller clubs.
A Prediction of 201 Through machine . A proof of Artificial Neural Networks, The project was N/A
Youth Football 8 learning classifiers concept was Decision Tree Classifier and unable to build a
Players’ Future and statistics, the produced using a Random Forests sufficient classifier
Sporting historic data was relabelled version with the historic
Success Using analysed to identify of the historic dataset as too much
Neural whether future dataset. The proof class imbalance
Networks and success can be of concept occurred. It was
Machine predicted in youth confirmed that the initially thought that
Learning academy football techniques formed the machine learning
players. in this project can classifiers were
be utilised as tools producing relatively
for future high classification
prediction. A accuracy; however,
statistically following analysis of
significant positive the confusion
relationship was matrices of each, it
identified between transpired that the
the length of time classifiers were
an athlete spent predicting all athletes
within the academy as failures.
and their likelihood
of future success.
Yea SNA Metrics/Concepts
Title Purpose R. Instrument Results
r discussed
14 | Page
Finding 201 Apply a new Recorded by video Unsupervised machine Approach based on
Efficient 9 analysis approach, and a position learning tracking data and
Strategies in so tracking system routines and artificial data mining methods
3-versus-2 far only used for intelligence methods developed by the
small games of 1-versus-1 analyses authors for the
youth soccer in soccer, to assessment of
players small-sided games tactical behaviour in
and to identify and soccer with the aim
analyse there with to analyse
the efficiency of therewith efficiency
tactical patterns in of tactical strategies
3-versus-2 game in a specific
play type of SSG. Two
types of pattern
recognitions with
different parameters
within each were
analysed to measure
the converted ration
of goals from shots
taken from different
angles using
calculations
Current 201 $ An overview of Tactical Clustering Algorithms, Different approaches
Approaches to 6 the current state of Performance Euclidian metrics, from
Tactical development of the Analysis in Soccer Decision trees, Mean average the perspective of
Performance analysis of position by Measuring dynamic systems
Analyses in data in soccer Inter-Player and neural networks
Soccer Using $ Different Coordination were presented
Position Data promising (variables, such as Tactical performance
approaches from the team centroid, analysis revealed
the perspective of stretch index or inter-player
dynamic systems surface area) coordination,
and neural networks inter-team and
inter-line coordination
before critical events,
as well as team-team
interaction and
compactness
coefficients.
A Prediction of 201 $ The current Dissertation Random Forest, The project
Youth 8 process of talent Multilayer Perceptron, Naïve attempted to
Football identification lacks a Bayes, Machine Learning, determine whether
Players’ Future systematic and Decision Trees, Artificial success can be
Sporting scientific Neural Networks predicted from
Success Using approach identify performance test
Neural whether future results of FVFA
Networks and success can be players and provide
Machine predicted in youth insight into
Learning academy football the most indicative
players performance tests. At
the outset, the
project required a
study of the Python
programming
language and its
Pandas, NumPy and
Scikit-learn libraries
to be able to
efficiently implement
the necessary
machine learning
algorithms.
15 | Page
Our findings are related and limited to young soccer players only. There has been a marked
increase in data collection within the area of youth soccer players performance analysis in
recent years, an area of which has been the collection of performance testing data of
individual athletes by sports scientist and coaches with the aim of identifying weaknesses
and amplifying strengths. Performance testing data has also cached the interest of talent
identifiers as a way of identifying future talent. The current process of talent identification
lacks a systematic and scientific approach. A review of different studies of historic
performance testing data will provide some insight and help form a more methodical
approach that can be utilized by talent identifiers and coaches. Through machine learning
classifiers and statistics, the studies were analyzed to identify whether future success can be
predicted in youth academy football players.
Random Forest, Multilayer Perceptron, Naïve Bayes, Machine Learning, Decision Trees,
Artificial Neural Networks, Decision trees, Mean Average, Unsupervised machine learning,
routines and artificial intelligence methods were all used in the studies we have reviewed. A
statistically significant positive relationship was identified between the different studies. The
study helped highlight the different metrics that have been used to analyze the performance
of youth soccer players.
Small-sided and conditioned games are commonly considered as modified games played on
reduced pitch dimensions (small-sided), often using adapted rules and involving a smaller
number of players than traditional games (Gabbett et al., 2009; Vilar et al., 2014). In the
study (Pedro, S., et al., 2015), The aim of this study was to extend knowledge on the
functional utility of SSCG in understanding how specific manipulations of field dimensions
and player numbers constrained youth football players’ performance behaviors. The relative
space per player formulated in small-sided and conditioned games can be manipulated
either by promoting variations in player numbers or by modifying field dimensions. For
visualization purposes only, the heat maps were spatially filtered with a Gaussian kernel with
a standard deviation of 1 (bin). Considering a performance area partition with N bins and
setting pi as the measured probability of finding the player in bin i, the entropy S of the
spatial distribution is
𝑆 = − ∑ 𝑝𝑖 log 𝑝𝑖 𝑁 𝑖=1
Normalized entropy was used to place the results within the range between 0 and 1, allowing
for comparisons between different field dimensions.
𝑆% = 1 log𝑁 ∑ 𝑝𝑖 log 𝑝i
16 | Page
Figure 2 (of study Pedro, S., et al., 2015) shows standardized mean differences between
manipulations of player numbers and field dimensions for the effective relative space per
player, radius of free movement and spatial distribution variability. Descriptive statistics
(mean ± standard deviation) of these quantities are summarized in Table 3 (of study Pedro,
S., et al., 2015). Differences can be observed for all variables and relative spaces per player
treatments (118, 133 and 152 m2). Results showed that manipulations of player numbers
elicited more free space in the vicinity of each player (ibid.).
The study (Abdullah MR., et al., 2016) aims to predict the potential pattern of soccer
technical skill on Malaysia youth soccer players relative performance using multivariate
analysis and artificial neural network techniques. 184 male youth soccer players were
recruited in Malaysia soccer academy (average age = 15.2±2.0) underwent to, physical
fitness test, anthropometric, maturity, motivation and the level of skill related soccer.
Unsupervised pattern recognition of principal component analysis (PCA) was used to identify
the most significant parameters in soccer for the current study and intelligent prediction of
artificial neural network (ANN) was developed to determine its predictive ability for the
soccer relative performance index (SRPI). the prediction method techniques for the present
study show very high and strong ability in prediction of the player’s performance. It has
highlighted the possibility of defining the optimum number of parameters for the player's
relative performance evaluation, which in turn will reduce the costs, energy and time of the
measurement (ibid). From the inclusion and exclusion of selection of the sample, overall,
184 youth soccer players (mean age = 15.2 ± 1.6 years) were enrolled to participate in this
study drawn from eight Malaysian state youth soccer academy. In this study, it involves three
main statistical analysis which is Principal component analysis (PCA), Hierarchical
Agglomerative cluster analysis (HACA) and Artificial Neural Network (ANN) to facilitate the
analysis of the model. the back propagation neural network (BPNN) model was applied
based on the recommendation of previous research. The performance of the ANN is
determined by the correlation of determination (R2), the root mean square error (RMSE) and
the misclassification rate (MR). The application of PCA and HACA was performed using
XLSTAT 2016, meanwhile an ANN was performed using JMP10 software respectively. Table
1 of the study (Abdullah MR., et al., 2016) exhibits the descriptive statistics of player’s
characteristics as projected as mean and standard deviation (SD) values for all variables.
From the PCA result, out of the twenty-six principal components (PCs) generated, only eight
PCs with eigenvalues > 1 was selected for the feed-forward ANN input selection parameters
representing 71.68% of the total variance. Nevertheless, Table I also 670 highlighted the
factor loading after varimax rotation method in the PCA (ibid.).
17 | Page
The study (Sieghartsleitner, R., 2019) aims to study is a comparison of the prognostic validity
of two frequently used areas within talent selection in youth football: physiologically driven
general motor performance (GMP) capacities (40m sprint, agility, counter movement jump,
Yo-Yo intermittent recovery test) and domain-specific motor performance (SMP) capacities
(i.e., technical skills; dribbling, passing, juggling, shooting). Table 7 presents the BLR
regression coefficients for this model at the U17 age group. The non-significant results for
the overall model show that no single variable has a significant impact, with the Yo-Yo
intermittent recovery test showing the highest OR of 1.52 [0.80; 2.89] (p = .20) for
z-standardized data, whilst the 40m sprint (p = .28), agility (p = .64), and counter movement
jump (p = .84) showed less important ORs. Finally, Table 8 presents Pearson correlations
between the percentage of predicted adult height and motor performance to examine the
influence of biological maturation (ibid.)
The aim of the study (Werner F., et al., 2005) was to determine the extent of the relative age
effect in male and female national youth selections across ten different European leagues. It
was predicted that the relative age effect would have a clear impact on selection procedures
in favor of those born early in the selection year. Players who were older and potentially
more physically developed were expected to be overrepresented in each of the age
categories and playing samples examined. The relative age effect may offer other
advantages to those who are born early in the selection year compared with those born later
in the year. This relative lack of experience is another disadvantage for those born far from
the cutoff date (see Ward & Williams, 2003; Ward, Hodges, Williams, Starkes, 2004). Table 1
(of the study Werner F., et al., 2005) highlights the samples of international youth selections
that were examined. The birth-date distributions for the U-15, U-16, U17 and U-18 national
selections for ten European countries, together with the results of the Kolmogorov-Smirnov
tests, are presented in Table II (of the study Werner F., et al., 2005). Significant effects were
obtained using Kolmogorov-Smirnov tests for the U-16, U-18 and the Meridian Cup teams.
Subsequent regression analyses showed a clear relationship between month of birth and
number of participants for the U-16 (r = 70.90, P 5 0.0001), U-18 (r = 70.84, P = 0.0007) and
the Meridian Cup (r = 70.81, P = 0.0016) teams (ibid.).
The study (Goncalves B., et al., 2017) aims to explore how passing networks and positioning
variables can be linked to the match outcome in youth elite association football. The
participants included 44 male elite players from under-15 and under-17 age groups. A
passing network approach within positioning-derived variables was computed to identify the
contributions of individual players for the overall team behavior outcome during a simulated
match. Results suggested that lower team passing dependency for a given player
(expressed by lower betweenness network centrality scores) and high intra-team
18 | Page
well-connected passing relations (expressed by higher closeness network centrality scores)
were related to better outcomes. Overall, this study emphasizes the potential of coupling
notational analyses with spatial-temporal relations to produce a more functional and holistic
understanding of teams’ sports performance. Also, the social network analysis allowed to
reveal novel key determinants of collective performance. The participants were 44 male elite
young Portuguese association football players from under-15 and under-17 age groups
(U15: n = 22, age 13.9±0.3 years, height 1.69±0.07 m, weight 59.1±5.4 kg and playing
experience 5.3±1.6 years; U17: n = 22, age 15.7±0.5 years, height 1.75±0.05 m, weight
66.4±3.5 kg and playing experience 7.4±1.3 years). All networks representations and
centrality-based measures were calculated using Cytoscape1 v3.1.1 [35, 36] with
CentiScaPe2.1 plugin [37, 38]. The software allowed to develop the teams’ passing
networks. Teams were classified into lower and higher performance using the number of
shots and teams’ efficacy. For the U15 game, the observed differences showed unclear
tendencies in the number of shots, closeness centrality and betweenness centrality (team A,
efficacy = 7.1%; Team B, efficacy = 10.1%. Figs 2 and 3 present the overall representation
from passing network and positioning relations established for each team. , in U17, the
higher performance team presented higher centrality values distributed among the DCM,
LCM and the right central midfielder (ibid.).
The study (Hägglund M., et al., 2016) evaluates risk factors for acute time-loss knee injury, in
particular anterior cruciate ligament (ACL) injury in female youth football players. Risk factors
were studied in 4,556 players aged 12-17 years from a randomized controlled trial during the
2009 season. 96 acute knee injuries were recorded, 21 of them ACL injuries. Multiple Cox
regression showed a 4-fold higher ACL injury rate for players with familial disposition of ACL
injury (HR 3.57; 95% CI 1.48-8.62). Significant predictor variables for acute knee injury were
age >14 years (HR 1.97; 95% CI 1.30-2.97), knee complaints at the start of the season (HR
1.98; 95% CI 1.30-3.02), and familial disposition of ACL injury (HR 1.96; 95% CI 1.22- 22
3.16). No differences in injury rates were seen when playing on artificial turf compared with
natural grass. Female youth football players with a familial disposition of ACL injury had an
increased risk of ACL injury and acute knee injury. Table 5 of the study (Hägglund M., et al.,
2016) displays the evaluation of the model and shows sensitivity and specificity of predictive
values of variables to predict ACL injury. Older players and those with knee complaints at
preseason were more at risk for acute knee injury. Although the predictive values were low,
these factors could be used in athlete screening to target preventive interventions (ibid.).
The study (Coutinho JC., et al., 2019) provides an alternative view on the performance
enhancement factors using data mining methods. The study conducted by Josá Carlos
Coutinho (Leiden University), Marck de Greeff (Windesheim University of Applied Sciences),
19 | Page
Nicolette Schipper-van Veldhoven (Windesheim University of Applied Sciences) and Cláudio
Rebelo de Sá (University of Twente) concentrates on technologies of automatic extraction of
performance metrics of football players using data mining. Generally, when we speak about
factors drastically affecting the performances, we usually concentrate only about athletic
components, such as strength, skillset, resistance to injuries, body weight, etc. But we also
surely cannot underestimate the importance of psychological stability and mental health is
sports. Athletes are constantly, in 24/7 mode, under the pressure of performing well, which
defines the essence of sports in general. Thus, this aspect plays a crucial role, too. We
usually underestimate the importance of this factor too, especially when we speak about kids
and representatives of youth age categories who are even more vulnerable to psychological
sustainability. According to the study, recent studies have proven that there is an evidence of
positive correlation between the existing auto sphere, level of comfort and internal
relationships between team/staff members and the overall sporting success. Organized
youth sport is seen as an important socializing context for children and adolescents. Sports,
which are also known as the third pedagogical environment (next to home and school),
contribute to a range of positive outcomes: self-esteem, social behavior and integration.
There is recent evidence that there is a positive relationship between ‘club culture and
atmosphere’ and ‘organizational performance’ (Schoot, 2016). This means that when a sport
club focuses more on the elements of club culture, financial performance, member
satisfaction and social sport performance can increase.
Despite the fact that this afterwards does not only discuss data mining in youth football, but
since youth football is specifically affected by psychological factors, we decided to discuss it
too, but by “adjusting under the angle that we are interested in”. Aside from those findings,
the study observes the area of positional data, which obviously provides enormous insights
about the performance of the athlete from many different aspects, such as tactical or
physical condition based. There is no secret that this data is hard to derive and accumulate
with high accuracy (that is the reason why usually human annotation is used instead during
the games). Why is the problem even more dramatic from youth football organizations and
entities? Because they are not commercially backed in most of the cases and cannot afford
to hire multiple people in order to track, record and analyze such type of information. But
does it affect the overall analysis? Of course, it does. Such performance records help to
identify the more prospective kids from the less prospective ones by their capabilities, and at
the same time provide vital feedback for each individual kid about their weak and strong
points. For this purpose, they decided to develop a tool called UnFooT, which combines
data mining techniques and approaches in a mix with fundamental statistical concepts in
order to record, track and analyze the performance of players and teams only concerning
20 | Page
their positions and movement on the football pitch (Coutinho JC., et al., 2019). Let us very
briefly introduce what the tool itself involves in terms of its capabilities. It involves match
data, extraction of features, visualization of player and team performance. It also has built-in
data mining techniques, such as association rule mining, subgroup discovery and a
proposed approach to look for frequent distributions (Coutinho JC., et al., 2019). The overall
processing of the tool process involves 3 stages: Processing of the data, representation of
the data and the data mining.
When the whole data was loaded, the tool first made one pass on the data and outputs a
new dataset with already features extracted in it. There are several features presented, such
as distance covered, the speed and the acceleration of the players. The dataset is divided
into time windows of the same size. For each window, several internal modules extract
different performance indicators and statistics from the positional data. One of the metrics,
pressure uses a clustering technique (DBSCAN), from the python package scikit-learn. With
the clusters, the founders identified moments of higher pressure of each individual player
throughout the whole game. In the end of the analysis, the overall and much more specific
results are available in the form of a .csv file. This might be used as a source for even
deeper analysis aftermath (Coutinho JC., et al., 2019).
In general, some football associations, such as Football Federation of Armenia have already
started to use patterns and similarities in order to identify the probabilities of future success
in professional football. Khachik Tadevosyan, one of the authors of this seminar study, has
worked as Head of Marketing and Development at Football Federation of Armenia, and has
personally witnessed the approach in practice. For example, they are using the historical
data of their most decorated and successful existing professionals and trying to derive
different attributes at different ages. Such attributes might be factors like weight, height,
speed, history of injuries, dynamics of weight change, number of shots made during different
seasons, accuracy of passes and so on. Basically, combining statistical results of different
attributes might create correlation and identify a “perfect strategy” on the study how the
professional players path should look like. The Federation is still considering and is yet to
commit into diving for a technologically backed ML model creation, so their analysis still
takes a more or less primitive scale which obviously cannot be considered a fully deployable
data mining yet. After identifying the similarities of successful players in different age groups,
21 | Page
the federation is planning to create a short manual for coaches in the country and send it to
every single coach in Armenia with specific models, advice and data figures for different age
groups. They expect that there might be serious contradictions too because this strategy is
highly experimental and non-scientific, but an overall roadmap identification is still
achievable. One thing which might prevent creation of an effective model is contradiction of
so-called “success factors” or “advice” for different age groups because kids develop
differently from each other and also their dynamics and specifics of growth might change
dramatically from one year to another.
Some players even might benefit from some specific things when they are between 10-12,
and the same thing might be an obstacle for them when they get 14-17 or even make the
step into professional football. The biggest difference when a player makes a switch from
youth football into professional concerns the amount of physical workload that their body
goes through, which obviously dramatically differs from the first to second and humans adapt
to those changes very differently.
And, finally, let us move to one more study which deeply concerns the topic of study. The
authors from the Computing Science and Mathematics Division of Natural Sciences Faculty
in University of Stirling have written an explicit study to dive further into the insights that
allow Machine Learning methods to predict whether a specific player will or will not be
successful in the future as a professional athlete already. In general, as an introduction of
the overall study it is worth presenting the three fundamental questions that the study
wanted to discuss as an answer to the client with whom the collaboration was taking place.
Speaking about achievements, the author admits that not all beforehand outlined goals have
been reached. Nevertheless, the project was generally successful in providing
understanding and some vital insights about the existing relationship between the time spent
in the academy system and the success. One of the possible reasons why not specifically all
goals have been met could be the fact that a lot of technical knowledge background was
required, which the author was initially not familiar with, and had to learn, acquire and quickly
adapt to in order to evolve the study (such as Python, and applications of the Panda, NumPy
and Scikit-learn libraries (Grant T., et al., 2018).
22 | Page
After the introductory part the author divided the study into chapters where each of them
provided the analysis of the dataset and model development.
So, what machine learning methods did the author use in order to make his study? He
primarily concentrated on 3 methods: Decision Tree Classifier and Random Forests, Artificial
Neural Networks and Naïve Bayes Classification. (Grant T., et al., 2018).
The study presents an extended description of the theoretical basis for each method and
how specifically those methods were applied into the concrete study taking into account
existing limitations and specifics. Also, obviously, as every single method seems to represent
some sort of “black box” with bias and effectiveness uncertainty the author used an
evaluation method in order to assess the strength of a model. He created confusion matrix
with evaluations of True Positives and Negatives, as well as False Positives and Negatives,
as a result is helped to calculate the Accuracy as in the previous study, while unlike the last
authors, here, the author does not dive deeper into calculations of recall and precision based
on the matrix, rather he dives deeper in an attempt to get a precise answer about the
relationship between input variables and the output using statistical tests such as correlation
coefficient calculation and chi-squared test. The latter one is a very popular statistical test
with the discussion of a hypothesis of two existing scenarios which are, the variables are
independent of each other and the variables are not independent of each other.
What the study perfectly combines is the crucial concepts around which the whole question
of future success identification relies because if the first part is underperforming and the data
about performances is not very accurate, then the model itself is going to be not ideal. And
the author comes to a conclusion that despite the critical role of the ta lent identification, still
we have a lack of scientific and methodical approach to the testing of the athlete's
performances and there is an existing gap in the overall perception whether which
performance tests are most successful to tackle the talent identification discovery.
Forth Valley Football Academy (FVFA), which acted as the primary partner for the study was
the main data source for the study, here we can see that the data collection is very similar to
most of similar studies, especially the one conducted in cooperation with Anderlecht’s
academy. Again, the data used was the data of youth players between 2005 and 2016. The
working data set contains data of 497 athletes, reduced from a total 538 athletes in the
master data set. Similarly, the working data set contained more than 1700 data points,
reduced from almost 2300 data points in the master data set. Regarding the methodology,
the author preferred to take a more flexible and modern approach of Cross-Industry
Standard Process for Data Mining (CRISPDM), since it provides a more flexible and
convenient way for constant iterations over larger and smaller circles. He briefly defines the
23 | Page
Business Understanding, Data Understanding, explains the process of Data Preparation,
describes the modelling and presents the strategy for Evaluation and Deployment. The
CRISP DM is very useful when the model assumes many “back and forth” movements, and
here since the performance tests assume loss of uncertainty, such an emphasis seems to be
a very vital advantage.
Discussion:
The study (Pedro, S., et al., 2015) analyzes the influence of manipulations of field
dimensions and player numbers on the spatial-temporal characteristics of inter-individual
coordination tendencies of U-15 yr old youth football players emerging within the same
replicated dimensions of relative space per player during SSCG. Results showed that, even
though manipulations of player numbers and field dimensions may be used to set the same
relative spaces per player, emergent interpersonal coordination tendencies of players during
each constraining SSCG differed. This suggests that each player afforded more space to
play and was required to perform in more regular zones of the field than when performing in
equivalent areas set through manipulations of field dimensions. Another important aspect
from this study to retain is that the effective relative spaces per player found for all
treatments were much smaller than those theoretically set by the simple quotient of the total
field area per number of players (Tables 1 and 2 of study Pedro, S., et al., 2015). Further
studies are needed to clarify this issue considering the effective playing area rather than the
total SSCG area and using a broader participants sample (of varied ages and skills). The
current study has some limitations that should be acknowledged. Larger samples of
participants of varied ages and skill levels should be considered in future studies as well as
the manipulation of other relative space per player areas, player numbers (e.g., 3v3, 4v4,
5v5) and field dimensions (Pedro, S., et al., 2015).
The study (Abdullah MR., et al., 2016) explores ANN models to predict the two levels of
skills (Novice and Elite) of Malaysian youth soccer player’s relative performance obtained by
HACA. Furthermore, ANN showed better prediction performance in technical skill
performance on their relative performance with correlation of determination give a high value
especially in hidden nodes five. The results showed that the model is successful in
discriminating technical skill performance according to the two different levels of skill among
24 | Page
youth soccer players. This SSRSP-ANN model are definitely very useful tools in helping
decision makers achieve better management (Abdullah MR., et al., 2016).
The study (Sieghartsleitner, R., 2019) examines statistical considerations of common linear
methods versus non-linear alternatives turn the inclusion of several variables within
multidimensional modelling into a meaningful problem. In both statistical approaches,
extensive models with high numbers of variables increase the probability of obtaining results
that are difficult to interpret. For example, multicollinearity leads to unclear explanations of
variance, whereby the loading and weighing of single variables also becomes unclear
(Backhaus, Erichson, Plinke, & Weiber, 2018). Furthermore, a specific problem of linear
statistical models is that they are enslaved by the general relations of “the higher (or lower)
x, the higher (or lower) y” (Maszczyk et al., 2014). They may therefore fail to represent
possible interaction and compensation phenomena between different talent predictors within
developing talents (Conzelmann, Zibung, & Zuber, 2018; Meylan, Cronin, Oliver, & Hughes,
2010). Non-linear alternatives such as artificial neural networks and person-oriented
approaches also face certain problems, particularly regarding impossible comparisons of
different statistical model configurations, or difficulties in interpreting the obtained results
(Pfeiffer & Hohmann, 2012; Pion, Hohmann, Liu, Lenoir, & Segers, 2017; Zibung, Zuber, &
Conzelmann, 2016; Zuber et al., 2016). Indeed, since an artificial neural network is a kind of
black box, the process behind the emergence of its results is hidden, and imposes a
questionable blind explanation of an effect without prior insight into the processes that cause
this effect (Zhang et al., 2018). However, the BLR models for GMP in the current study did
not lead to any significant solutions, whilst descriptive values even show surprising inverse
characteristics in early adolescence (i.e., descriptive statistics indicate better values for
non-professional players in certain tests within early adolescence). This study of an
immediate comparison of the prognostic validity of GMP (40m sprint, agility, counter
movement jump, YoYo intermittent recovery test) versus SMP (dribbling, passing, juggling,
shooting) for talent selection in youth football seems to provide certain evidence that the
latter is more useful for predicting future player status.
Further research is required in the study (Werner F., et al., 2005) to identify how the relative
age effect impacts upon children’s levels of self-esteem and potential to ‘‘drop out’’ of sport.
It has already been demonstrated that the relative age effect is correlated with a higher
incidence of suicide in school children (Thompson, Barnsley, & Stebelsky, 1991). Since
talent detection and identification procedures may be biased by these reported differences in
relative age, an examination of the specific talents that underlie sports performance is of
great importance to youth coaches and researchers.
25 | Page
The study (Goncalves B., et al., 2017) aimed to explore how passing networks and
positioning variables can be linked to the match outcome in youth elite association football.
The findings may provide insights to understand the reasons underpinning successful
performances. It is suggested that lower passing dependency for a given player (lower
betweenness scores) and higher intra-team well-connected passing relations (higher
closeness scores) may optimize team performance. The findings of the current study also
suggest that the team that presented higher passing density (i.e., number of passes
successfully performed) achieved more successful shots. The study (Hägglund M., et al.,
2016) is done with a large sample involving more than 4500 participants. However, Low
seasonal ACL injury and acute knee injury rates meant that some analyses suffer from a
lack of power. Even so, it is likely that the neuromuscular training intervention has influenced
other potential risk factors not included in the present study, such as neuromuscular control,
and these factors may also correlate with our measured risk factor variables. Although the
353 predictive values were low, these factors could be used in athlete screening to target
354 preventive interventions (Hägglund M., et al., 2016).
Some scientists try to break all those components which affect future performance prediction
and try to tackle them one by one. If we try to critically evaluate specifically which component
is dramatically affecting the potential of an ordinary talented kid to become a successful
professional football player, then instinctively the primary factor that comes to the mind of
every person involved in football is injuries. The study written by Nikki Rommers from
University of Brussels, Roland Roessler from University of Basel and also 7 other scientists
from Ghent University published in The Orthopedic Journal of Sports Medicine concentrates
around the prediction and assessment of risk injury in high performances youth football
players based on my different factors, such as anthropometric, motor coordination and
physical performance measures. All of the measures and data analysis processes have
been completed using machine learning technologies and approaches. The study was
conducted based on the data of over 700 players selected from different age categories of
Belgian youth football system. Players were constantly monitored by their coaching and
medical staff. All injuries occurred were classified and labeled, respectively in order to
improve the accuracy of the data. Aside from those traditional measurements of tracking
results some technological solutions were applied too. The researching team conducted
preseason anthropometric measurements, such as weight and height, and as well as even
sitting height were taken. Moreover, in order to derive motor coordination and physical
fitness (strength, flexibility, speed, agility, and endurance) results the team decided to use
test batteries too. (Rössler R, et al.,2020)
26 | Page
Finally, based on the preseason tests data, in order to analyze the results an algorithm
called XGBoost was used to make the predictions about the probability and possibility of
each player to have a risk of being injured during the season. Injuries too were classified into
two categories: overuse or acute. As a result, throughout the season, half of the players (368
to be precise) had one or more injuries. Of the first occurring injuries, 173 were identified and
subsequently classified as overuse, while the other 195 were acute ones. The machine
learning algorithm that was used here identified that injured players in the hold-out test
sample with 85% precision, 85% recall (sensitivity) and 85% accuracy (f1-score). In addition,
it was stated that injuries can be classified as overuse or acute with 78% precision, 78%
recall and 78% accuracy. (Rössler R., et al., 2020) If we turn a bit back to the theoretical side
of data mining, in general, recall, precision and accuracy are indeed very good indicators of
how good the quality and quantity of the measurement were.
After discussing all twelve studies, we may conclude that data mining could be widely used
in youth football, especially using classification models or different types of associations in
order to identify patterns and relations. The studies (Rommers N., et al., 2020) and
(Coutinho JC., et al., 2019) have shown significant achievements since the afterwards
calculated measurement accuracy indicators (such as recall, accuracy or precision) had
pretty high results in the former case, while in a later case the relationship between “internal
psychological satisfaction” of youth players and eventual success was proved, as well as a
platform was created having data mining logic behind it in order to improve tracking of
players on the pitch with some patterns derived for performance success.
Nevertheless, despite the success of the studies (Rommers N., et al., 2020) and (Coutinho
JC., et al., 2019), we have seen in the study (Grant T., et al., 2018) that the author was
eventually unable to an unbiased and sufficiently provable classifier, which was backed by
tremendous importance of having a balanced data, which is very hard to obtain sometimes.
One of the biggest challenges that the author described in the study, and might be assumed
as a relative cause of imbalance is lack of a scientific and methodical approach to the
testing of athlete’s performances; and there is certainly a lack of consensus as to which
performance tests are most important to consider when identifying talent. Especially for
supervised learning methods this might evolve into a major challenge since inability to create
unbiased and effective performance tests might hamper the labeling process in some cases
to form the model and conduct the study.
The study (Grant T., et al., 2018) was unable to build an unbiased and sufficiently provable
classifier using the used historical dataset of the academy players since too much class
imbalance occurred. It was initially assumed that the machine learning classifiers were
27 | Page
producing relatively high classification accuracy; however, following analysis of the confusion
matrices of each, it transpired that the classifiers were predicting all athletes as failures.
References:
Pedro, S., Vanda C., Keith D., Duarte A & Júlio G. (2015). Effects of manipulations of player
numbers vs. field dimensions on inter-individual coordination during small-sided games in
youth football. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/280232306_Effects_of_manipulations_of_player_numbers_vs_Field_dimensions_on_i
nter-individual_coordination_during_small-sided_games_in_youth_football/stats
Buchheit, M., & Mendez-Villanueva, A. (2014). Effects of age, maturity and body dimensions
on match running performance in highly trained under-15 soccer players, Journal of Sports
Sciences, 32(13), 1271-1278. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/280232306_Effects_of_manipulations_of_player_numbers_vs_Field_dimensions_on_i
nter-individual_coordination_during_small-sided_games_in_youth_football/stats
Silva, P., Aguiar, P., Duarte, R., Davids, K., Araújo, D., & Garganta, J. (2014a). Effects of
pitch size and skill level on tactical behaviors of association 659 football players during
small-sided and conditioned games, International Journal of Sports Science & Coaching,
9(5), 993-1006.
Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/280232306_Effects_of_manipulations_of_player_numbers_vs_Field_dimensions_on_i
nter-individual_coordination_during_small-sided_games_in_youth_football/stats
Silva, P., Travassos, B., Vilar, L., Aguiar, P., David, K., Araújo, D., & Garganta, J. (2014e).
Numerical relations and skill level constrain co-adaptive behaviors of agents in sports teams,
PLoS ONE, 9(9), e107112. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/280232306_Effects_of_manipulations_of_player_numbers_vs_Field_dimensions_on_i
nter-individual_coordination_during_small-sided_games_in_youth_football/stats
Vilar, L., Duarte, R., Silva, P., Chow, J.Y., & David, K. (2014). The influence of pitch
dimensions on performance during small-sided and conditioned soccer games, Journal of
Sports Sciences, 1-9. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/280232306_Effects_of_manipulations_of_player_numbers_vs_Field_dimensions_on_i
nter-individual_coordination_during_small-sided_games_in_youth_football/stats
28 | Page
Abdullah M.R., Maliki A., Musa R., Kosni N., Juahir H. (2016). Intelligent Prediction of Soccer
Technical Skill on Youth Soccer Player’s Relative Performance Using Multivariate Analysis
and Artificial Neural Network Techniques. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/309600188_Intelligent_Prediction_of_Soccer_Technical_Skill_on_Youth_Soccer_Play
er%27s_Relative_Performance_Using_Multivariate_Analysis_and_Artificial_Neural_Network_Techniques
Rowat O., J. Fenner & Unnithan V. (2016). Technical and physical determinants of soccer
match-play performance in elite youth soccer players,” Journal of Sports Medicine and
Physical Fitness. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/309600188_Intelligent_Prediction_of_Soccer_Technical_Skill_on_Youth_Soccer_Play
er%27s_Relative_Performance_Using_Multivariate_Analysis_and_Artificial_Neural_Network_Techniques
Zhang W., Wu H. & Tang J. (2015). A combined neural network approach to soccer player
prediction, Engineering and Technology, International Journal of Computer, Electrical,
Automation, Control and Information Engineering, vol. 9, pp. 510-514.
Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/309600188_Intelligent_Prediction_of_Soccer_Technical_Skill_on_Youth_Soccer_Play
er%27s_Relative_Performance_Using_Multivariate_Analysis_and_Artificial_Neural_Network_Techniques
Sieghartsleitner, R., Zuber, C., Zibung, M., Charbonnet, B., & Conzelmann, A. (2019). Talent
selection in youth football: Specific rather than general motor performance predicts future
player status of football talents. Current Issues in Sport Science, 4:011. doi:
10.15203/CISS_2019.011.
Retrieved on 24.08.2020 from https://webapp.uibk.ac.at/ojs2/index.php/ciss/article/view/2946/2419
Murr, D., Raabe, J., & Höner, O. (2017). The prognostic value of physiological and physical
characteristics in youth soccer: a systematic review. European Journal of Sport Science,
18(1), 1–13. doi.org/10.1080/17461391.2017.1386719. Retrieved on 24.08.2020 from
https://webapp.uibk.ac.at/ojs2/index.php/ciss/article/view/2946/2419
Sieghartsleitner, R., Zuber, C., Zibung, M., & Conzelmann, A. (2019). Beneficial collaboration
of multidimensional measurements and coach assessments for efficient talent selection in in
elite youth football. Journal of Sports Science and Medicine, 18, 32–43. Retrieved on
24.08.2020 from https://webapp.uibk.ac.at/ojs2/index.php/ciss/article/view/2946/2419
Sieghartsleitner, R., Zuber, C., Zibung, M., Charbonnet, B., & Conzelmann, A. (2019). Talent
selection in youth football: Specific rather than general motor performance predicts future
29 | Page
player status of football talents. Current Issues in Sport Science, 4:011. doi:
10.15203/CISS_2019.011. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/7570079_The_relative_age_effect_in_youth_soccer_across_Europe
Conzelmann, A., Zibung, M., & Zuber, C. (2018). Talent identification and talent development
in sports. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/7570079_The_relative_age_effect_in_youth_soccer_across_Europe
Dodd, K. D., & Newans, T. J. (2018). Talent identification for soccer: physiological aspects.
Journal of Science and Medicine in Sport, 21, 1073-1087.
doi.org/10.1016/j.jsams.2018.01.009.
Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/7570079_The_relative_age_effect_in_youth_soccer_across_Europe
WERNER F., JAN VAN W., & MARK WILLIAMS A. (2005). The relative age effect in youth
soccer across Europe. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/7570079_The_relative_age_effect_in_youth_soccer_across_Europe
Rein R, Memmert D. (2016). Big data and tactical analysis in elite soccer: future challenges
and opportunities for sports science. doi: 10.1186/s40064-016-3108-2 PMID: 27610328.
Retrieved on 24.08.2020 from https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0171156
Hägglund M., Waldén M. (2016). Risk factors for acute knee injury in female youth
football. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/288073209_Risk_factors_for_acute_knee_injury_in_female_youth_football
Christensen KB, Møller M, Thorborg K (2015). Self-reported previous knee injury and low
knee function increase knee injury risk in adolescent female football. Scand J Med Sci
Sports. Doi: 10.1111/sms.12521. Retrieved on 24.08.2020 from
https://www.researchgate.net/publication/288073209_Risk_factors_for_acute_knee_injury_in_female_youth_football
30 | Page
View publication stats
Junge A, Dvorak J. Soccer injuries: a review on incidence and prevention. Sports Med 2004:
34 (13): 929–938
Leser, R., Hoch, T., Tan, X., Moser, B., Kellermayr, G. i Baca, A. (2019). Finding efficient
strategies in 3-versus-2 small-sided games of youth soccer players. Kinesiology, 51. (1.),
110-118. https://doi.org/10.26582/k.51.1.7
Rommers N., Rössler R., Verhagen V., Vandecasteele V. , Verstockt S. , Vaeyens R. , Lenoir
M., D’Hondt E. , Witvrouw E. (2020), A Machine Learning Approach to Assess Injury Risk in
Elite Youth Football Players. Retrieved on August 10, 2020 from
https://www.researchgate.net/publication/339402608_A_Machine_Learning_Approach_to_Assess_Injury_Risk_in_Elite_Youth_Football_P
layers
Grant, T., 2018). A Prediction of Youth Football Players’ Future Sporting Success Using
Neural Networks and Machine Learning, University of Stirling. Retrieved on August 10, 2020
from http://www.cs.stir.ac.uk/courses/ITNPBD5/PastProjects/exemplars/isdale.pdf
C. Chen, A. Liaw and L. Breiman, "Using Random Forest to Learn Imbalanced Data,”
University of California, Berkeley.
31 | Page