European Football Player Valuation: Integrating Financial Models and Network Theory
European Football Player Valuation: Integrating Financial Models and Network Theory
Abstract
This paper presents a new framework for player valuation in European football by fusing
principles from financial mathematics and network theory. The valuation model leverages a
”passing matrix” to encapsulate player interactions on the field, utilizing centrality measures to
quantify individual influence. Unlike traditional approaches, this model is both metric-driven
and cohort-free, providing a dynamic and individualized framework for ascertaining a player’s
fair market value. The methodology is empirically validated through a case study in European
football, employing real-world match and financial data. The paper advances the disciplines of
sports analytics and financial mathematics by offering a cross-disciplinary mechanism for player
valuation, and also links together two well-known econometric methods in marginal revenue
product and expected present valuation.
Keywords: European Football Analytics, Soccer Analytics, Player Valuation, Financial Mathemat-
ics, Network Theory, Stochastic Processes, Passing Matrix, Markov Chains, Centrality Measures,
Black-Scholes Model, Sports Economics
1 Introduction
Player valuation in European football is a complex endeavor, requiring nuanced metrics that go
beyond traditional sports statistics. Due to the dearth of goals in a game, other in-game events
should be utilized to understand the contribution of individual players to the team’s performance.
Existing methods often fail to capture the dynamic, stochastic nature of player performance and
its impact on fair market valuation. This paper introduces a multi-disciplinary approach that inte-
grates financial models with stochastic player performance models, borrowing from social network
theory.
The main objectives of this research are:
1. formulate a player valuation model that integrates the passing matrix with financial mathe-
matics, specifically stochastic asset pricing models, and
2. validate this integrated model empirically through a case study involving real-world data.
1
1.1 Connections with previous work
There have been, in our opinion, at least three foundational works that have provided in-depth
analyses of various pieces of this puzzle. The first major work to address the link between per-
formance and pay for athletes can be found in the paper of Scully [8]. In this paper, the author
investigated the connection between a baseball team’s revenue and the salary paid to the team’s
players. By utilizing well-known tools in labor economics, such as the marginal revenue product
related to units of labor, the author derived a framework to link on-field play with salary. The
second work we reference, that of Tunaru, Clark, and Viney (TCV) [10], addresses the value V
of a player to a team, in continuous-time, via stochastic modeling and a subsequent Black-Scholes
partial differential equation (PDE.) The solution V of that pde also includes the assumption that
contract can be sold at the end of the term. (There have been some numerical implementations
of this model, such as in [2] to value the goalkeeper of Serie A League club)) Finally, the work of
Rockerbie and Easton [7] in their recent book offers a discrete, multi-period approach to contract
pricing via expected present valuation and cohort analysis via a market beta. We note that Rocker-
bie and Easton also consider the real-option value of a player resigning between seasons, which we
do not address in this paper.
R pj
Zj = p j · P =P · R := πj R. (1)
j pj j pj
p
Here, the dynamic πj = P jpj models the normalized performance share that a player provides
j
with their on-field play. For example, in association football if one simplifies to think of eleven key
players, a uniform distribution of performance load would suggest that each player would provide
πt = 1/11 = 0.0909 per game. This obviously does not happen, as there are injuries, substitutions,
matchups that enhance or detract from a specific player’s ability to contribute, and even worries
about upcoming contract negotiations, to name a few (of a multitude of factors). Finally, there
are many in-game decisions that can lead to performance estimation, and this requires the special
attention of analytics providers that may keep such metrics away from public view.
To address these and other issues, our model can be calibrated using publicly available data. In
doing so, we compute (augmented) passing matrices derived directly from in-game stats to apply
network analysis methodologies similar to [6] and [4].
2
2 Risk-Neutral Model for Player Valuation
As mentioned above, we see there being two main approaches to valuing performance, using the
marginal revenue product to calculate salaries and a financial derivative approach that expands
functionality to allow for events such as trades, early release, and contract transfer. Our current
work is also carried out in recognition of the partial link between direct and derived contract value
being the stochastic modeling seen in (1).
where
• Yj,k represents the valuation of the player’s contribution to the team for season k, derived
from the athlete’s on-field play (as in Equation (2)),
• and Sj,k is the performance-linked salary to be computed for player j for season k.
In this setting, salaries can be constant throughout the term, be front-loaded, or have other
term-structures that match the expected present value linked to revenue streams on the right side of
equation (3). This balance equation represents a way for general managers to determine a player’s
contract structure before the season begins. For that reason, we assume ak is known ahead of time
to both parties.
3
Yt = πt at Rt
p
dπt = σπ πt (1 − πt )dWtπ
(4)
dRt = rRt dt + σR Rt dWtR
ρdt = ⟨dWtπ , dWtR ⟩.
Here, W π and W R are Brownian motions under the risk neutral measure P̃, with correlation ρ.
Note that R follows a geometric Brownian motion [9]. The process defined by πt in (4) is a special
(univariate) case of the Wright-Fisher process, Dirichlet process, and Jacobi process [1].
4
2. Define for each t and player j
Pt (Aj |X∞ = S)
πj,t = P , (6)
i Pt (Ai |X∞ = S)
where the denominator sums over all players i involved in the passing matrix for game t. This is
required to ensure that 0 ≤ πj,t ≤ 1, and transforms Pt (Aj |X∞ = S) into a measure
P of relative player
importance. An elementary computation shows πj,t = Pt (Aj ∩ {X∞ = S})/ i Pt (Ai ∩ {X∞ = S})
which offers an alternative interpretation.
3.1 Setup
We first consider a discrete-time setup with years k = 0, 1, . . . since revenue data was only available
annually. Thus k = 1 corresponds to the 2018–2019 season, k = 2 to the 2019–2020 season, and so
forth. Then Rk is the annual revenue, e.g. R1 corresponds to earnings over 2018–2019. Similarly,
πj,k , k = 1, 2, . . . are defined annually for player j. The information known at time k is kept track
through the filtration (Fk )∞ k=0 . For player j, their fixed annual salary for season k is denoted Cj,k ,
and the proportion of revenue allocated to players is ak . We assume that Cj,k , ak ∈ Fk−1 , which is
reasonable, as the players should know their salary and revenue share at the beginning of the season,
and πj,k , Rk ∈ Fk for all j, k. The main quantity of interest is Sj,k , which is a rolling estimate of
the expected salary that we recalculate every year (i.e. Equation (3) with N = 1), simply termed
expected salary.
5
Alternatively, one can note that an application of Itô’s formula for d(πt Rt ) shows
p
Ẽ[πt+dt Rt+dt |Ft ] = πt Rt 1 + r + ρσπ σR (1 − πt )/πt dt.
Valuation then can utilize a fully discrete approximation (with dt = 1). By adopting a discrete
accounting for yearly interest rate r and risk premium λ, one obtains the expected salary computed
in equation (7) below:
Sj,k 1
= Ẽ[Yj,k |Fk−1 ] = ak Ẽ[πk Rk |Fk−1 ]
1+r 1 + r + λj
(9)
1+r h p i
⇒ Sj,k = ak πk−1 Rk−1 1 + r + ρσπ σR (1 − πk−1 )/πk−1 ,
1 + r + λj
In the analysis below, we utilize the continuous (8), but one can note its similarity with (9). In par-
ticular, both are dependent on the previous year player performance Zj,k−1 = πj,k−1 Rk−1 , current
year player share ak , with an accumulation factor depending on r and discounted by risk premium
λj . Additionally, both contain a factor that adjusts these values proportional to a covariance term
involving ρσπ σR (akin to a market β).
i.e. last year’s game performance multiplied by player share, and last years revenue accumulated
by the risk free rate er(t−k+1) for the t − k + 1 = t − ⌊t⌋ weeks elapsed since the beginning of the
season. Since game appearance is reflected in the π game process, we assume λj = 0 for S game .
game
In terms of application, if ∆j,t in Equation (11) widens beyond a certain threshold, the manager
may be tempted to release or trade the player during the upcoming season, if possible. Or, if
6
there was an insurance product that would compensate a team for a player’s under-performance,
payments could be triggered when ∆k exceeds a threshold value. With granular revenue data, these
could be valued using the Rt process. For example, an option on player underperformance could be
linked to a payoff of the form (K − Yj,t )+ , where (x)+ denotes the positive part of x. This is large
when Yj,t is small (relative to K), which benefits the purchaser in the case of underperformance.
Consequently, the parameters πj,0 , σπ,j , ρj are estimated using maximum likelihood estimation. See
Appendix C for all estimates obtained in the upcoming analysis and a related discussion.
To estimate the risk premium λj , first denote mj,k as the number of games player j missed
due to injury or violations (e.g. red card suspensions), so that nj,k = Nj,k − mj,k as the number of
games played over season k. Note that Nj,k = 38 is the maximal amount (if a player is available
for all games). This is the case for most players. In a few circumstances, like due to when a player
is traded onto a team in the middle
P of a season,
P we may have Nj,k < 38. Injury rates are estimated
over the entire period as λj = k mj,k / k Nj,k , following an approach similar to [10, 2].
In terms of the player shares, practically speaking ak should be known from the club in question.
However, this data is unavailable so it is estimated through
" #
X Ni,k
ak = Ci,k /Rk , (16)
38
i
which is exactly the fraction of the money that goes out to players for season k. Note that the
factor Ni,k /38 compensates for the case of players being traded or bought mid-season.
Finally, in order to calibrate π, recall that πj,k depends on a passing matrix originally defined
per game. In order to obtain an annual estimate, we consider a multi-step process. First, consider
game
Qt , which is the augmented sample passing frequency matrix, i.e. where the i, j entry is the
game
number of passes that went from player i to j in game t, augmented with columns so that [Qt ]i,U
game
is the number of unsuccessful passes from player i and [Qt ]i,S = wS · ni,score + (1 − wS ) · ni,miss ,
where ni,score is the number of times player i scored (implicitly for game t), and ni,miss is the the
number of missed shots. Initial calibrations showed that wS = 10/11 works well, which counts
a missed shot as worth 1/10 of a score. The initial distributions (only required in intermediate
1
Salary data is from https://www.sportrac.com; revenue data is from [3].
2
Game specific data is fromhttps://www.whoscored.com; game appearance data is from https://www.
transfermarkt.com/.
7
calculations) are similarly estimated per-game using frequencies of when a player began possession
game
of a play (beginning of half, steal, etc.). Then Pt is the stochastic (row-normalied) version of
game
Qt , so that the i, j entry is the empirical probability of the next transference of ball ownership
game game
ends in state j if it begins with player i. For each player j, and game t, πj,t is derived from Pt
using the methods described in Section 2.3 and Appendix A.1.
game
Granular game data allows obtaining (for each player j) the game processes πj,t which are
observed at the per-game (weekly) level, defined for the annual times at which games were played.
For example, t = 1/52 if a game was played on the first week of the season, and t = 38/52 for the
final game. Similarly, t = 1 + 1/52 would be the first game of the second season. Since we require
an annual estimate for valuation purposes, we estimate for k = 1, 2, . . .,
1 X game
πj,k = πj,t , (17)
nj,k
k−1<t<k
i.e. their average over the season, for the games that they were present. Remark: The individual
game
πj,t are also used in Section 3.3.1 below.
Figure 1: Salary related and π processes for Liverpool. Two dashed lines are Ck (paid salary), and
game
Sk , weekly (in £). The historic πt , overlayed with its average πj,k from Equation (17) (thick
solid line)
These players, as prominent members of the squad, provide a practical basis for sanity-checking
the estimates. Specifically, their π values should reasonably align with 0.0909 ≈ 1/11, given that a
starting squad comprises 11 players.
8
Looking individually at the players, Virgil van Dijk’s compensation appears well-aligned with
his contributions through the 2018–2021 period, as indicated by the close correspondence between
Ck and Sk . Although model suggests that the pay raise for the 2021–2023 season is potentially
unwarranted, it is worth noting that Virgil missed most of the 2020–2021 season due to injury,
deflating all salary estimates by his λ̂j .
Looking at Mohamed Salah, his π exhibits significant variability, likely influenced by his role
as a key scorer (e.g. guaranteed to be in all successful shots, if he was the only scorer for a game,
or on the opposite end if another player was the only scorer). His average π is typically higher
than that of his peers, correlating with higher salary estimates. Spanning 2018–2022, we observes
undervaluation (accentuated from 2019–2021), so it is unsurprising to see a pay raise for 2022–2023
(to approximately £350,000). Although Salah is clearly crucial to the teams performance and the
data suggested undervaluation, the model suggests the magnitude of the raise to be unwarranted.
Looking at Trent Alexander-Arnold, Liverpool’s salary increase (18–19 to 19–20, and 20–21 to
21-22) align with our measure of increased salary estimates. Indeed, his paid salary is approximately
equal to the estimate for 2021–2022. Otherwise he is generally underpaid.
First, note the interpretation of ∆: ∆ > 0 means the player is underperforming (relative to the
contract) and ∆ < 0 indicates overperformance. Since ∆k is simply the difference of Ck and Sk from
Figure 2, we verify many of the observations (comparing the thick grey line in Figure 2 with the
game
difference of Ck and Sk in Figure 1. However, the game-to-game analysis through ∆t provides a
more granular explanation. This is especially clear with Virgil van Dijk, where the purely annual
view suggested slight overpayment, but Seasons 20–21 and 22–23 specifically highlight games where
Virgil generally outperforms his contract. Salah shows higher variation than the other players, as
9
game
was also apparent by πt in Figure 1.
10
we would expect quantities like σπ to have a larger effect on these contracts. Insights into ∂/∂σπ
would illustrate how player variability influences contract price. This idea can be applied to any of
the aforementioned contracts.
References
[1] J. Bakosi, J. Ristorcelli, et al. A stochastic diffusion process for the Dirichlet distribution.
International Journal of Stochastic Analysis, 2013:1–7, 2013.
[2] D. Coluccia, S. Fontana, and S. Solimene. An application of the option-pricing model to the
valuation of a football player in the’serie a league’. International Journal of Sport Management
and Marketing, 18(1-2):155–168, 2018.
[3] Deloitte Sports Business Group. Annual review of football finance 2019–2023. https:
//www2.deloitte.com/content/dam/Deloitte/uk/Documents/sports-business-group/
deloitte-uk-annual-review-of-football-finance-2023.pdf, 2019–2023.
[5] G. Grimmett and D. Stirzaker. Probability and random processes. Oxford university press,
2020.
[6] J. L. Pena and H. Touchette. A network theory analysis of football strategies. arXiv preprint
arXiv:1206.6904, 2012.
[7] D. W. Rockerbie and S. T. Easton. Contract Options for Buyers and Sellers of Talent
in Professional Sports. Springer Nature, Switzerland, 2020. doi: https://doi.org/10.1007/
978-3-030-49513-8.
[8] G. W. Scully. Pay and performance in major league baseball. The American Economic Review,
64(6):915–930, 1974.
[9] S. E. Shreve et al. Stochastic calculus for finance II: Continuous-time models, volume 11.
Springer, 2004.
[10] R. Tunaru, E. Clark, and H. Viney. An option pricing framework for valuation of football
players. Review of financial economics, 14(3-4):281–295, 2005. doi: https://doi.org/10.1016/
j.rfe.2004.11.002.
11
P(Ai ∩ {X∞ = S})
P(Ai |X∞ = S) = .
P(X∞ = S)
X
P(Ai ∩ {X∞ = S}) = P(Ai ∩ {X∞ = S}|X0 = j)P(X0 = j)
j
X
= [P(X∞ = S|X0 = j) − P({X∞ = S}\Ai |X0 = j)] P(X0 = j)
j
where {X∞ = S}\Ai is the set difference, i.e. the case of absorption into S but player i never
having possession. This can be computed by considering a new Markov chain that treats state
i as absorbing, and finding the probability of reaching S before i (see [5] for details). For the
denominator,
X
P(X∞ = S) = P(X∞ = S|X0 = j)P(X0 = j)
j
A.2 Pagerank
Pagerank is a well known matrix centrality measure that has been applied in European football,
see e.g. [6]. One way to derive pagerank that is in line with our methods is as follows. If P is the
original probability transition matrix, then pagerank is the stationary (i.e. long-run) distribution
obtained from a modified probability transition matrix
1−λ
λP + E (18)
|P |
where E is a |P |×|P | matrix with entries of 1’s, and λ is a multiplicative factor, commonly λ = 0.85.
So, the transition probabilities change from pij to λpij + 1−λ |P | . The interpretation is that instead
of the ball possession being determined purely through pij , instead, being at any given state i, the
possession can change to all other players (equally probable) with probability 1 − λ; otherwise the
original passing probability is used with probability λ.
Preliminary analysis utilized pagerank but ultimately found that it severely undervalued key
scorers. This makes sense if one considers the derivation in Equation (18), which has no bearing on
the worth of a score. Similarly, defensive efficiency is not taken into account (whereas it is in the
initial distribution in the previous section), and neither is missed passes. This preliminary analysis
also considered an augmented matrix that contained a highly weighted column for scoring, but it
only offered marginal improvement. It is possible that a further modification could yield usable
results, but that is left to future work.
12
Figure 3: Salary related and π processes for Arsenal. Two dashed lines are Ck (paid salary), and
game
Sk , weekly (in £). The historic πt , overlayed with its average πj,k from Equation (17) (thick
solid line)
which matches his raised salary in 2022–2023. This is somewhat predicted by stellar performance
toward the end of the 2021–2022 season. In some sense, the model anticipates the salary raise,
similar to both Salah and Alexander-Arnold of Liverpool. Granit Xhaka and Rob Holding are two
interesting cases. Rob has many games missing. However, even as a lower string player, the π
values are telling the story of being impactful, between around 0.07 to 0.10. That player share
is quite significant, wondering if additional attention should be brought to Rob. The profile of
Granit Xhaka’s π’s are similar to van Dijk and Alexander-Arnold and generally tells a similar
story to Alexander-Arnold in terms of slight underpayment. However, according to the π’s, his
performance declined significantly in 2022–2023.
Brighton: Looking at Figure 4, the story is similar to Liverpool. Pascal Gross shows valuation
to be as expected. The pay raise for 2022–2023 is minimal, which could reflect desire for a contract
extension but not to the point of a high expense. Solly March tells a similar story to Pascal. Lewis
Dunk has a similar profile to Mohamed Salah. In particular, he has highly fluctuating π’s, and
an observed underpayment until a salary increase in 2022–2023 (which, similar to Salah, may have
been more than it was worth, as the ∆ for that year is nearly £20,000.)
C Parameter Estimates
All estimation and calibration is done according to Section C. Estimates are reported in Tables
1 (player specific) and 2 (team specific). It is worth noting that the ak values differ from the
wage/revenue ratio in [3] due to various factors (including those wages including staff and differences
in reporting).
Somewhat surprisingly, seven out of the nine ρ̂ values are negative. We attribute this to a variety
of factors, generally stemming from lack of data. From a statistical standpoint, estimating three
parameters with only ten data points is a difficult task. From a sports analytic standpoint, it seems
13
Figure 4: Salary related and π processes for Brighton. Two dashed lines are Ck (paid salary), and
game
Sk , weekly (in £). The historic πt , overlayed with its average πj,k from Equation (17) (thick
solid line)
hard to detect how a player’s aggregate performance in a given season is going to affect the revenue
for that season. The motivation for including ρ in the first place is due to a hypothesized effect on
game-day revenue, for example due to fan favorite players and hot streaks. In aggregation, that is
much harder to detect. There may also be a misalignment in the sense that performance of last week
would affect gameday revenue in the current week. It is worth noting that despite these estimates
being counter intuitive, their effect is relatively small; recall Equation (7) which displays it as a
multiplicative effect of (1 + g), with g depending on ρ. A quick calculation using the estimates from
Table 1 for Mohamed Salah reveals this factor to be (using π̂0 for πk−1 ) (1 − 0.004213) = 0.9958.
The table also reveals that the π̂0 and σ̂π estimation is mostly unaffected when moving from the ρ
estimated to ρ = 0 case.
Looking at σ̂π , these values are generally low, which is unsurprising as they are estimated
through an average of seasonal data and the annual values do not fluctuate much. These would
be much higher if we considered the game process (note that this would have a larger effect on the
“1 + g” factor mentioned in the last paragraph).
14
ρ estimated ρ=0
Team Name Player Name π̂0 σ̂π ρ̂ π̂0 σ̂π λ̂
Liverpool Mohamed Salah 0.109 0.026 -0.506 0.103 0.025 0.011
Liverpool Virgil van Dijk 0.089 0.014 -0.286 0.087 0.014 0.207
Liverpool Trent Alexander-Arnold 0.084 0.020 -0.070 0.084 0.020 0.043
Arsenal Granit Xhaka 0.102 0.022 0.206 0.102 0.022 0.136
Arsenal Eddie Nketiah 0.043 0.056 -0.400 0.043 0.056 0.051
Arsenal Rob Holding 0.064 0.060 -0.370 0.064 0.059 0.203
Brighton Pascal Gross 0.083 0.056 0.393 0.083 0.056 0.075
Brighton Solly March 0.077 0.074 -0.097 0.077 0.074 0.169
Brighton Lewis Dunk 0.089 0.029 -0.231 0.089 0.029 0.055
Table 1: Parameter estimates for each player under analysis, for the cases of ρ estimated and ρ = 0.
Table 2: Estimates of σ̂R and calibrated values for each team’s player share (ak ) process.
15