An In-Game Win Probability Model For Football
An In-Game Win Probability Model For Football
1 Introduction
In-game win probability models provide the likelihood that a particular team will
win a game based upon a specific game state (i.e., score, time remaining, . . . ).
Those models have become increasingly popular in a variety of sports over the last
decade. Nowadays, in-game win probability is widely used in baseball [12], basket-
ball [1,8] and American football [3,16,13]. It has a number of relevant use-cases
within these sports’ ecosystems. First, the win probability added (WPA) metric
computes the change in win probability between two consecutive game states.
It allows one to rate a player’s contribution to his team’s performance [17,11],
measure the risk-reward balance of coaching decisions [13,15], or evaluate in-
game decision making [14]. Second, win probability graphs can improve the fan
experience by telling the story of a game.3 For example, they can help identify
3
ESPN includes win probability graphs in its match reports for basketball (e.g.,
http://espn.com/nba/game?gameId=401071795) and American football (e.g., http:
//espn.com/nfl/game?gameId=401030972)
2 P. Robberechts et al.
exciting or influential moments in the game [19], which may be useful for broad-
casters looking for game highlights or to evaluate performance in crucial game
situations [18,17]. Third, they are relevant to in-game betting scenarios. Here,
gamblers have the option to continue to bet once an event has started, and adapt
their bets depending on how the event is progressing. This became a popular
betting service in many countries, and is estimated to account for over one-third
of online betting gross gambling yield in Britain [5].
While well established in these American sports, in-game win probability is
a relatively new concept in association football. It first emerged during the 2018
World Cup when both FiveThirtyEight and Google published such predictions.
The lack of attention in win probability in association football can probably be
attributed to its low-scoring nature and high probability of ties, which makes
the construction of a good in-game win probability model significantly harder in
comparison to the aforementioned sports. Unfortunately, FiveThirtyEight and
Google do not provide any details about how they tackled those challenges.
We present a machine learning approach for making minute-by-minute win
probability estimates for association football. By comparing with state-of-the-art
win probability estimation techniques in other sports, we introduce the unique
challenges that come with modelling these probabilities for football. In particular,
it involves challenges such as capturing the current game state, dealing with
stoppage time, the frequent occurrence of ties and changes in momentum. To
address these challenges, we introduce a Bayesian model that models the future
number of goals that each team will score as a temporal stochastic process. We
evaluate our model on event stream data from the four most recent seasons of
the major European football leagues.
2. Dealing with stoppage time. In most sports, one always knows exactly
how much time is left in the game, but this is not the case for football. Football
games rarely last precisely 90 minutes. Each half is 45 minutes long, but the
referee can supplement those allotted periods to compensate for stoppages during
the game. There are general recommendations and best practices that allow fans
to project broadly the amount of time added at the end of a half, but no one can
ever be quite certain.
3. The frequent occurrence of ties. Another unique property of football
is the frequent occurrence of ties. Football is a low-scoring game, hence the two
teams are often separated by less than two goals. In this setting the win-draw-loss
outcome provides essentially zero information. At each moment in time, a win
or loss could be converted to a tie, and a tie could be converted to a win or a
loss for one of both teams.
4. Changes in momentum. Additionally, the fact that goals are scarce
in football (typically less than three goals per game) means that when they do
occur, they often change the subsequent ebb and flow of the game in terms of
how space opens up and who dominates the ball – and where they do it. The
existing win probability models are very unresponsive to such shifts in the tenor
of a game.
3. Contextual features
– Team Goals: The number of goals scored so far.
– Yellows: The total number of yellow cards received.
– Reds: The difference with the opposing team in number of red cards received.
– Attacking Passes: A rolling average of the number of successfully completed
attacking passes (a forward pass ending in the final third of the field) during
the previous 10 time frames.
– Duel Strength: A rolling average of the percentage of duels won in the previous
10 time frames.
The challenge here is to design a good set of contextual features. The addition
of each variable increases the size of the state space exponentially and makes
learning a well-calibrated model significantly harder. On the other hand, they
should accurately capture the likelihood of each team to win the game. The five
contextual features that we propose are capable of doing this: the number of goals
scored so far gives an indication of whether a team was able to score in the past
(and is therefore probably capable of doing it again); a difference in red cards
represents a goal-scoring advantage [4]; a weaker team that is forced to defend can
be expected to commit more fouls and incur more yellow cards; the percentage
of successful attacking passes captures a team’s success in creating goal scoring
opportunities; and the percentage of duels won captures how effective teams are
at regaining possession. Besides these five contextual features, we experimented
with a large set of additional features.
αt ∼ N (αt−1 , 2)
θt,home = invlogit(αt ∗ xt,home + β + Ha)
β ∼ N (0, 10) (3)
θt,away = invlogit(αt ∗ xt,away + β)
Ha ∼ N (0, 10)
where αt are the regression coefficients, β is the intercept and Ha models the
home advantage.
Our model was trained using PYMC3’s Auto-Differentiation Variational In-
ference (ADVI) algorithm [10]. To deal with the large amounts of data, we also
take advantage of PYMC3’s mini-batch feature for ADVI.
4 Experiments
The goal of our experimental evaluation is to: (1) explore the prediction accuracy
and compare with the various models we introduced in the previous section and
(2) evaluate the importance of each feature.
4.1 Dataset
The time remaining and score differential could be obtained from match reports,
but the contextual features that describe the in-game situation require more
detailed data. Therefore, our analysis relies on Wyscout event stream data from
the English Premier League, Spanish LaLiga, German Bundesliga, Italian Serie A,
French Ligue 1, Dutch Eredivisie, and Belgian First Division A. For each league,
we used the 2014/15, 2015/16 and 2016/17 seasons to train our models. This
training set consists of 5967 games (some games in the 2014/15 and 2015/16
season were ignored due to missing events). The 2017/18 season was set aside as
a test set containing 2227 games. Due to the home advantage, the distribution
between wins, ties and losses is unbalanced. In the full dataset, 45.23% of the
games end in a win for the home team, 29.75% end in a tie and 25.01% end in a
win for the away team.
To assess the pre-game strength of each team, we scraped Elo ratings from
http://clubelo.com. In the case of association football, the single rating difference
between two teams is a highly significant predictor of match outcomes [9].
game as a win (e = [1, 1, 1]), a tie (e = [0, 1, 1]) or a loss (e = [0, 0, 1]). This
metric reflects that an away win is in a sense closer to a draw than a home win.
That means that a higher probability predicted for a draw is considered better
than a higher probability for home win if the actual result is an away win.
Actual Actual
probability LR probability LR
Predicted Predicted
probability probability
Actual Actual
probability RF probability Our proposed model
Predicted Predicted
probability probability
Because our model uses strength parameters which are time dependent, we
evaluate the quality of the estimated win probabilities on the external 2017/18
season. This evaluation is a challenging task. For example, when a team is given
an 8% probability of winning at a given state of the game, this essentially means
that if the game was played from that state onwards a hundred times, the team
8 P. Robberechts et al.
Besides the probability calibration, we also look at how the accuracy and
RPS of our predictions on the test set evolve as the game progresses (Figure 2).
To measure accuracy, we take the most likely outcome at each time frame and
compare this with the actual outcome at the end of the game. Both the RPS
and accuracy of all in-game win probability models improve when the game
progresses, as they gain more information about the final outcome. Yet, only the
Bayesian model is able to make consistently correct predictions at the end of each
game. For the first few time frames of each game, the models’ performance is
similar to a pre-game logistic regression model that uses the Elo rating difference
as a single feature. Furthermore, the Bayesian model clearly outperforms the LR,
mLR and RF models.
90% Pre-game
mLR 0.20
RF
80%
0.15
70%
0.10
60%
LR
0.05
mLR
50% Pre-game
Bayesian model
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Time frame Time frame
Fig. 2: All models’ performance improves as the game progresses, but only our
Bayesian model makes consistently correct predictions at the end of each game.
Early in the game, the performance of all models is similar to an Elo-based
pre-game win probability model.
Who Will Win It? 9
Yellows Reds
Time
Fig. 3: Estimated mean weight and variance for each feature per time frame.
4
https://fivethirtyeight.com/features/what-analytics-can-teach-us-about-the-
beautiful-game/
10 P. Robberechts et al.
5 Use Cases
In-game win-probability models have a number of interesting use cases. In this
section, we first show how win probability can be used as a story stat to enhance
fan engagement. Second, we discuss how win probability models can be used as a
tool to quantify the performance of players in the crucial moments of a game. We
illustrate this with an Added Goal Value (AGV) metric, which improves upon
standard goal scoring statistics by accounting for the value each goal adds to the
team’s probability of winning the game.
80%
Belgium wins
60%
40%
20% Draw
Japan wins
15 30 HT 60 75 90 min
Fig. 4: Win probability graph for the 2018 World Cup game between Belgium
and Japan.
Undoubtedly fans implicitly considered these win probabilities too as the game
unfolded. Where football fans and commentators have to rely on their intuition
and limited experience, win probability stats can deliver a more objective view
Who Will Win It? 11
value as the sum of the change in win probability multiplied by three and the
change in draw probability. The result can be interpreted as the average boost
in expected league points that a team receives each game from a player’s goals.
Figure 5 displays the relationship between AGVp90 and goals per game for
the most productive Bundesliga, Ligue 1, Premier League, LaLiga and Serie A
players who have played at least the equivalent of 20 games and scored at least
10 goals in the 2016/2017 and 2017/2018 seasons. The diagonal line denotes the
average AGVp90 for a player with a similar offensive productivity. The players
with the highest AGVp90 are Lionel Messi, Cavani, Balotelli, Kane and Giroud.
Also, players such as Neymar, Lewandowski, Lukaku, Mbappé and Mertens have
a relatively low added value per goal; while players such as Austin, Balottelli,
Dybala, Gameiro and Giroud add more value per goal than the average player.
0.6
Edinson Cavani
Mario Balotelli Harry Kane
Olivier Giroud
Dries Mertens
Kylian Mbappé
0.3
Neymar
Romelu Lukaku
Fig. 5: The relation between goals scored per 90 minutes and AGVp90 for the
most productive Bundesliga, Ligue 1, Premier League, LaLiga and Serie A players
in the 2016/2017 and 2017/2018 seasons.
6 Conclusions
This paper introduced a Bayesian in-game win probability model for football. Our
model uses eight features for each team and models the future number of goals
that a team will score as a temporal stochastic process. Our evaluations indicate
that the predictions made by this model are well calibrated and outperform the
typical modelling approaches that are used in other sports.
Acknowledgements
Pieter Robberechts is supported by the EU Interreg VA project Nano4Sports. Jesse
Davis is partially supported by the EU Interreg VA project Nano4Sports and the KU
Leuven Research Fund (C14/17/07, C32/17/036).
Who Will Win It? 13
References
1. Beuoy, M.: Updated NBA win probability calculator. https://
www.inpredictable.com/2015/02/updated-nba-win-probability-calculator.html
(2015), [Online; Accessed: 2019-06-12]
2. Burke, B.: Modeling win probability for a college basketball game.
http://wagesofwins.com/2009/03/05/modeling-win-probability-for-a-college-
basketball-game-a-guest-post-from-brian-burke/ (2009)
3. Burke, B.: (WPA)explained. http://archive.advancedfootballanalytics.com/2010/
01/win-probability-added-wpa-explained.html (2010)
4. Červenỳ, J., van Ours, J.C., van Tuijl, M.A.: Effects of a red card on goal-scoring
in world cup football matches. Empirical Economics 55(2), 883–903 (2018)
5. Commission, G.: In-play (in-running) betting: Position paper. https://
gamblingcommission.gov.uk/pdf/In-running-betting-position-paper.pdf (2016)
6. Constantinou, A.C., Fenton, N.E.: Solving the problem of inadequate scoring rules
for assessing probabilistic football forecast models. JQAS 8(1) (2012)
7. Constantinou, A.C., Fenton, N.E.: Determining the level of ability of football teams
by dynamic ratings based on the relative discrepancies in scores between adversaries.
Journal of Quantitative Analysis in Sports 9(1), 37–50 (2013)
8. Ganguly, S., Frank, N.: The problem with win probability. In: Proc. of the 12th
MIT Sloan Sports Analytics Conf. (2018)
9. Hvattum, L.M., Arntzen, H.: Using Elo ratings for match result prediction in asso-
ciation football. Int. J. Forecast 26(3), 460–470 (2010)
10. Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., Blei, D.M.: Automatic dif-
ferentiation variational inference. JMLR 18(1), 430–474 (2017)
11. Lindholm, S.: Using WPA to measure pitcher effectiveness. https:
//www.beyondtheboxscore.com/2014/5/19/5723968/chicago-white-sox-chris-
sale-wpa-major-league-baseball-pitcher-effectiveness (2014)
12. Lindsey, G.R.: The progress of the score during a baseball game. J. Am. Stat. Assoc
56(295), 703–728 (1961)
13. Lock, D., Nettleton, D.: Using random forests to estimate win probability before
each play of an NFL game. JQAS 10(2) (2014)
14. McFarlane, P.: Evaluating NBA end-of-game decision-making. J. Am. Stat. Assoc
5(1), 17–22 (2019)
15. Morris, B.: When To Go For 2, For Real. https://fivethirtyeight.com/features/when-
to-go-for-2-for-real/ (2017)
16. Pelechrinis, K.: iWinRNFL: A Simple, Interpretable & Well-Calibrated In-Game
Win Probability Model for NFL. arXiv:1704.00197 [stat] (2017)
17. Pettigrew, S.: Assessing the offensive productivity of NHL players using in-game
win probabilities. In: Proc. of the 9th MIT Sloan Sports Analytics Conf. (2015)
18. Robberechts, P., Bransen, L., Van Haaren, J., Davis, J.: Choke or Shine? Quantifying
Soccer Players’ Abilities to Perform Under Mental Pressure. In: Proc. of the 13th
MIT Sloan Sports Analytics Conf. (2019)
19. Schneider, T.: What Real-Time Gambling Data Reveals About Sports: Introducing
Gambletron 2000. http://www.gambletron2000.com/about (2014)
20. Torvik, B.: How I built a (crappy) basketball win probability model. http:
//adamcwisports.blogspot.com/2017/07/how-i-built-crappy-basketball-win.html
(2017)