Predicting Final Result of Football Match Using Poisson Regression Model
Predicting Final Result of Football Match Using Poisson Regression Model
Abstract. In any sport competition, there is a strong interest in knowing which team shall be
the champion at the end of the championship and one of the sports is football. Football match
predictions are of great interest to fans and sports press. In the last few years, it has been the
focus of several studies. In this paper, the researchers propose Poisson regression model to
predict the final result of football matches. The researchers predicted the average goals scored
by each team by assuming that the number of goals scored by a team in a match followed a
univariate Poisson distribution. Poisson regression model was formulated from four covariates:
the goal average in a match, the home-team advantage, the team's offensive power, and the
opponent team's defensive power. The methodology was applied to the 2017-2018 English
Premier League. The results obtained using this model had a fairly good accuracy.
1. Introduction
Football is one of the most popular sports in the world which has its own charm compared to other
sports. Football fans enchant in the tense and exciting moments of goals, especially for the last-minute
goals, the final result, the match intensity, and the final rank. For example, the dark horse team Leicester
City has won the 2014-2015 English Premier League. They surprisingly beat all the other strong teams.
Real Madrid have also won three champions league title in a row, where no team has been able to do it
before. This phenomenon reflects one of the most charming part of football that is complexity which
makes the game result hard to be predicted.
Several papers are found in literature considering football score prediction applied to championship
leagues such as the English Premier League ([1], [2], [3]), the Norwegian Elite Division [4] , the
Brazilian Championship [5]. Lee [1] considered a Poisson regression to predict the number of goals
from football team, where the average reects the strength of the team, the quality of the opposition and
the home advantage (if it is the home team). The independence between the goals scored by the two
teams was assumed and the methodology was applied to the 1995-1996 English Premier League.
Brillinger [5] modeled the probabilities of win, tie and loss through an ordinal-value model and applied
the model to the Brazilian Series A championship. Karlis and Ntzoufras [3] applied the Skellam’s
distribution to model the difference of goals between home and away teams. The authors illustrated the
model using the 2006–2007 English Premier League. Koopman and Lit [6] developed a statistical model
to predict the games of the 2010–2011 and 2011–2012 English Premier Leagues, assuming a bivariate
Poisson distribution with coefficients that stochastically changed the intensity over time. Koopman and
Lit [6] developed a statistical model to predict the games of the 2010–2011 and 2011–2012 English
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
MISEIC 2018 IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 1108 (2018)
1234567890 ‘’“” 012066 doi:10.1088/1742-6596/1108/1/012066
Premier Leagues, assuming a bivariate Poisson distribution with coefficients that stochastically changed
the intensity over time. An issue dealing with the papers cited above is that none of them considers the
home team factor to calculate the probabilities of interest. Dixon and Coles [7] presented the result that
46% of the matches was won by the home team, 27% were draws and in 27% the home team lost.
In this paper, the researchers modeled the number of goal scored by each team in a match by a Poisson
distribution, whose average reflected the strength of the attack and defense of the team and effect of
being playing at home. The model was applied to the 2017-2018 English Premier League. The Definetti
measure (DeFinetti [8]) was used to quantify the model predictive quality.
2. Model Construction
In this paper, the researchers assumed that the number of goals in each match followed the Poisson
distribution so they can start constructing the regression models based on the assumption. The Poisson
regression formula for this paper can be represented as:
𝑌~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆) (1)
𝑌 = 𝑋𝛽 (2)
From the formula, it can be seen that 𝑌 is a vector of dependent variable that consists of the home goals
and away goals in games, 𝑋 is a matrix of explanatory variables that records the home and away teams
corresponding to the games, 𝛽 is a vector containing the parameters, Offence and Deffence of the model.
There were 20 teams participating in the 2017-2018 English Premier League and each of 20 teams had
its offence parameter and defense parameter. Meanwhile, each of the times appeared as either a home
team or an away team. Thus, it can be said that 𝑌 = (ya,b1,yb,a1, ya,b2,yb,a2,…, ya,bn,yb,an)T and β = (Oafcbou,
Oars, …, Owestham, Dafcbou, Dars,…, Dwest, 𝛿)T, where ya,bi is the number of goals scored by team a versus
team b in game-i, Oj and Dj stands for the offence and defense parameter of team j and 𝛿 explains home
advantage. The status of a team in a variable incidence takes value 1 if it participates in the i-match and
takes value 0 if it does not participate. For example, Arsenal versus Westham in Emirates Stadium with
score 2-1. The vector 𝑌 = [2 1]T and first row of matrix 𝑋 is [0 1 … 0 0 0 ... 1 1] because arsenal’s
score with home advantage and the second row is [0 0 … 1 0 1 … 0 0] . The reserachers considered β
= (Oafcbou, Oars, …, Owestham, Dafcbou, Dars,…, Dwest, 𝛿)T = (𝛽1 , 𝛽2 , … , 𝛽𝑝 ) and this time 𝑝 equally to 41.
Then the log-likelihood function for β is given as follows:
𝑛 𝑝 𝑝
2
MISEIC 2018 IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 1108 (2018)
1234567890 ‘’“” 012066 doi:10.1088/1742-6596/1108/1/012066
Similiarly to Bastos and da Rosa [9], and Suzuki et al. [10], the researchers in this study calculated the
de Finetti distance in order to measure the beneficence of a prediction. This distance was given by the
Euclidean distance between the point corresponding to the real outcome and the one corresponding to
the prediction. For this case, just assumed that the set of all possible forecasts is given by the simplex
set 𝑆 = {(𝑃𝑤 , 𝑃𝑑 , 𝑃𝑙 )𝜖ℝ3: 𝑃𝑤 + 𝑃𝑑 + 𝑃𝑙 = 1. 𝑃𝑤 ≥ 0, 𝑃𝑑 ≥ 0, 𝑃𝑙 ≥ 0}. The possible real outcome,
including the win, draw, and loss are represented by the points (1,0,0),(0,1,0) and (0,0,1), respectively.
where (𝑏1 , 𝑏2 , 𝑏3 ) ∈ {(1,0,0), (0,1,0), (0,0,1)}. For example, if the prediction for the game between
teams a and b is (0.2,0.65,0.15) and the real outcome is (0,1,0), the de Finetti distances is 𝑑𝑓 =
(0.2 − 0)2 + (0.65 − 1)2 + (0.15 − 0)2 = 0.185.
For the equiprobable case, 𝑃𝑤 = 𝑃𝑑 = 𝑃𝑙 = 1/3, with the win of the home team (1,0,0), the de Finetti
measure is given by 𝑑𝑓 = (1/3 − 1)2 + (1/3 − 0)2 + (1/3 − 0)2 = 2/3. This value is accepted as a
threshold value in order to classify the prediction as acceptable or not (see the example in Suzuki et al.
(2010). If 𝑑𝑓 < 2/3, the predictions are considered acceptable; otherwise, If 𝑑𝑓 > 2/3, the predictions
are considered poor.
3. Application
1.2
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
Offence Deffence
The proportion of correct prediction was 80%. The teams with an estimated probability of win higher
than 0.5, were the actual winning team. Figure 1 displays the graphic of the attack and defense effect.
In the graphic of Figure 1, it can be seen each attack and defense effect of each team. In offence effect,
3
MISEIC 2018 IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 1108 (2018)
1234567890 ‘’“” 012066 doi:10.1088/1742-6596/1108/1/012066
the more positive the value, the stronger the effect of the attack is. Meanwhile, in the defensive effect,
the value more negative then it indicates the persistence.
Table 1. Probabilities of win, draw, and loss for each match in 32th round.
Probability
Home Away Score de Finetti Correct
Win Draw Loss
Crystal Palace Liverpool 0,166 0,189 0,644 1-2 0,191 Yes
Brighton Leicester City 0,401 0,302 0,297 0-2 0,746 No
Manchester United Swansea City 0,814 0,133 0,053 2-0 0,055 Yes
Newcastle United Huddersfield 0,607 0,238 0,154 1-0 0,235 Yes
Watford Bournemouth 0,453 0,222 0,325 2-2 0,916 No
West Brom Burnley 0,225 0,278 0,497 1-2 0,381 Yes
West Ham Southampton 0,463 0,238 0,299 3-0 0,434 Yes
Everton Manchester City 0,083 0,046 0,87 1-3 0,026 Yes
Arsenal Stoke City 0,817 0,112 0,071 3-0 0,051 Yes
Chelsea Tottenham 0,27 0,227 0,502 1-3 0,372 Yes
4
MISEIC 2018 IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 1108 (2018)
1234567890 ‘’“” 012066 doi:10.1088/1742-6596/1108/1/012066
Stoke City 26 33 6 7 8 12 24 19
West Bromwich Albion 23 31 4 6 11 13 23 19
4. Final Remarks
Through this paper, the researchers proposed a simple method with good predictive quality. It has easy
implementation and low computational effort for predicting match outcomes. The researchers developed
a model to estimate the probabilities of win, tie and defeat in football games. In order to calculate these
probabilities, they proposed Poisson regression model in which the average of goals scored reflected the
strength of attack of the team, the strength of defense of the opposing team and the home team effect.
The methodology was implemented in the software R. In this paper, the researchers estimated 100 games
(29th-38th round), and the accuracy for our modelling was 61%. Although the model was applied to the
2017-2018 English Premier League. Principally, it was exible and it could be easily adapted to other
different tournaments. However, it would be interesting if it can count every team's chances of winning
the league or the chances of being degraded. It can be seen as a direct generalization of the researchers’
proposed model and it may be investigated further by adding a significant parameter which can update
the model so that it gets better in terms of its accuracy.
5. References
[1] Lee A J 1997 Modeling scores in the Premier League: is Manchester United really the best?
Chance 10 pp 15–19
[2] Everson P, Goldsmith-Pinkham P 2008 Composite Poisson Models for Goal Scoring. Journal of
Quantitative Analysis in Sports 4 (2)
[3] Karlis D and Ntzoufras I 2009 Bayesian modelling of football outcomes: using the
Skellam’s distribution for the goal difference IMA Journal of Management Mathematics 20 pp
133–145
[4] Brillinger D R 2006 Modelling Some Norwegian Soccer Data Advance in Statistical
Modelling and Inference (Ed. V. J. Nair.) World Scienticpp 3-20
[5] Brillinger D R 2008 Modelling Game Outcomes of the Brazilian 2006 Series a Championship
as Ordinal-valued Brazilian Journal of Probability Statistics 22 pp 89–104
[6] Koopman SJ and Lit R 2015 A dynamic bivariate Poisson model for analysing and
forecasting match results in the English Premier League Journal of the Royal Statistical
Society: Series A (Statistics in Society) 178 pp 167–186
[7] Dixon MJ and Coles SG 1997 Modelling association football scores and inefficiencies in the
football betting market Journal of the Royal Statistical Society: Series C (Applied Statistics)
46 pp 265-280
[8] DeFinetti B 1972 Probability, Induction and Statictics John Wiley, London.
[9] Bastos L S and da Rosa J M C 2013 Predicting Probabilities for the 2010 FIFA World Cup Games
Using a Poisson-Gamma Model Journal of Applied Statistics 40 pp 1533–44
[10] Suzuki AK, Salasar LEB, Leite JG, and Louzada-Neto F 2010 A Bayesian approach for
predicting match outcomes: the 2006 (Association) Football World Cup Journal of the
Operational Research Society 61 pp 1530–39
Acknowledgements
This research was funded by Directorate of Research and Community Service of Universitas Indonesia
(DRPM UI) as a grant of PITTA (Publikasi Terindeks untuk Tugas Akhir) 2018, Number:
2336/UN2.R3.1/HKP.05.00/2018.