001 0001 (2018)
001 0001 (2018)
Abstract-The predictions of the outcomes of football (American soccer) matches are widely done using the if-else
case based Football Result Expert System (FRES). The proposed prediction technique uses a neural network approach
to predict the results of football matches. The neural network detects patterns from a number of factors affecting the
outcome of a match making use of historical cases of training. This paper describes the inputs, outputs and compares
the results of this kind of a system.
Keywords: - Artificial Neural Networks, Back propagation, sport result prediction, pattern prediction, FRES,
Activation function
where v' – normalized value, v – previous value, average cup. Care should be taken to ensure that only records
– average of the training set, SD – standard deviation of pertaining to that league are considered as the number of
the set. matches played may otherwise tend to be different for
the two teams.
2.2 UEFA co-effecient
2.5 Team cost
UEFA is the European football governing body that
organizes the annual Champions League in which all the Each player on the team is given a figure that is
top European teams appear. Even though the Champions indicative of how much the club would have earned
league is not related to the German Bundes Liga, this from the sale of that player to another club. [4]
rank has been taken into consideration since both Bayern The sum of all the individual player costs gives the team
Munich and Borussia Dortmund have been regular cost. These values were found to vary between 2005 and
fixtures in the league since 2005 [3]. UEFA rank is given 2012, reflecting changes in the value of the Euro.
by the "Champions league point system". A certain
number of points are awarded to a team at each stage of 2.6 Year of match
the Champions league as given in Table 1. The inclusion of year of match is found to improve
the performance and the prediction capabilities of
Table 1: Champions league point system
the neural network. This is analogous to a human
S.no Stage Points earned being coming to the conclusion that the recent
1 1st qualifying round elimination 1 performance of the team is more important than past
2 2nd qualifying round elimination 2 performances. In simple terms Borussia Dortmund
3 Group stage participation 4 is on a rising trend with more and more victories
4 Group stage win 2
5 Group stage draw 1 than Bayern Munich while in the past Bayern
6 Round of 16 participation 4 Munich was the superior team. [5]
matrix into consideration. The algorithm for back by Y ' Oh d ' . Where d is given
propagation is as follows:
by d (TO OO ) OO (1 OO ) .
Step 1: Inputs and outputs are normalized as Step 14: 'delW' is updated using the formula given as
explained above. Total number of inputs to delW ( momentum delw) (learningRate Y )
the network is given by l = iO×3 and number Step 15: The complete training set 'nSet' error is
of outputs is given by n = jO×3, where iO and calculated as e w d .
jO are the number of inputs and outputs. The Step 16: 'X' is calculated as X Oi dStart ' .
number of hidden layer neurons m equals the
number of input neurons i.e. m = l. Where dStar e Oh (1 Oh ) . (5)
Step 2: Let 'nTest' be the number of training sets. Step 17: Change in input layer weights is given by
This means the size of the input matrix will delv ( momentum delv ) (learningRate X )
be lxm and the size of the output matrix will and weights are adjusted as v v delv
be nxm. and w w delw .
Step 3: The input and output pattern are stored in Step 18: Repeat steps 9 to 18 until the error rate is
rows of the input and output matrices leaving lesser than tolerance value. Save the weights
two places for the augmented neurons. Every and exit.
second row of the input and output matrix is
given by computing the ln (loge) of the The given algorithm explains the learning algorithm
preceding row to give the logarithmic neuron. used and how the weights have been adjusted.
Every third row of both matrices is computed
by the exponential of the original pattern. 3.5 Number of hidden nodes
Step 4: Assign learning rate and momentum factor to The number of nodes in the hidden layer is generally
some initial value. kept to the minimum required to classify all input pattern
Step 5: Initialize the input layer – hidden layer weight areas accurately. This is done to keep the memory
matrix v (lxm) and the hidden layer – output requirement of the nodes to a minimum. The number of
layer weight matrix w (mxn) to some random separable regions in space is one less than the number of
values. hidden nodes. It is also argued that the number of hidden
Step 6: Let the thresholds given by delv and delw, nodes should be one less than the number of training
both be zero matrices initially. sets.
Step 7: Variable 'iterate' is used to store the number
of iterations that training is going to take Based on this, the number of training sets is found to
place for. be13 for matches played between 2005 and 2011,
Step 8: Since the input neurons use a linear activation meaning that the number of hidden nodes should be 12
function, output of input layer 'Oi' is made (given as 13-1). But, since the number of inputs being
equal to input to input layer 'Ii' for each passed to the network is 29, or 87 taking into account
pattern (stored as a column). logarithmic and exponential neurons, the number of
Step 9: Input to the hidden layer is calculated by hidden nodes is kept at 87 to ensure separability of each
multiplying the output of the input layer with region within the training set.
corresponding weight values. That
is, I h v ' Oi , where 'Ih' represents the input General rule of thumb dictates that the number of hidden
to the hidden layer and is a column matrix of layer nodes for a single hidden layer feed forward
length m. network equals the mean (rounded up) of the number of
Step 10: Hidden layer outputs 'Oh' are calculated using input layer and output layer nodes.
the sigmoidal function as shown- 3.6 Logarithmic and exponential neurons
1
OO (2) Logarithmic and exponential neurons are generally used
1 e Io to augment the training capabilities of the network. For
Step 11: Target output 'To' (nx1) is calculated from the every one input neuron, its logarithmic and exponential
output matrix by taking the appropriate neurons are computed as shown and passed as additional
column. inputs to the network. On the output side also, the
Step 12: Error is calculated in two steps. First, the part logarithmic and exponential neurons are calculated and
error ePart is calculated as- used for every output neuron.
ePart (TO OO )
2 (3)
These neurons are collectively termed as augmented
Final error is given as Root Mean Square neurons. The addition of these neurons provides accurate
ePart (4) boundary mapping and increases the speed of training.
(RMS) value of ePart or ERMS . These increase the accuracy of the neural network at the
n
cost of the generalization capabilities.
Step 13: Calculated output 'Y' (mxn) is given
ISSN: 2367-895X 4 Volume 3, 2018
International Journal of Mathematical and Computational Methods
K. Sujatha et al. http://www.iaras.org/iaras/journals/ijmcm
Though these neurons play no role in the actual inputs borders will be clearly defined and fresh data that falls
and outputs of the network, they are used to modify the just outside a border will be disregarded even though it
interconnecting weights of the neural network and may belong to that data set.
therefore improve training.
4 Performance Parameters
4.1 Input nodes
The list of inputs given to the neural network (given
separately for team A and team B):
Transfer money spent A, B
UEFA ranking A, B
League position A, B
Away goals scored A, B
Home goals scored A, B
Away goals conceded A, B
Home goals conceded A, B
Player cost A, B
Year of match
Away wins A, B
Home wins A, B
Away losses A, B
Home losses A, B Figure 1: Comparison of learning rates
Total draws A, B
League points A, B
Home ground A, B
The above given 29 inputs are used with the network.
Addition of augmented neurons gives a total of 87 input
neurons.
4.2 Output nodes
The following are the outputs of the neural network:
Match winner A, B
Goals scored A, B
Yellow cards A, B
Red cards A, B
The total number of bookings or cards shown for each
team is given by the sum of the corresponding teams red
and yellow cards.
4.3 Selection of learning rate Figure 2: Comparison of various momentum factors
Experimentally, the optimal learning rate is found to be
0.7. As shown below, a learning rate of 0.7 provides the The ideal solution involves classifying all training data
best convergence of error rate in 500 iterations. with minimum error, while at the same time maintaining
4.4 Selection of momentum factor a minimum error for fresh data. In order to decide when
to stop training, 3 out of 13 training patterns are used as
When momentum factor is zero, the neural network the control group. Training is done on the remaining 10
takes 854 iterations to achieve a convergence of 0.02. patterns while error is calculated at the end of each set of
The most suitable momentum factor which does not weight iterations for the 3 control patterns. It is found
make the neural network overshoot its minima is found the error value for the control set initially decreases
to be 0.9. The figure given below compares the error rate along with decrease in overall error of the training set,
for different values of momentum factor. but starts to increase at a certain point. This point is the
minimum error for the control pattern after which it has
4.5 Generalisation started to lose its generalization capabilities.
Often the problem with training for a number of fixed
iterations or training with the goal of reducing error is The 3 control patterns are selected randomly from the
that the neural network will lose its generalization given input test patterns, choice of different control
capability. That is, since the error will now be small, the
ISSN: 2367-895X 5 Volume 3, 2018
International Journal of Mathematical and Computational Methods
K. Sujatha et al. http://www.iaras.org/iaras/journals/ijmcm
patterns is found not to noticeably affect the neural case, the neural network has to be trained separately for
network. such matches using suitable historical test cases, so that
the neural network makes the association for a reduced
goal margin.
Figure 3: Error rate of training set for 65 iterations Table 3: Predicted results Vs actual results
12/04/2012 44% 72% 50% 50% case of predicting the winner, the system is found have
20/11/2011 43% 80% 66.7% 33.3% an RMS error value slightly more than the FRES system
of prediction. Another feature of the system of
Note that the sum of percentages of prediction of either prediction is that the later the match predicted, better the
team winning may not be equal to 100%. This is because accuracy. This clearly highlights the fact that this
the neural network has not created the connection that method of prediction is a learning based method of
sum of returned percentages should be equal to 100%, as prediction and that the system predictions improve with
the training pattern makes use of a 100 - 0 relationship to more test cases.
indicate a winning team.
CITATIONS
5.4 Comparison with FRES
1. "The jersey colors losers wear",
FRES makes use of a rule-based network akin to an if- www.news.menshealth.com/the-jersey-colors-
else system to predict the outcome of the match. FRES is winners-wear/2012/05/02/
found to be very successful in predicting the final 2. "Normalizing NN data",
outcome of the match closer to match completion. This www.elitereader.com/vb/showthread.php?threadid=
is because FRES divides the match into a number of 118179
time segments and inputs are used for each given time 3. "UEFA club coefficients",
segment. www.uefa.com/memberassociations/uefarankings/cl
ub/index.html
The FRES approach involves factors such as emotional 4. "Bundesliga stats",
state (depending on current score line), the teams' www.bundesliga.com/en/about/questions/marketing
offensive and defensive capabilities etc. That is, it is /
more of a here and now kind of system used to predict 5. "Bayern Munich Wikipedia page",
matches based on numerous current factors. What it en.wikipedia.org/wiki/FC_Bayern_Munich
lacks is the capability to make deductions based on past 6. "Home advantage",
factors or performances of the team. en.wikipedia.org/wiki/Home_advantage
When FRES was applied to predict the result of a match 7. "How teams rise to the occasion",
it was found that given the first half data input, FRES http://www.soccerhelp.com/Soccer_Tactics_Weak_
would predict the full time winner and score line very Teams.shtml
accurately. The performance of FRES is better in this
regard when compared to the system of prediction using ACKNOWLEDGEMENTS
neural networks. It is also found that the use of neural
networks to predict matches before the start of the match The author would like to thank Mr. Arun Prakash,
produces results more accurate than those of FRES. manager and coach of Andrew football club, Hyderabad
Another advantage is that this approach can predict for his valuable guidance on factors affecting a teams'
matches weeks even months in advance, whereas FRES performance in a match. The author also owes a debt of
only starts predicting matches once they have begun. gratitude to a number of Bundes Liga fans on the Bundes
Liga forums who answered questions about the league
Table 5: Predicted RMS error Vs FRES RMS error and the two teams and provided certain hard to find
statistics.
S. FRES RMS error Prediction RMS error
No. Goals Winner Goals Winner
1 6.593 6.108 1.500 6.403
2 5.920 5.306 1.772 9.014
3 6.809 6.358 0.761 2.280
4 5.702 6.212 0.283 5.237
6 Conclusion
From the table it is clear that this method of prediction is
much better at predicting goals than FRES. This is REFERENCES
mainly because of the vast number of factors taken into [1] Byungho Min, Jinhyuck Kim, Chongyoun Choe and
consideration that give an inkling of the defensive and Robert Ian, "A compound framework for sports
offensive capabilities of the two teams in question. In the prediction: The case study of football", Knowledge-
ISSN: 2367-895X 7
Based Systems, Vol. 21, No. 7, pp. 551-562, 2008.
Volume 3, 2018
International Journal of Mathematical and Computational Methods
K. Sujatha et al. http://www.iaras.org/iaras/journals/ijmcm
[2] J. Sindik and N. Vidal, "Uncertainty coeffecient as a Football match outcomes", Working Papers, Queen
method for optimization of the competition systems Mary University, 2012.
in various sports", Sport Science, Vol 2, No. 1, pp. [5] Andreas Heuer and Oliver Rubner, "Towards he
95-100, 2009. perfect prediction of soccer matches", Westfalische
[3] Y. Y. Petrunin, "Analysis of the football Wilhelms University, Germany, In press, 2012.
performance: from classical methods to neural [6] Andrew James Moore, "Predicting football results",
network", Conference Paper, unpublished, 2011. Unpublished, Module code: COM 3021, May 6,
[4] A. C. Constantinaou, N. E. Fenton and M. Neil, "A 2004.
Bayesian network model for forecasting Association