League ML2
League ML2
1 Introduction
Statistical analysis of sports, or sports analytics, has become an increasingly
popular method for recruitment and strategising in modern sport and competi-
tion. The popularisation of sports analytics is often attributed to Billy Beane,
who famously achieved great success as the general manager of the Oakland
Athletics baseball team using a data-driven approach to evaluate and recruit
players on a much lower budget than competing teams. Other teams took note
of this approach and went on to achieve success through data-based decision
c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2022
Published by Springer Nature Switzerland AG 2022. All Rights Reserved
M. Wölfel et al. (Eds.): ArtsIT 2021, LNICST 422, pp. 138–152, 2022.
https://doi.org/10.1007/978-3-030-95531-1_10
Statistical Models for Predicting Results in Professional League of Legends 139
making. This success was noticed by executives and owners of teams in other
professional sports leagues, to the point where practically all modern sporting
organisations now recruit analytic experts or entire departments dedicated to
sports analytics [12].
The convenient nature of statistics allows managers and coaches to identify
a player’s strengths and weaknesses at a glance, without having to spectate each
game the players compete in. The same data can used by gambling organisations
to determine probability and assign odds to certain outcomes.
For example, football statistics have evolved to include automated sensing
technology that can track player position, movement and other observations
from fixed and mobile cameras and sensors. Several professional statistical anal-
ysis firms offer data and analysis to professional teams as a product, providing
context to the data collected and helping teams make tactical decisions [2].
Since League of Legends (LoL) is a video game, an abundance of statistics
can be gathered automatically as they are tracked by the game itself. The wealth
of data available provides many opportunities to perform analytics on the game.
Most of the existing forms of public analytics involving LoL is used by jour-
nalists and fans to make comparisons and fuel narratives. Other organisations
provide LoL teams with a paid product package to enhance in-house analysis
and supplement coaching.
The aim of this research is to build a statistical model using metrics from this
data that can accurately rate team and player performance, with the intention
of predicting the outcome of games featuring those players and teams in future
games.
2 League of Legends
League of Legends was released in October 2009, and in the years since its release,
it has developed a competitive infrastructure across multiple regions that rivals
that of traditional sports [8]. Each region’s competitive league features franchised
teams that compete against each other in weekly broadcasts that regularly draw
thousands of viewers and annual inter-regional championships that have drawn
44 million peak concurrent viewers during grand finals [21]. The events feature
grand finals in venues such as the Staples Center, selling out the venue within
1 h of tickets being available [22], and the Beijing National Stadium, catering to
live audiences in their thousands.
LoL is a team-based strategy game where two competing teams of 5 players
aim to destroy their opponents base, canonically named the Nexus. Each game
of League of Legends takes place on the same map, known as Summoner’s Rift.
Summoner’s Rift is split into three lanes, commonly known as Top, middle and
Bottom. These lanes form a path that leads from one team’s base to the other.
The two sides of Summoner’s Rift, referred to as ‘Blue Side’ and ‘Red Side’
are separated by a River that runs from top lane to bottom lane, and the area
in-between the lanes is known collectively as the Jungle. Blue team’s base and
nexus is situated in the bottom-left of the map, while red team’s base and nexus
is in the top-right. A representation of the map is shown in Fig. 1.
140 R. Jadowski and S. Cunningham
Fig. 1. Simplified version of the Summoner’s Rift Map. Original PNG version by
Raizin, SVG rework by sameboat licensed under CC BY-SA 3.0 [17] (Color figure
online)
team, there is a debate that blue side has an inherent advantage compared to the
red team. Similar to the home advantage often seen in traditional sports. This
advantage will be explored when analysing the data from competitive games and
considered when making predictions if such an advantage exists.
3 Background
The use of player rankings in LoL is recognised as being an important feature of
the game for individuals as well as to ensure the competitive edge of the game
[11], which may arguably extend to system of team rankings and statistics. Previ-
ous work has examined the effect that the ability of LoL players working together
in teams, and the presence of female gender players, has in being able to predict
the competitive performance of those teams, however this relies upon individual
measures being taken from players, such as measures of collective intelligence,
gender, and so forth, that are not intrinsic to the LoL game statistics and so
require additional information gather to take place [10]. Unsurprisingly, much
existing research tends to point towards the influence that individual players,
and their ability to form effective teams, can have on game outcomes [4,5]. How-
ever, in terms of win prediction, it has been shown that for other Multi-player
Online Battle Arena games in professional contexts, accuracy rates of up to 85%
are possible [9].
Win Percentage; Counter-Pick Rate; Total Kills; Total Deaths; Total Assists;
Total Kill/Death/Assist Ratio; Kill Participation; Kill Share; Average Share
of Team’s Deaths; First Blood Rate; Average Gold Difference at 10 min; Aver-
age Experience Difference at 10 min; Average Creep Score Difference at 10 min;
Average Monsters + Minions killed per minute; Average Share of Team’s Total
Creep Score post-15-minutes; Average Damage to Champions per minute; Dam-
age Share; Average Earned Gold per minute; Gold Share; Average Wards Placed
per minute; and Average Wards Cleared per minute. The players are separated by
their role in the team, since different metrics can be more important to specific
roles.
Metric Opposite
Kills Deaths
Gold at 15 Opponent Gold at 15
XP at 15 Opponent XP at 15
CS at 15 Opponent CS at 15
Towers Opponent Towers
Dragons Opponent Dragons
Vision Score per Minute Opponent Vision Score per Minute
Kills per Minute Opponent Kills per Minute
Damage per Minute Opponent Damage per Minute
Barons Opponent Barons
Heralds Opponent Heralds
Inhibitors Opponent Inhibitors
Wins Losses
S2 1
W = = (1)
S 2 + A2 1 + (A/S)2
In the original formula, W is the win percentage, S is the observed number
of runs scored, and A is the observed number of runs allowed. James initially
used an exponent of 2, inspiring the use of Pythagorean in the formula’s name.
The formula has since been studied to identify the optimal exponent value for
accurate predictions. Different exponents can be calculated for each team in
order to more accurately predict win percentages, and methods to find those
exponents, such as the Pythagenpat formula, have been developed
S + A 0.287
n= (2)
G
where n is the exponent, and G is the total number of games. Though orig-
inally used for baseball, the simple concept of an offensive and defensive stat
forming the foundation of the PE formula means that it can be applied to other
sports [13,15].
For LoL there are several metrics that can be used in an application of PE.
The most obvious one would be kills and deaths. While the win condition of LoL
is not having a higher margin of kills than the other team, it is an obvious met-
ric that usually indicates the more dominant team. Another alternative would
be turrets destroyed vs turrets lost. The planned model for rating teams will
be calculating an overall offensive and defensive rating for each team, so these
ratings can also serve as the values used in the PE formula.
Log5. Once the values of the PE formula for each team are known, we can
use another formula to estimate the probability of one team beating another.
James also devised Log5, a formula that uses two teams’ winning percentages to
calculate head-to-head match up probabilities [14].
pA − pA × pB
pA, B = (3)
pA + pB − 2 × pA × pB
The Log5 formula considers the winning percentage of team A (pA) and team
B (pB) and returns the percentage chance that team A beats team B. From
which we can easily calculate the chance that team B beats team A. We can
experiment using this formula with the values obtained from PE and compare
them to predictions from logistic regression models to see if it offers better or
worse performance.
where N is the number of games featuring the selected team, OppStati is the
opponent’s opposite raw stat in row i, AvgStatM is the overall league average
stat for metric M , and SideAdvM T is the average advantage/disadvantage for
metric M on team T ’s side of the map.
The adjustment to the chosen metric is made by dividing AdjT otal by the
number of games a team has played and subtracting that from RawStat
AdjT otalM T
AdjustedStatM T = RawStatM T − (5)
T otalGamesT
where RawStatM T is the raw per-game average stat for metric M for team
T and T otalGamesT is the total amount of games played by team T .
Using this information, one can calculate what a team’s adjusted stats would
be for each metric and compare them to their actual performance. If a team’s
adjusted stats are lower than their actual performance, this would indicate that
the level of their opponents was worse in that metric and vice versa.
5 Evaluation
this, there are major differences between starting on either side of the map that
could provide an advantage to a team.
It may be argued that the blue side of the map holds an inherent advantage
due to several factors. These include the asymmetrical geometry of Summoner’s
Rift and the isometric point-of-view favouring the blue side of the map. Most
importantly, the pick/ban phase strategy of a team is often dictated by the side of
the map the team is going to playing. Data suggests that this side advantage does
exist. In 2017, professional League of Legends games saw a period where blue
side had a win rate of 64%. So much so that the developers of LoL, have sought to
balance this advantage through various balance updates, such as making dragons
a more lucrative objective.
The dataset used in this study includes 882 games, of which blue side won
477. This equates to a 54.08% win rate for blue side. A chi-square test suggests
that the side of the map does have an impact on a team’s chances of winning
χ2 (1, 882) = 5.878, p = 0.015. This infers that blue wins are expected to be more
prevalent in the dataset, causing a slight imbalance.
They can also be evenly split into offensive (shown in Table 2) and defensive
(Table 3) metrics, which will form the basis of offensive and defensive team rat-
ings. The coefficient values can be used to calculate a weighting for each metric
when producing a team rating. Another prediction model can also be formed by
using these metrics as features, meaning that the results can be compared to the
prediction models using all available metrics.
n
i=1|yi − xi |
M AE = (6)
n
This is done with the intention of finding the PE exponent value that min-
imises the MAE. The values of the defensive rating were inverted and each added
to a constant of 5, since the formula relies on a lower, positive value, defensive
stat being a reflection of a team’s ability. We found a value of 1.82 the most
accurate single exponent to use for this dataset, with MAE of 0.0397. The MAE
values for this exponent range are shown in Fig. 3.
selected by their PBCC scores per team (WT); (4) the calculated offensive rating
and defensive rating per team (OD); (5) a player rating for each player in both
teams (PR); (6) actual win rate percentages of each team (WP); and (7) the
expected win percentage calculated using the Pythagorean expectation formula
for both teams (PE). Approaches 1 to 5 made use of logistic regression to predict
game outcomes and 6 to 7 made use of the Log5 formula for prediction.
Performance Metrics and Results. The following metrics were used to mea-
sure performance of the approaches: Classification Accuracy (CA) [18]; F1 Score
150 R. Jadowski and S. Cunningham
(F1) [1]; Area Under the Curve (AUC) [7]; Mathews Correlation Coefficient
(MCC) [1]; Log Loss (LL) [18].
Following training of the logistic regression models and calculation of the
Log5 outcomes, the results were obtained for each approach using the test data
set from the 2020 Summer Split, as shown in Table 6, where the highest per-
forming outcome for each metric is highlighted in bold.
The Player Rating model scores best in each performance metric, especially
MCC, while all models suffered lower F1 scores for predicting Red Wins than
predicting Blue Wins. This indicates that the models have more difficulty iden-
tifying if the red team wins, and seems resistant to predict this, despite having
taken the blue side advantage into account during stat adjustments for the mod-
els. Prediction performance of wins for the Player Rating model is illustrated in
Fig. 4.
References
1. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient
(MCC) over F1 score and accuracy in binary classification evaluation. BMC
Genomics 21(1), 1–13 (2020)
2. Cintia, P., Giannotti, F., Pappalardo, L., Pedreschi, D., Malvaldi, M.: The harsh
rule of the goals: data-driven performance indicators for football teams. In: 2015
IEEE International Conference on Data Science and Advanced Analytics (DSAA),
pp. 1–10. IEEE (2015)
3. Costa, G.B., Huber, M.R., Saccoman, J.T.: Understanding Sabermetrics: An Intro-
duction to the Science of Baseball Statistics. McFarland, Jefferson (2019)
4. Costa, L.M., Souza, A.C.C., Souza, F.C.M.: An approach for team composition in
league of legends using genetic algorithm. In: 2019 18th Brazilian Symposium on
Computer Games and Digital Entertainment (SBGames), pp. 52–61. IEEE (2019)
5. Do, T.D., Dylan, S.Y., Anwer, S., Wang, S.I.: Using collaborative filtering to rec-
ommend champions in league of legends. In: 2020 IEEE Conference on Games
(CoG), pp. 650–653. IEEE (2020)
6. Fearnhead, P., Taylor, B.M.: Calculating strength of schedule, and choosing teams
for March Madness. Am. Stat. 64(2), 108–115 (2010)
152 R. Jadowski and S. Cunningham
7. Fogarty, J., Baker, R.S., Hudson, S.E.: Case studies in the use of ROC curve
analysis for sensor-based estimates in human computer interaction. In: Proceedings
of Graphics Interface 2005, pp. 129–136 (2005)
8. Games, R.: League of Legends. Riot Games, Garena, Santa Monica, CA, USA
(2009)
9. Hodge, V.J., Devlin, S.M., Sephton, N.J., Block, F.O., Cowling, P.I., Drachen, A.:
Win prediction in multi-player esports: live professional match prediction. IEEE
Trans. Games 13, 368–379 (2019)
10. Kim, Y.J., Engel, D., Woolley, A.W., Lin, J.Y.T., McArthur, N., Malone, T.W.:
What makes a strong team? Using collective intelligence to predict team perfor-
mance in league of legends. In: Proceedings of the 2017 ACM Conference on Com-
puter Supported Cooperative Work and Social Computing, pp. 2316–2329 (2017)
11. Kou, Y., Gui, X., Kow, Y.M.: Ranking practices and distinction in league of leg-
ends. In: Proceedings of the 2016 Annual Symposium on Computer-Human Inter-
action in Play, pp. 4–9 (2016)
12. Lewis, M.: Moneyball: The Art of Winning an Unfair Game. WW Norton & Com-
pany, New York City (2004)
13. Morey, D.: STATS basketball scoreboard, pp. 1–288 (1993)
14. Morey, L.C., Cohen, M.A.: Bias in the log5 estimation of outcome of batter/pitcher
matchups, and an alternative. J. Sports Anal. 1(1), 65–76 (2015)
15. Oliver, D.: Basketball on paper: rules and tools for performance analysis. Potomac
Books, Inc., Dulles (2004)
16. Prasetio, D., et al.: Predicting football match results with logistic regression. In:
2016 International Conference On Advanced Informatics: Concepts, Theory And
Application (ICAICTA), pp. 1–5. IEEE (2016)
17. Raizin, Sameboat: Simplified version of the summoner’s rift map. CC BY-SA
3.0 (https://creativecommons.org/licenses/by-sa/3.0/) (2013). https://commons.
wikimedia.org/w/index.php?curid=29443207
18. Saleh, H.: Machine Learning Fundamentals: Use Python and Scikit-learn to Get
Up and Running with the Hottest Developments in Machine Learning. Packt Pub-
lishing Ltd., Birmingham (2018)
19. Sevenhuysen, T.: Oracle’s elixir - LoL esports stats (2021). https://oracleselixir.
com
20. Snyder, J.: What actually wins soccer matches: prediction of the 2011–2012 premier
league for fun and profit. Thesis. University of Washington, WA: Department of
Computer Science (2013)
21. Staff, L.E.: 2019 world championship hits record viewership. https://nexus.
leagueoflegends.com/en-us/2019/12/2019-world-championship-hits-record-
viewership/. Accessed 26 Mar 2021
22. Tassi, P.: League of Legends finals sells out LA’s Staples Center in an hour. Forbes
(2013)
23. Tate, R.F.: Correlation between a discrete and a continuous variable. Point-biserial
correlation. Ann. Math. Stat. 25(3), 603–607 (1954)