0% found this document useful (0 votes)
54 views11 pages

Machine Learning Methods For Predicting League of Legends Game Outcome

This study focuses on predicting the winning team in professional League of Legends matches using only pregame data, such as player skills and team compositions. The authors propose new features to enhance prediction accuracy and achieve a classification accuracy above 0.70 with limited data. This approach is valuable for strategic decision-making and betting systems, highlighting the importance of pregame information in esports analytics.

Uploaded by

臧悠然
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views11 pages

Machine Learning Methods For Predicting League of Legends Game Outcome

This study focuses on predicting the winning team in professional League of Legends matches using only pregame data, such as player skills and team compositions. The authors propose new features to enhance prediction accuracy and achieve a classification accuracy above 0.70 with limited data. This approach is valuable for strategic decision-making and betting systems, highlighting the importance of pregame information in esports analytics.

Uploaded by

臧悠然
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE TRANSACTIONS ON GAMES, VOL. 15, NO.

2, JUNE 2023 171

Machine Learning Methods for Predicting League of


Legends Game Outcome
Juan Agustín Hitar-García, Laura Morán-Fernández, and Verónica Bolón-Canedo

Abstract—The video game League of Legends has several profes- data for every game played. This makes research in this area
sional leagues and tournaments that offer prizes reaching several very interesting, for example, to develop predictive models to
million dollars, making it one of the most followed games in the assist teams about the strategy to follow. However, there are few
Esports scene. This article addresses the prediction of the winning
team in professional matches of the game, using only pregame data. studies focused on LoL, and there is an almost complete lack of
We propose to improve the accuracy of the models trained with the research on the professional levels of the game.
features offered by the game application programming interface This study focuses on using pregame data (teams, players,
(API). To this end, new features are built to collect interesting infor- champions, etc.) from different LoL leagues and tournaments
mation, such as the skills of a player handling a certain champion, to predict the winning team of professional games. The main
the synergies between players of the same team or the ability of a
player to beat another player. Then, we perform feature selection motivation of our pregame approach is their applicability. Pre-
and train different classification algorithms aiming at obtaining dicting the winning team before match starts is very valuable
the best model. Experimental results show classification accuracy data, for example, to make decisions regarding team composition
above 0.70, which is comparable to the results of other proposals in terms of players, champions, and roles (recommendation
presented in the literature, but with the added benefit of using few system), or as part of betting systems. Choosing the professional
samples and not requiring the use of external sources to collect
additional statistics. level of the game reduces considerably the number of available
observations (compared to the other levels, thousands of games
Index Terms—Esports, feature creation, League of Legends versus hundreds of thousands or even millions). In order to
(LoL), model ensemble, prediction, video games.
obtain robust models, it is necessary to exploit the particularities
of this dataset, such as the fact that the matches are played by a
I. INTRODUCTION limited number of teams and players. This allows the extraction
N THE context of electronic entertainment, the broadcasting of features that provide much more relevant information to the
I of professional video games, known as Esports, is gaining
more and more relevance. In these games, the opponents face
predictive model than just the fact of participating or not in the
encounter, as would be the case when using the original set of
each other to achieve one of the important prizes at stake and features provided by the game API.
the recognition of the public, reaching an audience of 495 We propose an approach that, based on the analysis of the
million people and exceeding one billion dollars in revenues original data, detects possible new features that are important
by 2020 [1]. In this emerging field, League of Legends (LoL) [2] for predicting the outcome of a match. These features reflect,
from Riot Games Inc. is at the top of the most watched games for example, the dexterity of the player in handling a certain
with 348.8 million hours of viewing in 2019 [1], giving an champion, what synergies are produced when two players are in
idea of the popularity of this game. The creators of the game the same team, or the ability of a player to beat another player
have made available to developers an application programming when they face each other. Subsequently, a preprocessing step is
interface (API) that allows access to a large amount of detailed carried out, which includes the creation of these new features and
feature selection. Next, several machine learning algorithms are
trained, and those that show the best performance are selected to
Manuscript received 19 May 2021; revised 3 November 2021 and 18 January
2022; accepted 11 February 2022. Date of publication 23 February 2022; date be used in a meta-model to further increase accuracy. The main
of current version 16 June 2023. This work was supported in part by the contribution of this study is the development of a robust model
National Plan for Scientific and Technical Research and Innovation of the for winner prediction in professional LoL games. Despite the
Spanish Government under Grant PID2019-109238GB-C2, in part by the Xunta
de Galicia under Grant ED431C 2018/34 with the European Union ERDF funds, fact of having a dataset with limited information, our approach
in part by the CITIC, as a Research Center accredited by Galician University solves this problem by creating new features which will help
System, is funded by “Consellería de Cultura, Educación e Universidades from selecting the best predictive algorithms that will be combined
Xunta de Galicia,” supported in an 80% through ERDF Funds, ERDF Opera-
tional Programme Galicia 2014–2020, and the remaining 20% by “Secretaría into a meta-model.
Xeral de Universidades’ under Grant ED431G 2019/01. (Corresponding author: The remainder of this article is organized as follows. In
Verónica Bolón-Canedo.) Section II, a general description of the LoL game is given.
The authors are with CITIC, Universidade da Coruña, 15071 A Coruña,
Spain (e-mail: [email protected]; [email protected]; veronica.bolon@ Section III reviews the related work. Section IV describes the
udc.es). methods and materials we work with, including the dataset
Color versions of one or more figures in this article are available at (Section IV-A), preprocessing tasks (Sections IV-B and IV-C),
https://doi.org/10.1109/TG.2022.3153086.
Digital Object Identifier 10.1109/TG.2022.3153086 as well as the classification algorithms (Section IV-D). Section V

2475-1502 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
172 IEEE TRANSACTIONS ON GAMES, VOL. 15, NO. 2, JUNE 2023

about the game, a review of the specialized literature reveals


that this is not a widely explored topic. In the case of predicting
the winning team, as discussed in this article, two groups of
approaches can be distinguished depending on what information
is considered. The first group takes into account the in-game
information, that is, the information related to the game state
generated during the course of the game. This information may
include, for example, gold earned, experience gained, or turrets
destroyed at a certain point in the game. Within this group,
Silva et al. [4] employed recurrent neural networks to predict
the winner of the game based on information related to the
game state at a given minute. The dataset used by the authors,
obtained from Kaggle repository, is composed of 7621 instances
Fig. 1. LoL game map “Summoner’s Rift.”
of professional games. Thus, they achieve an accuracy of 63.91%
using data from the first 5 min of the game, and 83.54% for the
shows the proposed approach, including the exploratory analysis interval between the 20th and 25th min. Kim et al. [5] proposed
performed (Section V-A), the data partitioning and preprocess- a method which considers uncertainty to better calibrate the
ing (Section V-B), and the training for model prediction (Sec- confidence level in neural networks. In this way, they manage
tion V-C). Then, in Section VI, the specific parameters of the to predict the winning team in real time with an accuracy of
implementation are presented (Section VI-A), and the solution 73.81%. Kho et al. [6] presented a logic mining technique,
is evaluated, presenting the results and comparing the proposed kSATRA, to obtain the logical relationship between team tactics
models with each other and with related works (Section VI-B). during the game and the game objectives. The obtained rule is
Finally, Section VII concludes this article. used to classify the outcome of future matches. The authors
extract, from the game website, 400 instances with information
II. GAME DESCRIPTION about ten teams from each of three leagues. A training set
accuracy of 73% is obtained with this dataset. Also with in-game
In professional LoL matches, two teams of five players, blue
information, Novak et al. [7] carried out a team performance
team and red team, face each other on a map called “Summoner’s
analysis. For this purpose, they use the assessment of expert
Rift.” This map (see Fig. 1) is divided into two parts by a river,
coaches to determine the features during the game most related
and each of these parts is assigned to a team, with the base
to the fact of winning or losing the match. With the selected
of the blue team in the lower-left corner and the base of the
features, they create a generalized linear mixed model with
red team in the upper-right corner. Three lanes (top, middle,
binomial logit link function to find those contributing the most to
and bottom) connect the bases, and between each of these lanes
the final result of the match. In this way they achieve an accuracy
there are neutral zones, called jungles. Before the battle begins,
of 95.8% using a dataset of 119 match instances extracted from
the players of each team must decide how to distribute the
the official video and historical match repositories. To conclude
five existing roles or positions: Top, jungle, middle, bottom
with this group, Kang et al. [8] compared the Poisson model
(bottom role is also known as ADC or Attack Damage Carry),
and the Bradley Terry model, evaluating their performance in
and support. These roles are named after the zone of the map
terms of the predictive capacity for the match outcome. For
they occupy, except for the support role, which helps and gives
this purpose, authors consider in-game data after the end of the
protection to the rest of the teammates [3]. Likewise, players
match from 333 games, reaching an accuracy of 69.96% with
must also choose a champion from the 139 available. These
the Poisson model of Dixon and Coles.
champions have unique abilities and characteristics, making the
The second group of approaches only takes into account
choice champion/role composition of the team an important
pregame information, in other words, information that is avail-
strategic factor that can determine the final outcome of the
able before the match starts, such as the players involved, the
match. During the course of the game, players on the same team
champions they will play with, or the role that each player
cooperate to destroy the defensive structures of the enemy in a
will assume. Within this group, Gu et al. [9] presented a novel
lane (called turrets and inhibitors) and reach the opposing base in
neural network, NeuralAC, to predict the outcome of a match.
order to destroy their main structure, the Nexus, the final goal of
This neural network captures the importance of interactions
the game. To advance toward that goal, it is of vital importance
between members of the same team and between members
that champions gain experience levels, which allows them to
of rival teams. To evaluate their approach, the authors use a
unlock and enhance their special abilities, as well as earning gold
dataset with 754 700 match instances, extracted from the game
to be able to buy items that increase the characteristics of the
API, to achieve an accuracy of 62.09%. White et al. [10]
champion. This is achieved by defeating nonplayer characters
exploited the temporal data from the games to determine the
or enemy champions, and destroying opponent structures.
psychological momentum, positive or negative, of the players.
Then, they combine this information with player experience to
III. RELATED WORK predict the winning team. These authors used a dataset of 87 743
Despite the potential that LoL can have among machine learn- game instances extracted from the game API. In addition, they
ing researchers to develop algorithms that make good predictions also obtained a summary of the profiles of the participants in
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
HITAR-GARCÍA et al.: MACHINE LEARNING METHODS FOR PREDICTING LEAGUE OF LEGENDS GAME OUTCOME 173

those games from OP.GG Website [11], as well as information unknown battlegrounds, and used various artificial neural
regarding the champions from CHAMPION.GG Website [12]. network techniques to predict the winning player of the game
With all this data they trained their model, based on recursive by knowing their starting position. For the game Counter Strike,
neural network and logistic regression, to achieve an accuracy Xenopoulos et al. [21] created a probabilistic model to predict
of 72.10%. Ong et al. [13] employed the k-means algorithm to the winning team at the beginning of each round. With this
cluster the behavior of players based on their statistics in played model as a basis, they presented a recommender system to guide
games. In this way, they obtain a series of groups that describe teams in purchasing equipment. Meanwhile, Ravari et al. [22]
different playing styles of those players. Using these clusters studied the prediction of match outcome in the videogame
as features to define each team, they train various classification Destiny, which combines the genres first person shooter and
algorithms to predict the winner, achieving an accuracy of 70.4% massively multiplayer online role playing game. The authors
with support vector machines. The authors used 113 000 game create two sets of predictive models, one set predicts the match
instances, randomly extracted from the game API, in their study. outcome for each game mode, while the other set predicts the
They also used the game API to obtain the player statistics from overall match outcome, without considering the game modes.
which they perform the clustering. Finally, with the information In addition, they also analyze how game performance metrics
available before the start of the match, Chen et al. [14] aimed influence each of the proposed models.
to detect player skills that are determinant for the outcome of a As seen in the related work, in-game approaches generally
match. To do so, they used several skill-based predictive models perform better than pregame approaches. This is because these
to decompose player skills into interpretive parts. The impact approaches have much more information than pregame ones
on game outcome of these parts is evaluated in statistical terms. (such as experience gained, turrets destroyed, or enemies de-
This approach achieves an accuracy of 60.24% with a dataset of feated). Thus, as the game advances, the features allow more
231 212 game instances extracted from the game API. Note that accurate prediction of the winning team. Pregame approaches,
[13] and [14] are not peer reviewed. however, have very limited information. This makes prediction
Other relevant articles use as a basis for study the game much more complex and it is challenging to obtain models with
Defense of the Ancients 2 (DotA2), which is very similar to good accuracy, but it provides useful information before game
LoL in its concept. This game is more studied in literature, starts, which can be exploited in recommendation or betting
and, therefore, has a larger number of articles on winning team systems. The pregame approaches that achieve the best results,
prediction. For this game, Conley et al. [15] proposed a hero Ong et al. [13] and White et al. [10], need to obtain additional
recommendation engine. For this purpose, they train a kNN information, besides the match information, to train their mod-
(k-nearest neighbors) algorithm using as features the heroes that els. Our proposal, that belongs to the pregame approach, does
compose each team. In this way, they achieve an accuracy of not need additional data about the players or the champions.
70% with a dataset of 18 000 instances. Agarwala et al. [16]
performed a principal component analysis to extract the in- IV. MATERIALS AND METHODS
teraction between the heroes. With the obtained result, they
train a logistic regression algorithm with a dataset composed of A. Dataset
40 000 instances. Applying their approach, the authors achieve The dataset used in this article has been obtained from
an accuracy of 62%. Hodge et al. [17] presented the study Kaggle [23]. It consists of observations corresponding to profes-
of real-time prediction of the outcome in professional DotA2 sional games from various leagues and championships played
matches using in-game information. The authors used standard between 2014 and 2018. It includes details regarding both match
machine learning, feature creation, and optimization algorithms setup and match progress. The match progress data are discarded
on a mixed professional and nonprofessional games dataset to as this work is based on prediction before the match starts
create their model. They tested the obtained model in real time (pregame). Once cleaned and prepared, the dataset includes 241
during a championship match, reaching an accuracy of 85% after teams, 1470 players, and 139 champions. It has 7583 instances,
5 min of gameplay. Lan et al. [18] proposed a model that allows with a binary class variable, indicating the team that wins the
predicting the winning team using data on player behavior during game, and 26 features, corresponding to the year, season, league,
the match (in-game). They first extract the features that define and type of match, as well as the name of the team playing on
player behavior with a convolutional neural network. These each side (2 features, 1 per side), its composition of champions in
features are modeled as a temporal sequence that is processed each of the five roles (10 features, 5 per side), and its composition
by a recurrent neural network. Finally, the output of these two of players in each of the five roles (10 features, 5 per side).
networks for each team are combined to forecast the outcome. Table I shows the name and content of the class variable and
Thus, the authors use a training set of 20 000 instances to obtain, each of these features.
after the first 20 combats occurred during the match, an accuracy
of 87.85%.
Outside the context of multiplayer online battle arena B. Preprocessing
(MOBA) there are also articles on other Esports that attempt to In order to obtain data of quality, adapted to the needs of the
predict the match outcome. Thus, Sánchez-Ruiz et al. [19] used models to be trained, thus improving their predictive capacity, a
a set of numerical matrices, which represent the units influence series of preprocessing tasks are applied to the dataset.
on the game map, to predict the outcome of the game StarCraft. For the treatment of missing values an imputation with
The paper by Mamulpet [20] focused on the videogame players kNN [24] is performed. This algorithm searches the dataset for
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
174 IEEE TRANSACTIONS ON GAMES, VOL. 15, NO. 2, JUNE 2023

TABLE I to the classification algorithm. Using too many features can


CLASS AND FEATURES OF THE ORIGINAL DATASET
sometimes lead to creating a model that overfits the training
data and, therefore, reduces its predictive capacity. Furthermore,
since the dataset contains a limited number of instances, reduc-
ing the number of features simplifies the models, thus requiring
fewer observations to achieve good results. In this study, feature
selection is performed to obtain a reduced representation of the
original dataset that preserves the relevant information contained
in it, using a combination of two techniques as follows.
1) Feature Selection Using Univariate Filtering [26]: In
which the relevance of each feature with respect to
the class is evaluated with independence of the pre-
dictive models. In this case, a test is performed, using
ANOVA [27] as model, to see if the mean of each feature
is different between classes. Only the set of features that
shows statistically significant differences between classes
is finally selected. The main advantage of this technique is
its computational efficiency. However, it does not consider
possible correlations between features, and the selection
it performs is not directly related to the accuracy of the
predictive models.
2) Feature Selection Using Wrapper Methods [26]: In this
kind of selection, a model is trained, for example, us-
ing decision trees, to evaluate a subset of features using
the value obtained from the chosen metric. To find the
finally selected set of features, the sequential backward
the “k” nearest neighbors of the observation with the missing search [28] will be used. This algorithm first trains the
value. Once these neighbors are found, their feature values are model with the complete dataset to establish a ranking of
used to impute the missing value, using, for example, the mean features according to their importance. It then performs an
or the mode. iterative procedure in which the training is repeated at each
Feature generation [25] is the process of creating new features step with a subset of features in which the least important
from one or multiple existing features. These new representa- ones have been eliminated. This technique usually obtains
tions of the dataset make it easier for predictive models to extract better results than the filtering technique, since it generates
useful information, improving their performance. However, the compact subsets of features, optimized for the used metric.
creation of new features can lead to having some of them highly However, there is a risk of overfitting, which can be
correlated with each other, achieving the opposite effect to the reduced by using cross-validation, and it requires training
initially pursued objective. For this reason it is appropriate, after the models numerous times, making them computationally
performing this task, to do a feature selection that avoids this expensive. An added disadvantage is the need to adjust the
problem. hyperparameters of the algorithms used.
The categorical features contained in the dataset must be trans-
formed into numerical features so that some of the algorithms
used can handle them correctly. Thus, we choose to one-hot en- D. Classification
code them (replacing a categorical feature with n levels by n − 1
binary features). In this way the binary feature corresponding to For the modeling step, different algorithms have been eval-
the level of the categorical feature in an observation takes the uated that, a priori, could give good results for this binary
value 1, leaving the others at 0. classification problem. Finally, among all the trained models, we
To avoid the influence of the scale and variance of the con- will select those that obtain the best results (either individually
tinuous features on their weight in the predictive models, their or as part of meta-models). These algorithms are described as
transformation is necessary. We used a normalization Z-score, follows.
i.e., subtracting the mean of each feature value and dividing the 1) Extreme Gradient Boosting [29]: It is an efficient and
result by the standard deviation. Therefore, these features have scalable implementation of decision trees with gradient
a mean of 0 and a standard deviation of 1. boosting [30]. It consists in building trees sequentially, so
that, in each iteration the tree that minimizes the errors of
the previously generated trees is added.
C. Feature Selection
2) Support Vector Machines [31]: It seek to define a set
In order to create a robust model, it is necessary to identify of hyperplanes in a space of a certain dimensionality,
those features that are highly correlated with the output, since that separate as best as possible the classes of the output
they will be the ones that really provide valuable information variable. Two variants of this algorithm are applied, the
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
HITAR-GARCÍA et al.: MACHINE LEARNING METHODS FOR PREDICTING LEAGUE OF LEGENDS GAME OUTCOME 175

first with a linear kernel and the second with a radial basis
function Gaussian kernel.
3) Logistic Regression [32]: It is a statistical model that
uses a logistic function to predict the outcome of a bi-
nary nominal variable. In this case, two variants are used
that provide good results, the generalized linear model
with boosting [33] and the generalized linear model with
elastic-net regularization [34].
4) Naive Bayes Classifier [35]: It is based on the Bayes
theorem and on assuming independence between features
to calculate the probability that an observation belongs to
one class or another.
5) kNN Algorithm [36]: It is based on the idea of searching
for “k” observations in the training data that have the
smallest Euclidean distance to a new observation (nearest
neighbors), assigning it the most repeated class of those
training observations.
6) Neural Networks [37]: They are based on generating an
interconnected group of nodes that attempts to imitate
the biological behavior of the axons of neurons. Each of
these nodes receives as input the output of the nodes of
the previous layers multiplied by a weight. These values
are aggregated at each receiving node and, optionally,
modified or limited by an activation function before propa-
gating the new value to the next neurons. Within the broad
field of neural networks, in our work we use a multilayer
perceptron (MLP) [38], a kind of fully connected feed-
forward neural network composed of multiple layers of
perceptrons.
In this work, we also include two meta-models that are inter- Fig. 2. Schematic diagram of the steps followed in the methodology.
esting for the results obtained. The first one is a model based
on label fusion with majority voting [39], while the second one
performs stacking [40], where the outputs of a set of classifiers
are used as new features to train a new model.

V. METHODOLOGY
As mentioned before, in this work we are considering only the
state of the game before starting the battle (pregame) to perform
the prediction of the winning team in professional games. With
this limited information, both in terms of available features and
number of instances, it is recommended, during the exploratory
analysis, to guide a search process for possible new features
that can provide additional relevant information about the class.
Fig. 3. Features importance of the original dataset.
In the preprocessing of the data, the new candidate features,
found in the previous analysis, are created and feature selection
is performed to determine which ones are really important, dis- On the other hand, to guide the search for new features
carding the rest. The final dataset is trained on a set of classifiers. that capture useful information, a supervised learning algorithm
Finally, voting- and stacking-based meta-models are created based on decision trees is trained with the aim of determining
from the trained models to improve the final accuracy obtained. the importance of the features. The obtained ranking (see Fig. 3)
A diagram of the methodology followed is shown in Fig. 2. shows that the most relevant features with respect to the class
correspond to the roles of the players of both teams, followed
by those reflecting the selected champion, and finally those
A. Exploratory Analysis referring to the name of the team playing the match. The rest of
The analysis of the original dataset revealed the need to per- the features are detected as not important for the outcome.
form some preprocessing tasks to prepare it to the classification From these features detected as most relevant, a search for
phase, such as the handling of missing values and the one-hot possible new features is carried out. First, the win ratios for each
encoding of categorical features. value of these features are calculated (as matches won for that
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
176 IEEE TRANSACTIONS ON GAMES, VOL. 15, NO. 2, JUNE 2023

Fig. 4. Distribution of win ratios regarding champions.


Fig. 6. Win ratio for each team according to the side in which they play
(blue/red).

For the sake of brevity, it is not possible to plot all the


heat maps, but we concluded that the evaluation focused on
different combinations of player-related and champion-related
features, produces apparently interesting features that capture
the following aspects: The performance of a player leading a
certain champion, the performance of a player in a given role,
the synergies between players of the same team, the ability
of a player to win or lose when facing another player, the
performance of a champion in a given role, the synergies between
champions on a team, and the ability of a champion to defeat
another champion on the opposing team.
Finally, in the case of teams, it is observed that most of them
win more games when they are in the blue side (see Fig. 6).
Fig. 5. Win ratio for combination of champions in opposing teams heat map.
For this reason, it is additionally proposed to create a feature
containing information about the performance of a team when
playing in each of the sides.
value divided by matches played for that value). The distribution
of these win ratios is analyzed to determine their possible
predictive ability. For example, the distribution of win ratios B. Partition and Preprocess
regarding champions (see Fig. 4) follows a normal distribution, Before performing any data preprocessing, it is important
with mean 0.5 and small standard deviation. This indicates that to split the data into training and test sets. In this way, the
selecting one or other champion, analyzed individually, does not independence between the two sets is maintained, which is
seem to lead to winning the game. necessary to perform an honest evaluation of the models.
For this reason, and continuing with the search for new After data splitting with a 90%/10% ratio, all the preprocess-
features, we opted to explore win ratios, but this time for the ing is carried out using only the data in the training set, and then it
combination of those features detected as most relevant. In this is applied to the test set. The first preprocessing task is to handle
case, to generate these win ratios, the number of matches won the missing values, which occur in the feature corresponding
by a combination of two values from two features is divided by to the name of the team in the blue side. Since this feature is
the number of matches played by that combination of two values categorical, we impute them using the mode of the kNN, with
from those features. For example, for a new possible feature that k = 5, from the features containing the name of the players
captures the champion versus champion combination, the num- of that side, since it is possible that they have played in more
ber of matches won by a given champion when facing another occasions together in that team.
champion is divided by the number of matches played between Once the missing values have been dealt with, the new
both champions. The possible relevance of these combinations features, detected as potentially relevant for the class in the
with respect to the class can be observed graphically in a heat exploratory analysis, are created. To do this, a set of matrices are
map. In this case, each axis is one of the features selected from generated, always from the training set, which collect, for each
the original dataset, and the value of the win ratio is represented combination of original features considered, the corresponding
by a color gradient, with blue for values close to 0 and red for winning ratio. For example, the player–champion matrix has
values close to 1. For example, the heat map of the champion as many rows as players and as many columns as champions,
versus champion combination win ratio (see Fig. 5), which and contains the winning ratio for each possible pair of player
presents the ability of a champion to defeat another champion on and champion. Since all matrices are built based on win ratios,
the opposing team during the game, shows a significant number those combinations that do not occur in the training partition
of pairs near the extremes, suggesting that it may be interesting are assigned the average value between the maximum and the
to include this information in the dataset due to its ability to minimum (ratio of 0.5). In this way, it is intended that they do not
discriminate class. have a relevant influence on the output. It should be noted that,
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
HITAR-GARCÍA et al.: MACHINE LEARNING METHODS FOR PREDICTING LEAGUE OF LEGENDS GAME OUTCOME 177

  winsmn
coopPlayerred = .
matchesmn
m∈Rp n∈Rp, m=n

4) One feature calculated as the sum of the winning ratios of


the 25 possible combinations of each player of the blue
team, when facing each of the players of the red team. It
determines the superiority of the blue team players against
the red team players
  winsmn
vsPlayer = .
matchesmn
m∈Bp n∈Rp, m=n

5) Ten features, one for each champion present in the game,


with the winning ratio of the combination of each cham-
pion with the role played. It establishes how effective a
Fig. 7. Schematic diagram of the feature creation process for champion-player. champion is in a certain role
wins champion i
championRoleij = .
for the calculation of the ratios of these matrices, the formula matches champion i
“wins/matches” has been applied. Using these described ma- 6) Two features, one for each team present in the game,
trices it is easy to create new features for an observation, since calculated as the sum of the winning ratios of the ten
it is only necessary to extract the value of the ratio by accessing possible combinations of each pair of champions when
the row and column corresponding to the combined original they are on the same team. They capture the synergies
features. For example, to create the new feature that collects the between the champions of the same team
combination of a given player with the champion he controls, the   winsmn
value contained in the intersection of the row corresponding to coopChampionblue =
matchesmn
that player with the column corresponding to that champion of m∈Bc n∈Bc, m=n
the player–champion matrix is extracted. A diagram of the full
feature creation process for this champion–player combination   winsmn
coopChampionred = .
is shown in the Fig. 7. matchesmn
m∈Rc n∈Rc, m=n
In this manner, the values are extracted to create the 38 new
features described below. The notation used in these formulas 7) One feature calculated as the sum of the winning ratios
is as follows: i={top, jungle, middle, ADC, support}, j={blue, of the 25 possible combinations of each champion of the
red}, Bp={players in blue team}, Rp={players in red team}, blue team, when facing each of the champions of the red
Bc={champions in blue team}, and Rc={champions in red team. It shows the superiority of the champions of the blue
team}. team against the champions of the red team
1) Ten features, one for each player present in the match,   winsmn
with the winning ratio of each combination of player and vsChampion = .
matchesmn
m∈Bc n∈Rc, m=n
role played. Each of these features reflect how effective a
player is in a given role 8) Two features, one for each team present in the game, with
the winning ratio of the combination of each team with the
wins player i
playerRoleij = . side where it plays. It shows how well or poorly a team
matches player i plays on each side
2) Ten features, one for each player present in the game, with wins team j
teamColorj = .
the winning ratio of the combination of each player with matches team j
the champion he controls. It indicates how well a player
Once the task of creating new features has been completed,
controls a given champion
the categorical features are one-hot encoded to convert them into
wins player champion numerical ones.
playerChampionij = . The next preprocessing performed is feature selection. As
matches player champion
a result of the previous operations, the number of features
3) Two features, one for each team present in the game, increased to 4257. For this reason, a filter type selection, since
calculated as the sum of the winning ratios of the ten it requires less computational resources, is made with ANOVA
possible combinations of each pair of players when they statistical test and p − value = 0.05 as threshold. In this way,
are on the same team. They capture the synergies between a first subset of 705 features is obtained. Then, and over this
the members of each team subset, a wrapper selection is applied using random forest [41]
  as training algorithm and recursive backward selection as search
winsmn
coopPlayerblue = algorithm. Thus, the best metric in the cross-validation with
matchesmn repetition is obtained for a set composed of 28 features (see
m∈Bp n∈Bp, m=n

Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
178 IEEE TRANSACTIONS ON GAMES, VOL. 15, NO. 2, JUNE 2023

TABLE II with the best results for this dataset have been selected. One
FINAL SET OF SELECTED FEATURES
of them is the generalized linear model with boosting. The
other variant is the generalized linear model with elastic-net
regularization.
The Naive Bayes classifier, despite its simplicity, achieves
very good results in this problem. Similarly, a simple classifier
such as kNN obtains good accuracy, better than some much more
complex algorithms.
Using a neural network, in this case, does not obtain results as
good as other models presented above. However, its combination
in a meta-model with other classifiers adds enough diversity in
the prediction to improve the results obtained. A feedforward
network is implemented with one single hidden layer and with
dropout to avoid overfitting. The number of hyperparameters to
be adjusted is high, which makes the process of finding their
optimal values difficult.
Finally, two meta-models (also known as ensembles) are
created combining the previous models and improving their
results. The first one uses the models obtained with extreme
gradient boosting, the Naive Bayes classifier, and the neural
network to create a label fusion with majority voting. This simple
meta-model has no hyperparameters. In the second meta-model,
a stacking is performed, where the output of the Naive Bayes
classifier and the neural network is used as new features to train
a model based on the extreme gradient boosting algorithm.

Table II). All the selected features are from the group of new VI. RESULTS
ones. Among these new features, only the 10 corresponding to A. Experiments Framework
how effective a champion is in a certain role have been discarded.
Finally, the 28 numerical features are Z-score normalized. In the experimentation, a balanced random splitting with a
90%/10% ratio is used on the dataset. Thus, the training of
C. Classification the algorithms in the modeling phase is performed using 6826
instances, leaving 757 instances for testing. In this case, a tenfold
With the preprocessed data, binary classification algorithms cross-validation with five repetitions was chosen for training
are trained to find the patterns present in the training set, allowing the models. These partitions were created using seeds to ensure
generalization to new observations to predict their class. For this reproducibility. The metric used for evaluating all models, in-
training we have selected, from the whole set of algorithms tested cluding those used in the wrapper feature selection, is accuracy.
in the experimentation, those with the best results obtained, by It should also be noted that the experimentation of this work is
themselves or in one of the proposed meta-models. done in R programming language [42].
In order to find the hyperparameters for each algorithm, a grid
search has been performed. Then, a set of candidate values for
B. Comparative Study
the hyperparameters is created. For each possible combination
of these values, the model is fitted and the accuracy is estimated As mentioned before, the test partition was not used for the
using a tenfold cross-validation with five repetitions. Finally, preprocessing and modeling. This allows an honest estimation
the hyperparameter values with the best metric are chosen. The of the error to compare the results between classifiers. For this
hyperparameter values chosen for each algorithm can be found purpose, the accuracy obtained on the test partition for each of
in the Appendix. the algorithms is computed. Comparing the models (see first
Among all the trained algorithms based on decision trees, column of Table III), it is easy to note how the meta-models
extreme gradient boosting has obtained the best results in this improve the results of the other ones. In particular, the meta-
study. This algorithm tends to overfit, so, to avoid this issue, it is model based on stacking has the highest predictive capacity of
especially critical to correctly determine the value of its hyper- the models evaluated.
parameters. Support vector machines performed well in other However, model performance in terms of computation times
related work [13]. For this reason, two variants are included in could be a critical factor for practical application of classifiers
this study. The first one uses a linear kernel and the second variant (e.g., for real-time use during the champion and role selection
uses a Gaussian radial basis function kernel with class weights. phase). In that case, if the computation times employed by the
As in the previous case, logistic regression is also found in algorithms for this dataset with new features and feature selec-
the literature as one of the algorithms with the best predictive tion are compared (see columns fourth and fifth in Table III),
capacity for the problem addressed. In this case, the two variants a good choice would be the Naive Bayes classifier. This model
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
HITAR-GARCÍA et al.: MACHINE LEARNING METHODS FOR PREDICTING LEAGUE OF LEGENDS GAME OUTCOME 179

TABLE III TABLE IV


MODEL ACCURACY ON TEST PARTITION ACCORDING TO THE DATASET USED COMPARISON OF THE PROPOSED MODELS RESULTS WITH THOSE EXISTING IN
FOR TRAINING THE LITERATURE

NF = New features, FS = Feature selection.



For the full test partition (757 instances).

For the same algorithm, the accuracy test results are not significantly worse than the best. Nevertheless, these results highlight the importance of these
The bold values in first three columns represents the best accuracy for each row (method). two preprocessing tasks and the stacking-based meta-model, the
The bold values in last two columns (training time and prediction time) is the lowest time
combination of which improves the predictive capacity of this
among all the methods.
proposal, increasing it by around 5% when it is compared with
the results for the best model trained on the original dataset.
has a slightly lower accuracy than the meta-models, but requires Among all the related literature, only Silva et al. [4] used the
less than 1/8 of time to compute a prediction. same dataset as the one we use. Nevertheless, their work takes
It is also interesting to evaluate the impact on the models accu- into account in-game information. For this reason, and to make a
racy, both of the new features created and of the feature selection. comparison as honest as possible with this approach, their result
For this purpose, the algorithms are retrained, setting again the for the interval [0–5] minutes of gameplay is considered. On the
best hyperparameters, with two new datasets. The first one is other hand, our work could be compared with the approaches
obtained by applying, on the original dataset, all the preprocess- available in the literature that address the problem knowing
ing tasks indicated in this methodology, except the one related only the pregame information. However, in these approaches
to feature selection. The second is the original dataset, with there are no standard datasets due to the notable variations the
only the preprocessing essential tasks for the correct working of game suffers with each update, so this comparison is not fair.
the classification algorithms (imputation of missing values and Thus, each article uses its own dataset, extracted from the source
one-hot encoding for the qualitative features). Comparing the the authors have considered, and with its particular features. In
results obtained on the test partition of each of the three datasets addition, the works that obtain the best accuracy, need to extract
(see Table III), the best accuracy is obtained with those datasets extra information, such as statistics of players and/or champions
containing the new features created. Among them, the one with a involved, from additional sources. For example, White et al. [10]
feature selection has better metrics in almost all the models (9 of use a dataset containing 87 743 matches from various levels of
the 10 finally chosen), while the one that does not include feature play, not only from the professional, obtained from the game
selection only achieves the same results in the algorithms based API but also extract a set of statistics of all players and cham-
on logistic regression, and only outperforms it in the neural net- pions disputing those matches from OP.GG Website [11] and
work. This drop in accuracy can be explained because sometimes CHAMPION.GG Website [12]. Comparing the results of these
a neural network trained with irrelevant features is more flexible works with the ones obtained by our proposal (see Table IV),
than the network after feature selection, which can lead to better we can see how our models outperform the in-game model
results [43]. To verify if there are significant differences between that uses the same dataset, even though it uses information
the accuracy values obtained on each dataset, it is necessary from the first 5 min of the match. In addition, for pregame
to carry out a statistical analysis of the results.1 A Wilcoxon approaches our meta-models obtain an accuracy similar to those
signed-rank test [44] is performed by comparing these results of the state-of-the-art, despite having a much smaller number of
across all cross-validation folds and repetitions of the same observations for training, and without the need to use additional
algorithm trained on each of the three datasets. The statistical sources to obtain statistics of the participants.
tests (see Table III) reveal how the datasets with the new features The most important features for determining the class in the
(with and without feature selection), besides showing the best meta-model with the best accuracy in this article can be seen
metrics in all models, are significantly better than those obtained in Fig. 8. It seems that, in this case, the relevant features are
with the original dataset. It should be noted that the majority those that determine the superiority of players and champions
voting meta-model has not been tested, because it is not really of the blue team over those of the red team. These are followed
trained (the majority outcome is chosen from a set of classi- by the features that indicate how well a player controls a given
fiers), and, therefore, cross-validation folds are not available. champion. Next are the features that capture synergies, both
1 Standard deviations for each model and dataset. [Online]. Available:
between champions on the same team and between players on
https://github.com/JuAGarHi/ML-methods-predicting-LoL-outcome/blob/ the same team. Finally, and with similar importance to each
main/models-datasets-standard-deviations.md other, are the features that indicate how effective a player is in
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
180 IEEE TRANSACTIONS ON GAMES, VOL. 15, NO. 2, JUNE 2023

what role they should play, or which champion they should


select, according to the configuration of the opposing team.

APPENDIX
CLASSIFIER HYPERPARAMETER VALUES
Extreme Gradient Boosting
1) Number of trees to be adjusted, nrounds = 14 000;
2) maximum depth of each tree, maxdepth = 2;
3) reduction of the step size in each update, eta = 1e − 6;
4) reduction of minimum loss required to perform a new
partitioning of a node, gamma = 0;
Fig. 8. Ranking of feature importance in the stacking-based meta-model. 5) ratio of subsample of columns for tree construction,
colsamplebytree = 1e − 9;
6) minimum of the sum of the instance weight required on a
a certain role and those that capture how well or poorly a team
child node, minchildweight = 1; and
plays on each side.
7) ratio of data to be used to generate the trees, subsample =
1e − 1.
VII. CONCLUSION
SVM Linear Kernel:
In this article we have seen how it is possible to create 1) Cost for hyperplane margin infringement, C = 5e − 5.
a classifier for determining the winning team in professional SVM RBF Kernel With Class Weights:
LoL games that, with a limited number of observations and 1) Radial kernel coefficient, sigma = 2.8657e − 4;
features, obtains good results, in line with those offered by other 2) cost for hyperplane margin infringement, C = 0.25; and
approaches that have datasets with tens of thousands of instances 3) class weight, Weight = 3.
and with additional information on player performance. To this Generalized Linear Model With Boosting:
end, this proposal has been based on the search for and creation 1) Initial number of boosting iterations, mstop = 200.
of new features that provide additional relevant information to Generalized Linear Model With Elastic-Net Regularization:
the classifier. To the best of our knowledge, this is the first 1) Penalty mix ratio for elastic-net, alpha = 0.55; and
attempt in the literature to create new features (without relying 2) penalty value, lambda = 0.09.
on external sources) to improve the prediction of the winner in Naive Bayes Classifier:
LoL. These new features have been subjected, together with the 1) Laplace smoothing value, laplace = 0;
original ones, to a feature selection process. The result has been 2) use of a function to estimate conditional densities for the
used as a final dataset to train different models and assemble class of each feature, usekernel = TRUE; and
a selection of them into a meta-model. The best results were 3) adjustment value for the function of the previous hyper-
obtained by the meta-model based on stacking, achieving accu- parameter, adjust = 8.
racy over 70%. This result is comparable to other approaches k-Nearest Neighbors:
from the state-of-the-art, but with the added benefit of using few 1) Number of neighbors, k = 16.
samples and not requiring the use of external sources to collect Neural Network (MLP):
additional statistics. 1) Number of neurons in the hidden layer, size = 256;
The approach presented in this article can be considered as 2) ratio of information to be discarded after each layer,
a proof of concept for its application in other videogames (not dropout = 0.4;
exclusively in MOBA) or even sports. If the particularities of the 3) batch size used in each iteration, batch size = 2275;
dataset allow combining features to create new ones based on 4) learning rate, lr = 3e − 4;
win ratios, our methodology can be applied to predict the match 5) gradient decay value, rho = 0.9;
outcome. 6) learning rate decay value, decay = 0.2; and
This proposal opens the door to different ways for future 7) activation function used in neurons, activation = tanh.
development. The first one is to increase the number of obser- Meta-Model With Extreme Gradient Boosting Algorithm:
vations in the dataset. Getting more observations would allow 1) Number of trees to be adjusted, nrounds = 14 000;
studying how it impacts the predictive capacity of models. It is 2) maximum depth of each tree, maxdepth = 2;
also interesting to get information about when each match was 3) reduction of the step size in each update, eta = 1e − 6;
played. Having temporal information offers a whole range of 4) reduction of minimum loss required to perform a new
possibilities, both for the creation of new features that exploit partitioning of a node, gamma = 0;
this information (e.g., trends in players to detect periods in which 5) ratio of subsample of columns for tree construction,
the player performs more or less), and for using classification colsamplebytree = 1e − 9;
algorithms that consider the time factor, such as recursive neural 6) minimum of the sum of the instance weight required on a
networks. Another development option is the implementation of child node, minchildweight = 1; and
a recommendation system for professional teams. This system 7) ratio of data to be used to generate the trees, subsample =
would advise about which team members should participate, 0.15.
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.
HITAR-GARCÍA et al.: MACHINE LEARNING METHODS FOR PREDICTING LEAGUE OF LEGENDS GAME OUTCOME 181

REFERENCES [30] J. H. Friedman, “Greedy function approximation: A gradient boosting


machine,” Ann. Statist., vol. 29, no. 5, 2001, pp. 1189–1232.
[1] “Global Esports market report,” NEWZOO, Amsterdam, The Netherlands, [31] C. J. Burges, “A tutorial on support vector machines for pattern recogni-
2020. tion,” Data Mining Knowl. Discov., vol. 2, no. 2, pp. 121–167, 1998.
[2] Riot Games, League of Legends, Accessed: Jan. 2022. [Online]. Available: [32] D. G. Kleinbaum, K. Dietz, M. Gail, and M. Klein, Logistic Regression.
https://www.leagueoflegends.com New York, NY, USA: Springer-Verlag, 2002.
[3] Riot Games, “League of Legends: How to play—Lane positions.” Ac- [33] P. Bühlmann and B. Yu Bin, “Boosting with the L. 2 loss: Regression
cessed: Jan. 2022. [Online]. Available: https://www.leagueoflegends.com/ and classification,” J. Amer. Statist. Assoc., vol. 98, no. 462, pp. 324–339,
en-us/how-to-play/ 2003.
[4] A. L. S. Cardoso, G. P. Lobo, and L. Chaimowicz, “Continuous outcome [34] J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for gener-
prediction of League of Legends competitive matches using recurrent alized linear models via coordinate descent,” J. Stat. Softw., vol. 33, no. 1,
neural networks,” in Proc. SBCGames, 2018, pp. 2179–2259. pp. 1–22, 2010.
[5] D.-H. Kim, C. Lee, and K.-S. Chung, “A confidence-calibrated MOBA [35] I. Rish, “An empirical study of the naive Bayes classifier,” in Proc.
game winner predictor,” in Proc. IEEE Conf. Games, 2020, pp. 622–625. Workshop Empirical Methods Artif. Intell., 2001, pp. 41–46.
[6] L. C. Kho, “Logic mining in league of legends,” Pertanika J. Sci. Technol., [36] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans.
vol. 28, no. 1, pp. 211–225, 2020. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967.
[7] A. R. Novak, K. J. Bennett, M. A. Pluss, and J. Fransen, “ Performance [37] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge,
analysis in Esports: Modelling performance at the 2018 League of Legends MA, USA: MIT Press, 1995.
world championship,” Int. J. Sports Sci. Coaching, vol. 15, no. 5/6, [38] A. Pinkus, “Approximation theory of the MLP model in neural networks,”
pp. 809–817, 2020. Acta Numerica, vol. 8, pp. 143–195, 1999.
[8] D.-K. Kang and M.-J. Kim, “Poisson model and Bradley terry model [39] L. Lam and S. Y. Suen, “Application of majority voting to pattern recogni-
for predicting multiplayer online battle games,” in Proc. 17th Int. Conf. tion: An analysis of its behavior and performance,” IEEE Trans. Syst., Man,
Ubiquitous Future Netw., 2015, pp. 882–887. Cybern.-Part A: Syst. Humans, vol. 27, no. 5, pp. 553–568, Sep. 1997.
[9] Z. Chen, Y. Sun, M. S. El Nasr, and T. -H. D. Nguyen, “NeuralAC: Learning [40] D. H. Wolpert, “Stacked generalization,” Neural Netw., vol. 5, no. 2,
cooperation and competition effects for match outcome prediction,” in pp. 241–259, 1992.
Proc. AAAI Conf. Artif. Intell., 2021, pp. 4072–4080. [41] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32,
[10] A. White and D. M. Romano, “Scalable psychological momentum fore- 2001.
casting in Esports,” in Proc. Workshop: State-based User Model., 13th [42] “The R project for statistical computing,” The R foundation, Vienna, Aus-
ACM Int. Conf. Web Search Data Mining, 2020. [Online]. Available: tria. Accessed: Jan. 2022. [Online]. Available: https://www.r-project.org
https://www.k4all.org/event/wsdmsum20/ [43] E. Romero and J. S. María, “Performing feature selection with multilayer
[11] LoL Stats. Accessed: Jan. 2022. [Online]. Available: https://www.op.gg perceptrons,” IEEE Trans. Neural Netw., vol. 19, no. 3, pp. 431–441,
[12] LoL Champions Stats. Accessed: Jan. 2022. [Online]. Available: https: Mar. 2008.
//www.champion.gg [44] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics,
[13] H. Y. Ong, S. Deolalikar, and M. Peng, “Player behavior and optimal team vol. 1, no. 6, pp. 80–83, 1945.
composition for online multiplayer games,” 2015, arXiv:1503.02230.
[14] Z. Chen, Y. Sun, M. S. El Nasr, and T. -H. D. Nguyen, “Player skill decom-
position in multiplayer online battle arenas,” 2017, arXiv:1702.06253. Juan Agustín Hitar-García received the B.S. de-
[15] K. Conley and D. Perry, “How does he saw me? A recommendation engine gree in industrial engineering from the Polytechnic
for picking heroes in Dota 2,” Stanford Univ., Stanford, CA, USA, 2013. University of Valencia, Valencia, Spain, 2001, the
[16] A. Agarwala and M. Pearce, “Learning Dota 2 team compositions,” M.S. degree in artificial intelligence research from the
Stanford Univ., Stanford, CA, USA, 2014. Menéndez Pelayo International University, Madrid,
[17] V. J. Hodge, S. D. Michael, N. S. John, F. B. Oliver, P. C. Ivan, and Spain, 2020. He is currently working toward the Ph.D.
A. Drachen, “Win prediction in multi-player Esports: Live professional degree with the University of A Coruña, A Coruña,
match prediction,” IEEE Trans. Games, vol. 13, no. 4, pp. 368–379, Spain, and his thesis focuses on explainable artificial
Dec. 2021. intelligence.
[18] X. Lan, L. Duan, W. Chen, R. Qin, T. Nummenmaa, and J. Nummenmaa, He has worked in the private sector, in recent years
“A Player behavior model for predicting win-loss outcome in MOBA as chief operating officer.
games,” in Proc. Int. Conf. Adv. Data Mining Appl., 2018. pp. 474–488.
[19] A. A. Sánchez-ruiz and M. Miranda, “A machine learning approach to
predict the winner in StarCraft based on influence maps,” Entertainment Laura Morán-Fernández received the B.S. and
Comput., vol. 19, pp. 29–41, 2017. Ph.D. degrees in computer science from the Univer-
[20] M. M. Mamulpet, “Pubg winner placement prediction using artificial sity of A Coruña, A Coruña, Spain, 2015 and 2020,
neural network,” Int. J. Eng. Appl. Sci. Technol., vol. 3, no. 12, pp. 107–118, respectively.
2019. She is currently an Assistant Lecturer with the
[21] P. Xenopoulos, B. Coelho, and C. Silva, “Optimal team economic decisions Department of Computer Science and Information
in counter-strike,” 2021, arXiv:2109.12990. Technologies, University of A Coruña. She has coau-
[22] Y. N. Ravari, P. Spronck, R. Sifa, and A. Drachen, “Predicting victory in a thored three book chapters, and more than 15 research
hybrid online competitive game: The case of destiny,” in Proc. AAAI Conf. papers in international journals and conferences. Her
Artif. Intell. Interactive Digit. Entertainment, 2017, pp. 207–213. research interests include machine learning, feature
[23] Kaggle, “League of Legends - competitive matches 2014–2018.” [Online]. selection, and big data.
Available: https://www.kaggle.com/chuckephron/leagueoflegends
[24] O. Troyanskaya et al., “Missing value estimation methods for DNA
microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001.
[25] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A Verónica Bolón-Canedo received the B.S. and Ph.D.
review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., degrees in computer science from the University of A
vol. 35, no. 8, pp. 1798–1828, Aug. 2013. Coruña, A Coruña, Spain, in 2009 and 2014, respec-
[26] G. H. John, R. Kohavi, and K. Pfleger, “Irrelevant features and the tively.
subset selection problem,” in Proc. 11th Int. Mach. Learn. Conf., 1994, After a Postdoctoral Fellowship with the Univer-
pp. 121–129. sity of Manchester, Manchester, UK, in 2015, she is
[27] P. Jafari and F. Azuaje, “An assessment of recently published gene expres- currently an Associate Professor with the Department
sion data analyses: Reporting experimental design and statistical factors,” of Computer Science and Information Technologies,
BMC Med. Inform. Decis. Making, vol. 6, no. 1, 2006, Art. no. 27. University of A Coruña. She has authored or coau-
[28] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer thored extensively in the area of machine learning
classification using support vector machines,” Mach. Learn., vol. 46, and feature selection. She has coauthored two books,
no. 1–3, pp. 389–422, 2002. seven book chapters, and more than 80 research papers in international confer-
[29] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in ences and journals, on these topics.
Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016,
pp. 785–794.
Authorized licensed use limited to: University of North Carolina at Chapel Hill. Downloaded on January 26,2024 at 03:05:15 UTC from IEEE Xplore. Restrictions apply.

You might also like