Abstract and Figures: The Open International Soccer Database For Machine Learning
Abstract and Figures: The Open International Soccer Database For Machine Learning
The sample database represents some of the data storage and retrieval about a
soccer tournament based on EURO CUP Series. You might love football, and
for all the football lovers we are providing a detail information about a football
tournament. This design of database will make it easier to understand the
various questions comes in your mind about a soccer tournament.
One of the earliest studies on soccer analysis concluded that chance dominates
the game, which makes outcome prediction very difficult. Despite the relatively
simple rules and objectives governing soccer, predicting the outcome of a
soccer game is difficult. One aspect that makes soccer so popular and
unpredictable is that goals are relatively rare and the margin of victory for the
winning team is relatively low for most matches.
Another reason why predicting the outcome of a soccer game is difficult is that
goals and other game-changing circumstances (e.g., red cards, injuries,
penalties) often do not occur as a result of superior or inferior play by one team,
but are due to difficult-to-capture events, such as poor refereeing, unfortunate
deflections or bounces of the ball, weather or ground conditions, or fraudulent
match manipulation. Also, factors like political upheaval in the club’s
management, behavior of spectators, and media pressure can influence the
outcome of matches, but such events are rarely captured in databases.
To date, relatively few studies have investigated machine learning methods for
soccer outcome prediction. We speculate that one reason is the lack of readily
available open soccer data. Here, we present the Open International Soccer
Database to bridge this gap. The Database contains the most commonly and
freely available as well as consistently reported information about the outcome
of a league soccer match. This information concerns the goals scored by each
team, teams involved, league, season, and date on which the match was played.
While goals are arguably the most important match events, the drawback of
such basic data is that it lacks more “sophisticated” outcome-relevant
information, such as fouls committed, yellow and red cards, corners conceded
by each team, or data about players, teams and clubs.
Note, however, that legislation, such as the UK Data Protection Bill and the
General Data Protection Regulation by the European Union, puts legal
constraints to the disclosure of full names of players or coaches in publicly
available databases. In sports, biometric or health data could be highly sensitive
In contrast to more sophisticated data, the beauty of simple match data is that it
can be easily understood and analyzed by any machine learning researcher, just
like the famous Iris data set. But although the data is simple to understand, it
does not mean that the scope of possible analysis is limited on the contrary, as
the special issue Machine Learning for Soccer shows, the data set provides
considerable analytical challenges.
Researchers are welcome to freely use the Database to develop and test their
own strategies, methods, and tools.
However, the major motivation for developing the Open International Soccer
Database was not to provide yet-another benchmark data set for the machine
learning community, but to build a knowledge base that can be used for the
prediction of real-world soccer matches.
Related work
soccer_country
soccer_city
soccer_venue
soccer_team
playing_position
player_mast
referee_mast
match_mast
coach_mast
asst_referee_mast
match_details
goal_details
penalty_shootout
player_booked
player_in_out
match_captain
team_coaches
penalty_gk
soccer_country:
country_id – this is a unique ID for each country
country_abbr – this is the sort name of each country
country_name – this is the name of each country
soccer_city:
city_id – this is a unique ID for each city
city – this is the name of the city
country_id – this is the ID of the country where the cities are located and
only those countries will be available which are in soccer_country table
soccer_venue:
venue_id – this is a unique ID for each venue
venue_name – this is the name of the venue
city_id – this is the ID of the city where the venue is located and only
those cities will be available which are in the soccer_city table
aud_capicity – this is the capacity of audience for each venue
soccer_team:
team_id – this is the ID for each team. Each teams are representing to a
country which are referencing the country_id column of soccer_country
table
team_group – the name of the group in which the team belongs
match_played – how many matches a team played in group stage
playing_position:
position_id – this is a unique ID for each position where a player played
position_desc – this is the name of the position where a player played
player_mast:
player_id – this is a unique ID for each player
team_id – this is the team where a player played, and only those teams
which referencing the country_id column of the table soccer_country
jersey_no – the number which labeled on the jersey for each player
player_name – name of the player
posi_to_play – the position where a player played, and the positions are
referencing the position_id column of playing_position table
dt_of_bir – date of birth of each player
age – approximate age at the time of playing the tournament
playing_club – the name of the club for which a player was playing at the
time of the tournament
referee_mast:
referee_id – this is the unique ID for each referee
referee_name – name of the referee
country_id – the country, where a referee belongs and the countries are
those which referencing the country_id column of soccer_country table
match_mast:
coach_mast:
coach_id – this is the unique ID for a coach
coach_name – this is the name of the coach
asst_referee_mast:
ass_ref_id – this is the unique ID for each referee assists the main referee
ass_ref_name – name of the assistant referee
country_id – the country where an assistant referee belongs and the
countries are those which are referencing the country_id column of
soccer_country table
match_details:
goal_details:
goal_id – this is the unique ID for each goal
match_no – this is match_no which is referencing the match_no column
of match_mast table
player_id - this is the ID of a player who is selected for the 23 men squad
of a team for the tournament and which is referencing the player_id
column of player_mast table
team_id – this is the ID of each team who are playing in the tournament
and referencing the country_id column of soccer_country table
goal_time – this is the time when the goal scored
goal_type – this is the type of goal which came in normally indicated by
N or own goal indicating by O and goal came from penalty indicated by P
play_stage – this is the play stage in which goal scored, indicated by G
for group stage, R for round of 16 stage, Q for quarter final stage, S for
semifinal stage and F for final match
goal_schedule – when the goal came, is it normal play session indicated
by NT or in stoppage time indicated by ST or in extra time indicated by
ET
goal_half – in which half of match goal came
player_booked:
match_no - this is the match_no which is referencing the match_no
column of match_mast table
team_id – this is the ID of each team who are playing in the tournament
and referencing the country_id column of soccer_country table
player_id - this is the ID of a player who is selected for the 23 men squad
of a team for the tournament and which is referencing the player_id
column of player_mast table
booking_time – this is the time when a player booked
sent_off – this is the flag Y when a player sent off
play_schedule – when a player booked, is it in normal play session
indicated by NT or in stoppage time indicated by ST or in extra time
indicated by ET
play_half – in which half a player booked
player_in_out:
match_no - this is the match_no which is referencing the match_no
column of match_mast table
team_id – this is the ID of each team who are playing in the tournament
and referencing the country_id column of soccer_country table
match_captain:
match_no - this is the match_no which is referencing the match_no
column of match_mast table
team_id – this is the ID of each team who are playing in the tournament
and referencing the country_id column of soccer_country table
player_captain - the player who represents as a captain for a team, is
referencing the player_id column of player_mast table
team_coaches:
team_id – this is the ID of a team who is playing in the tournament and
referencing the country_id column of soccer_country table
coach_id – a team may be one or more coaches, this indicates the
coach(s) who is/are coaching the team is referencing the coach_id column
of coach_mast table
penalty_gk:
match_no - this is the match_no which is referencing the match_no
column of match_mast table
team_id – this is the ID of each team who are playing in the tournament
and referencing the country_id column of soccer_country table
player_gk - the player who kept goal at the time of penalty shootout, is
referencing the player_id column of player_mast table
1) Write a query in SQL to find the number of venues for EURO cup 2016.
2) Write a query in SQL to find the number countries participated in the EURO
cup 2016.
3) Write a query in SQL to find the number goals scored in EURO cup 2016
within normal play schedule.
4) Write a query in SQL to find the number of matches ended with a result.
5) Write a query in SQL to find the number of matches ended with draws.
6) Write a query in SQL to find the date when did Football EURO cup 2016
begin.
7) Write a query in SQL to find the number of goal scored in every match within
normal play schedule.
8) Write a query in SQL to find the match no, date of play, and goal scored for
that match in which no stoppage time have been added in 1st half of play.
9) Write a query in SQL to find the number of matches ending with a goalless
draw in group stage of play.
10) Write a query in SQL to find the number of booking happened in each
half of play within normal play schedule.
11) Write a query in SQL to find the name of the venue with city where the
EURO cup 2016 final match was played.
12) Write a query in SQL to find the number of goal scored by each team in
every match within normal play schedule.
13) Write a query in SQL to find the total number of goals scored by each
player within normal play schedule and arrange the result set according to the
heighest to lowest scorer.