Wide Open Spaces: A Statistical Technique For Measuring Space Creation in Professional Soccer
Wide Open Spaces: A Statistical Technique For Measuring Space Creation in Professional Soccer
Wide Open Spaces: A Statistical Technique For Measuring Space Creation in Professional Soccer
1 Introduction
Soccer analytics has long focused on the outcomes of discrete, on-ball events; however, much of the sport’s
complexity resides in off-ball events. In the words of Johan Cruyff: “it is statistically proven that players
actually have the ball 3 minutes on average. So, the most important thing is: what do you do during those
87 minutes when you do not have the ball? That is what determines whether you are a good player or
not.” The creation and closure of spaces is a recurrent subject in observation-based tactical analysis, yet it
remains highly unexplored from a quantitative perspective.
We present a method for quantifying spatial value occupation and generation during open play. Here
direct space occupation refers to space created for oneself, while space generation refers to opening up
space for teammates by attracting opponents out of position. We irst build a novel parametric pitch con-
trol model that incorporates motion information, relative distance to the ball, and player position in order
to provide a smooth surface of potential ball control. Through the mixture of all players’ control surfaces
we obtain a fuzzy degree of potential ball control at the team level in any given moment. We also con-
struct a model for the relative value of any pitch position, based on the position of the ball and using feed
forward neural networks. From all this (a player’s invested pitch zones, a team’s pitch control, and the
relative value of each zone), we employ the full spatio-temporal dynamics of each player to construct two
novel spatial value creation metrics, accounting for both occupation and generation of spaces.
Through the analysis of a irst division Spanish league match, we show a handful of approaches to bet-
ter understand a missing key factor for performance analysis in soccer: off-ball attacking dynamics. The
quanti ication of space occupation gain and space generation allows us to observe Sergio Busquets’ high
relevance during positional attacks through his pivoting skills, the dragging power of Luis Suarez to gen-
erate spaces for his teammates, and the capacity of Lionel Messi to occupy spaces of value with smooth
movements along the ield, among many other characteristics.
The level of detail we can reach with automated quantitative analysis of space dynamics is beyond
what can be reached through observational analysis. The capacity of evaluating space occupation and
generation opens the door for new research on off-ball dynamics that can be applied in speci ic matches
and situations, and directly integrated into coaches’ analysis. This information can be used not only to
better evaluate players’ contributions to their teams, but also to improve their positioning and movement
through coaching, providing a key competitive advantage in a complex and dynamic sport.
We have to pass the ball, yes, but with clear intention. Pass it to drag players to one side and
creating space in the opposite side. Then, move the ball there. That’s our game.
Occupying space on the ield is fundamentally about a player’s act of continually positioning himself
in an area of high value. The value of space can be de ined in terms of the relative position of the ball,
the closeness to the opponent’s goal, and more speci ically the level of ownership of space, regarding the
density of opponents within the given area. Furthermore, we can cluster the types of occupation of space
depending on the speed of the player. Speci ically we identify two types: active occupation, when the
player moves at running speed to earn the space, and passive occupation, when the player is below run-
ning speed (jogging or walking). For instance, if a player is closely marked and then runs towards a free
space faster than the opponent, he will obtain a gain on owned space through active space occupation. As
another example, if the player is walking towards a given area and nearby opponents move away from that
area, the player will be gaining space through passive occupation.
A more complex concept is that of space generation. We de ine the generation of space as the action of
dragging opponents out of certain areas to create new available space in previously covered areas. Specif-
ically, we identify situations where a player drags an opponent away from another teammate whom the
opponent was close to originally. The dragging concept is, at its simplest, creating space for a teammate
by pulling their defender towards oneself. Notice that unoccupied space could also be generated when
dragged players leave a clear area; however, we are not considering this case for this study. Similarly to
the Space Occupation Gain (SOG) concept above, later we also explore the concept of Space Generation
Gain (SGG). In this way, we separate out space created for oneself from space created for teammates, in
both a passive and active manner.
Figure 1 presents an example of both space occupation and space generation during an of icial Span-
ish irst division match. The three images show a process where Andrés Iniesta moves to clear up space
away from the ball and then attacks a high value space inside the box. When he moves to this space he
drags three defenders towards himeslf while also receiving a pass. The attraction of the three defenders
leaves open space for Lionel Messi, who in this newfound space receives a pass free of a mark, and subse-
quently sends a lob pass onwards for Suarez, who meanwhile was running towards the goal line in search
of space of value to score. A more detailed video example of occupation and generation of spaces can be
seen at the following link, where players are highlighted when adding space for themselves or teammates:
http://www.lukebornn.com/sloan/space_occupation_1.mp4
Before providing explicit details on how to calculate space occupation and generation, we irst need a
better notion of space ownership and value, as creating space for your own goalkeeper who is 80 meters
2018 Research Papers Competition
2 Presented by:
Figure 1: A game situation presenting both space occupation and generation. From left to right: in the
irst frame Iniesta moves back to occupy a space of value with higher control. In the second frame, Iniesta
observes an open space to attack. He moves towards the space, dragging three defenders. In the third
frame, the three dragged defenders leave an open space for Messi that can now receive the ball free of
mark, while Suarez runs towards the goal line enabling him to receive a pass.
behind the location of play is worth much less than creating space in high-threat areas nearer the ball and
goal. The next two sections present a novel pitch control model for evaluating space ownership, and a
dynamical model for space value according to ball and player position.
Based on this reasoning we propose de ining the player in luence area through a bivariate normal dis-
tribution, whose shape can be adjusted to account for the player’s location, velocity, and relative distance
to the ball. At any given location a degree of in luence or control can be queried through the distribution’s
probability density function.
Speci ically, the player’s in luence I at a given location p for a given player i at time t is de ined by a
bivariate normal distribution with mean µi (t) and covariance matrix Σi (t), given the player’s velocity ⃗s
and angle θ. For a given location in space p at time t, the probability density function of player i in luence
area is de ined by a standard multivariate normal distribution. The player’s in luence likelihood is then
de ined as the normalization of f at the given location p by the value of f at player’s current location pi (t),
as shown in Equation 1.
fi (p, t)
Ii (p, t) = (1)
fi (pi (t), t)
This formulation provides an initial model for obtaining a degree of in luence within a [0, 1] range for
any given location on the ield. The mean and covariance matrix can be dynamically adjusted to provide a
player dominance distribution that accounts for location and velocity. In Appendix A.1 we provide speci ic
details for this equation.
Figure 2 presents the player in luence area in two different situations regarding the player’s distance to
the ball and velocity. Here we can observe how depending on the distance to the ball the range of in luence
of the player varies. Also, the distribution of player in luence is reshaped to be oriented according to the
direction of movement and stretched in relation to the speed. If the player is in motion, the distribution
is translated so the higher level of in luence is near points where the player can reach faster, according to
his speed. This model can easily be expanded to handle player-speci ic movement characteristics, such as
acceleration and maximum speed.
where σ is the logistic function. Since the pitch control model follows the de inition of player in luence
area in Figure 2, the model is taking into account the location of the ball, the players’ velocities and the
location of all the players on the ield. Equation 2 is a simpli ied version of pitch control calculation based
on players’ in luence areas. Note we can include a constant within σ to add more lexibility, if desired.
Figure 3 presents the pitch control surface in a given situation of the match. At location (82, 8), near the
ball, it can be observed clearly how the yellow team’s high density provides lower level of control for the red
team near the ball. Also the velocity of the player in possession of the ball (red team) provides the red team
an advantage in the running direction. At location (80, 25), the red player is creating a positional advantage.
Meanwhile, at location (50, 30) the yellow player has minimal control of space because of the high density
provided by the three surrounding opponent players. For a single time frame, this pitch control model
provides a synthesis of player locations, player velocities and ball-relative positioning in one variable. Also,
by exploiting the dynamics of pitch control time, it becomes a versatile tool for evaluating multiple types
of spatio-temporal characteristics of the game such as the creation of positional advantages, the in luence
of density and pressure speed in defending situations, and the creation and generation of spaces.
Instead of de ining a priori a model for space valuation we would like to extract a sense of space value
from the spatio-temporal behaviour of players during multiple matches. For this we set the following
hypothesis: considering a suf iciently high number of situations, the defending team distributes itself
throughout the ield in a manner which covers high value spaces. Although it is clear that at any given
point defenders will deviate based on overloads, speci ic offensive player positioning, and other scenar-
ios, in general, most players will remain close to high value areas. An extreme example of this will be the
case where the attacking team places all players in the middle of ield. It is arguable that, although this
would impact the position of the defending team, they will most probably still keep players near the box
and their own goal. Note that similar ideas are used when identifying defensive matchups based on de-
fender locations in basketball [7].
Based on this, we propose learning the sum in luence that a defensive team would have in a given
location on the ield, given the location of the ball. Let Vk,l (t) be the value of location pk,l of the pitch at
time t, and let pb (t) be the location of the ball at time t, we want to learn a function f n with parameters θ
that values space as a function of of the ball,
(c) Pitch value for ball at the third quarter of (d) Pitch value for ball vertically centered in
the ield on top of the left lane the fourth quarter of the ield
Figure 4: Predicted pitch value in a [0,1] range for given ball location (white circle)
Defensive situations are found by selecting game situations where the opponent has possession of the
ball. Then, the sum of player in luence is found for every location (k, l) within a 21 by 15 grid, for every
defending player i. Situations are selected so they are separated in time by at least three seconds. Here
we employ Metrica Sports tracking-data of 20 matches of irst and second B Spanish division, consisting of
2.4 million examples. For learning the parameters we use a feed forward neural network with one hidden
layer using the adam optimization algorithm [8]. Speci ically, we aim to ind the optimal parameters θ∗ that
minimize the loss function L as presented in Equation 5. We selected mean square error as loss function
L and sigmoid function as the activation function f .
1∑
n
L(θ) = arg min L(ye , f (xe , θ)) (5)
θ n
e=1
We found the best model through a 10-fold cross-validation process. In order to obtain a valuation of a
ield location for a given ball location, we now query the learned model. Figure 4 shows three different
ball position scenarios and the obtained ield valuation. This model has learned that nearby locations to
the ball have increasing value for a certain range, while understanding effectively how to translate this
value depending on ball position. The model still lacks from the natural intuition that space generated
at the higher valued locations of the irst quarter of the ield should not have an identical valuation than
those of higher valued locations at the last quarter. In other words, the cumulative value of space is higher
when further up the ield, closer to the opponent’s goal. In order to adapt to this intuitive thinking we
2018 Research Papers Competition
8 Presented by:
(a) Distance to goal pitch value normalization (b) Normalized pitch value for ball vertically
surface centered at the irst quarter of the ield
(c) Normalized pitch value for ball at the cen- (d) Normalized pitch value for ball vertically
ter of ield centered the fourt quarter of the ield
Figure 5: Predicted pitch value in a [0,1] range for given ball location (white circle) normalized by a dis-
tance to goal model
normalize the obtained pitch value by the distance to the goal of every location normalized on a [0, 1]
range. Figure 5 presents the normalization surface and three different pitch value situations, where the
results still adapt to ball location but show a more consistent valuation of the pitch which adjusts for the
threat of the ball location, according to expert analysts. We see that when one’s own goalkeeper has the
ball, the overall value of space is limited, but when in the opponent’s box, space is much more valuable
alongside the looming threat of a shot on goal.
(a) Pitch control surface (b) Pitch value based on ball position (c) Value of the owned space as prod-
uct of pitch control and ield value
Figure 6: Pitch control, ield value and value of owned space for attacking team in red, for attacking di-
rection left to right
SGi,i′ (t) = ∃j (di′ ,j (t) ≤ δ) ∧ (di,j (t + w) ≤ δ) ∧ (di′ ,j (t + w) > δ) ∧ (di,j (t + w) − di,j (t) > α) (10)
Once we can identify when a space generation behaviour is occurring, we would like to focus on the
cases in which we actually have a gain in space due to the dragging effect. Analogously to the SOG de ini-
tion, we express the Space Generation Gain (SGG) as space generation situations where the gain is above
a threshold ϵ, as presented in Equation 11.
{
Gj (t) if SGi,j (t) ∧ Gj (t) ≥ ϵ
SGGij (t) = (11)
0 otherwise
Essentially, we are attributing space gain to a player when a defender leaves his mark and moves to-
wards a teammate, subject to the conditions that the defender was close to the player and ended close to
the teammate during a time window. It is important to clarify that while SOG and SGG represent two
frequent and relevant cases of space gain within soccer, other types of situations and movements might
contribute as well to the total space created by a player during a match. An additional possible concept is
that of potential space, referring to a space that the player is more likely to reach, within his positioning,
but not in his immediate in luence area. We will now focus on analyzing SOG and SGG within a match
context.
6 Match Analysis
The ability to create and occupy spaces are two commonly trained concepts in modern soccer. During
training, coaches interrupt and reshape individual drills to teach players how to orient and move toward
spaces and away from low value local zones on the ield. When analyzing off-ball performance, coaches
appeal to video analysis. Although elite soccer analysis staff typically have a great capacity to understand
complex concepts through match visualization, the dynamics of space creation are so frequent and hap-
pen in such short time windows, that it becomes impractical for video analysts to grasp them all, even for
2018 Research Papers Competition
11 Presented by:
a single match. However, is important to note that there is no existance of ground truth data regarding
the quanti ication of spaces in soccer. Hence we have performed an extensive validation of the devel-
oped concepts through video and studying individual situations within games, with the help of two expert
soccer video analysts from F.C. Barcelona, in order to ine-tune our quantitative approach. The follow-
ing videos are examples of the video-based validation tool we have used: http://www.lukebornn.com/
sloan/space_occupation_1.mp4, http://www.lukebornn.com/sloan/space_occupation_2.mp4
Based on this, we provide a complete summary of off-ball movement statistics for a speci ic Spanish
irst division of icial match between F.C. Barcelona and Villareal F.C. in January 2017. Speci ically, we pro-
vide an analysis focused on the concepts of space occupation and space generation, using Metrica Sports
optical tracking data. This match ended with a 1-1 result, where the irst goal was scored by Villareal
F.C. at the 49th minute (second half), and the F.C. Barcelona equalizer came at the 90th minute by Lionel
Messi. Situationally, this presents a game where F.C. Barcelona was in need of scoring during the inal
minutes, and were required to occupy and generate the most spaces possible to reach scoring chances. In
order to identify space occupation and generation actions we calculate for the attacking situations of F.C.
Barcelona all the instances where a player had controlled possession of the ball with his feet. From each
of those situations, and alongside expert football analysts from F.C. Barcelona, we de ine a window w of
three seconds after each of these cases, reaching a total of 845 different situations. The closeness factor δ
is set to 5 meters, based on the minimum distance an opponent is on average to a player in possession of
the ball. We also set the minimum attraction distance for space generation α to 3 meters.
Table 1 presents the space occupation statistics for F.C. Barcelona, sorted in descending order by the
total amount of Space Occupation Gain (SOG). At irst glance it can be seen that over 41% of gain of space
occupation was performed by Iniesta, Sergio Busquets and Lionel Messi. Notably, these three players oc-
cupy different positions and have different roles within the team. Busquets is a pivot and has a speci ic
role of helping to drive the ball with controlled possession during build-ups, and to accompany the game
creation during positional attacks. Iniesta is an attacking mid ielder with great control of the ball, and
special skills in moving and inding spaces between lines. Messi is an attacker but not attached to a spe-
ci ic position, and is allowed to cover wide areas of the pitch to ind space and request the ball. The three
players share, however, a long-time tradition of possession-centered and off-ball movements quality dur-
ing their career. Suarez and Neymar, two highly mobile players, appear with a lower count of situations
where space was gained. This can be associated with the high level of strictly closed marking these players
suffered during the match.
It is interesting to observe that for most players the active occupation of spaces is considerably more
frequent than passive occupation. This is particularly noticeable on left and right backs Digne and Sergi
Roberto, who need to cover wider spaces and show a high mean distance to ball for SOG, a characteris-
tic shared by central defenders Pique and Mascherano. A remarkable case is that of Lionel Messi, whose
passive SOG is considerably higher than the active one. The passive characteristic of SOG does not mean
the player is not occupying the space intentionally, but rather that he is not moving at running speed, but
slower. Much has been argued in recent years about several moments during matches where Messi walks
through zones of the ield. However, that walking behaviour is not a detachment from the match but a
conscious action to move through empty spaces of value and claim the control of valuable space, and ul-
timately the ball. Messi does this very effectively, placing him near the top of players in terms of space
gained during the whole match, despite the lack of active gain. A relevant characteristic of this is that 71%
of the time the gain in space is done in front of the ball rather than behind. The in front and behind the ball
Table 1: Statistics of ∑
space occupation for F.C. Barcelona in an of icial Spanish League match against Vil-
lareal F.C. Symbols #, and µ represent the total, sum and mean of their associated variable. SOG refers
to Space Occupation Gain, while FRT and BEH indicate the amount of times SOG occurs in front or behind
the ball. MBD represents the mean ball distance, and Active (%) and Passive (%) the player percentage of
times the space was occupied through active or passive occupation.
statistics show a clear tendency for central defenders to gain space behind the ball, while attackers show
a higher rate of space gain in front of the ball. Noticeably Busquests, Iniesta and the right and left backs
(Digne and S. Roberto) have a balanced ratio of space gain behind and in front of the ball.
Table 2 presents the statistics for Space Occupation Loss (SOL) and Space Generation Gain (SGG). The
SOL statistics show a clear tendency of higher space loss for players that are more often in possession of
the ball such as Iniesta, Messi, Neymar and Suarez. The space loss can be directly associated with pressure
by the opponent, who tends to increase density near to attacking players to reduce their range of action,
especially for highly skilled players. Regarding the generation of space, we obtain a different picture from
the space occupation skills. Here, Neymar and Suarez appear to be, alongside Messi, the players that most
often drag opponents to create space. With a 4-3-3 system and high-quality players, a speci ic attacking
strategy is that of spreading out attacking players to drag defenders out of position and provide wider
spaces for attacking action. Busquets, a pivoting specialist, appears also at the top of the table showing his
value in supporting space creation. Notably the left and right back, Digne and S. Roberto do not generate
much space. Given that they move towards the border lines of the ield, it is less likely that opponents are
dragged by back defenders.
A more detailed perspective of space generators and receivers is presented in Figure 7. Here we can
observe the amount of times generators are producing space for receivers, and discover some collaborative
playing behaviour. First to observe is that Busquets receives space from most of the players at least once,
possibly showing his ability to stay at the center of play. A renowned skill of F.C. Barcelona is the third-
man pass, which consists of the following: if a player A wants to pass to player C, but is marked, he passes
to player B, dragging the opponents toward him, enabling C to receive the ball in more space. This plot
might show a third-man behaviour through Busquets. Notably, Suarez, Neymar and Messi generate space
commonly for each other, especially Suarez who provides considerable space to both. A special connection
between Suarez and Messi is also shown for this game, where both were able to generate a high amount
of space for each other.
A further vision of space gain and generation can be grasped from Figure 9. Here we present the spa-
tial heatmap for SOG and SGG situations. At irst glance we observe the amount of space gained through
occupation is considerably higher than through generation, a more complex process. Iniesta presents an
interesting case where he can generate more space next to the left border line of the ield, while he is bet-
ter at gaining spaces for himself at the interior of the ield. Also, he produces a notable amount of space
near the box. Busquets shows an incredible collaborative behaviour by generating space almost anywhere
around the ield. He also presents wide areas of SOG, but more intensively near the mid ield, his natural
habitat. Suarez presents a notable ability to generate space within the box, where he concentrates most
of his generating contribution. Here he arises as a specialist in dragging defenders either while making
spaces for himself or while generating spaces for others. Messi also shows a great ability in generating
spaces around the attacking zones of the ield, while Neymar concentrates on the left wing, focused on
high speed diagonal runs towards the box. Defenders, as expected, show very little generation of space.
7 Discussion
In a sport where the average possession of the ball by a player is 3 minutes in a 90 minute game, the analy-
sis of team-collective dynamics through off-ball movements becomes a critical element for understanding
performance. We have shown how through spatio-temporal data it is possible to extract meaningful in-
formation relating the occupation of spaces of value and the generation of spaces for teammates. Beyond
the bigger picture that overall performance statistics of multiple matches can provide, the understanding
of off-ball movements demands the need for a more specialized per-match or even per-situation analysis.
Through the understanding of the frequencies, quality, position and effectiveness of space occupation and
generation, a coach can provide speci ic guidance to players to help the team playing dynamics beyond
what he can do with the ball.
2018 Research Papers Competition
14 Presented by:
Figure 7: A heatmap showing the total times space was generated by generators (y-axis) for receivers
(x-axis)
In order to provide a deeper understanding of space, we have presented two novel approaches for
pitch control and pitch value modelling. Our pitch control model takes into account critical factors when
understanding the dominance of space such as the velocity and position of the player. It also provides a
key element that was missing in previous dominant region models: the idea of a soft surface of control
where for a given location on the ield, nearby players have a certain level of in luence, instead of de ining
strict dominance margins such as in Voronoi-based models. On the other hand, the proposed pitch value
model presents a way of quantifying the value of every location on the ield in a dynamic way, relative to
the location of the ball. This way, we can account for both the control of space a team has and the value of
that space, to obtain a measure of spatial value controlled.
For future studies, the proposed pitch control and ield models can be directly applied for reaching
more comprehensive pass probability and reward models, and in general to incorporate a new perspective
on dominant regions based approaches for understanding team sports. But more generally, this study sets
a base for new research on off-ball behaviour in soccer. New perspectives are still to be studied, such as
the effect of different pressure strategies, the concept of potential space and how it could be exploited, the
overall dynamic balance of space control between the two teams and its association to performance, as
well as many other research lines that address a critical question when training to succeed in soccer: what
should I do when my teammate is in possession of the ball.
[2] Robert Rein, Dominik Raabe, Jürgen Perl, and Daniel Memmert. Evaluation of changes in space con-
trol due to passing behavior in elite soccer using voronoi-cells. In Proceedings of the 10th Interna-
tional Symposium on Computer Science in Sports (ISCSS), pages 179–183. Springer, 2016.
[3] Michael Horton, Joachim Gudmundsson, Sanjay Chawla, and Joël Estephan. Classi ication of passes
in football matches using spatiotemporal data. arXiv preprint arXiv:1407.5093, 2014.
[4] R Masheswaran, Y Chang, Jeff Su, Sheldon Kwok, Tal Levy, Adam Wexler, and Noel Hollingsworth. The
three dimensions of rebounding. MIT SSAC, 2014.
[5] Tsuyoshi Taki, Jun-ichi Hasegawa, and Teruo Fukumura. Development of motion analysis system
for quantitative evaluation of teamwork in soccer games. In Image Processing, 1996. Proceedings.,
International Conference on, volume 3, pages 815–818. IEEE, 1996.
[6] Dan Cervone, Luke Bornn, and Kirk Goldsberry. NBA court realty. MIT Sloan Sports Analytics Con-
ference, Boston, MA, USA, 2016.
[7] Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry. Counterpoints: Advanced de-
fensive metrics for NBA basketball. In 9th Annual MIT Sloan Sports Analytics Conference, Boston, MA,
2015.
[8] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[9] Angel Ric, Carlota Torrents, Bruno Gonçalves, Lorena Torres-Ronda, Jaime Sampaio, and Robert Hris-
tovski. Dynamics of tactical behaviour in association football when manipulating players’ space of
interaction. PloS one, 12(7):e0180773, 2017.
Σ = V LV −1 (14)
Σ = RSSR−1 (15)
[ ]
cos(θ) −sin(θ)
R= (16)
sin(θ) cos(θ)
[ ]
sx 0
S= (17)
0 sy
In order to ind the scaling factors, we take into account both the player’s magnitude of speed Si (t) (as
meters per second), and the distance to the ball Di (t). Based on the opinion of expert soccer analysts we
have de ined the range [4, 10] as the minimum and maximum distance in meters of player’s pitch control
surface radius Ri (t), based on the distance to the ball, following the transformation function shown at
Figure 9. Setting 13m/s as the maximum possible speed reachable, we calculate the ratio between players
and the maximum speed, as shown in Equation 18. Then, the scaling matrix is expanded in x direction and
contracted in y direction by this factor, as expressed in Equation 19. Given this, we can express a function
COV for obtaining the covariance matrix as shown in Equation 21. Finally, the distribution mean value
µi (t) is found by translating the players location at time t by half the magnitude of speed vector ⃗s, following
Equation 21.
s2
Srati (s) = (18)
132