Build Order Optimization in Starcraft.: January 2011
Build Order Optimization in Starcraft.: January 2011
net/publication/220978332
CITATIONS READS
69 1,038
2 authors, including:
David Churchill
Memorial University of Newfoundland
13 PUBLICATIONS 668 CITATIONS
SEE PROFILE
All content following this page was uploaded by David Churchill on 21 March 2016.
Abstract bases. The order in which units and structures are produced
is called a build order. RTS games are usually won by play-
In recent years, real-time strategy (RTS) games have gained
ers who destroy opponents’ structures first. This goal can be
interest in the AI research community for their multitude of
challenging subproblems — such as collaborative pathfind- accomplished in various ways. For instance, one could try
ing, effective resource allocation and unit targeting, to name to surprise (“rush”) the opponent by investing resources into
a few. In this paper we consider the build order problem in attack forces early in the game at the cost of delaying the
RTS games in which we need to find concurrent action se- construction of structures that are important in later game
quences that, constrained by unit dependencies and resource stages. If the opponent, on the other hand, invests in tech-
availability, create a certain number of units and structures in nological development and disregards defense, the rushing
the shortest possible time span. We present abstractions and player will win easily. Thus, at the highest adversarial strat-
heuristics that speed up the search for approximative solu- egy level, the choice of initial build orders often decides the
tions considerably in the game of StarCraft, and show the ef- game outcome. Therefore, like in chess, aspiring players
ficacy of our method by comparing its real-time performance
need to study and practice executing build orders and tai-
with that of professional StarCraft players.
lor them to specific opponents. Avoiding the interesting and
ambitious task of selecting good build order goals, in this
Introduction paper we assume that they are given to us. Because act-
Automated planning, i.e. finding a sequence of actions lead- ing fast is very important in RTS games due to the fact that
ing from a start to a goal state, is a central problem in artifi- players act asynchronously, what remains is finding action
cial intelligence research with many real-world applications. sequences that accomplish the given goal while minimizing
For instance, the satisfiability problem can be considered a the plan duration (makespan). This process is called build
planning problem (we need to assign values to n Boolean order optimization.
variables so that a given formula evaluates to true) with ap- The research on this subject that is reported here was mo-
plications to circuit verification, solving Rubik’s cube from a tivated by the goal of creating strong AI systems for the
given start state is a challenging pastime, and building sub- popular RTS game of StarCraft and the frustration of hard-
marines when done inefficiently can easily squander hun- coding build orders for them. In the remainder of this paper
dreds of millions of dollars. Most interesting planning prob- we first give a brief overview of related work on build order
lems in general are hard to solve algorithmically. Some, like optimization and our application domain StarCraft. Then we
the halting problem for Turing machines, are even undecid- describe our search algorithm, the underlying assumptions,
able. and the abstractions we use. In the following experimental
In this paper we consider a class of planning problems that section we gauge the performance of our planner by com-
arises in a popular video game genre called real-time strat- paring its build orders with those executed by professional
egy (RTS) games. In these games, which can be succinctly players. We finish the paper by conclusions and suggestions
described as war simulations, players instruct units in real- for future work.
time to gather resources, to build other units and structures,
to scout enemy locations, and to eventually destroy oppo- Background
nents’ units to win the game. In the opening phase of RTS The build order optimization problem can be described as a
games players usually don’t interact with each other because constraint resource allocation problem with makespan min-
their starting locations are spread over large maps and player imization, which features concurrent actions. Because of
visibility is limited to small regions around friendly units or their practical relevance, problems of this kind have been
structures. The main sub-goals in this game phase are to es- the subject of study for many years, predominantly in the
tablish a sufficient income flow by producing workers that area of operations research.
gather resources, to quickly build structures that are pre- (Buro and Kovarsky 2007) motivates research on build or-
requisites for other structures or can produce combat units der problems in the context of RTS games and proposes a
to build a minimal force for defense or early attack, and to way of modeling them in PDDL, the language used in the
send out scouts to explore the terrain and search for enemy automated planning competitions. In (Kovarsky and Buro
2006) the issue of concurrent execution is studied in general and gas which are consumed by the player throughout the
and efficient action ordering mechanisms are described for game. Producing additional worker units early in the game
the RTS game build order domain. increases resource income and is typically how most profes-
Existing techniques for build order planning in the RTS sional players start their build orders. Once a suitable level
game domain have focused mainly on the game Wargus (an of income has been reached, players begin the construction
open source clone of Warcraft 2), which is much simpler of additional structures and units which grow their military
than StarCraft due to the limited number of possible ac- strength. Each unit has a set of prerequisite resources and
tions and lower resource gathering complexity. Several of units which the player must obtain before beginning their
these techniques rely heavily on means-end analysis (MEA) construction. The graph obtained by listing all unit prereq-
scheduling. Given an initial state and a goal, MEA produces uisites and eliminating transitive edges is called a tech tree.
a satisficing plan which is minimal in the number of actions Due to the complex balance of resource collection and unit
taken. MEA runs in linear time w.r.t. the number of actions prerequisite construction, finding good build orders is a dif-
in a plan, so it is quite fast, but the makespans it produces ficult task, a skill often taking professional players years to
are often much longer than optimal. develop.
(Chan et al. 2007b) employ MEA to generate build or-
der plans, followed by a heuristic rescheduling phase which A Build Order Planning Model for StarCraft
attempts to shorten the overall makespan. While they pro- Build order planning in RTS games is concerned with find-
duce satisficing plans quite quickly, the plans are not op- ing a sequence of actions which satisfies a goal with the
timal due to the complex nature of the rescheduling prob- shortest makespan. It is our goal to use domain specific
lem. In some cases they are able to beat makespans gen- knowledge to limit both the branching factor as well as depth
erated by human players, but do not mention the relative of search while maintaining optimality, resulting in a search
skill level of these players. This technique is extended in algorithm which can run in real-time in a StarCraft playing
(Chan et al. 2007a) by incorporating best-first search in an agent.
attempt to reduce makespans further by solving intermedi- In StarCraft, a player is limited to a finite number of re-
ate goals. They admit that their search algorithm is lacking sources which they must both collect and produce through-
many optimizations, and their results show that this is not out the game. All consumables (minerals, gas) as well as
only slower than their previous work but still cannot produce units (workers, fighters, buildings) are considered resources
significantly better solutions. (Branquinho and Lopes 2010) for the purpose of search. An action in our search is one
extend further on these ideas by combining two new tech- which requires some type of resource, while producing an-
niques called MeaPop (MEA with partial order planning) other (combat actions are out of our scope). Resources
and Search and Learning A* (SLA*). These new results which are used by actions can be of the forms Require, Bor-
improve on the makespans generated by MEA, but require row, Consume, and Produce (Branquinho and Lopes 2010).
much more time to compute, bringing it outside the range of Required resources, which are called prerequisites, are the
real-time search. They are currently investigating ways of ones which must be present at the time of issuing an action.
improving the run-time of SLA*. A borrowed resource is one which is required, used for the
These techniques however are only being applied to War- duration of an action, and returned once the action is com-
gus, with goals consisting of at most 5 types of resources. pleted. A consumed resource is one which is required, and
Interesting plans in StarCraft may involve multiple instances used up immediately upon issue. A produced resource is one
of up to 15 different units in a single goal and requiring far which is created upon completion of the action.
more workers, increasing complexity dramatically. Each action a has the form a = (δ, r, b, c, p), with du-
ration δ (measured in game simulation frames), three sets
StarCraft of preconditions r (required), b (borrowed), c (consumed),
RTS games are interesting application domains for AI re- and one set of produced items p. For example, in the Star-
searchers, because state spaces are huge, actions are con- Craft domain, the action a = “Produce Protoss Dragoon”
current, and part of the game state is hidden from players has δ = 600, r = {Cybernetics-Core}, b = {Gateway},
— and yet, human players still play much better than ma- c = {125 minerals, 50 gas, 2 supply}, p = {1 Dragoon}.
chines. To spark researchers’ interest in this game domain, States then take the form S = (t, R, P, I), where t is the
a series of RTS game AI competitions have been organized current game time (measured in frames), vector R holds the
in the past 6 years. In 2006-2009 a free software RTS game state of each resource available (ex: 2 barracks available,
engine was used (ORTS 2010), but since the advent of the one currently borrowed until time X), vector P holds actions
BWAPI library (BWAPI 2011), the competition focus has in progress but are not yet completed (ex: supply depot will
switched to StarCraft (by Blizzard Entertainment), the most finish at time X), and vector I holds worker income data
popular RTS game in the world with over 10 million copies (ex: 8 gathering minerals, 3 gathering gas). Unlike some
sold. StarCraft has received over 50 game industry awards, implementations such as (Branquinho and Lopes 2010), I is
including over 20 “game of the year” awards. Some pro- necessary due to abstractions made to facilitate search.
fessional players have reached celebrity status, and prize
money for tournaments total in the millions of dollars an- Abstractions
nually. Without having access to the StarCraft game engine source
As in most RTS games, each player starts with a num- code, it was necessary to write a simulator to compute
ber of worker units which gather resources such as minerals state transitions. Several abstractions were made in order
2
to greatly reduce the complexity of the simulation and the Algorithm 1 Depth-First Branch & Bound
search space, while maintaining close to StarCraft-optimal Require: goal G, state S, time limit t, bound b
results. Note that any future use of the term ’optimal’ or 1: procedure DFBB(S)
’optimality’ refers to optimality within these abstractions: 2: if TimeElapsed ≥ t then
We abstract mineral and gas resource gathering by real 3: return
valued income rates of 0.045 minerals per worker per frame 4: end if
and 0.07 gas per worker per frame. These values have been 5: if S satisfies G then
determined empirically by analyzing professional games. In 6: b ← min(b, St ) . update bound
reality, resource gathering is a process in which workers 7: bestSolution ← solutionPath(S)
spend a set amount of time gathering resources before re- 8: else
turning them to a base. Although we fixed income rates in 9: while S has more children do
our experiments, they could be easily estimated during the 10: S 0 ← S.nextChild
game. This abstraction greatly increases the speed of state 11: S 0 .parent ← S
transition and resource look-ahead calculations. It also elim- 12: h ← eval(S 0 ) . heuristic evaluation
inates the need for “gather resource” type actions which typ- 13: if St0 + h < b then
ically dominate the complexity of build order optimization. 14: DFBB(S 0 )
Due to this abstraction, we now consider minerals and gas 15: end if
to be a special type of resource, whose “income level” data 16: end while
is stored in state component I. 17: end if
Once a refinery location has been built, a set number of 18: end procedure
workers (3 in our experiments) will be sent to gather gas
from it. This abstraction eliminates the need for worker
re-assignment and greatly reduces search space, but in rare
cases is not “truly” optimal for a given goal. Fast Forwarding and State Transition
Whenever a building is constructed, a constant of 4 sec- In general, RTS games allow the user to take no action at
onds (96 simulation frames) is added to the game state’s any given state, resulting in a new state which increases the
time component. This is to simulate the time required for internal game clock, possibly increasing resources and com-
a worker unit to move to a suitable building location within pleting actions. This is problematic for efficient search al-
an arbitrary environment, since individual map data is not gorithms since it means that all actions (including the null
used in our search, but again could be estimated during the action) must be taken into consideration in each state of the
game. game. This results in a search depth which is linear not in
the number of actions taken, but in the makespan of our
Algorithm solution, which is often quite high. In order to solve this
We use a depth-first branch and bound algorithm to perform problem, we have implemented a fast-forwarding simulation
build order search. The algorithm takes a starting state S technique which eliminates the need for null actions.
as input and performs a depth-first recursive search on the In StarCraft, the time-optimal build order for any goal is
descendants of S in order to find a state which satisfies a one in which actions are executed as soon as they are legal,
given goal G. This algorithm has the advantage of using since hoarding resources cannot reduce the total makespan.
a linear amount of memory with respect to the maximum Although resource hoarding can be a vital strategy in late-
search depth. Since this is an any-time algorithm we can game combat, it is outside the scope of our planner. Let us
halt the search at any point and return the best solution so define the following functions:
far, which is an important feature for real-time applications. S 0 ←Sim(S, δ) - Simulate the natural progression of a
StarCraft game from a state S through δ time steps given
Action Legality that no other actions are issued, resulting in a new state S 0 .
In order to generate the children of a state, we must deter- This simulation includes the gathering of resources (given
mine which actions are legal in this state. Intuitively, an ac- our economic abstraction) and the completion of durative
tion is legal in state S if the simulation of the game starting actions which have already been issued.
in time will eventually produce all required resources with- δ ←When(S, R) - Takes a state S and a set of resource
out issuing any further actions. Given our abstractions, an requirements R and returns the earliest time δ for which
action is therefore legal in state S if and only if the following Sim(S, δ) will contain R. This function is typically called
conditions hold: 1) The prerequisites required or resources with action prerequisites to determine when the required re-
borrowed are either currently available, or being created. sources for an action a will be ready.
Example: a Barracks is under construction, so fighter units S 0 ←Do(S, a) - Issue action a in state S assuming all re-
will be trainable without any other actions being issued. 2) quired resources are available. The issuing of the action in-
The consumed resources required by the action are either volves subtracting the consume resources, updating actions
currently available or will be available at some point in the in progress and flagging borrowed resources in use. The re-
future without any other actions being taken. Example: we sulting state S 0 is the state for which action a has just been
do not currently meet the amount of minerals required, how- issued and has its full duration remaining.
ever our workers will eventually gather the required amount
(assuming there is a worker gathering minerals). S 0 = Do(Sim(S, When(S, a)), a)
3
now defines our state transition function which returns the 500
K=1
state S 0 for which action a has been issued. 480 K=2
460
Concurrent Actions and Action Subset Selection
makespan (seconds)
440
A defining feature of RTS games is the ability to perform
concurrent actions. For example, if a player has a suffi- 420
cient amount of resources they may begin the concurrent 400
construction of several buildings as well as the training of
380
several units. In a general setting, this may cause an action-
space explosion because an super-exponential number of 360
possible actions sequences has to be considered. Even in 340
the common video game setting in which a game server se- 1 2 3 4 5 6 7 8 9
quentializes incoming concurrent player actions, it can be log(number of nodes expanded) [base 10]
4
A) CPU time statistics for search without macro actions:
Algorithm 2 Compare Build Order 1 1
0.9 0.9
Require: BuildOrder B, TimeLimit t, Increment Time i 0.8 0.8
1: procedure C OMPARE B UILD O RDER(B,t,i) 0.7 0.7
0.6 density 0.6 density
2: S ← Initial StarCraft State 0.5 distribution 0.5 distribution
3: SearchPlan ← DFBB(S,GetGoal(B, 0, ∞),t) 0.4
75th perc.: 0.08%
0.4
75th perc.: 13.06%
0.3 0.3
4: if SearchPlan.timeElapsed ≤ t then 0.2
90th perc.: 1.50%
0.2
90th perc.: 17.86%
5: return MakeSpan(SearchPlan) / MakeSpan(B) 0.1 0.1
0 0
6: else 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
7: inc ← i opt(120) CPU time / makespan (%) opt(120) CPU time / makespan (%)
8: SearchPlan ← ∅
9: while inc ≤ MakeSpan(B) do B) CPU time statistics for search with macro actions:
10: IncPlan ← DFBB(S,GetGoal(B,inc-i,inc),t) 1 1
11: if IncPlan.timeElapsed ≥ t then 0.9 0.9
0.8 0.8
12: return failure 0.7 0.7
13: else 0.6 density 0.6 density
0.5 distribution 0.5 distribution
14: SearchPlan.append(IncPlan) 0.4 0.4
75th perc.: 0.01% 75th perc.: 8.18%
15: S ← S.execute(IncPlan) 0.3
90th perc.: 0.02%
0.3
90th perc.: 9.99%
0.2 0.2
16: inc ← inc + i 0.1 0.1
17: end if 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
18: end while
app(120) CPU time / makespan (%) app(120) CPU time / makespan (%)
19: return MakeSpan(SearchPlan) / MakeSpan(B)
20: end if
Figure 2: CPU time statistics for search without (A), and
21: end procedure
with (B) macro actions at 120s increments. Shown are den-
sities and cumulative distributions of CPU time/makespan
ratios in % and percentiles for professional game data points
Stork, Kal, and White-Ra. The remaining replays were taken with player makespans 0..249s (left) and 250..500s (right).
from high level tournaments such as World Cyber Games. E.g. the top-left graph indicates that 90% of the time, the
runtime is only 1.5% of the makespan, i.e. 98.5% of the CPU
The BWAPI StarCraft programming interface was used
time in the early game can be used for other tasks.
to analyze and extract the actions performed by the profes-
sional players. Every 500 frames (21s) the build order im-
plemented by the player (from the start of the game) was have changed their mind or re-planned at various stages of
extracted and written to a file. Build orders were contin- their build order. It is however the best possible comparison
ually extracted until either 10000 frames (7m) had passed, without having access to a professional player to implement
or until one of the player’s units had died. A total of 520 build orders during the experiment.
unique build orders were extract this way. We would like to Figures 2 (time statistics) and 3 (makespan statistics) dis-
have used more data for further confidence, however the pro- play the results of these experiments, from which we can
cess of finding quality replays and manually extracting the conclude our planner produces build orders with comparable
data was quite time consuming. Though our planner is ca- makespans while consuming few CPU resources. Results
pable of planning from any state of the game, the beginning for 60s incremental search were similar to 120s (with less
stages were chosen as it was too difficult to extract meaning- CPU usage) and were omitted for space. Results grouped by
ful build orders from later points in the game due to the on- makespan to show effects of more complex searches.
going combat. To extract goals from professional build or-
ders, we construct a function GetGoal(B,ts ,te ) which given Use in StarCraft Playing Agents
a professional build order sequence B, a start time ts and Our planner (with macro actions) was incorporated into our
an end time te computes a goal which contains all resources StarCraft playing agent (name removed, written in C++ with
produced by actions issued in B between ts and te . BWAPI) which was previously a participant the 2010 AIIDE
Tests were performed on each build order with the method StarCraft AI Competition. When given expert knowledge
described in Algorithm 2 with both optimal (opt) and macro goals, the agent was capable of planning to the goal in real
action (app) search. First with t = 60s and i = 15s, sec- time, executing the build order, and subsequently defeating
ond with t = 120s and i = 30s. This incremental tactic is some amateur level players, as well as the built-in StarCraft
believed to be similar in nature to how professionals re-plan computer AI. The specific results are omitted since for this
at various stages of play, however it is impossible be certain paper we are not concerned with the strength of the overall
without access to professionally labeled data sets (for which agent, but with showing that our build order planning sys-
none exist). We claim that build orders produced by this tem works in a real world competitive setting, something no
system are “real-time” or “online” since they consume far existing method has accomplished.
less CPU time than the durations of the makespans they pro-
duce. Agents can implement the current increment while it Conclusion and Future Work
plans the next. It should be noted that this experiment is in- In this paper we have presented heuristics and abstractions
deed biased against the professional player, since they may that reduce the search effort for solving build order problems
5
A) Planning without macro actions, 120s increments, plan quality relative to professional player makespans [opt(120)]
opt(120) makespan / pro plan makespan
distribution(x)
distribution(x)
0.2 0.7 0.2 0.7
density(x)
density(x)
75th perc.: 0.97 0.6 75th perc.: 1.04 0.6
1.1
0.15 90th perc.: 1.00 0.5 0.15 90th perc.: 1.09 0.5
1
0.4 0.4
0.1 0.1
0.9 0.3 0.3
0.05 0.2 0.05 0.2
0.8 0.1 0.1
0.7 0 0 0 0
0 50 100 150 200 250 300 350 400 450 500 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
pro plan makespan (seconds) opt(120) makespan / pro makespan opt(120) makespan / pro makespan
B) Planning with macro actions, 120s increments, plan quality relative to professional player plan makespans [app(120)]
app(120) makespan / pro plan makespan
distribution(x)
distribution(x)
0.2 0.7 0.2 0.7
density(x)
density(x)
1.1 75th perc.: 1.00 0.6 75th perc.: 1.03 0.6
1.05 0.15 90th perc.: 1.00 0.5 0.15 90th perc.: 1.08 0.5
1 0.4 0.4
0.1 0.1
0.95 0.3 0.3
0.9 0.05 0.2 0.05 0.2
0.85 0.1 0.1
0.8 0 0 0 0
0 50 100 150 200 250 300 350 400 450 500 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
pro plan makespan (seconds) app(120) makespan / pro makespan app(120) makespan / pro makespan
Figure 3: Makespan statistics for search without (A) and with (B) macro actions. Goals extracted by looking ahead 120s
relative to professional player plan makespans. Shown are scatter plots of the makespan ratios (left), ratio densities, cumulative
distributions, and percentiles for early game scenarios (pro makespan 0..249s, center) and early-mid game scenarios (250..500s).
E.g. the top-middle graph indicates that 90% of the time, our planner produces makespans that match those of professionals.