COMP34120 - AI & Games Sem 2
COMP34120 - AI & Games Sem 2
1. Length of the report: 3-10 pages or 1000-3000 words but it is no problem at all if your
report goes beyond this guided length.
2. Main contents of your long report could be:
● Your approaches:
○ when you present each of your approaches, please tell us the basic idea and
related technical details. For example, if you need to learn the follower’s
reaction function, you need to give the formula of the reaction function such
as a linear function or a polynomial, and then the corresponding learning
algorithm (which can be in different forms such as the formula of calculating
the model parameters, the pseudo codes, or a flow diagram, dependent on
your preference);
○ your approaches could include one or more from our lecture notes (for
example as a basic approach before you propose your extended or new
approaches) and/or any new approach you like to propose ;
○ As the goal of the project is to maximise the profits, your approaches need to
include the pricing optimisation part rather than just the learning part.
● Your reasons and justifications of your approaches:
○ Different from submitting codes whose implemented approaches will be
justified by the empirical results from running the codes, the long report needs
to give your motivations/justifications/ reasons behind your proposed
approaches. In the case that some of your approaches is only a heuristic
approach, a simple explanation of the thinking behind will be useful and
helpful.
● Your system design:
○ As your proposed approaches are potential solutions, the system design can
be included to tell whether these approaches can be realised by a piece of
codes, and how the different methods in your approaches such as leaning of
the reaction functions and pricing optimisation can be linked together. Again
there are different forms to present your designs. For example, a high level
component diagram to show how different parts can be linked together to
realise your approach, or a detailed class diagram to show how different
objects and their support methods are need to realise your approach.
- SGDRegression
- SVM
- LSTM
- GA
- GANs
Basic idea
Overview:
Regression
1 variable SGDRegression
The most basic approach that comes to mind is running a SGDRegression on the 1 variable
leader choices mapped to the follower’s reaction. The purpose for this would be to learn the
Reaction function for the follower. Once we know this, we could try to maximize the profits.
To maximize the profits, we have chosen to go with a very greedy approach: maximising the
daily profits, e.g. choosing a leader price that would maximise the profits for the current day.
Swapping this in the above demand and profit functions, we get the following profit function:
In this case, we notice that choosing our price_l to be 1.8241 gives us 0.6482 daily profits,
which sum up to 19.446 over the last 30 days.
In our assumption, that the follower is based only on the current leader price, the above
price_l will be constant throughout the game.
However, most often this assumption will not hold true. Most followers will have more
complex functions for choosing today’s price, probably taking into account some inputs from
the past days as well.
2 variable SGDRegression
The obvious next step is to add one more variable and see how the model behaves. In this
case, we will try to find:
We notice a better “trend following”. Obviously, the more variables we add, and the more we
train the system (in this case on 1 000 000 iteration with a tol of 1e-11), the more we run the
risk of overfitting the solution. In this basic scenario, where the follower does not change his
strategy, overfitting is not that bad.
As above, we would like to solve our demand/profit equation for the new function found.
The weights we found for the first follower are:
A = 0.69144258
B = -0.17855508
C = 0.5251335
N Variable Regression
However, let’s move onto the first dataset, which is less obvious to be mapping to a
trendline.
For this, we will take the polynomial equation through 1, 2, 7, and finally 21 dimensionalities.
Pretty easy to realise what is happening. As we increase the dimensionality, we are able to
better map the peaks and irregularities in the dataset. Again, we run into the overfitting
issue.
LSTM
Another way to approach this project is through LSTMs.
A LSTM model is the first idea that comes to mind when discussing anything to do with
extracting information from varied batches. At its basis, it is an RNN that deals with the long
term dependency issue. Intuitively, it acts by keeping track of which previous information is
more relevant to the current data. The benefit of using a type of RNN is in its backfeeding
connections. Unlike a regular feedforward neural network, an RNN allows training and
extracting information not only from fixed chunks of data, but sequences as well.
The above represents the follower’s actual price, and our prediction. We notice a decent
trend line following in both the training set (orange) and the test phase (green).
# define model
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
We also used a 90-10 split for training and testing over the 100 days of data available, over
1000 epochs and with a batch size of 1.
return x
score += calculate_profit_per_day(x, y)
return score
This means that each player from the initial set will have (at the end of a generation) a score
based on which we decide if we kill it or not (or, for a pg version, if we discard this player)
In this way, GA-s help us by getting rid of the extra higher polynomial solver we would have
had to do for the optimum solution. Of course, approximate optimum since we would still
have to predict (estimate, approximate, find, guess) the follower’s reaction function.
Ga-s help with all of this by simply discarding the players that do not have the required
genes for staying on top when calculating the profits over the last 10 days (our test dataset).
return players
3. Mutating the survivors: increases the diversity of the population and provides a
mechanism for escaping from a local optimum
def mutate_survivors(players):
print(f"INFO: Mutating survivors..")
mutated_players = []
mutation_gravity_size = len(MUTATION_GRAVITIES)
mutated_player["genes"][gene_index] += random_mutation
mutated_player["genes"] = mutated_player["genes"]
mutated_players.append(mutated_player)
return mutated_players
4. Crossing the survivors: the crossover works in a subspace, and the converged
solutions/states will remain converged
def crossover_survivors(players):
print(f"INFO: Crossing survivors..")
crossed_players = []
players_size = len(players)
player = players[player_index]
partner = players[partner_index]
crossed_player = create_player()
crossed_players.append(crossed_player)
return crossed_players
populate_world()
survivors = copy.deepcopy(players)
players.extend(mutate_survivors(survivors))
players.extend(crossover_survivors(survivors))
players = sorted(players, key = lambda player: player['score'], reverse =
True)
print()
for player in players[:10]:
pretty_print(player)
You will notice a couple of functions that are not explained above. Those are mainly util
functions we have created to help us with starting/stopping and resuming the simulation from
a previous stage.
//TODO: Add charts: bar chart with overall players, plot chart with highest score converging,
etc
The above approach has achieved a score over the last 10 days of 5.6953 for 1000
generations with the following parameters:
NUMBER_GENES = 90
MAX_POPULATION = 1000
SURVIVAL_RATE = 0.2
MUTATION_TIMES = 1
CROSSOVER_TIMES = 1
NUMBER_GENERATIONS = 1000
Basically, we are keeping the top 20%, crossing them once (20% of max_population), and
mutating them once (20% of max_populating)
This means that up to 40% will hold newly randomized genes to help with escaping local
minimas.
[{
"uuid": "ab82618e-7b70-4780-a449-03df769fea6f", "genes":
[-0.7792198507229642, 0.30721460597480005, -0.2598543130783896,
0.9145952334492288, 0.5328275743414359, -0.159232316288603,
-0.13487283973192565, 0.012163721257282966, 0.7583669102412975,
-0.49570248580450876, 0.1423512986446792, -0.838728844189889,
-1.0086698494189392, -0.04004392479425422, -0.12070829415443518,
-0.5274102150322444, 0.02464044840181804, -0.2852798954086946,
-0.17428951820470162, 0.7893986096018852, -0.7655762281481764,
0.7020015533814917, 0.08821538873021101, -0.5591705835009875,
0.5774038980224234, 0.7340864488795638, -0.5925931668045539,
-0.37564234717515044, -0.3968089736424338, -0.20671495197408946,
-0.4835483838185092, 0.1398175029856883, -0.9108301742819769,
0.9413969195436646, -0.7401476706343166, -0.8721351445125972,
-0.004809706999197347, -0.0739477939830757, 0.7295979180051123,
0.25330284185246293, -0.4238206274229064, -0.906817880548792,
0.25544828545294535, -0.7348019191989479, -0.1561664861317732,
0.6559245314292919, 0.9469850007309208, 0.3318650775845497,
0.8174930804455225, -0.7378128249590794, 0.09815230376647233,
-0.4299203901629354, -0.6053968912905773, -0.18246285480556265,
-0.0012482969240358742, 0.19082315384771759, -0.6989928296496571,
-0.7687836457029888, 0.28867978747388795, 0.5491425130956895,
-0.3475365178249934, 0.22734154026156456, 0.40350684857518127,
0.6712017218771866, 0.2741565082704225, -0.12765780175238928,
0.5557726500359381, 0.7555399045878639, 0.6635898377468185,
0.4814239471610991, 0.8068524114477722, 0.8580817382689889,
0.6630226657345927, 0.38372183494765544, 0.2693872334910478,
0.7999966093592159, 0.7445311725166657, -0.9272863101553838,
0.5578125511016296, -0.06130940823444128, -0.7633021331584189,
-0.5725003538089127, 0.09674564081818846, 0.7555328097905035,
0.3645596332472994, -0.8823951788641474, -0.21528186643075894,
-0.6573034685773174, 0.2052230669301289, -0.42094652606410976], "score":
5.695346044666315}
}, {
"uuid": "d5dcb82b-50b7-41f0-91aa-4b0b1c4099d6", "genes":
[-0.7792198507229642, 0.3061116627205185, -0.25937635397280384,
0.9167647063456378, 0.5330174678278715, -0.15926752920347528,
-0.13536574217222241, 0.012484743050605642, 0.7593792176079834,
-0.499990660188219, 0.140097889190513, -0.8386734954957673,
-1.0086698494189392, -0.04004392479425422, -0.1212313660800253,
-0.5331098729168029, 0.025598903472925445, -0.28620293787183676,
-0.173909268600903, 0.7893986096018852, -0.7675979214515911,
0.698243382509277, 0.08831827813287244, -0.5593177270368913,
0.5774038980224234, 0.7340864488795638, -0.5903264347974543,
-0.3751673287438869, -0.3968089736424338, -0.20671495197408946,
-0.4835483838185092, 0.1398175029856883, -0.9108301742819769,
0.9411958522739917, -0.7389214040676966, -0.8712414878840817,
-0.004912320634089207, -0.0739477939830757, 0.7295979180051123,
0.2519192374309078, -0.4241077498312483, -0.9078608953950568,
0.25423297118685534, -0.7348019191989479, -0.15506294116367156,
0.6561277191034333, 0.9461754130335408, 0.3318650775845497,
0.8174930804455225, -0.7378128249590794, 0.0987964084912825,
-0.4299325556953197, -0.6074539754679977, -0.1791793528574648,
0.002440283743862133, 0.19285448298458885, -0.6955306322895036,
-0.771135637881789, 0.28867978747388795, 0.5491425130956895,
-0.3478876295954686, 0.22699650724210357, 0.40339244196036583,
0.6712017218771866, 0.2754596037988818, -0.1289573793019463,
0.5557701264121518, 0.7550626696221143, 0.663135832564517,
0.48310757201344107, 0.806892759583171, 0.8582257148271366,
0.6630226657345927, 0.3859643758329698, 0.27308141142035497,
0.7957979380296368, 0.7445311725166657, -0.9272863101553838,
0.556141789909618, -0.06130940823444128, -0.7628845511085585,
-0.5708262865785914, 0.09748690461995492, 0.7558971137880808,
0.3639693678377868, -0.8774240986975811, -0.2147864723297817,
-0.6611438173086644, 0.20580567542922637, -0.42051518790354686],
"score": 5.6953459370043165
}]