0% found this document useful (0 votes)
83 views14 pages

COMP34120 - AI & Games Sem 2

The document provides guidelines for a project report on approaches to maximize profits in a duopoly market. It recommends including 1) details of the proposed approaches, 2) justifications for the approaches, and 3) a system design to link the different parts of the approaches. The approaches could learn the follower's reaction function using regression algorithms and optimize pricing. The report length should be 3-10 pages or 1000-3000 words.

Uploaded by

Jordas Andrei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views14 pages

COMP34120 - AI & Games Sem 2

The document provides guidelines for a project report on approaches to maximize profits in a duopoly market. It recommends including 1) details of the proposed approaches, 2) justifications for the approaches, and 3) a system design to link the different parts of the approaches. The approaches could learn the follower's reaction function using regression algorithms and optimize pricing. The report length should be 3-10 pages or 1000-3000 words.

Uploaded by

Jordas Andrei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

COMP34120: Project 2

1. Length of the report: 3-10 pages or 1000-3000 words but it is no problem at all if your
report goes beyond this guided length.
2. Main contents of your long report could be:
● Your approaches:
○ when you present each of your approaches, please tell us the basic idea and
related technical details. For example, if you need to learn the follower’s
reaction function, you need to give the formula of the reaction function such
as a linear function or a polynomial, and then the corresponding learning
algorithm (which can be in different forms such as the formula of calculating
the model parameters, the pseudo codes, or a flow diagram, dependent on
your preference);
○ your approaches could include one or more from our lecture notes (for
example as a basic approach before you propose your extended or new
approaches) and/or any new approach you like to propose ;
○ As the goal of the project is to maximise the profits, your approaches need to
include the pricing optimisation part rather than just the learning part.
● Your reasons and justifications of your approaches:
○ Different from submitting codes whose implemented approaches will be
justified by the empirical results from running the codes, the long report needs
to give your motivations/justifications/ reasons behind your proposed
approaches. In the case that some of your approaches is only a heuristic
approach, a simple explanation of the thinking behind will be useful and
helpful.
● Your system design:
○ As your proposed approaches are potential solutions, the system design can
be included to tell whether these approaches can be realised by a piece of
codes, and how the different methods in your approaches such as leaning of
the reaction functions and pricing optimisation can be linked together. Again
there are different forms to present your designs. For example, a high level
component diagram to show how different parts can be linked together to
realise your approach, or a detailed class diagram to show how different
objects and their support methods are need to realise your approach.

- SGDRegression
- SVM
- LSTM
- GA
- GANs
Basic idea

Detail approaches + Justifications

Follower’s reaction function & learning approach

Pricing optimisation vs Learning

Overview:

We have a demand function:

demand_l(price_l, price_f) = 2 - price_l + 0.3 * price_f

We have a profit function:

profit_l = (price_l - cost_l) * demand_l(price_l, price_follower)

Learning the follower’s reaction function

Regression

1 variable SGDRegression
The most basic approach that comes to mind is running a SGDRegression on the 1 variable
leader choices mapped to the follower’s reaction. The purpose for this would be to learn the
Reaction function for the follower. Once we know this, we could try to maximize the profits.

To maximize the profits, we have chosen to go with a very greedy approach: maximising the
daily profits, e.g. choosing a leader price that would maximise the profits for the current day.

The charts generated on Follower 1, 2, and 3 look something like this:


Knowing that what we are trying to do is learn the response function for the follower, the
above will approximate a (for follower 3):

f(price_l) = price_f, of the form (a * x + b), where x is the price_l

Swapping this in the above demand and profit functions, we get the following profit function:

profit(price_l) = (0.3a - 1) * profit_l^2 + (-0.3a + 0.3b + 3) * profit_l + (-2 - 0.3b)

In this case, we notice that choosing our price_l to be 1.8241 gives us 0.6482 daily profits,
which sum up to 19.446 over the last 30 days.

In our assumption, that the follower is based only on the current leader price, the above
price_l will be constant throughout the game.
However, most often this assumption will not hold true. Most followers will have more
complex functions for choosing today’s price, probably taking into account some inputs from
the past days as well.

2 variable SGDRegression
The obvious next step is to add one more variable and see how the model behaves. In this
case, we will try to find:

f(price_l) = a * price_l^2 + b * price_l + c, to substitute the price_f


Assuming that the follower takes into account the difference with the previous leader price,
we get the following charts:

We notice a better “trend following”. Obviously, the more variables we add, and the more we
train the system (in this case on 1 000 000 iteration with a tol of 1e-11), the more we run the
risk of overfitting the solution. In this basic scenario, where the follower does not change his
strategy, overfitting is not that bad.

As above, we would like to solve our demand/profit equation for the new function found.
The weights we found for the first follower are:

A = 0.69144258
B = -0.17855508
C = 0.5251335

Which gives us a daily profit of 0.7109 for price_l of 1.9343

N Variable Regression
However, let’s move onto the first dataset, which is less obvious to be mapping to a
trendline.

For this, we will take the polynomial equation through 1, 2, 7, and finally 21 dimensionalities.
Pretty easy to realise what is happening. As we increase the dimensionality, we are able to
better map the peaks and irregularities in the dataset. Again, we run into the overfitting
issue.

LSTM
Another way to approach this project is through LSTMs.
A LSTM model is the first idea that comes to mind when discussing anything to do with
extracting information from varied batches. At its basis, it is an RNN that deals with the long
term dependency issue. Intuitively, it acts by keeping track of which previous information is
more relevant to the current data. The benefit of using a type of RNN is in its backfeeding
connections. Unlike a regular feedforward neural network, an RNN allows training and
extracting information not only from fixed chunks of data, but sequences as well.
The above represents the follower’s actual price, and our prediction. We notice a decent
trend line following in both the training set (orange) and the test phase (green).

Our architecture was pretty simplistic:

# define model
model = Sequential()
model.​add​(LSTM(4, input_shape=(1, look_back)))
model.​add​(Dense(1))
model.compile(​loss​=​'mean_squared_error'​, ​optimizer​=​'adam'​)

We also used a 90-10 split for training and testing over the 100 days of data available, over
1000 epochs and with a batch size of 1.

GAs: Genetic Algorithms


We gotta be honest, ga-s are a sweetheart of ours. The idea is pretty straightforward.
Since we can not run our algo against the platform due to some missing class error, we will
consider the first 90 days of our training data, and validate with the remaining 10. This
means that we will be trying to maximize the profits for the last 10 days of the dataset, while
using only the first 90 points for our training.
Of course, after the first day, we will be moving our window size over, since 1 more data
point has been explored.

By considering the previous 90 days, we simply mean spawning a random selection of 90


genes, and assigning each one to the corresponding day, hence our find_profit_l function
becomes:

def​ ​find_x​(​genes​, ​day​):


x = ​0.0

​for​ i ​in​ ​range​(NUMBER_GENES):


x += genes[i] * trainingX[i + day - NUMBER_GENES]

​return​ x

And our score function, and profits per day:

def​ ​calculate_profit_per_day​(​x​, ​y​):


​return​ (x - ​1.0​) * (​2.0​ - x + ​0.3​ * y)
def​ ​score_player​(​player​):
score = ​0.0

​for​ i ​in​ ​range​(​10​):


x = find_x(player[​'genes'​], NUMBER_GENES + i)
y = trainingY[NUMBER_GENES + i]

score += calculate_profit_per_day(x, y)

​return​ score

This means that each player from the initial set will have (at the end of a generation) a score
based on which we decide if we kill it or not (or, for a pg version, if we discard this player)

In this way, GA-s help us by getting rid of the extra higher polynomial solver we would have
had to do for the optimum solution. Of course, approximate optimum since we would still
have to predict (estimate, approximate, find, guess) the follower’s reaction function.

Ga-s help with all of this by simply discarding the players that do not have the required
genes for staying on top when calculating the profits over the last 10 days (our test dataset).

As a quick note, Genetic Algorithms have 4 main parts:


1. Populating the world:
def​ ​populate_world​():
​print​(​"INFO: Seeding.."​)

seeds_needed = MAX_POPULATION - ​len​(players)


​for​ i ​in​ ​range​(seeds_needed):
players.append(create_player())

​print​(​f​"INFO: ​\t\t​Added ​{​ seeds_needed ​}​ seed players to the


world"​)
2. Killing the underperformers:
def​ ​kill_the_masses​(​players​):
​print​(​f​"INFO: Killing the masses.."​)
players = players[​0​ : ​int​(MAX_POPULATION * SURVIVAL_RATE)]

​print​(​f​"INFO: ​\t\t​Keeping top ​{​ SURVIVAL_RATE ​}​ of the population.


Remaining: ​{​ ​len​(players) ​}​"​)

​return​ players
3. Mutating the survivors: increases the diversity of the population and provides a
mechanism for escaping from a local optimum
def​ ​mutate_survivors​(​players​):
​print​(​f​"INFO: Mutating survivors.."​)

mutated_players = []
mutation_gravity_size = ​len​(MUTATION_GRAVITIES)

​for​ mutation_pass ​in​ ​range​(MUTATION_TIMES):


​for​ player ​in​ players:
mutated_player = create_player()
mutated_player[​"genes"​] = copy.deepcopy(player[​"genes"​])

​for​ gene_index ​in​ ​range​(NUMBER_GENES):


mutation_gravity_index = randint(​0​,
mutation_gravity_size - ​1​)
random_mutation = (random() - ​0.5​) *
MUTATION_GRAVITIES[mutation_gravity_index]

mutated_player[​"genes"​][gene_index] += random_mutation

mutated_player[​"genes"​] = mutated_player[​"genes"​]
mutated_players.append(mutated_player)

​print​(​f​"INFO: ​\t\t​Mutated new players ​{​ ​len​(mutated_players) ​}​"​)

​return​ mutated_players

4. Crossing the survivors: the crossover works in a subspace, and the converged
solutions/states will remain converged
def​ ​crossover_survivors​(​players​):
​print​(​f​"INFO: Crossing survivors.."​)

crossed_players = []
players_size = ​len​(players)

​for​ crossover_pass ​in​ ​range​(CROSSOVER_TIMES):


​for​ player_index ​in​ ​range​(players_size):
partner_index = player_index

​while​ partner_index == player_index:


partner_index = randint(​0​, players_size - ​1​)

player = players[player_index]
partner = players[partner_index]

crossed_player = create_player()

​for​ gene_index ​in​ ​range​(NUMBER_GENES):


crossed_player[​'genes'​][gene_index] = (partner,
player)[random() < ​0.5​][​'genes'​][gene_index]

crossed_players.append(crossed_player)

​print​(​f​"INFO: ​\t\t​Crossed new players ​{​ ​len​(crossed_players) ​}​"​)

​return​ crossed_players

And, of course, the actual implementation of the game:


for​ generation_number ​in​ ​range​(NUMBER_GENERATIONS):
​print​()
​print​(​f​"Generation ​{​ generation_number ​}​ started.."​)

populate_world()

​for​ i ​in​ ​range​(MAX_POPULATION):


players[i][​'score'​] = score_player(players[i])

players = ​sorted​(players, ​key​ = ​lambda​ ​player​: player[​'score'​], ​reverse​ =


True​)
players = kill_the_masses(players)

survivors = copy.deepcopy(players)

save_survivors(players, generation_number + ​1​)

players.extend(mutate_survivors(survivors))
players.extend(crossover_survivors(survivors))
players = ​sorted​(players, ​key​ = ​lambda​ ​player​: player[​'score'​], ​reverse​ =
True​)

​print​()
​for​ player ​in​ players[:​10​]:
pretty_print(player)

You will notice a couple of functions that are not explained above. Those are mainly util
functions we have created to help us with starting/stopping and resuming the simulation from
a previous stage.

//TODO: Add charts: bar chart with overall players, plot chart with highest score converging,
etc

The above approach has achieved a score over the last 10 days of 5.6953 for 1000
generations with the following parameters:

NUMBER_GENES = ​90

MAX_POPULATION = ​1000
SURVIVAL_RATE = ​0.2
MUTATION_TIMES = ​1
CROSSOVER_TIMES = ​1
NUMBER_GENERATIONS = ​1000

MUTATION_GRAVITIES = [​0.001​, ​0.001​, ​0.001​, ​0.001​, ​0.001​, ​0.001​, ​0.001​, ​0.001​,


0.001​, ​0.001​, ​0.001​, ​0.01​, ​0.01​, ​0.01​, ​0.1​]

Basically, we are keeping the top 20%, crossing them once (20% of max_population), and
mutating them once (20% of max_populating)

This means that up to 40% will hold newly randomized genes to help with escaping local
minimas.

An example of the logs from the running algorithm:

Generation 834 started..


INFO: Seeding..
INFO: Added 400 seed players t ​ o​ the world
INFO: Killing the masses..
INFO: Keeping top 0.2 ​of​ the population. Remaining: 200
INFO: Mutating survivors..
INFO: Mutated new players 200
INFO: Crossing survivors..
INFO: Crossed new players 200

{​'uuid'​: ​'94a538a5-5bc0-4040-aef6-690580706fc3'​, ​'score'​:


5.695345278976607}
{​'uuid'​: ​'dddebfa9-6b73-4a1e-a8e3-66efd75d2ad7'​, ​'score'​:
5.6953447023642285}
{​'uuid'​: ​'e93f9a90-b184-4283-8c6a-a4db5d6bb758'​, ​'score'​:
5.695344544052022}
{​'uuid'​: ​'1a52fbc4-cfdc-4306-b762-680b3b9c6495'​, ​'score'​:
5.6953427293005365}
{​'uuid'​: ​'1c88f3c5-b16e-4a88-9bb0-66f8adf37c77'​, ​'score'​:
5.695342574292074}
{​'uuid'​: ​'ce3b075d-42f6-43d3-b5b2-09f8baaf3611'​, ​'score'​:
5.695342557820693}
{​'uuid'​: ​'5c3e2012-9eeb-4da3-8c0c-4ab291791086'​, ​'score'​:
5.695342293484843}
{​'uuid'​: ​'f74936a9-a53c-4c8e-b16c-70c0ea074676'​, ​'score'​:
5.6953422857821}
{​'uuid'​: ​'53b105fe-9e15-4dc3-a100-13ea5c570cfa'​, ​'score'​:
5.695342275427277}
{​'uuid'​: ​'f38c459d-f61a-4d73-8f31-fc74913c9d32'​, ​'score'​:
5.695342255814049}

Generation 835 started..


INFO: Seeding..
INFO: Added 400 seed players ​to​ the world
INFO: Killing the masses..
INFO: Keeping top 0.2 ​of​ the population. Remaining: 200
INFO: Mutating survivors..
INFO: Mutated new players 200
INFO: Crossing survivors..
INFO: Crossed new players 200

{​'uuid'​: ​'94a538a5-5bc0-4040-aef6-690580706fc3'​, ​'score'​:


5.695345278976607}
{​'uuid'​: ​'dddebfa9-6b73-4a1e-a8e3-66efd75d2ad7'​, ​'score'​:
5.6953447023642285}
{​'uuid'​: ​'dd21de63-45f5-401e-897c-5a34c7faa16c'​, ​'score'​:
5.695344653630906}
{​'uuid'​: ​'e93f9a90-b184-4283-8c6a-a4db5d6bb758'​, ​'score'​:
5.695344544052022}
{​'uuid'​: ​'d0927de6-2f25-4451-b05d-a206e06ea13b'​, ​'score'​:
5.695343625884887}
{​'uuid'​: ​'1a52fbc4-cfdc-4306-b762-680b3b9c6495'​, ​'score'​:
5.6953427293005365}
{​'uuid'​: ​'1c88f3c5-b16e-4a88-9bb0-66f8adf37c77'​, ​'score'​:
5.695342574292074}
{​'uuid'​: ​'ce3b075d-42f6-43d3-b5b2-09f8baaf3611'​, ​'score'​:
5.695342557820693}
{​'uuid'​: ​'5c3e2012-9eeb-4da3-8c0c-4ab291791086'​, ​'score'​:
5.695342293484843}
{​'uuid'​: ​'f74936a9-a53c-4c8e-b16c-70c0ea074676'​, ​'score'​:
5.6953422857821}

Generation 836 started..


INFO: Seeding..
INFO: Added 400 seed players ​to​ the world

And a sample of the survivors:

[{
"uuid": ​"ab82618e-7b70-4780-a449-03df769fea6f"​, "genes":
[-0.7792198507229642, 0.30721460597480005, -0.2598543130783896,
0.9145952334492288, 0.5328275743414359, -0.159232316288603,
-0.13487283973192565, 0.012163721257282966, 0.7583669102412975,
-0.49570248580450876, 0.1423512986446792, -0.838728844189889,
-1.0086698494189392, -0.04004392479425422, -0.12070829415443518,
-0.5274102150322444, 0.02464044840181804, -0.2852798954086946,
-0.17428951820470162, 0.7893986096018852, -0.7655762281481764,
0.7020015533814917, 0.08821538873021101, -0.5591705835009875,
0.5774038980224234, 0.7340864488795638, -0.5925931668045539,
-0.37564234717515044, -0.3968089736424338, -0.20671495197408946,
-0.4835483838185092, 0.1398175029856883, -0.9108301742819769,
0.9413969195436646, -0.7401476706343166, -0.8721351445125972,
-0.004809706999197347, -0.0739477939830757, 0.7295979180051123,
0.25330284185246293, -0.4238206274229064, -0.906817880548792,
0.25544828545294535, -0.7348019191989479, -0.1561664861317732,
0.6559245314292919, 0.9469850007309208, 0.3318650775845497,
0.8174930804455225, -0.7378128249590794, 0.09815230376647233,
-0.4299203901629354, -0.6053968912905773, -0.18246285480556265,
-0.0012482969240358742, 0.19082315384771759, -0.6989928296496571,
-0.7687836457029888, 0.28867978747388795, 0.5491425130956895,
-0.3475365178249934, 0.22734154026156456, 0.40350684857518127,
0.6712017218771866, 0.2741565082704225, -0.12765780175238928,
0.5557726500359381, 0.7555399045878639, 0.6635898377468185,
0.4814239471610991, 0.8068524114477722, 0.8580817382689889,
0.6630226657345927, 0.38372183494765544, 0.2693872334910478,
0.7999966093592159, 0.7445311725166657, -0.9272863101553838,
0.5578125511016296, -0.06130940823444128, -0.7633021331584189,
-0.5725003538089127, 0.09674564081818846, 0.7555328097905035,
0.3645596332472994, -0.8823951788641474, -0.21528186643075894,
-0.6573034685773174, 0.2052230669301289, -0.42094652606410976], "score":
5.695346044666315}
}, {
"uuid": ​"d5dcb82b-50b7-41f0-91aa-4b0b1c4099d6"​, "genes":
[-0.7792198507229642, 0.3061116627205185, -0.25937635397280384,
0.9167647063456378, 0.5330174678278715, -0.15926752920347528,
-0.13536574217222241, 0.012484743050605642, 0.7593792176079834,
-0.499990660188219, 0.140097889190513, -0.8386734954957673,
-1.0086698494189392, -0.04004392479425422, -0.1212313660800253,
-0.5331098729168029, 0.025598903472925445, -0.28620293787183676,
-0.173909268600903, 0.7893986096018852, -0.7675979214515911,
0.698243382509277, 0.08831827813287244, -0.5593177270368913,
0.5774038980224234, 0.7340864488795638, -0.5903264347974543,
-0.3751673287438869, -0.3968089736424338, -0.20671495197408946,
-0.4835483838185092, 0.1398175029856883, -0.9108301742819769,
0.9411958522739917, -0.7389214040676966, -0.8712414878840817,
-0.004912320634089207, -0.0739477939830757, 0.7295979180051123,
0.2519192374309078, -0.4241077498312483, -0.9078608953950568,
0.25423297118685534, -0.7348019191989479, -0.15506294116367156,
0.6561277191034333, 0.9461754130335408, 0.3318650775845497,
0.8174930804455225, -0.7378128249590794, 0.0987964084912825,
-0.4299325556953197, -0.6074539754679977, -0.1791793528574648,
0.002440283743862133, 0.19285448298458885, -0.6955306322895036,
-0.771135637881789, 0.28867978747388795, 0.5491425130956895,
-0.3478876295954686, 0.22699650724210357, 0.40339244196036583,
0.6712017218771866, 0.2754596037988818, -0.1289573793019463,
0.5557701264121518, 0.7550626696221143, 0.663135832564517,
0.48310757201344107, 0.806892759583171, 0.8582257148271366,
0.6630226657345927, 0.3859643758329698, 0.27308141142035497,
0.7957979380296368, 0.7445311725166657, -0.9272863101553838,
0.556141789909618, -0.06130940823444128, -0.7628845511085585,
-0.5708262865785914, 0.09748690461995492, 0.7558971137880808,
0.3639693678377868, -0.8774240986975811, -0.2147864723297817,
-0.6611438173086644, 0.20580567542922637, -0.42051518790354686],
"score": 5.6953459370043165
}]

You might also like