Report
Report
Learning
Yan Zeng
ShanghaiTech University
[email protected]
Yijie Fan
ShanghaiTech University
[email protected]
Luojia Hu
ShanghaiTech University
[email protected]
Ziang Li
ShanghaiTech University
[email protected]
Chongyu Wang
ShanghaiTech University
[email protected]
Abstract
1 Introduction
Aircraft Warfare is a classic game we all enjoyed very much, which is also perfect for fully practicing
what we have learned in class, as shown in Figure 1. The rule of the game is rather simple. The goal
of the player – a upward facing aircraft plane – is to make the score as high as possible. The player
can get reward by managing to hit enemies – downward facing aircraft planes – with five actions,
namely, up, down, left, right, and using the bomb. The state includes the life value and positions of
player and enemies and so on. Game overs when life value decreases to 0.
However, playing the game well is quit tough when it comes to difficult mode. Hence we turn to
AI for help. To the best of our knowledge, reinforcement learning has shown to be very successful
in mastering control policies in lots of tasks such as object recognition and solving physics-based
control problems[]. Specially, Deep Q-Networks (DQN) are proved to be effective in playing games
and even could defeat top human Go players[]. The reason they can work well is that games can
quickly generate large amounts of naturally self-annotated (state-action-reward) data, which are
high-quality training material for reinforcement learning. That is, this property of games allows deep
reinforcement learning to obtain a large number of samples for training to enhance the effect almost
costlessly. For example, DeepMind’s achievements such as playing Atari with DQN, AlphaGo
defeating Go masters, AlphaZero becoming a self-taught master of Go and Chess. And OpenAI’s
2 Methodology
2.1 Approximate Q-learning
In this game, it is very necessary to control the distance between our aircraft and enemy aircraft and
props.Thus,we use the Manhattan distance to measure the interaction of the aircraft with the external
environment. For the current state s, we observe the location of the nearest enemies, game props to
the aircraft. Next, we analyze the effect of action a on the Manhattan distance between the aircraft
and the aforementioned target. The weights corresponding to each of these features reflect the
choices and trade-offs made by the aircraft in various situations.The purpose of the plane’s actions
can be highly summarized as attacking, dodging, and picking up props. Both affect the value of
Q(s, a) and the training of Approximate Q-learning.
2.1.2 Training
In the training phase we train the learning with multiple hyperparameters and update the weights
according to the formula below.
difference = r + γ max
′
Q(s′ , a′ ) − Q(s, a)
α
2
we use these two states with association to obtain the difference and update the parameters of the
model. γ is the discount parameter.α is the learning rate. The adjustment of these two parameters
can better help our model to converge.
Deep Q Network is a kind of Deep Reinforcement Learning, which is a combination of Deep Learn-
ing and Q-learning. Due to the limitations of Q-learning that it is impossible to choose the best action
when the number of combinations of states and actions is infinite, using a deep neural network to
help determine the action is reasonable.
{1
2δ
2
for |δ| ≤ 1,
where L(δ) =
|δ| − 1
2 otherwise.
3 Result