0% found this document useful (0 votes)

18 views3 pages

Report

The document describes using reinforcement learning methods like approximate Q-learning and deep Q-learning to train an AI agent to play the aircraft warfare game. It extracts features for approximate Q-learning and uses a convolutional neural network as the function approximator for deep Q-learning. The agent is trained using these methods to learn policies for playing the game.

Uploaded by

Rahul R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views3 pages

Report

Uploaded by

Rahul R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Playing Aircraft Warfare Game with Reinforcement

Learning

Yan Zeng
ShanghaiTech University
[email protected]

Yijie Fan
ShanghaiTech University
[email protected]

Luojia Hu
ShanghaiTech University
[email protected]

Ziang Li
ShanghaiTech University
[email protected]

Chongyu Wang
ShanghaiTech University
[email protected]

Abstract

1 Introduction

Aircraft Warfare is a classic game we all enjoyed very much, which is also perfect for fully practicing
what we have learned in class, as shown in Figure 1. The rule of the game is rather simple. The goal
of the player – a upward facing aircraft plane – is to make the score as high as possible. The player
can get reward by managing to hit enemies – downward facing aircraft planes – with ﬁve actions,
namely, up, down, left, right, and using the bomb. The state includes the life value and positions of
player and enemies and so on. Game overs when life value decreases to 0.
However, playing the game well is quit tough when it comes to difﬁcult mode. Hence we turn to
AI for help. To the best of our knowledge, reinforcement learning has shown to be very successful
in mastering control policies in lots of tasks such as object recognition and solving physics-based
control problems[]. Specially, Deep Q-Networks (DQN) are proved to be effective in playing games
and even could defeat top human Go players[]. The reason they can work well is that games can
quickly generate large amounts of naturally self-annotated (state-action-reward) data, which are
high-quality training material for reinforcement learning. That is, this property of games allows deep
reinforcement learning to obtain a large number of samples for training to enhance the effect almost
costlessly. For example, DeepMind’s achievements such as playing Atari with DQN, AlphaGo
defeating Go masters, AlphaZero becoming a self-taught master of Go and Chess. And OpenAI’s

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

main research is based on games, such as the progress made in Dota. For these reasons, we believe
that training agents based on deep reinforcement learning techniques is a promising solution
In this paper, we implement an AI-agent for
playing the Aircraft Warfare with Approxi-
mate Q-learning method and Deep Q-learning
method. For Approximate Q-learning, we ex-
tracted four most useful features, i.e., interac-
tions with the closest aircraft, bomb supply,
double bullet, and the trade-off between move-
ment and explosion. We trained our agent
with an online learning method. For Deep Q-
learning, we utilize a convolutional neural net-
work which takes the screen patch as input and
outputs the expected converged sum of the dis-
counted reward of taking each action given the
current state of the 6 legal actions. We also
trained two DQN in case overﬁtting.
result...

2 Methodology
2.1 Approximate Q-learning

Since the number of states in the Aircraft War-

fare Game is extremely large, we choose Ap-
proximate Q-learning for reinforcement learn-
ing.

2.1.1 Feature Extraction

In total, we extracted four features, which are Figure 1: The Aircraft Warfare Game
the interaction with the closest aircraft, the
interaction with bomb supply, the interaction
with double bullet , and the trade-off between
movement and explosion. The following will
explain in detail how the features are extracted.
Q(s, a) = w1 f1 (s, a)+w2 f2 (s, a)+w3 f3 (s, a)+w4 f4 (s, a)

In this game, it is very necessary to control the distance between our aircraft and enemy aircraft and
props.Thus,we use the Manhattan distance to measure the interaction of the aircraft with the external
environment. For the current state s, we observe the location of the nearest enemies, game props to
the aircraft. Next, we analyze the effect of action a on the Manhattan distance between the aircraft
and the aforementioned target. The weights corresponding to each of these features reﬂect the
choices and trade-offs made by the aircraft in various situations.The purpose of the plane’s actions
can be highly summarized as attacking, dodging, and picking up props. Both affect the value of
Q(s, a) and the training of Approximate Q-learning.

2.1.2 Training
In the training phase we train the learning with multiple hyperparameters and update the weights
according to the formula below.
diﬀerence = r + γ max
′
Q(s′ , a′ ) − Q(s, a)
α

Q(s, a) ←− Q(s, a) + α[diﬀerence]

wi ←− wi + α[diﬀerence]fi (s, a)
Our training is an online learning approach. After getting a new state s in each round, we choose the
action a with the highest Q(s, a). After taking the action in the current round, we get state s. Then,

2
we use these two states with association to obtain the difference and update the parameters of the
model. γ is the discount parameter.α is the learning rate. The adjustment of these two parameters
can better help our model to converge.

2.2 Deep Q Network

Deep Q Network is a kind of Deep Reinforcement Learning, which is a combination of Deep Learn-
ing and Q-learning. Due to the limitations of Q-learning that it is impossible to choose the best action
when the number of combinations of states and actions is inﬁnite, using a deep neural network to
help determine the action is reasonable.

2.2.1 DQN algorithm

In the Aircraft Warfare Game the environment is deterministic, so all the equations listed below are
formulated deterministically for simplicity.
∑∞ Our aim will be to train a policy that can maximize the
cumulative, discounted reward Rt0 = t=t0 γ t−t0 rt ,here the γ is the discount factor,which should
be a constant between 0 and 1 to make sure the sum can converge.In the Q-learning algorithm, we
get a table of the Q values of the combinations of states and actions, then we construct a policy that
maximizes the rewards:
π ∗ (s) = argmax Q∗ (s, a)
a
However, since the number of the combinations of states and actions is inﬁnite in this scene, so we
use a neural network to resemble Q∗ . And by Bellman equation,we get:
Qπ (s, a) = r + γQπ (s′ , π(s′ ))
The difference between the two sides of the equation is known as the difference discussed in the
lecture:
δ = Q(s, a) − (r + γ max ′
Q(s′ , a′ ))
a
To minimize this difference, we use the Huber loss which acts like the mean squared error when the
error is small, but like the mean absolute error when the error is large - this makes it more robust
to outliers when the estimates of Q are very noisy. We calculate this over a batch of transitions, B,
sampled from the replay memory:
1 ∑
L= L(δ)
|B| ′
(s,a,s ,r) ∈ B

{1
2δ
2
for |δ| ≤ 1,
where L(δ) =
|δ| − 1
2 otherwise.

2.2.2 DQN Structure and Training

Our network is a convolutional neural network which takes the screen patch as input and outputs
the expected converged sum of the discounted reward of taking each action given the current state
of the 6 legal actions. To prevent overfitting, we trained two Deep Q Networks: policy network
and target network. They have the same structure but the parameters are different. We get the best
action from the policy network and computes the maxa′ Q(s′ , a′ ) from the target network for added
stability. How can this method help prevent overfitting? Suppose there is only one network, when
the parameters are updated, not only does the Q(s, a) change, but the maxa′ Q(s′ , a′ ) also changes
then the loss we want to minimize is always changing. By introducing the target network we can
temporarily fix maxa′ Q(s′ , a′ ) which makes the loss a fixed value to help prevent overfitting. To
explore the environment, we use the ϵ greedy method when choosing the action and the value of ϵ is
decayed with time to lower the regret.

3 Result

Integrate Reading Writing Basic 2 TG
No ratings yet
Integrate Reading Writing Basic 2 TG
120 pages
HSC Japanese Continuers Scope
No ratings yet
HSC Japanese Continuers Scope
3 pages
CS181 Final Project
No ratings yet
CS181 Final Project
5 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Morefeatures
No ratings yet
Morefeatures
6 pages
Untitled Document
No ratings yet
Untitled Document
11 pages
DDQN PDF
No ratings yet
DDQN PDF
13 pages
AI 20report 205
No ratings yet
AI 20report 205
18 pages
RLDL PBL AmriteshChandra 09411503121
No ratings yet
RLDL PBL AmriteshChandra 09411503121
15 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Report
No ratings yet
Report
11 pages
Playing FPS Games With Deep Reinforcement Learning: Guillaume Lample, Devendra Singh Chaplot
No ratings yet
Playing FPS Games With Deep Reinforcement Learning: Guillaume Lample, Devendra Singh Chaplot
7 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Doom AI
No ratings yet
Doom AI
7 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
DQN Atari
No ratings yet
DQN Atari
26 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Deep Learning Book Part5
No ratings yet
Deep Learning Book Part5
142 pages
Deep Deformable Q-Network An Extension of Deep Q-Network
No ratings yet
Deep Deformable Q-Network An Extension of Deep Q-Network
4 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Human-Level Control Through Deep Reinforcement Learning
No ratings yet
Human-Level Control Through Deep Reinforcement Learning
13 pages
Nature 14236
No ratings yet
Nature 14236
13 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
DRL hw2 2022 Fin2
No ratings yet
DRL hw2 2022 Fin2
6 pages
Midterm Report Example3
No ratings yet
Midterm Report Example3
4 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
Research Article: Research On UCAV Maneuvering Decision Method Based On Heuristic Reinforcement Learning
No ratings yet
Research Article: Research On UCAV Maneuvering Decision Method Based On Heuristic Reinforcement Learning
13 pages
18 Deeprl
No ratings yet
18 Deeprl
19 pages
Towards Monocular Vision Based Obstacle Avoidance Through Deep Reinforcement Learning
No ratings yet
Towards Monocular Vision Based Obstacle Avoidance Through Deep Reinforcement Learning
14 pages
Op Tim Ization
No ratings yet
Op Tim Ization
19 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
CS6700 Programming Assignment 2
No ratings yet
CS6700 Programming Assignment 2
17 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
RL Unit V Qa
No ratings yet
RL Unit V Qa
13 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Q Learning
No ratings yet
Q Learning
187 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
Ee126 Project 1
No ratings yet
Ee126 Project 1
5 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Final Pong RL Report
No ratings yet
Final Pong RL Report
3 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Final Report RL
No ratings yet
Final Report RL
5 pages
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science
7 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
Deep Reinforcement Learning For Flappy Bird: Pipeline
No ratings yet
Deep Reinforcement Learning For Flappy Bird: Pipeline
1 page
Q Learning
No ratings yet
Q Learning
38 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
BE Verb Past PDF
No ratings yet
BE Verb Past PDF
2 pages
The Audi - Lingual Method Zuhal
No ratings yet
The Audi - Lingual Method Zuhal
5 pages
Negative and Limiting Adverbials
No ratings yet
Negative and Limiting Adverbials
12 pages
Chapter 10 Maslow
No ratings yet
Chapter 10 Maslow
8 pages
BIBLIOGRAPHY
No ratings yet
BIBLIOGRAPHY
6 pages
That It Is A Questionnaire About Code Switching
No ratings yet
That It Is A Questionnaire About Code Switching
2 pages
Language Polices & Programs in Multilingual Societies Week 1
No ratings yet
Language Polices & Programs in Multilingual Societies Week 1
7 pages
19BM110
No ratings yet
19BM110
4 pages
Learning Disability - Ashwika
No ratings yet
Learning Disability - Ashwika
28 pages
New Concept 1b Lesson 3 (Acadsoc)
No ratings yet
New Concept 1b Lesson 3 (Acadsoc)
17 pages
Report Card Comments: Made For Grade 3-4 But Is Suitable For Any Grade. Editable and Very Convenient
No ratings yet
Report Card Comments: Made For Grade 3-4 But Is Suitable For Any Grade. Editable and Very Convenient
11 pages
Resource Based
No ratings yet
Resource Based
16 pages
Ibrahim Neji PDF
No ratings yet
Ibrahim Neji PDF
77 pages
Al-Bayan: An Arabic Question Answering System For The Holy Quran
No ratings yet
Al-Bayan: An Arabic Question Answering System For The Holy Quran
8 pages
Final Matrix
No ratings yet
Final Matrix
3 pages
LINGUA FRANCA 18 Jul-Aug 2008
No ratings yet
LINGUA FRANCA 18 Jul-Aug 2008
11 pages
Michael Persinger
No ratings yet
Michael Persinger
3 pages
TA12 - Unit 6
No ratings yet
TA12 - Unit 6
53 pages
Worksheet No. 1
No ratings yet
Worksheet No. 1
5 pages
Fall 2024 - EDU430 - 1 - Solved
No ratings yet
Fall 2024 - EDU430 - 1 - Solved
5 pages
Report On Least Learned Competencies in English S.Y. 2020-2021
No ratings yet
Report On Least Learned Competencies in English S.Y. 2020-2021
3 pages
CBE 647 Lesson Plan - Sept 2017
No ratings yet
CBE 647 Lesson Plan - Sept 2017
3 pages
Trial Memorandum Writing
No ratings yet
Trial Memorandum Writing
6 pages
Usability Test Report
No ratings yet
Usability Test Report
3 pages
Exploring Sentiment Analysis Techniques in Natural Language Processing: A Comprehensive Review
No ratings yet
Exploring Sentiment Analysis Techniques in Natural Language Processing: A Comprehensive Review
6 pages
Individual Psychology-Wexberg PDF
No ratings yet
Individual Psychology-Wexberg PDF
250 pages
Chapter One 1.1 Backgrounds To The Study
No ratings yet
Chapter One 1.1 Backgrounds To The Study
5 pages
1 SM
No ratings yet
1 SM
4 pages

Report

Uploaded by

Report

Uploaded by

Playing Aircraft Warfare Game with Reinforcement

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Since the number of states in the Aircraft War-

2.1.1 Feature Extraction

Q(s, a) ←− Q(s, a) + α[diﬀerence]

2.2 Deep Q Network

2.2.1 DQN algorithm

2.2.2 DQN Structure and Training

You might also like