0% found this document useful (0 votes)

30 views

Generating Intelligent Agent Behaviors in Multi-Agent Game AI Using Deep Reinforcement Learning Algorithm

The utilization of games in training the reinforcement learning (RL) agent is to describe the complex and high-dimensional real-world data. By utilizing games, RL researchers will be able to evade high experimental costs in training an agent to do intelligence tasks. The objective of this research is to generate intelligent agent behaviors in multi-agent game artificial intelligence (AI) using deep reinforcement learning (DRL) algorithm. A basic RL algorithm called deep Q network is chosen to be

Uploaded by

International Journal of Advances in Applied Sciences (IJAAS)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Generating Intelligent Agent Behaviors in Multi-Agent Game AI Using Deep Reinforcement Learning Algorithm

Uploaded by

International Journal of Advances in Applied Sciences (IJAAS)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

International Journal of Advances in Applied Sciences (IJAAS)

Vol. 12, No. 4, December 2023, pp. 396~404

ISSN: 2252-8814, DOI: 10.11591/ijaas.v12.i4.pp396-404  396

Generating intelligent agent behaviors in multi-agent game AI

using deep reinforcement learning algorithm

Rosalina, Axel Sengkey, Genta Sahuri, Rila Mandala

Informatics Study Program, Faculty of Computing, President University, Bekasi, Indonesia

Article Info ABSTRACT

Article history: The utilization of games in training the reinforcement learning (RL) agent is
to describe the complex and high-dimensional real-world data. By utilizing
Received Aug 4, 2023 games, RL researchers will be able to evade high experimental costs in
Revised Oct 9, 2023 training an agent to do intelligence tasks. The objective of this research is to
Accepted Oct 18, 2023 generate intelligent agent behaviors in multi-agent game artificial
intelligence (AI) using deep reinforcement learning (DRL) algorithm. A
basic RL algorithm called deep Q network is chosen to be implemented. The
Keywords: agent is trained by the environment's raw pixel images and the action list
information. The experiments conducted by using this algorithm show the
Artificial intelligent agent’s decision-making ability in choosing a favorable action. In the default
Deep reinforcement learning setting for the algorithm, the training is set into 1 epoch and 0.0025 learning
Intelligent multi-agent rate. The number of training iterations is set to one as the training function
Multi-agent game will be repeatedly called for every 4-timestep. However, the author also
Q-learning experimented with two different scenarios in training the agent and
compared the results. The experimental findings demonstrate that our agents
learn correctly and successfully while actively participating in the game in
real time. Additionally, our agent can quickly adjust against a different
enemy on a varied map because of the observed knowledge from prior
training.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Rosalina
Informatics Study Program, Faculty of Computing, President University
Jl. Ki Hajar Dewantara, Cikarang Baru, Bekasi, Jawa Barat, Indonesia
Email: [email protected]

1. INTRODUCTION
Technology is developed to help humans perform complicated work that either involves dangerous
work or complex computation. Inventions such as computers and smartphones are some good examples of
technological development that enables humans to work in a smart, simple, and efficient manner through a
variety of smart programs. An example of a smart program is the virtual intelligence assistant developed by
Google which can recognize and process our voice as an input, the Google Assistant. This smart program has
a trainable intelligence that will get better in recognizing voice and processing tasks as long as it has a decent
amount of input and training time. This man-made intelligence is called artificial intelligence (AI).
Google DeepMind and OpenAI are companies that show AI potential in solving problems that can be
trained in a simulated environment. Reinforcement learning (RL), one of the machine learning (ML) methods, is
utilized by these companies to train an expert agent who outperforms humans in the game. The utilization of the
game in training the RL agent is to describe the complex and high-dimensional real-world data [1]–[9]. By
utilizing games, RL researchers will be able to evade high experimental costs in training an agent to do
intelligence tasks [10]. However, RL application is still impacted by high sample complexity, especially in

Journal homepage: http://ijaas.iaescore.com

Int J Adv Appl Sci ISSN: 2252-8814  397

multi-agent systems. To solve this problem, Loftin et al. [11] formally define the concept of strategically
effective exploration in Markov games and use this to create two finite Markov game learning algorithms.
In an end-to-end framework, the combination, known as deep reinforcement learning (DRL),
significantly enhances the generalization and scalability of conventional RL algorithms by instructing agents to
make decisions in high-dimensional state space, such as playing video games, controlling robots, and making
decisions in various real-world applications. Examine the evolution of DRL research with a particular emphasis
on AlphaGo and AlphaGo Zero [12], [13]. Research by Li [14] discusses key components, significant
mechanisms, and a range of applications while providing an overview of DRL achievements. Meanwhile,
Perakam et al. [15] introduce the first deep-learning model that can successfully learn control policies from
high-dimensional sensory input. The model is a convolutional neural network that was trained using a variation
of Q-learning, with raw pixels as its input, and an estimation of future rewards as its output. More recently, the
RL algorithm was implemented in [15]–[22] which mostly aims to generate intelligent agent behavior. This
research implements RL as the ML algorithm in creating an agent that could outperform humans in the Atari
game by using the algorithm that is proposed in [23] and the agent is trained by the environment raw pixel
images and the action list information.

2. RESEARCH METHOD
2.1. The Markov property
The state in the RL should satisfy the Markov property. The Markov property defines that a current
state completely characterizes the state of the world, hence the current state is independent both towards the
future and past state [24]. In the agent’s environment, the agent transitions to another state through the taken
action. If the next state could be predicted without knowing/dependent on the preceded events, then the
mathematical equation of the property is given in (1).

𝑃𝑟{𝑠𝑡+1 = 𝑠́ , 𝑟𝑡+1 = 𝑟 |𝑠𝑡 , 𝑟𝑡 , 𝑎𝑡 , 𝑠𝑡−1 , 𝑎𝑡−1 , … 𝑟1 , 𝑠0 , 𝑎0 } (1)

2.2. Markov decision models

The Markov decision process is the mathematical formulation of the RL defined by the tuple
(𝑆, 𝐴, ℛ, ℙ , 𝛾), where 𝑆 represents the set of possible states, 𝐴 represents the set of possible actions, ℛ
represents the distribution of the reward given a (state, action) pair, ℙ represents the transition probability
i.e., distribution of the next state given (state, action) pair, and 𝛾 represents the discount factor.
The Markov decision process works will be represented as the main task of the RL’s agent which is
described through the pseudocode: i) The agent initializes by sampling the environment's initial state
𝑠0 ~ 𝑝(𝑠0 ) and ii) Then, from t=0 until done: agent select action 𝑎𝑡 , environment samples reward given the
state and action given 𝑟𝑡 ~ 𝑅( . |𝑠𝑡 , 𝑎𝑡 ), Environment sample the next state 𝑠𝑡+1 ~ 𝑃( . |𝑠𝑡 , 𝑎𝑡 ), and Agent
receives reward 𝑟𝑡 and move to the next state 𝑠𝑡+1 .
Based on this, the agent policy can now be stated as 𝜋𝑡 (𝑠, 𝑎) that specifies the choosing action
mechanics for the agents in each state. The objective of the RL agent is to find the optimum policy 𝜋 ∗ that
maximize the cumulative discounted reward ∑𝑡>0 𝛾 𝑡 𝑟𝑡 . The optimum policy should be stochastic to fulfill the
Markov property. Thus, to handle the randomness, the maximum expected sum of reward is taken.

𝜋 ∗ = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝔼[∑𝑡≥0 𝛾 𝑡 𝑟𝑡 |𝜋] (2)

𝜋

Initial state sampled from the initial state distribution s0 ~ p(s0 ), action sampled from the policy given by the
state a t ~ π( . | st ), and the next state sampled from the transition probability distribution st+1 ~ p( . | st , a t ).

2.3. Value function and Q-value function

Finding the optimum policy means that the agent has to learn the goodness of a state and the
goodness of a state-action pair. The value function is the expected cumulative reward from following the
policy from a state that quantifies the good and bad state.

𝑉 𝜋 (𝑠) = 𝔼[∑𝑡≥0 𝛾 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝜋] (3)

The Q-Value function at state 𝑠 and action 𝑎 is the expected cumulative reward from taking the action 𝑎 on
state 𝑠.

𝑄𝜋 (𝑠, 𝑎) = 𝔼[∑𝑡≥0 𝛾 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝑎0 = 𝑎, 𝜋] (4)

Generating intelligent agent behaviors in multi-agent game AI using deep reinforcement … (Rosalina)
398  ISSN: 2252-8814

2.4. Q-learning
The Q-learning uses an action-value function, Q, to approximate the optimal action-value function,
Q*. Q-learning utilizes the Bellman Equation which is a mathematical equation that is mainly used to solve
the optimization problem and it is mainly utilized in Dynamic Programming. The equation that is utilized for
RL is shown in (5).

𝑄(𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + 𝛾 max 𝑄(𝑠 ′ , 𝑎) (5)

𝛼

The (5) is a processed Bellman Equation that is used to fit a state and action pair into it. The Q(s, a),
commonly referred to as the Q value, is calculated through the addition of the immediate reward r(s, a)
added by the maximum value of the highest possible Q value from the next state (s’) in the response of taking
an action (a) time a discount factor gamma (γ). A discount factor, a number between 0 and 1, is used to
control the importance of the short-term and long-term rewards. When given the chance to obtain a short-
term reward, an agent will be compelled to act avariciously and grab the largest reward as soon as possible,
which leads to the development of exploitation behavior. In contrast, a long-term reward will have the
opposite effect. Thus, the goal of Q-learning is to maximize the future cumulative reward that could be
achieved. The characteristic of this algorithm made it to be called a greedy algorithm.
The Q-learning algorithm is a reliable solution in the field of RL, especially in uncharted territory.
Known for its ease of use, Q-learning has fewer parameters, strong exploratory powers, and a convergence
guarantee. Its independence from the requirement for an explicit model of the environment is one of its
unique characteristics. Because of this feature, Q-learning is especially useful in situations where it is
difficult or impractical to obtain an accurate model [25]. Because of its effectiveness in path planning, an
area where outcomes are optimized, Q-learning has attracted a lot of attention and investigation in academic
study. The algorithm's adaptability and usefulness in many real-world circumstances are highlighted by its
capacity to navigate unfamiliar environments and determine optimal policies without the need for a pre-
existing model.

2.5. Deep Q-network

A deep Q network combines the deep neural network with the Q-learning mechanics. The neural
network will replace the Q values tables, and as a result, the neural network will replace the table function to
approximate Q values. By minimizing the cost function, which will be similar to the mean square error
function, the algorithm aims to minimize the difference between the initial learning state and the goal state
where the Q value reaches its final converged value. The cost function of the deep Q network is defined as
(6):
2
𝐶𝑜𝑠𝑡 = [𝑄(𝑠, 𝑎; 𝜃) − ( 𝑟(𝑠, 𝑎) + 𝛾 max 𝑄(𝑠 ′ , 𝑎))] (6)
𝛼

where 𝑄(𝑠, 𝑎; 𝜃) is the new state-action value function which takes trainable weights of the neural network
(𝜃).
The main problem of training an agent is to introduce the agent’s independence towards state
transition. The agent that depends on learning from the exact previous state has a risk of being trapped in the
unwanted scenario as the agent does not have any other source to learn. Hence, experience replay is
introduced to stochastically handle the problem. The algorithm is shown in Algorithm 1.

Algorithm 1. Deep Q-learning with experience replay

Initialize replay memory D to capacity N
Initialize action-value function Q with random weights
for episode=1, M do
Initialize sequence s1={x1} and preprocessed sequenced ϕ1=ϕ(s1)
for t=1, T do
With probability ϵ select a random action at
otherwise select at=maxaQ*(ϕ(st),a;θ)
Execute action at in the emulator and observe reward rt and image xt+1
Set st+1=st,at,xt+1 and preprocess ϕt+1=ϕ(st+1)
Store transition (ϕt,at,rt,ϕt+1) in D
Sample random minibatch of transitions (ϕj,aj,rj,ϕj+1) from D
rj for terminal ϕj+1
Set yj={ r +γmax Q(ϕ ,a';θ)
j a' j+1 for non-terminal ϕj+1
2
Perform a gradient descent step on (yj−Q(ϕj,aj;θ)) according to equation 3
end for
end for

Int J Adv Appl Sci, Vol. 12, No. 4, December 2023: 396-404
Int J Adv Appl Sci ISSN: 2252-8814  399

3. RESULTS AND DISCUSSION

Ms. PacMan's environment satisfies the Markov properties, as the agent does not need to know the
previous state to predict the next state. For example, the agent does not need to know how the bonus fruit
appears in the game, instead, it could predict in the future to approach the bonus fruit when it does appear on
the game screen. The system processes high-dimensional pixel data. Therefore, the game’s frame is pre-
processed by the user-defined function to a smaller size (84, 84) with the grayscale color scheme. Figure 1
illustrates the process of the game frame transformation.

Figure 1. The transformation of the preprocessed game frame

The agent's performance in the initial episode is shown in Figure 2. The initial result when the first
episode runs shows that the agent still trying to explore the surrounding area (Figure 2(a)). The agent
movement is not smooth which means that the agent did not take 1 action per direction it’s heading into
(Figure 2(b)). Instead, the agent tries different random actions that cause it to stutter around and end up dying
while it gathers a 90 score (Figure 2(c)).

(a) (b) (c)

Figure 2. The agent's performance at the initial episode, (a) the interface of the agent’s initial behavior, (b)
the agent’s collision with a ghost, and (c) the agent out of lives

The states that will be used in the paper will be the pre-processed game frames. The state space
consists of a lot of states as Ms. PacMan has 1293 distinct locations in the maze. A complete state of Ms.
PacMan’s model consists of the location of Ms. PacMan, the ghosts, the power pills, along with the ghost's
previous move, and the information on whether the ghost is edible.
The agent has nine actions that could be performed at the game which are represented by a single
integer. These actions are ['NOOP', 'UP', 'RIGHT', 'LEFT', 'DOWN', 'UPRIGHT', 'UPLEFT',
'DOWNRIGHT', 'DOWNLEFT']. Meanwhile, Ms. PacMan’s reward could be obtained by gathering the
foods (Pac-Dot), bonus fruit (Fruit), power-up items (Power Pellet), eating a ghost, and chain-eating the
ghosts. The reward list is shown in the Tables 1 and 2.

Generating intelligent agent behaviors in multi-agent game AI using deep reinforcement … (Rosalina)
400  ISSN: 2252-8814

Table 1. Ms. PacMan's reward space–the foods and ghost-eating scores

Image Name Score
Pac-Dot 10 points
Power Pellet 50 points

1 Ghost 200 points

2 Ghost 400 points

3 Ghost 600 points

4 Ghost 800 points

Table 2. Ms. PacMan's reward space–the bonus fruits

Image Name Score
Cherry 100 points

Strawberry 200 points

Orange 500 points

Pretzel 700 points

Apple 1000 points

Pear 2000 points

Banana 5000 points

The testing was conducted in the local machine in the PyCharm IDE. The observation is conducted
by observing the output of the neural network per game episode, analyzing the video output, and creating a
gameplay graph. The initial result when the first episode runs shows that the agent still trying to explore the
surrounding area. The agent's movement is not smooth which means that the agent did not take 1 action per
direction it’s heading into. Instead, the agent tries different random actions that cause it to stutter around the
hall and end up dying. At the end of its life, the agent manages to gather 220 episodes without any utilization
of power pills which can power up Ms. PacMan to eat the ghost without dying.
In the default setting for the algorithm, the training is set into 1 epoch and 0.0025 learning rate. The
number of training iterations is set to one as the training function will be repeatedly called for every 4-
timestep. However, the author also experimented with two different scenarios in training the agent and
compared the results. The full training scenario is shown in Table 3.

Table 3. Testing scenario

Epoch Learning rate
Case A 1 0.3
Case B 100 0.025
Default 1 0.025

The first scenario, case A, is to set 1 epoch and 0.3 learning rate. The time used to train 100 episodes
with 1 epoch and 0.3 learning rate is 6 hours. During its 100th gameplay, the agent has undergone 12,595
training sessions. The agent got a 180 score when it reached the 100th episode. The scores from episode 80 to
episode 110 mostly dominated around the range of 200–250, the highest score that the agent can reach occurred
at episode 97 with the score 590. In this case, the agent still tends to act like the initial test.
The author uses 100 epochs to test whether the agent could run well if the number of training is
increased. The second scenario, case B, is to set 100 epochs and 0.025 learning rate. The time used to train
100 episodes is 1 day and 2 hours, or 26 hours in total. During its 100th gameplay, the agent has undergone
13,023*102 training sessions. In this scenario, the agent behavior has resulted in a more consistent
exploration, detecting that a lot of the road is emptied from the food that is previously lying around. In this
scenario, there exist three high scores achieved by the agent which are found at episodes 85, 100, and 104
with a value of 940, 1,040, and 1,340. Seeing the amount of resources that are exhausted in this testing
scenario, the author decides that this testing scenario is not feasible to be tested.

Int J Adv Appl Sci, Vol. 12, No. 4, December 2023: 396-404
Int J Adv Appl Sci ISSN: 2252-8814  401

For the default case, the training is conducted in three different phases because the author
encountered a technical error during the training session which interrupted the training session. The first
phase has 547 episodes which is running for 29 hours and 27 minutes. The second phase has 601 episodes
which is running for 9 hour and 43 minutes. The third phase has 1,001 episodes which is running for 20 hours
and 8 minutes. Hence, in total, the default scenario was going through 2,149 episodes in 59 hours and 19
minutes. In this scenario, the agent can reach 100 episodes/hour. However, as the training went on, the
training speed declined to around 35-40 episodes/hour starting from the 700th–1,001th episode of the second
phase. The training speed declined as the agent consumed a lot of memory and storage throughout the
training session. At the start of a training session, the agent only needs less than 2 GB of RAM whereas at the
halfway of the training, around the 600th episode, the agent consumes around 6.5–8.7 GB of RAM.
In the first phase, the agent explored any possible actions it could take which is shown by the low
score achieved (ranging around 200-400 points) and stuttering behavior. Thus, the number of high scores
achieved at this point cannot be said as a result of an intelligent decision. The average scores graph, depicted
in Figure 3, shows the trends of improvement in the agent’s performance through 547 episodes.
The agent started to frequently achieve scores of more than 400 points as shown in the scores history
graph depicted by Figure 4. The number of high scores achieved also shows an improvement as the score
graph is updated to more than 2,500 points. The epsilon is set to follow the progress that has been made in
the first phase with a 0.05 value addition.
At the third phase of the training session, the agent is also loaded with the second phase training’s
weight and the latest epsilon of the second phase. The agent's performance at the beginning is much more
improved than the agent’s performance at the beginning of the first phase. At the end of the third phase, the
agent reached a maximum score at the 462nd episode with 4,020 points as shown in Figure 5. Around the
700th episode, the maximum average score peaked at 877.7. The second evaluation agent gained 651.8
evaluation points.
The agent performance in the default case shows promising behavior (indicated by the agent’s
performance improvement in achieving a new higher score). The average scores history graph is presented to
give more details on the agent’s performance through Figure 6. The figure shows that throughout the learning
the agent has shown a good trend of improvement. However, after the agent passes the 500th episode, the
agent shows stagnant performance that almost leads to performance deterioration.
An additional experiment was carried out on a simpler game environment, Breakout. Breakout is a
game where a layer of bricks is stationed on the top third of the screen and the goal of the game is to destroy
all of the bricks using a bouncing ball that the player can hit using a panel that can move horizontally. The
agent shows similar performance in the game of Breakout as shown by the average score graph in Figure 7.
In 1000 episodes, the agent still iterates through the exploration phase as shown by the fluctuation of the
average score graph.
In this research, the author implements the algorithm featured in [9], where the result of this paper is
shown in Figure 8. However, due to the experimental parameters, which are a combination of experimental
and recommended values, the code’s performance may not be optimal. There are a lot of sources that give a
better approach to either construct the algorithm or set the recommended parameters, but these approaches
require more expert knowledge that needs to be understood and studied which requires a lot of time.

Figure 3. The first phase score graph Figure 4. The second phase score graph

Generating intelligent agent behaviors in multi-agent game AI using deep reinforcement … (Rosalina)
402  ISSN: 2252-8814

Figure 5. The third phase score graph Figure 6. The third phase average score graph

Figure 7. The Breakout average score graph

Figure 8. The average reward per episode on Breakout and Seaquest result in paper [9] respectively during
training

4. CONCLUSION
Even though the agent training progress shows a good trend of improvement, this research cannot be
said successful in creating an agent that could master and exploit Ms. PacMan. The author found that the
choosing action mechanism works as intended. The agent first discovers new possibilities until the agent
exploits a good approach to get a better result. The agent’s got a stagnant score after passing the 500th
episode followed by the downfall in the latest episode. This pattern is assumed to happen because of the
agent’s behavior to exploit the action that is not resulting in the desirable result as the agent’s performance
and behavior around 1000 episodes is not perfect which means the agent is still learning. Hence, the agent
that exploits the non-desired action got a downgrade in its performance.

Int J Adv Appl Sci, Vol. 12, No. 4, December 2023: 396-404
Int J Adv Appl Sci ISSN: 2252-8814  403

At the end of the experiment, the agent can move with less stuttering per frame explore a large
amount of area, and utilize some power pills before it runs out of life. Looking at the results of the
experiment, the author believes that the RL could be a great approach to solving real-world problems. The
author encourages other students to study this field. It should be noted that further knowledge is required as
the real-world problem is more complicated than a game which means that a basic RL algorithm will only be
fundamental to solve the problem.

REFERENCES
[1] I. A. Kash, M. Sullins, and K. Hofmann, “Combining No-regret and Q-learning,” ArXiv-Computing Science, 2019, doi:
10.48550/ARXIV.1910.03094.
[2] S. Levine, “Unsupervised reinforcement learning,” in Proceedings of the 19th International Conference on Autonomous Agents
and MultiAgent Systems, in AAMAS ’20. Richland, SC: International Foundation for Autonomous Agents and Multiagent
Systems, May 2020, pp. 5–6, doi: 10.5555/3398761.3398766.
[3] P. Knott et al., “Evaluating the robustness of collaborative agents,” ArXiv-Computing Science, 2021, doi:
10.48550/ARXIV.2101.05507.
[4] W. Dabney, M. Rowland, M. Bellemare, and R. Munos, “Distributional reinforcement learning with quantile regression,” in
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, pp. 1–10, doi: 10.1609/aaai.v32i1.11791.
[5] J. Beck, K. Ciosek, S. Devlin, S. Tschiatschek, C. Zhang, and K. Hofmann, “Amrl: aggregated memory for reinforcement
learning,” in International Conference on Learning Representations, 2020, pp. 1–20.
[6] I. Szita, “Reinforcement learning in games,” in Reinforcement Learning, vol. 12, M. Wiering and M. van Otterlo, Eds., in
Adaptation, Learning, and Optimization, vol. 12, Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 539–577, doi:
10.1007/978-3-642-27645-3_17.
[7] K. Hofmann, “Minecraft as AI playground and laboratory,” in Proceedings of the Annual Symposium on Computer-Human
Interaction in Play, Barcelona Spain: ACM, Oct. 2019, pp. 1–1, doi: 10.1145/3311350.3357716.
[8] M. Babaeizadeh, I. Frosio, S. Tyree, J. Clemons, and J. Kautz, “Reinforcement learning through asynchronous advantage actor-
critic on a GPU,” arXiv:1611.06256, pp. 1–12, 2016, doi: 10.48550/ARXIV.1611.06256.
[9] L. Zintgraf et al., “Exploration in approximate hyper-state space for meta reinforcement learning,” in International Conference on
Machine Learning (ICML), vol. 139, pp. 12991–13001, 2020, doi: 10.48550/ARXIV.2010.01062.
[10] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May. 2015, doi:
10.1038/nature14539.
[11] R. Loftin, A. Saha, S. Devlin, and K. Hofmann, “Strategically efficient exploration in competitive multi-agent reinforcement
learning,” arXiv:2107.14698, 2021, doi: 10.48550/ARXIV.2107.14698.
[12] D. Zhao et al., “Review of deep reinforcement learning and discussions on the development of computer Go,” Control Theory and
Applications, vol. 33, no. 6, pp. 701–717, Jun. 2016, doi: 10.7641/CTA.2016.60173.
[13] Z. Tang, K. Shao, D. Zhao, and Y. Zhu, “Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo Zero,”
Control Theory and Applications, vol. 34, no. 12, pp. 1529–1546, Dec. 2017, doi: 10.7641/CTA.2017.70808.
[14] Y. Li, “Deep reinforcement learning: an overview,” arXiv:1701.07274, 2017, doi: 10.48550/ARXIV.1701.07274.
[15] S. K. U, P. S, G. Perakam, V. P. Palukuru, J. Varma Raghavaraju, and P. R, “Artificial intelligence (AI) prediction of atari game
strategy by using reinforcement learning algorithms,” in 2021 International Conference on Computational Performance
Evaluation (ComPE), Shillong, India: IEEE, Dec. 2021, pp. 536–539, doi: 10.1109/ComPE53109.2021.9752304.
[16] B. Setiaji, E. Pujastuti, M. F. Filza, A. Masruro, and Y. A. Pradana, “Implementation of reinforcement learning in 2D based
games using open AI gym,” in 2022 International Conference on Informatics, Multimedia, Cyber and Information System
(ICIMCIS), Jakarta, Indonesia: IEEE, Nov. 2022, pp. 293–297, doi: 10.1109/ICIMCIS56303.2022.10017810.
[17] J. Nogae, K. Ootsu, T. Yokota, and S. Kojima, “Comparison of reinforcement learning in game AI,” in 2022 23rd ACIS
International Summer Virtual Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
Computing (SNPD-Summer), Kyoto City, Japan: IEEE, Jul. 2022, pp. 82–86, doi: 10.1109/SNPD-Summer57817.2022.00022.
[18] A. Das, V. Shroff, A. Jain, and G. Sharma, “Knowledge transfer between similar atari games using deep Q-networks to improve
performance,” in 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT),
Kharagpur, India: IEEE, Jul. 2021, pp. 1–8, doi: 10.1109/ICCCNT51525.2021.9580091.
[19] C. Hu et al., “Reinforcement learning with dual-observation for general video game playing,” IEEE Transactions on Games, vol.
15, no. 2, pp. 202–216, Jun. 2023, doi: 10.1109/TG.2022.3164242.
[20] P. Xu, Q. Yin, J. Zhang, and K. Huang, “Deep reinforcement learning with part-aware exploration bonus in video games,” IEEE
Transactions on Games, vol. 14, no. 4, pp. 644–653, Dec. 2022, doi: 10.1109/TG.2021.3134259.
[21] C. Fan and Y. Hao, “Precise key frames adversarial attack against deep reinforcement learning,” in 2022 10th International
Conference on Intelligent Computing and Wireless Optical Communications (ICWOC), Chongqing, China: IEEE, Jun. 2022, pp.
1–6, doi: 10.1109/ICWOC55996.2022.9809899.
[22] Y. Zhan, Z. Xu, and G. Fan, “Learn effective representation for deep reinforcement learning,” in 2022 IEEE International
Conference on Multimedia and Expo (ICME), Taipei, Taiwan: IEEE, Jul. 2022, pp. 1–6, doi: 10.1109/ICME52920.2022.9859768.
[23] V. Mnih et al., “Playing atari with deep reinforcement learning,” arXiv:1312.5602, 2013, doi: 10.48550/ARXIV.1312.5602.
[24] S. Pitis, “Rethinking the discount factor in reinforcement learning: a decision theoretic approach,” in Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 33, no. 1, pp. 7949–7956, Jul. 2019, doi: 10.1609/aaai.v33i01.33017949.
[25] A. Wong, T. Bäck, A. V. Kononova, and A. Plaat, “Deep multiagent reinforcement learning: challenges and directions,” Artificial
Intelligence Review, vol. 56, no. 6, pp. 5023–5056, Jun. 2023, doi: 10.1007/s10462-022-10299-x.

Generating intelligent agent behaviors in multi-agent game AI using deep reinforcement … (Rosalina)
404  ISSN: 2252-8814

BIOGRAPHIES OF AUTHORS

Rosalina is a lecturer in the Informatics Study Program at President University.

Strong education professional who graduated with a master's degree in informatics from
President University. She can be contacted at email: [email protected].

Axel Sengkey is a student in the Informatics Study Program at President

University. His research interest includes programming, web development, game
development, data science, and machine learning. He can be contacted at email:
[email protected].

Genta Sahuri is a lecturer in the Information System Study Program at President

University. Strong education professional who graduated with a master's degree in informatics
from President University. He can be contacted at email: [email protected].

Rila Mandala is the dean of the Faculty of Computing at President University.

Strong education professional in Informatics, graduated from Tokyo Institute of Technology
(Ph.D.). He can be contacted at email: [email protected].

Int J Adv Appl Sci, Vol. 12, No. 4, December 2023: 396-404

G4 - Nova Waterfront Hotel Analysis
No ratings yet
G4 - Nova Waterfront Hotel Analysis
16 pages
DLL Grade 9 - Week 1
No ratings yet
DLL Grade 9 - Week 1
12 pages
Why Are We Psychotherapists?: The Necessity of Help For The Helper
No ratings yet
Why Are We Psychotherapists?: The Necessity of Help For The Helper
8 pages
PHIL613 Williams - Ethics and The Limits... Ch. 3
No ratings yet
PHIL613 Williams - Ethics and The Limits... Ch. 3
2 pages
5e Lesson Plan Plants
100% (2)
5e Lesson Plan Plants
4 pages
Assignment 3 - ReinforcementLearning - 200508263 - AdityaAnantharaman - Trikkur
No ratings yet
Assignment 3 - ReinforcementLearning - 200508263 - AdityaAnantharaman - Trikkur
9 pages
DQN 1
No ratings yet
DQN 1
9 pages
Disertatie
No ratings yet
Disertatie
5 pages
DeepMind Whitepaper
No ratings yet
DeepMind Whitepaper
9 pages
RL Project - Deep Q-Network Agent Report
No ratings yet
RL Project - Deep Q-Network Agent Report
11 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
19 pages
MDP
No ratings yet
MDP
10 pages
Exploring Game Playing AI Using Reinforcement Learning Techniques
No ratings yet
Exploring Game Playing AI Using Reinforcement Learning Techniques
5 pages
Safe Multi Agent Reforcement Learning For Autonomous Driving
No ratings yet
Safe Multi Agent Reforcement Learning For Autonomous Driving
13 pages
Module 1
No ratings yet
Module 1
81 pages
Assignment3 Yash Patel
No ratings yet
Assignment3 Yash Patel
10 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
RL RS-Unit_3 (1)
No ratings yet
RL RS-Unit_3 (1)
6 pages
Hindsight Experience Replay
No ratings yet
Hindsight Experience Replay
15 pages
applsci_13_02443
No ratings yet
applsci_13_02443
23 pages
Reinforcement 2
No ratings yet
Reinforcement 2
2 pages
kguh
No ratings yet
kguh
38 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
Project3-Arc1 1
No ratings yet
Project3-Arc1 1
7 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Actor Mimic
No ratings yet
Actor Mimic
16 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
Unit 5 - Copy
No ratings yet
Unit 5 - Copy
7 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
11 pages
Co-Learning in Differential Games
No ratings yet
Co-Learning in Differential Games
35 pages
2206.09328v1
No ratings yet
2206.09328v1
28 pages
UNIT- 5
No ratings yet
UNIT- 5
43 pages
Thesis Fabrizio Galli
No ratings yet
Thesis Fabrizio Galli
22 pages
Instant Access to The Art of Reinforcement Learning Michael Hu ebook Full Chapters
100% (1)
Instant Access to The Art of Reinforcement Learning Michael Hu ebook Full Chapters
57 pages
Download Full The Art of Reinforcement Learning Michael Hu PDF All Chapters
100% (1)
Download Full The Art of Reinforcement Learning Michael Hu PDF All Chapters
41 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
RL Course Report
No ratings yet
RL Course Report
10 pages
The Art of Reinforcement Learning Michael Hu - Download the entire ebook instantly and explore every detail
No ratings yet
The Art of Reinforcement Learning Michael Hu - Download the entire ebook instantly and explore every detail
58 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
RL Lecturer (1)
No ratings yet
RL Lecturer (1)
38 pages
Reinforcement Learning_ Playing Tic-Tac-Toe (Pre-Print)
No ratings yet
Reinforcement Learning_ Playing Tic-Tac-Toe (Pre-Print)
11 pages
Unit 04 Finite Markov Decision Processes
No ratings yet
Unit 04 Finite Markov Decision Processes
8 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
An Analysis of Stochastic Game Theory For Multiagent Reinforcement Learning
No ratings yet
An Analysis of Stochastic Game Theory For Multiagent Reinforcement Learning
12 pages
Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
No ratings yet
Genetic Reinforcement Learning Algorithms For On-Line Fuzzy Inference System Tuning "Application To Mobile Robotic"
31 pages
Open Ended Learning Paper
No ratings yet
Open Ended Learning Paper
54 pages
Hybrid AI Agent On 2d Racing Game Using Neural Networks and Reinforcement Learning
No ratings yet
Hybrid AI Agent On 2d Racing Game Using Neural Networks and Reinforcement Learning
7 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning and Deep Learning Unit 1,2
No ratings yet
Reinforcement Learning and Deep Learning Unit 1,2
74 pages
InTech-Multi Automata Learning
No ratings yet
InTech-Multi Automata Learning
21 pages
Understanding Reinforcement Learning Algorithms：Q Learning
No ratings yet
Understanding Reinforcement Learning Algorithms：Q Learning
18 pages
s10489-022-04105-y
No ratings yet
s10489-022-04105-y
46 pages
A Short Survey On Memory Based RL
No ratings yet
A Short Survey On Memory Based RL
18 pages
Full Download The Art of Reinforcement Learning Michael Hu PDF DOCX
100% (2)
Full Download The Art of Reinforcement Learning Michael Hu PDF DOCX
51 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Maai 6
No ratings yet
Maai 6
143 pages
关于视频游戏人工智能开发的深度学习最新技术
No ratings yet
关于视频游戏人工智能开发的深度学习最新技术
8 pages
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Green Conversion of Red Snapper Fish Scale-Derived Carbon Dots and Its Absorption Properties For Solar Thermal Desalination
No ratings yet
Green Conversion of Red Snapper Fish Scale-Derived Carbon Dots and Its Absorption Properties For Solar Thermal Desalination
9 pages
Business Enterprise Architecture in Fitness Center Using The Open Group Architecture Framework
No ratings yet
Business Enterprise Architecture in Fitness Center Using The Open Group Architecture Framework
8 pages
Quantized Consensus of Finite-Level Multi-Agent Networked Systems
No ratings yet
Quantized Consensus of Finite-Level Multi-Agent Networked Systems
12 pages
Utilization of The Heat From Combustion of Water For The Heating Process in Water Desalination
No ratings yet
Utilization of The Heat From Combustion of Water For The Heating Process in Water Desalination
7 pages
A Novel Reduced-Switch Multi-Level Inverter Based Multi-Device Universal Power-Quality Conditioner For PQ Enhancement
No ratings yet
A Novel Reduced-Switch Multi-Level Inverter Based Multi-Device Universal Power-Quality Conditioner For PQ Enhancement
12 pages
Detecting and Identifying Occluded and Camouflaged Objects in Low-Illumination Environments
No ratings yet
Detecting and Identifying Occluded and Camouflaged Objects in Low-Illumination Environments
9 pages
Mamdani Fuzzy-Based Water Quality Monitoring and Control System in Vannamei Shrimp Farming Using The Internet of Things
No ratings yet
Mamdani Fuzzy-Based Water Quality Monitoring and Control System in Vannamei Shrimp Farming Using The Internet of Things
8 pages
Detecting Facial Image Forgeries With Transfer Learning Techniques
No ratings yet
Detecting Facial Image Forgeries With Transfer Learning Techniques
13 pages
Taxonomic Attribution of The Haplic Gleysols of The Azerbaijan Republic in World Reference Base For Soil Resources
No ratings yet
Taxonomic Attribution of The Haplic Gleysols of The Azerbaijan Republic in World Reference Base For Soil Resources
7 pages
Performance Assessment of Routing Protocols For Campus Area Emergency Delay-Tolerant Network
No ratings yet
Performance Assessment of Routing Protocols For Campus Area Emergency Delay-Tolerant Network
9 pages
A Novel Solar PV Integrated Fuzzy-Logic Controlled UAPQC Device For Power Quality Enhancement
No ratings yet
A Novel Solar PV Integrated Fuzzy-Logic Controlled UAPQC Device For Power Quality Enhancement
11 pages
One-Step Phase Transferring Method On Preparing CuInS2/ZnS QDs Dispersion Via Ultrasonic Treatment For Bioimaging
No ratings yet
One-Step Phase Transferring Method On Preparing CuInS2/ZnS QDs Dispersion Via Ultrasonic Treatment For Bioimaging
11 pages
LAN Network Architecture Design at Nurul Jalal Islamic Boarding School, North Jakarta
No ratings yet
LAN Network Architecture Design at Nurul Jalal Islamic Boarding School, North Jakarta
11 pages
Improving The BERT Model For Long Text Sequences in Question Answering Domain
No ratings yet
Improving The BERT Model For Long Text Sequences in Question Answering Domain
10 pages
A Novel Fuzzy-Logic controller-MDsUPQC Topology For Power Quality Improvement in Multi-Feeder Distribution System
No ratings yet
A Novel Fuzzy-Logic controller-MDsUPQC Topology For Power Quality Improvement in Multi-Feeder Distribution System
13 pages
Complexes and Bioenergy of Invertebrates in Oil-Polluted Gray Brown Soils of Absheron
No ratings yet
Complexes and Bioenergy of Invertebrates in Oil-Polluted Gray Brown Soils of Absheron
7 pages
TherapyBot: A Chatbot For Mental Well-Being Using Transformers
No ratings yet
TherapyBot: A Chatbot For Mental Well-Being Using Transformers
12 pages
Enrichment of Microscopic Photographs by Utilizing CNN Regarding Soil-Transmitted Helminths Identification
No ratings yet
Enrichment of Microscopic Photographs by Utilizing CNN Regarding Soil-Transmitted Helminths Identification
8 pages
Fractal Analysis To Determine JRC On Sandstones and Its Correlation To SRF, Ende-Lianunu Regency, East Nusa Tenggara Province, Indonesia
No ratings yet
Fractal Analysis To Determine JRC On Sandstones and Its Correlation To SRF, Ende-Lianunu Regency, East Nusa Tenggara Province, Indonesia
12 pages
Artificial Intelligence Assisted Insights Into Bali's Destination Image: Sentiment and Thematic Analyses of TripAdvisor Reviews
No ratings yet
Artificial Intelligence Assisted Insights Into Bali's Destination Image: Sentiment and Thematic Analyses of TripAdvisor Reviews
10 pages
Proposed Algorithm Base Optimization Scheme For Intrusion Detection Using Feature Selection
No ratings yet
Proposed Algorithm Base Optimization Scheme For Intrusion Detection Using Feature Selection
9 pages
Low-Carbon No-Idle Permutation Flow Shop Schedulling Problem: Giant Trevally Optimizer Vs African Vultures Optimization Algorithm
No ratings yet
Low-Carbon No-Idle Permutation Flow Shop Schedulling Problem: Giant Trevally Optimizer Vs African Vultures Optimization Algorithm
10 pages
Antioxidant activity and inhibition of α-glucosidase from extract and fraction of leaves and stems of Vernonia amygdalina
No ratings yet
Antioxidant activity and inhibition of α-glucosidase from extract and fraction of leaves and stems of Vernonia amygdalina
11 pages
Experimental Study of RDF-5 Performance Based On Natural Waste On Fast Pyrolysis Process On The Quality of The Liquid Smoke
No ratings yet
Experimental Study of RDF-5 Performance Based On Natural Waste On Fast Pyrolysis Process On The Quality of The Liquid Smoke
7 pages
Performance Analysis of Protection Devices in Electrical Power Distribution System: Eko Electricity Distribution Company As A Case Study
No ratings yet
Performance Analysis of Protection Devices in Electrical Power Distribution System: Eko Electricity Distribution Company As A Case Study
7 pages
Mechanical and Thermal Properties of Recycled Polypropylene Bamboo Fibre-Reinforced Composites
No ratings yet
Mechanical and Thermal Properties of Recycled Polypropylene Bamboo Fibre-Reinforced Composites
9 pages
The Effect of Adding Sodium Tripolyphosphate On Thiobarbituric Acid Value, Crispiness, and Organoleptic On Salted Egg Crackers
No ratings yet
The Effect of Adding Sodium Tripolyphosphate On Thiobarbituric Acid Value, Crispiness, and Organoleptic On Salted Egg Crackers
8 pages
Internet of Things Time Saving and Management System For Bus Passengers
No ratings yet
Internet of Things Time Saving and Management System For Bus Passengers
8 pages
Internet of Things With NodeMCU ESP8266 For MPX-5700AP Sensor-Based LPG Pressure Monitoring
No ratings yet
Internet of Things With NodeMCU ESP8266 For MPX-5700AP Sensor-Based LPG Pressure Monitoring
8 pages
01 20697
No ratings yet
01 20697
10 pages
Portfolio Data Analyst
No ratings yet
Portfolio Data Analyst
2 pages
Example of A Matrix For An Essay
No ratings yet
Example of A Matrix For An Essay
2 pages
CSSRIW Lesson Plan
No ratings yet
CSSRIW Lesson Plan
66 pages
Stress Management by Sumit
No ratings yet
Stress Management by Sumit
26 pages
Lesson Plan 2
No ratings yet
Lesson Plan 2
5 pages
Cot DLP Math Grade 6
No ratings yet
Cot DLP Math Grade 6
11 pages
10 The Poverty of The Stimulus Argument
No ratings yet
10 The Poverty of The Stimulus Argument
45 pages
Resarch Project Case Bose Corporation
100% (1)
Resarch Project Case Bose Corporation
10 pages
Social Inclusion Through Segregation? A Tricountry Cooperation. Moldova, Ukraine and Sweden
No ratings yet
Social Inclusion Through Segregation? A Tricountry Cooperation. Moldova, Ukraine and Sweden
51 pages
Improvisation As Ability, Culture, and Experience
100% (1)
Improvisation As Ability, Culture, and Experience
7 pages
21st Century Skills Categories-Group2final
No ratings yet
21st Century Skills Categories-Group2final
42 pages
The GO Method
No ratings yet
The GO Method
109 pages
San Vert
No ratings yet
San Vert
1 page
San Pedro College of Business Administration KN Chapter 1 The Problem and Its Setting
No ratings yet
San Pedro College of Business Administration KN Chapter 1 The Problem and Its Setting
9 pages
Blended Learning in Higher Education
100% (1)
Blended Learning in Higher Education
14 pages
Basic Philosophical Perspectives of Art
100% (1)
Basic Philosophical Perspectives of Art
5 pages
EuroVariety 2017-Book of Abstracts
No ratings yet
EuroVariety 2017-Book of Abstracts
147 pages
Edpr 7521 Research Proposal
No ratings yet
Edpr 7521 Research Proposal
15 pages
HSSF372_MIDS_Q
No ratings yet
HSSF372_MIDS_Q
2 pages
Article Critique
No ratings yet
Article Critique
2 pages
AI and ML in IoT
No ratings yet
AI and ML in IoT
6 pages
1.EmployabilitySkill FullPaper PDF
No ratings yet
1.EmployabilitySkill FullPaper PDF
31 pages
Machine Learning: Louis Fippo Fitime
No ratings yet
Machine Learning: Louis Fippo Fitime
37 pages
Misic Intelligence Test Manual PDF
No ratings yet
Misic Intelligence Test Manual PDF
5 pages
Deleuze Hume Essay PDF
No ratings yet
Deleuze Hume Essay PDF
27 pages

Generating Intelligent Agent Behaviors in Multi-Agent Game AI Using Deep Reinforcement Learning Algorithm

Uploaded by

Generating Intelligent Agent Behaviors in Multi-Agent Game AI Using Deep Reinforcement Learning Algorithm

Uploaded by

International Journal of Advances in Applied Sciences (IJAAS)

Vol. 12, No. 4, December 2023, pp. 396~404

Generating intelligent agent behaviors in multi-agent game AI

Rosalina, Axel Sengkey, Genta Sahuri, Rila Mandala

Article Info ABSTRACT

Journal homepage: http://ijaas.iaescore.com

𝑃𝑟{𝑠𝑡+1 = 𝑠́ , 𝑟𝑡+1 = 𝑟 |𝑠𝑡 , 𝑟𝑡 , 𝑎𝑡 , 𝑠𝑡−1 , 𝑎𝑡−1 , … 𝑟1 , 𝑠0 , 𝑎0 } (1)

2.2. Markov decision models

𝜋 ∗ = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝔼[∑𝑡≥0 𝛾 𝑡 𝑟𝑡 |𝜋] (2)

2.3. Value function and Q-value function

𝑉 𝜋 (𝑠) = 𝔼[∑𝑡≥0 𝛾 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝜋] (3)

𝑄𝜋 (𝑠, 𝑎) = 𝔼[∑𝑡≥0 𝛾 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝑎0 = 𝑎, 𝜋] (4)

𝑄(𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + 𝛾 max 𝑄(𝑠 ′ , 𝑎) (5)

2.5. Deep Q-network

Algorithm 1. Deep Q-learning with experience replay

3. RESULTS AND DISCUSSION

Figure 1. The transformation of the preprocessed game frame

(a) (b) (c)

Table 1. Ms. PacMan's reward space–the foods and ghost-eating scores

1 Ghost 200 points

2 Ghost 400 points

3 Ghost 600 points

4 Ghost 800 points

Table 2. Ms. PacMan's reward space–the bonus fruits

Strawberry 200 points

Pretzel 700 points

Apple 1000 points

Banana 5000 points

Table 3. Testing scenario

Figure 7. The Breakout average score graph

Rosalina is a lecturer in the Informatics Study Program at President University.

Axel Sengkey is a student in the Informatics Study Program at President

Genta Sahuri is a lecturer in the Information System Study Program at President

Rila Mandala is the dean of the Faculty of Computing at President University.

You might also like