0% found this document useful (0 votes)
105 views9 pages

Applied Machine Learning For Games A Graduate School Course

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views9 pages

Applied Machine Learning For Games A Graduate School Course

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

PRELIMINARY PREPRINT VERSION: DO NOT CITE

The AAAI Digital Library will contain the published


version some time after the conference.

Applied Machine Learning for Games: A Graduate School Course


Yilei Zeng, Aayush Shah, Jameson Thai, Michael Zyda
University of Southern California
{yilei.zeng, aayushsh, jamesont, zyda}@usc.edu

Abstract research-oriented, industry-oriented or patent-oriented di-


rections. The projects’ difficulties are also dynamically ad-
The game industry is moving into an era where old-style justable towards different students’ learning curve or prior
game engines are being replaced by re-engineered systems
experiences in machine learning. In this class, we intend
with embedded machine learning technologies for the opera-
tion, analysis and understanding of game play. In this paper, to encourage further research into different gaming areas
we describe our machine learning course designed for gradu- by requiring students to work on a semester-long research
ate students interested in applying recent advances of deep project in groups of up to 8. Students work on incorporat-
learning and reinforcement learning towards gaming. This ing deep learning and reinforcement learning techniques in
course serves as a bridge to foster interdisciplinary collab- different aspects of game-play creation, simulation, or spec-
oration among graduate schools and does not require prior tating. These projects are completely driven by the student
experience designing or building games. Graduate students along any direction they wish to explore. Giving the students
enrolled in this course apply different fields of machine learn- an intrinsic motivation to engage on their favorite ideas will
ing techniques such as computer vision, natural language not only make teaching more time efficient but also bestow
processing, computer graphics, human computer interaction,
a long-term meaning to the course project which will open
robotics and data analysis to solve open challenges in gam-
ing. Student projects cover use-cases such as training AI-bots doors for them. By having a semester-long project, students
in gaming benchmark environments and competitions, under- can dive deep into different algorithms. They also receive
standing human decision patterns in gaming, and creating in- hands-on experience incorporating various ML algorithms
telligent non-playable characters or environments to foster for their use case.
engaging gameplay. Projects demos can help students open Writing, presenting and teamwork proficiency is a critical
doors for an industry career, aim for publications, or lay the component of a higher education, and this courses involve
foundations of a future product. Our students gained hands- writing assignments, extensive team collaboration and oral
on experience in applying state of the art machine learning
techniques to solve real-life problems in gaming.
presentation to a public audience. Student performance on
formal writing assignments, project actualization and pub-
lic presentation provides benchmarks for examining student
Introduction progress, both within and across semesters.
Applied machine learning in games is now a vividly expand- This experience report describes a three semester-long ef-
ing research field that provides a platform for novel vision, fort in an applied machine learning course with advanced
language, robotics, and online social interaction algorithms. research orientations in gaming. This course has withstood
Exposure to state-of-the-art research literature is an integral the test through in-person, hybrid learning, and completely
part of the course plan, in part because research community online modalities separately. We contribute a new course de-
is moving forward at an ever-increasing speed and under- sign inline with the most recent advancements in the gam-
standing several backbone papers will clarify the research ing research community. This course attracts and caters to
question and enhance an understanding of the iterations and mutual interests across engineering graduate programs. Of
improvements made. Moreover, an emphasis on the state-of- the 292 students enrolled in this course over 3 semesters;
the-art research methods fosters an appreciation of research 1.3% major in Environmental Engineering, Physics , Chem-
design and methodology, and more generally, of the impor- istry or Computer Networks, 1.3% are Software Engineering
tance of critical evaluation. Therefore, new ideas can be gen- or High-Performance Computing, 2% are Game Develop-
erated based on critical thinking. ment, 3.2% are Electrical Engineering, 4% are Intelligent
As this course does not require prerequisites on machine Robotics, 7% are Computer Engineering, 9% are Applied
learning, we encourage learning by doing. A self-proposed Data Science, 9.2% are Data Science and the majority of
project will enable the students to tailor themselves into students, 63%, are majored in General Computer Science.
Students are expected to gain both creative and fun hands-
Copyright © 2021, Association for the Advancement of Artificial on experience through a semester-long applied deep learn-
Intelligence (www.aaai.org). All rights reserved. ing and reinforcement learning project. This course demon-
strates the feasibility of teaching and conducting state-of- PySC2 is DeepMind’s Python component of the StarCraft
the-art applied machine learning research within mixed fo- II Learning Environment (SC2LE) (Vinyals et al. 2017).
cused engineering graduate students. This course also shows STARDATA (Lin et al. 2017), a StarCraft: Brood War re-
the capability to help students open doors for an industry ca- play dataset, is published with the StarCraft II API. Mi-
reer, aim for publications, or lay the foundations of a future crosoft announced Project Malmo (Johnson et al. 2016),
product. which provides an open-source platform built on top of
Minecraft. MineRL Environments built on Malmo are re-
Background leased for NeurIPS competitions and MineRL imitation
Aiming for approaching Artificial General Intelligence learning datasets (Johnson et al. 2016) with over 60 million
(AGI), video games such as Atari, Doom, Minecraft, Dota2 frames of recorded human player data are published to facil-
1
, StarCraft, and driving games have been used extensively itate research. The Unity Machine Learning Agents Toolkit
to test the deep learning and reinforcement learning meth- (ML-Agents) (Juliani et al. 2018) is an open-source project
ods’ performance and generalizability. Following Google’s that enables games and simulations created by individuals
Alpha Go (Silver et al. 2016), researchers have made to serve as environments for training intelligent agents. As
steady progress in improving AI’s game playing capabilities. an active research field, new environments and tasks emerge
Besides creating intelligent Non-player characters (NPC), daily. We leave the constant learning to students as they
game testing and level generation have also seen advance- progress through their projects.
ment with deep learning for the gaming industry. Moreover,
Machine learning can unleash the power of data generated Computer Vision & Natural Language Processing
from millions of players worldwide. Gaming provides nu- Learning to play from pixels have become a widely accepted
merous behavioral data for online user profiling, advertise- approach for traning AI agents after DeepMinds paper of
ment recommendation, modeling social interactions, and un- playing Atari with Deep Reinforcement Learning (Mnih
derstanding decision-making strategies. Apart from in-game et al. 2013) using raw pixels as input. Vision-based user in-
trajectories, Esports and streaming open new research op- puts augmented automatic face, and gesture recognition has
portunities for multi-modal machine learning that combines enabled the fitness game genre to boost. With the pandemic
textual, audio natural language processing, computer vision in 2020, virtual reality devices and fitness gaming has of-
with social media. Gaming simulated interactive environ- fered a safe and entertaining indoor option. With the boom-
ments can extend beyond gaming and adopt practical values ing of streaming platforms, elaborate walk-through, strate-
for robotics, health, and broader social good. gies, and sentiments shared via videos provided a wealth of
We cover all the following topics in our course. The cited data for applied computer vision tasks such as motion analy-
work also serve as supplementary reading materials. And sis and activity recognition. Leveraging the information pro-
these topics will be exemplified in the Student Projects sec- vided in the YouTube videos, researchers can guide deep
tion. reinforcement learning explorations for games with sparse
rewards (Aytar et al. 2018).
Benchmark Environments and Competitions Understanding players’ textual interactions, both in-game
For academic and individual researchers, the IEEE Con- and on social networks, is crucial for gaming companies
ference on Games(COG), AAAI Conference on Artificial to prevent toxicity and increase inclusion. In gaming, lan-
Intelligence and Interactive Digital Entertainment(AIIDE), guage generation techniques are leveraged to generate narra-
Conference on the Foundations of Digital Games (FDG), tives for interactive and creative storytelling. Text adventure
and Conference on Neural Information Processing Systems games is an active task for reinforcement learning (RL) fo-
(NeurIPS) host a series of annual competitions featuring cre- cused Natural Language Processing (NLP) researchers. Mi-
ating deep learning and reinforcement learning algorithms crosoft introduced TextWorld (Côté et al. 2018), a text-
for game-play generation or AI playing games. based game generator, as send box learning environment for
Major technology companies open-sourced a number of training and testing RL agents.
gaming AI environments to help push forward the bound- Recent progress on deep representations on both com-
aries of Artificial general intelligence (AGI). OpenAI re- puter vision and natural language processing have enabled
leases for public OpenAI Gym (Brockman et al. 2016), the exploration on issues of active perception, long-term
which incorporates Arcade Learning Environment (ALE) planning, learning from interaction, and holding a dialog
that emulates Atari 2600 game-playing (Bellemare et al. grounded in an simulated environment. Simulated housing
2013), robotics, and expanding third party environments. environments such as the ALFRED (Action Learning From
Gym Retro (Nichol et al. 2018) extends the integration to Realistic Environments and Directives) (Shridhar et al.
1000 retro games, including games from the Sega Gene- 2020) project in Allen Institute and Habitat Lab from Face-
sis and Sega Master System, and Nintendo’s NES, SNES, book research (Savva et al. 2019), serve for embodied AI
and Game Boy consoles. Facebook AI has released ELF: tasks (e.g. navigation, instruction following, question an-
An Extensive, Lightweight, and Flexible Platform for Game swering), configuring embodied agents (physical form, sen-
Research (Tian et al. 2017), which provides three environ- sors, capabilities), training these agents (via imitation or re-
ments, i.e., MiniRTS, Capture the Flag, and Tower Defense. inforcement learning), and benchmarking their performance
on the defined tasks using standard metrics. AI and language
1
OpenAI Five (Last accessed: 12/15/2020) instructed MiniRTS project (Hu et al. 2019) from Facebook
AI is similar to this initiative. on their team’s progress to the entire class. We encourage
the students to be prepared with questions before class to
Player Modeling and Human AI Interactions learn proactively rather than learning passively. The instruc-
Social gaming, such as the Battle-Royale genre and Animal tor evaluates the progress and provides either algorithmic
Crossing, has gained increasing popularity. Combined with suggestions or structural suggestions to facilitate their learn-
heterogeneous data provided on social media and streaming ing and project formulation every week.
platforms, understanding and predicting players’ behavior We host the midterm and final on the 8th and 15th week.
patterns considering graph structures becomes increasingly Each team will present PowerPoint and live or recorded de-
important. The data provided by major AAA games will of- mos on their project on both midterm and final. We will
fer resources to imitating and modeling human behaviors also collect the Engineering Design Document (EDD) and
(Sapienza et al. 2018; Zeng 2020) and facilitate understand- a technical paper draft on both midterm and final to foster
ing of human collaborations (Zeng, Sapienza, and Ferrara continuous contribution. We require each team to construct
2019). a website to present their project demos to help them on the
Gaming industry with exuberant data of in-game human job market. The gradings’ weights are 20% for mid-term
collaborations makes suitable sand-box environments for EDD, 20% for the mid-term draft of technical paper, 10%
conducting multi-agent interaction/collaboration research. for midterm presentation, 20% for final EDD, 20% for final
For instance, multi-agent Hide-and-Seek (Baker et al. of technical paper, 10% for final presentation.
2019), OpenAI Five (Berner et al. 2019), AlphaStar The learning objective for the course: (1) Students learn
(Vinyals et al. 2019), Hanabi (Bard et al. 2020) and capture deep learning and reinforcement learning fundamentals
the flag (Jaderberg et al. 2019) are some initial attempts. through lectures and supplemental materials; (2) Students
With detailed human behavior trajectory recorded as learn the most recent advancements, landscape, and applied
replays or demos, gaming environments provide data- use cases of machine learning for gaming; (3) Students can
intensive sources for human-computer interaction research. unleash their creativity in projects that cater to their career
Recent advancements of AI in games has evolved human- plans; (4) Students engage in teamwork and practice both
computer interactions in gaming environments into human oral and written presentation skills.
bots interactions. As suggested in paper (Risi and Preuss The course first introduces students to the difference be-
2020), with the increasing popularity in human/AI interac- tween Artificial Intelligence, Machine Learning, and Deep
tions, we will see more research on human-like NPC and Learning (Ongsulee 2017). We then cover the survey of
human-AI collaboration in the future. Deep learning applications in games (Justesen et al. 2019)
to give students a tentative idea on projects they can pursue.
Procedural Content Generation Following the lecture, students must select a machine learn-
Procedural Content Generation via Machine Learning (ab- ing project and the game they will work on. The course in-
breviated PCGML) (Summerville et al. 2018) embraces structors will guide and instruct students’ projects according
a broadening scope, incorporating automatic generation to the sub-directions shown in backgrounds, i.e., benchmark
of levels, gaming environments, characters, stories, music, environments and competitions, computer vision and natural
even game-play mechanics. In the future, more reliable and language processing, player modeling and human-AI inter-
explainable machine learning algorithms will emerge in this actions, procedural content generation, simulated interactive
direction. environments, etc.
Apart from building a new research project from scratch,
Simulated Interactive Environments and beyond students can choose to advance on projects created in the
previous semesters for better algorithmic AI performances.
Playtesting, matchmaking, dynamic difficulty adaptation In the first half of the course, we introduce the funda-
(DDA) are some other important tasks for gaming industry mentals of deep learning. We start with the concept of back-
to solve using machine learning. propagation (Hecht-Nielsen 1992), along with gradient de-
Beyond gaming, interactive environments are used to scent (Baldi 1995) is covered to solidify student’s theoret-
mimic real-life scenes such as training robots or autonomous ical understanding of Neural Networks. The different acti-
vehicles. Interactive gaming environments can also serve as vation functions covered include the sigmoid, tanh (LeCun
demonstrations for game theory decision makings that serve et al. 2012) and ReLu (Nair and Hinton 2010) functions.
AI for social good initiatives. We cover a tutorial on combining Neural Networks with
Genetic Algorithms in a simulated game environment for
Course Design Flappy Bird. Students are then introduced to popular Deep
The semester-long course comprises 15 lectures. The de- Learning frameworks like Tensorflow and Pytorch.
tailed course structure consists of weekly lectures on deep We then move onto Convolutional Neural Networks
learning and reinforcement learning fundamentals, project (CNNs). Students are introduced to the convolution layer,
demonstrations of how each technique are applied in gaming pooling layer, and fully connected layer along with their re-
use cases and openly available tools or environments. Upon spective functionalities. We also cover appropriate activation
the conclusion of the lecture, each team updates their weekly functions and loss functions for CNNs. A brief overview of
progress to the course instructors. Every alternate week stu- state-of-art deep CNN based architectures for object detec-
dents conduct a power-point presentation along with a demo tion tasks are given to students. These include R-CNN (Gir-
shick et al. 2014), Fast R-CNN (Girshick 2015), Faster R- Reading Assignments
CNN (Ren et al. 2015) and YOLO (Redmon et al. 2016; Material for reading assignments primarily stems from An-
Redmon and Farhadi 2017, 2018). We cover a sample pro- drew Glassner’s textbook titled Deep Learning: From Ba-
gram on image classification tasks (Lee et al. 2018) using sics to Practice. This course is supplemented by various
Tensorflow. Students are encouraged to experiment with the sources, including articles on websites such as Medium,
source code and try different CNN configurations to improve TowardsDataScience, tutorials from GDC, TensorFlow, Py-
the classifier’s accuracy. torch, OpenAI Gym, ML-Agents, and survey papers of re-
Following CNN, we explore different variants of a Re- cent advancements in gaming AI research. These materials
current Neural Network (RNN) (Graves 2012). RNNs are incorporate detailed information on implementing specific
used for sequence tasks. Long short-term memory (LSTM) deep learning or reinforcement learning algorithms, step-
(Hochreiter and Schmidhuber 1997) overcome the explod- by-step guides for implementing a gaming AI project from
ing and vanishing gradient problems (Hochreiter 1998; Pas- scratch, and state-of-the-art research papers as references.
canu, Mikolov, and Bengio 2012) in vanilla RNN, which
enables them to learn long term dependencies more effec- Guest Lectures
tively. We explore a case study on LSTM-based architecture We invited 2-3 guest lecturers every semester who were ei-
implemented for the game of FIFA 18.2 After 400 minutes of ther experienced professionals from the gaming industry or
training, the LSTM based bot scored 4 goals in 6 games of Ph.D. students researching Deep Learning and Reinforce-
FIFA 18 on beginner difficulty. ment Learning for games. These lecturers provided valuable
Moving on, we introduce Generative Adversarial Net- insights to students into how machine learning is applied in
works (GANs) (Goodfellow et al. 2014) and its variations. different gaming research areas. Some of the topics covered
We then give an example of using GANs to generate high- in these lectures include applications of Deep Learning in
quality anime characters (Jin et al. 2017). Zynga, Policy Gradient based agents for Doom, and current
In the second half of the course, we introduce the fun- research frontiers for machine learning in gaming. The lec-
damentals of reinforcement learning. We start by answer- turers also attended student presentations and provided stu-
ing the following questions: What is Reinforcement Learn- dents with feedback on technologies that they could utilize
ing? Why is it needed in games? What are its advantages in for their respective projects.
games? Why can’t we use supervised learning in games? We
then introduce Markov Decision Process (MDP), Partially Student Projects
Observable Markov Decision Process (POMDP) (Mnih et al. This section selected and summarized 32 student course
2015; Astrom 1965), value iteration (Bellman 1957) and projects, covering various topics based on the different sub-
policy iteration. domains illustrated earlier in the background section.
We move on to introduce Q-learning (Watkins 1989) and
Deep Q-Networks (DQN) (Mnih et al. 2013). In 2013, a Machine Learning for Playing Games
Deep Q-Network was applied to play seven Atari 2600 To train AI agents in League of Legends, one project used
games (Mnih et al. 2013). In 2015 the same network was YOLOv3 object detection algorithm to identify different
used to beat human-level performance in 49 games (Mnih champions and NPCs in League of Legends. They also
et al. 2015). For this course we ask students to refer to a trained two separate agents, one combining PPO and LSTM,
sample program that uses a DQN for Flappy Bird game 3 . and one supervised LSTM trained on keyboard and mouse
Students are encouraged to tune the model’s parameters and pressed captured from the advanced League of Legends
run the training scripts to get a better practical understanding players. In a one-on-one custom game, agents achieved first
of Deep Q-Learning. blood against amateur and average players, respectively (Lo-
Lastly, we introduce students to Policy Gradient algo- hokare, Shah, and Zyda 2020).
rithms (Kakade 2002). Policy gradient based algorithms Tackling a tower defense game, one team focused on for-
such as Actor-Critic (Konda and Tsitsiklis 2000; Fujimoto, mulating a strategy to place towers. The agent also had to
Van Hoof, and Meger 2018; Mnih et al. 2016) and Proximal monitor gold income from destroying monsters and view the
Policy Optimization (Schulman et al. 2017) have provided best locations and timing to place the tower as well as tower
state of art performance for Reinforcement Learning tasks upgrades. Using a CNN, the agent is trained on summarized
(Stooke and Abbeel 2018). A Case Analysis to play Torc, a data of randomly generated tower placements where each
racing car game, using Policy Gradient is covered 4 to sup- sample includes the placement of towers, selling and up-
plement the material covered in class. Students are given a grade of towers, and the last wave number achieved.
chance to develop their agents to play the game Dino Run 5 Scotland Yard and Ghostbusters are two similar projects
and compete with the remainder of the class. that aim to train agents to play hide and seek. The agents use
an asymmetric environment with incomplete information to
either seek or hide from the other agent. There are one hid-
2
FIFA 18 AI (Last accessed: 12/15/2020) ing player and five seeker players. For both games, the two
3
Flappy Bird AI (Last accessed: 12/15/2020) teams built DQN based agents with different reward shaping
4
TORCS AI (Last accessed: 12/15/2020) functions for the seekers as well as the hider. Figure 1 shows
5
Dino Run AI (Last accessed: 12/15/2020) the environment for training an agent in Scotland Yard.
Another DeepMind inspired team explored the research
and underlying architecture of the multi-agent system, Al-
phaStar, in the RTS environment Starcraft II. Specifically,
the project aimed to utilize algorithms such as DQN with
experience replay, CNN, Q-learning, and behavior tree to
model different agents against an AI. The team success-
fully trained four agents where each agent played 100 games
against an easy AI where the win rates were 13, 68, 96, and
59, respectively.

Figure 1: DQN based agent for Ghostbusters Computer Vision


Deep fake applications which uses deep learning to gener-
ate fake images or videos have raised debates in AI com-
An agent trained to play the online multiplayer game munity. One project applied realistic and personalized head
Slither.IO aim to achieving a high score against other play- models in a one-shot setting as an overlay to video game
ers. Applying a DQN and Epsilon Greedy Learning Strategy, characters. They picked Unity3D Mario model for exper-
the agent observed the game’s current frame to determine a iment. Taking a video input, the machine learning system
direction to move in. extracted facial landmarks on a person and mapped them to
PokemonShowdown is an online game simulator to play a a specified character model. In mapping human features to
one-on-one match of Pokemon. With a predefined Pokemon a character Mario, the system primarily looked at detecting
set, an agent was trained using a DQN with states incorpo- the head model as well as specific facial features; the eyes,
rating the arena state, player active, and reserve states to de- nose, and mouth.
termine its next actions. Against a minimax agent, the DQN Counter-Strike Global Offensive (CSGO) is a popular on-
agent won 17 games out of 20 and effectively learned super line first-person shooter game. Using object detection mod-
effective moves and generally avoided minimally effective els based on Single-Shot Detection and Retinanet, an agent
ones. was trained to play the game while distinguishing friends
Donkey Kong is a Nintendo 1983 arcade game where from foes. The agent also demonstrated visual transfer learn-
Mario has to reach Donkey Kong while dodging barrels. ing between the newer CSGO and the older Counter-Strike
Starting from a minimal interface, a team mapped and fed 1.6 game, where the model learned low-level features com-
each object’s bit locations to an agent based on a Q-learning. mon to both CS 1.6 and CSGO.
This agent could be further broken down into a priority and Motion recognition is an important task in computer vi-
background agent. This project successfully produced an sion. One team developed a game from scratch while lever-
agent that can complete the first level in Donkey Kong. aging Computer Vision techniques in Unity 3D. The team
created an underwater endless runner game where the agent
Benchmark Environments and Competitions must overcome random rock hurdles and collect money.
MarioKart64 is a benchmark game for numerous tutorials Python’s OpenCV package was used to detect human body
and competitions. Using a CNN and DAGGER algorithm, movements and move the submarine correspondingly. As
the team compared their agent’s recoverability from going the human player moving left, right, up (jump) or down
off track or immediately using power-ups. Moreover, the (crouch), the submarine responded in the same directions via
team applied transfer learning to a Donkey Kong racing TensorFlow’s PoseNet.
game. A different motion capture project focuses on pose esti-
mation and accurate fidelity for weightlifting form. The team
Two Pommerman teams worked on building an agent to
collected data of both good and bad forms of various exer-
play the NES game Bomberman. Both teams used Pytorch
cises to be marked and fed into a OpenPose model. They
and TensorFlow but differed in that one focused on PPO
tackled the project in three approaches; splitting the input
and A2C whereas the other team focused on Accelerated
into a series of periodic frames, summarizing frames, and
Proximal Policy Optimization (APPO). Along with differ-
feeding full frames into a Keras ResNet CNN model. The
ent reward functions, the teams found that PPO and APPO
video is evaluated by a voting system model that tells the
agents on average outperformed the A2C agent in explor-
user if the exercise had good or bad form.
ing the game board but not necessarily in laying bombs or
winning the game.
Inspired by DeepMind’s AlphaGo, one team tackled the Natural Language Processing
game of Go with their agent. Despite the hardware differ- Featuring text-to-speech and automatic speech recogni-
ence, the team successfully trained an agent to win over am- tion, MapChat helps users practice their English speaking
ateur human player. Using greedy and simple neural network skills through a quest-based role-playing game. Users can
agents as a benchmark, the team’s agent utilized both tradi- move around and complete objectives by speaking prompted
tional and deep learning algorithms to outperform the base- phrases in specific locations using a simple map. The au-
line agents, achieving the same rank as an advanced amateur dio is recorded and processed to provide feedback to the
player (1 dan). user regarding how clear and cohesive the response is. Fig-
Procedural Content Generation
Part of the famous Mario genre, Super Mario Bros. is a side-
scrolling game where levels are meticulously designed from
scratch. However, with procedural generation, a level can be
produced and deployed with minimal or no design changes.
Using an RNN, LSTM, and Markov Chain model, the team
map a sequence of characters to an object in the Mario world
that is later interpreted as a Mario Level. Each generated
level is evaluated by an A* agent to determine if the agent
can complete the level. Ultimately the Markov model pro-
duced the best ratio of completed to incomplete levels fol-
lowed by the RNN and LSTM models.
Figure 2: Game interface for MapChat: A game designed Generating creative arts play an important role in gam-
leveraging text-to-speech and automatic speech recognition ing. One team worked on constructing a GAN for character
to teach players English and asset creation in their custom-Unity game. In addition to
building a GANs, the team used Unity’s ML-agents libraries
as a framework to build offensive and defensive AI’s which
were trained with different reward functions.
ure 2 shows the game interface developed by students for
Using conditional GANs, one team augmented real videos
Mapchat.
with stylized game environments. A reference style image is
Language generation is a challenging task in NLP. Utiliz- used as an input to encode two vectors to generate a Gaus-
ing FIFA and Pro Evolution Soccer commentaries, a team sian random noise based model to a video generator. SPADE
generated on-the-fly commentary for a football game. Ap- ResNet blocks are then used to reinforce the segmentation
plying NLP techniques, the team fed their Tensorflow model mask and provide coherent and consistent video frames. The
game prompts as seed words to produce relevant sentences constructed standalone model could be used to apply a ref-
that produced coherent and realistic announcer prompts. An- erence style to any real-world input video.
other team sourced their information from movie databases
like IMDb and IMSDb. With the goal of dialogue genera- Simulated Interactive Environments
tion, their system examines keywords, dialogues, and sen-
timent from game text. Using multiple models and frame- Motivated by the 2020 Australian Wildfires, a team simu-
works such as Markovify, LSTM, and Watson, the team gen- lated a wildfire in a forest ecosystem through a Unity video
erated coherent character dialogues. game. The goal is to train an agent, represented by a fire-
fighter dog, to find and save endangered animals from the
fire without hurting itself. Using Unity’s Machine Learning
Data Science and Player Modeling Library, the team trained the agent using PPO, Behavioral
Cloning (BC), and Generative Adversarial Imitation Learn-
One CSGO project (Zeng et al. 2020) proposes a Sequence ing (GAIL) to evaluate how long each agent takes to rescue
Reasoner with Round Attribute Encoder and Multi-Task De- an animal.
coder to interpret the round-based purchasing decisions’ There were several self-driving student projects in this
strategies. They adopt few-shot learning to sample multi- course. Students developed autonomous driving agents for
ple rounds in a match and modified model agnostic meta- the games Grand Theft Auto V, Mario kart, Track Ma-
learning algorithm Reptile for the meta-learning loop. They nia Nations Forever, TORCS, and Live for Speed racing
formulate each round as a multi-task sequence generation simulator. Different techniques were applied to each of
problem. The state representations combine action encoder, these projects. Object detection algorithms such as AlexNet,
team encoder, player features, round attribute encoder, and VGG16, YOLOv3, and Hough transformations were imple-
economy encoders to help their agent learn to reason under mented to detect the racing track and obstacles within the
this specific multi-player round-based scenario. A complete game and avoid collision with other vehicles. DQN, Imita-
ablation study and comparison with the greedy approach cer- tion Learning, Policy Gradients, and Transfer Learning were
tifies the effectiveness of their model. experimented with to train the agent to drive.
Instead of looking at the in-game content of CSGO, an- Another self-navigating team trained an autonomous
other team examined the mental state of the CSGO Twitch quadcopter in a simulated 3D environment shown in the left
audience to detect and define a metric of audience immer- of Figure 3. The team implement a unified quadrotor con-
sion. Representations of different modalities are fed to a trol policy using PPO, curriculum, and supervised learning
multi-modal fusion system. Representations learned through to enable the drone going along a specific path. This policy
CNN and RNN cover three modalities, i.e., video record- generalized the task as a single learning task, significantly
ings, commentary audio, and Twitch chat. The model as- reducing the amount of training needed. In addition to re-
signs text inputs with positive or negative connotations that ward shaping, the team tested the model across different
later use gameplay audio to capture and map audience im- track environments, such as a 2D square, 2D square diag-
mersion. onal, and a descending ellipse.
Your initial learning motivation What have you learned most What did you struggle with On a scale of 1(low) On a scale of 1(low)
for taking this class? from this class? Can choose most in class? -5(high), how would -5(high), how do you
Sample Size 93. more than one, sample size 55. Can choose more than one, you recommend think this class will
52% AI in Interactive Entertainment sample size 55. this class to your help you find a
38% Computer Vision 75% Hand on Team 38% Division of Labor friends? full-time job or
29% Game theory, Project Experiences within Team Sample size 55. internship?
Multi-agent Systems 74% Applied Machine 36% Applying Deep Learning Sample size 55.
23% NLP, Data Science Learning for Gaming Algorithms 0% 1 0% 1
Human-Computer Interaction Tasks 31% Finding the topic 2% 2 5% 2
12% Procedural Content Generation 53% Deep Learning and 16% Finding the team 9% 3 33% 3
5% Gaming Benchmark Reinforcement Learning 9% Lost during 29% 4 51% 4
Environments and Competitions Theory weekly progress 60% 5 11% 5

Table 1: Our evaluation for the class is based on two sets of surveys containing five questions in total. The detailed survey
statistics are listed below each question.

culties for both students and instructors.

Evaluations and Conclusion


Table 1 shows our survey results to evaluate the course.
A majority of students gave high ratings for recommending
this course to other students, the usefulness of this course for
finding an internship or a full-time job, and learning from
team projects to get applied machine learning hands-on ex-
periences. The survey results indicate positive feedback for
(a) Quad-copter Project (b) Humanoid Project the course.
From a teaching perspective, we encountered three chal-
Figure 3: Example projects simulating and training robotics lenges: At what level should we balance theoretical deep
agents in OpenAI Gym. learning and reinforcement learning lecture materials and
applied environment demonstrations; How to create an adap-
tive learning curve for students with varying machine learn-
Apart from simulating drones, students also explored on ing backgrounds; and how to form an innovative research
simulated humanoids in robotics seen in the right of Figure pipeline at the graduate school level to facilitate publica-
3. Using PyBullet, the agent was trained in varying environ- tions. Throughout the three semesters, we learned that more
ments to move a ball to a designated area. Applying rein- visual aids, such as live demonstrations and videos, are
forcement learning algorithms like PPO and A2C, the team needed to increase online engagement as we move into on-
simulated two movements; hopping and walking in different line virtual courses. Weekly project demonstrations in front
wind environments for each movement. The team defined of the whole class will create a healthy peer effect that in-
reward functions based on how long the agent remains alive, creases learning efficacy. Within three semesters, three re-
how close it is to the target location, and how much move- search conference papers have been published, and more
ment is taken to achieve the goal. in preparation. From the students’ self-proposed projects,
we strengthened our belief that gaming as an interdisci-
plinary research domain can reach other fields, such as
Resources robotics, medical diagnosis, human-computer interactions,
For this course, we provide each student with $50 Google etc. Games are the testbeds for advancing state-of-the-arts
Cloud Credit (GCP) to be utilized for training Deep Learn- learning algorithms. In the future, the class can benefit
ing algorithms. This sums up to $300 for a team of 6 stu- from state-of-the-arts paper reading sessions and live cod-
dents. In addition to this, students are provided laboratory ing demonstrations to help graduate students build a com-
access to high-end Windows systems. These systems are prehensive understanding of how a research project is built.
equipped with NVIDIA GTX 1080 GPUs, 32GB RAM, and This report summarizes the design of our applied machine
Intel i7 7th generation processors. Students may access the learning course for graduate students interested in applying
labs at any time to train their Machine Learning algorithms. recent machine learning advancements towards gaming. We
This course structure withstood the challenges of transi- engage students in learning deep learning and reinforcement
tioning from in-person to hybrid and eventually to fully on- learning through practical team projects, regardless of the
line modalities throughout the COVID-19 Pandemic. The major and machine learning expertise level. We familiar-
resistance to risks is mainly credited to the extensive use ize students with the current applied research landscape and
of Google Cloud services, Github for code version control, improve students’ oral and written presentation skills. Our
Slack and Zoom for instant communication, as well as Pi- course can help students open doors for an industry career,
azza and Blackboard for course logistics. The semester-long aim for publications, or lay the foundations of future inno-
team project has also provided flexible but adjustable diffi- vative products.
References Hecht-Nielsen, R. 1992. Theory of the backpropagation
Astrom, K. J. 1965. Optimal control of Markov decision neural network. In Neural networks for perception, 65–93.
processes with incomplete state estimation. J. Math. Anal. Elsevier.
Applic. 10: 174–205. Hochreiter, S. 1998. The vanishing gradient problem during
learning recurrent neural nets and problem solutions. Inter-
Aytar, Y.; Pfaff, T.; Budden, D.; Paine, T.; Wang, Z.; and
national Journal of Uncertainty, Fuzziness and Knowledge-
de Freitas, N. 2018. Playing hard exploration games by
Based Systems 6(02): 107–116.
watching youtube. In Advances in Neural Information Pro-
cessing Systems, 2930–2941. Hochreiter, S.; and Schmidhuber, J. 1997. Long short-term
memory. Neural computation 9(8): 1735–1780.
Baker, B.; Kanitscheider, I.; Markov, T.; Wu, Y.; Powell, G.;
McGrew, B.; and Mordatch, I. 2019. Emergent tool use from Hu, H.; Yarats, D.; Gong, Q.; Tian, Y.; and Lewis, M. 2019.
multi-agent autocurricula. arXiv:1909.07528 . Hierarchical decision making by generating and following
natural language instructions. In Advances in neural infor-
Baldi, P. 1995. Gradient descent learning algorithm mation processing systems, 10025–10034.
overview: A general dynamical systems perspective. IEEE
Jaderberg, M.; Czarnecki, W. M.; Dunning, I.; Marris, L.;
Transactions on neural networks 6(1): 182–195.
Lever, G.; Castaneda, A. G.; Beattie, C.; Rabinowitz, N. C.;
Bard, N.; Foerster, J. N.; Chandar, S.; Burch, N.; Lanctot, Morcos, A. S.; Ruderman, A.; et al. 2019. Human-level per-
M.; Song, H. F.; Parisotto, E.; Dumoulin, V.; Moitra, S.; formance in 3D multiplayer games with population-based
Hughes, E.; et al. 2020. The hanabi challenge: A new fron- reinforcement learning. Science 364(6443): 859–865.
tier for ai research. Artificial Intelligence 280: 103216. Jin, Y.; Zhang, J.; Li, M.; Tian, Y.; Zhu, H.; and Fang,
Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. Z. 2017. Towards the automatic anime characters cre-
2013. The arcade learning environment: An evaluation plat- ation with generative adversarial networks. arXiv preprint
form for general agents. Journal of Artificial Intelligence arXiv:1708.05509 .
Research 47: 253–279. Johnson, M.; Hofmann, K.; Hutton, T.; and Bignell, D. 2016.
Bellman, R. 1957. A Markovian decision process. Journal The Malmo Platform for Artificial Intelligence Experimen-
of mathematics and mechanics 679–684. tation. In IJCAI, 4246–4247.
Berner, C.; Brockman, G.; Chan, B.; Cheung, V.; D˛ebiak, Juliani, A.; Berges, V.-P.; Vckay, E.; Gao, Y.; Henry, H.;
P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S.; Hesse, Mattar, M.; and Lange, D. 2018. Unity: A general platform
C.; et al. 2019. Dota 2 with large scale deep reinforcement for intelligent agents. arXiv preprint arXiv:1809.02627 .
learning. arXiv preprint arXiv:1912.06680 . Justesen, N.; Bontrager, P.; Togelius, J.; and Risi, S. 2019.
Deep Learning for Video Game Playing. IEEE Transactions
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.;
on Games 1–1. ISSN 2475-1510. doi:10.1109/TG.2019.
Schulman, J.; Tang, J.; and Zaremba, W. 2016. Openai gym.
2896986.
arXiv preprint arXiv:1606.01540 .
Kakade, S. M. 2002. A natural policy gradient. In Advances
Côté, M.-A.; Kádár, A.; Yuan, X.; Kybartas, B.; Barnes, T.; in neural information processing systems, 1531–1538.
Fine, E.; Moore, J.; Tao, R. Y.; Hausknecht, M.; Asri, L. E.;
Adada, M.; Tay, W.; and Trischler, A. 2018. TextWorld: Konda, V. R.; and Tsitsiklis, J. N. 2000. Actor-critic algo-
A Learning Environment for Text-based Games. CoRR rithms. In Advances in neural information processing sys-
abs/1806.11532. tems, 1008–1014.
Fujimoto, S.; Van Hoof, H.; and Meger, D. 2018. Addressing LeCun, Y. A.; Bottou, L.; Orr, G. B.; and Müller, K.-R. 2012.
function approximation error in actor-critic methods. arXiv Efficient backprop. In Neural networks: Tricks of the trade,
preprint arXiv:1802.09477 . 9–48. Springer.
Lee, S.-J.; Chen, T.; Yu, L.; and Lai, C.-H. 2018. Image clas-
Girshick, R. 2015. Fast r-cnn. In Proceedings of the IEEE sification based on the boost convolutional neural network.
international conference on computer vision, 1440–1448. IEEE Access 6: 12755–12768.
Girshick, R.; Donahue, J.; Darrell, T.; and Malik, J. 2014. Lin, Z.; Gehring, J.; Khalidov, V.; and Synnaeve, G. 2017.
Rich feature hierarchies for accurate object detection and Stardata: A starcraft ai research dataset. arXiv preprint
semantic segmentation. In Proceedings of the IEEE confer- arXiv:1708.02139 .
ence on computer vision and pattern recognition, 580–587.
Lohokare, A.; Shah, A.; and Zyda, M. 2020. Deep Learning
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Bot for League of Legends. In Proceedings of the AAAI
Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. Conference on Artificial Intelligence and Interactive Digital
2014. Generative adversarial nets. In Advances in neural Entertainment, volume 16, 322–324.
information processing systems, 2672–2680.
Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.;
Graves, A. 2012. Supervised sequence labelling. In Su- Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asyn-
pervised sequence labelling with recurrent neural networks, chronous methods for deep reinforcement learning. In In-
5–13. Springer. ternational conference on machine learning, 1928–1937.
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Computer Vision and Pattern Recognition (CVPR). URL
Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Play- https://arxiv.org/abs/1912.01734.
ing atari with deep reinforcement learning. arXiv preprint Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.;
arXiv:1312.5602 . Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.;
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Ve- Panneershelvam, V.; Lanctot, M.; et al. 2016. Mastering the
ness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fid- game of Go with deep neural networks and tree search. na-
jeland, A. K.; Ostrovski, G.; et al. 2015. Human-level con- ture 529(7587): 484–489.
trol through deep reinforcement learning. Nature 518(7540): Stooke, A.; and Abbeel, P. 2018. Accelerated meth-
529–533. ods for deep reinforcement learning. arXiv preprint
Nair, V.; and Hinton, G. E. 2010. Rectified linear units im- arXiv:1803.02811 .
prove restricted boltzmann machines. In Proceedings of the Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.;
27th international conference on machine learning (ICML- Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J.
10), 807–814. 2018. Procedural content generation via machine learning
Nichol, A.; Pfau, V.; Hesse, C.; Klimov, O.; and Schulman, (PCGML). IEEE Transactions on Games 10(3): 257–270.
J. 2018. Gotta Learn Fast: A New Benchmark for General- Tian, Y.; Gong, Q.; Shang, W.; Wu, Y.; and Zitnick, C. L.
ization in RL. arXiv preprint arXiv:1804.03720 . 2017. ELF: An Extensive, Lightweight and Flexible Re-
Ongsulee, P. 2017. Artificial intelligence, machine learning search Platform for Real-time Strategy Games. Advances in
and deep learning. In 2017 15th International Conference Neural Information Processing Systems (NIPS) .
on ICT and Knowledge Engineering (ICT KE), 1–6. ISSN Vinyals, O.; Babuschkin, I.; Czarnecki, W. M.; Mathieu, M.;
2157-0981. doi:10.1109/ICTKE.2017.8259629. Dudzik, A.; Chung, J.; Choi, D. H.; Powell, R.; Ewalds,
Pascanu, R.; Mikolov, T.; and Bengio, Y. 2012. Understand- T.; Georgiev, P.; et al. 2019. Grandmaster level in Star-
ing the exploding gradient problem. CoRR, abs/1211.5063 Craft II using multi-agent reinforcement learning. Nature
2: 417. 575(7782): 350–354.
Redmon, J.; Divvala, S.; Girshick, R.; and Farhadi, A. 2016. Vinyals, O.; Ewalds, T.; Bartunov, S.; Georgiev, P.; Vezhn-
You only look once: Unified, real-time object detection. In evets, A. S.; Yeo, M.; Makhzani, A.; Küttler, H.; Agapiou, J.;
Proceedings of the IEEE conference on computer vision and Schrittwieser, J.; et al. 2017. Starcraft ii: A new challenge for
pattern recognition, 779–788. reinforcement learning. arXiv preprint arXiv:1708.04782 .
Redmon, J.; and Farhadi, A. 2017. YOLO9000: better, Watkins, C. J. C. H. 1989. Learning from delayed rewards .
faster, stronger. In Proceedings of the IEEE conference on Zeng, Y. 2020. How Human Centered AI Will Contribute
computer vision and pattern recognition, 7263–7271. Towards Intelligent Gaming Systems. The Thirty-Fifth AAAI
Redmon, J.; and Farhadi, A. 2018. Yolov3: An incremental Conference on Artificial Intelligence (AAAI-21) .
improvement. arXiv preprint arXiv:1804.02767 . Zeng, Y.; Lei, D.; Li, B.; Jiang, G.; Ferrara, E.; and Zyda, M.
Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster r-cnn: 2020. Learning to Reason in Round-Based Games: Multi-
Towards real-time object detection with region proposal net- Task Sequence Generation for Purchasing Decision Making
works. In Advances in neural information processing sys- in First-Person Shooters. In Proceedings of the AAAI Con-
tems, 91–99. ference on Artificial Intelligence and Interactive Digital En-
tertainment, volume 16, 308–314.
Risi, S.; and Preuss, M. 2020. From Chess and Atari to Star-
Craft and Beyond: How Game AI is Driving the World of Zeng, Y.; Sapienza, A.; and Ferrara, E. 2019. The Influ-
AI. KI-Künstliche Intelligenz 34(1): 7–17. ence of Social Ties on Performance in Team-based Online
Games. IEEE Transactions on Games .
Sapienza, A.; Zeng, Y.; Bessi, A.; Lerman, K.; and Fer-
rara, E. 2018. Individual performance in team-based online
games. Royal Society open science 5(6): 180329.
Savva, M.; Kadian, A.; Maksymets, O.; Zhao, Y.; Wijmans,
E.; Jain, B.; Straub, J.; Liu, J.; Koltun, V.; Malik, J.; Parikh,
D.; and Batra, D. 2019. Habitat: A Platform for Embodied
AI Research. In Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV).
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and
Klimov, O. 2017. Proximal policy optimization algorithms.
arXiv preprint arXiv:1707.06347 .
Shridhar, M.; Thomason, J.; Gordon, D.; Bisk, Y.; Han,
W.; Mottaghi, R.; Zettlemoyer, L.; and Fox, D. 2020. AL-
FRED: A Benchmark for Interpreting Grounded Instruc-
tions for Everyday Tasks. In The IEEE Conference on

You might also like