0% found this document useful (0 votes)
13 views4 pages

Dunham Alves Tetris Game Playing

This document discusses a project that implemented various Tetris-playing agents using Python, focusing on three AI approaches: state-based Q-learning, feature-based Q-learning, and reflex agents. The results indicated that state-based Q-learning was ineffective, while reflex and feature-based agents performed better, with reflex agents completing an average of 12.7 lines per game. The study concludes that for large state spaces, standard Q-learning is impractical, and the effectiveness of feature-based Q-learning heavily relies on the quality of the feature functions designed.

Uploaded by

wj916356
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Dunham Alves Tetris Game Playing

This document discusses a project that implemented various Tetris-playing agents using Python, focusing on three AI approaches: state-based Q-learning, feature-based Q-learning, and reflex agents. The results indicated that state-based Q-learning was ineffective, while reflex and feature-based agents performed better, with reflex agents completing an average of 12.7 lines per game. The study concludes that for large state spaces, standard Q-learning is impractical, and the effectiveness of feature-based Q-learning heavily relies on the quality of the feature functions designed.

Uploaded by

wj916356
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Tetris Game-playing Agents in Python

Michael Dunham Andrew Alves


University of Massachusetts, Lowell University of Massachusetts, Lowell
[email protected] [email protected]
ABSTRACT
Tetris is a classic tile matching game that entertains and
appeals to people across all generations. Tetris is a prime BEHIND THE SCENES
example of the set of problems where humans find The first challenge was to either create or find a suitable
solutions evident and intuitive yet there is significant Tetris environment for testing. With the goal in mind of
difficulty in formulating and playing the game using an creating multiple Tetris playing agents, it made sense to
agent of artificial intelligence. In this project, we looked to start with an open source implementation of Tetris to
compare three classical approaches in Artificial simplify the project workload and therefore concentrate our
Intelligence, state based q-learning, feature based q- efforts. With python being our language of choice, Pytris
learning, and reflex agents. Our experiments led to the seemed like the best possible option.
following conclusions. State based q-learning is ineffective
Using as much of the original code as possible, a controller
at playing Tetris. Well designed reflex agents and feature
based q-learning agents can play Tetris well. class was created which has the ability to communicate
with our agents. By inheriting information from the player,
Author Keywords
the controller keeps a dictionary of valuable state
Artificial Intelligence, algorithm, A.I., state, action, state- information so that our agents can use it to decide on proper
action pair, reflex, Q-learning, analysis actions.
The agent then chooses an action or series of actions and
INTRODUCTION sends it back to the controller. The controller keeps these
The goal of this project was to successfully implement as actions on a queue so that on each update, the action on top
many different types of agents as possible so that their of the queue can be performed.
performance could be compared against both each other and
human agents. We have been able to test a total of three
different agents. These agents are reflex, q-learning, and q-
learning with features. Each of these agents is more
sophisticated than the last with q-learning with features
being the greatest achievement of this project.

Though simple in theory, implementation of the controller


was non-trivial because possible game actions were
scattered throughout the starting code.

PROJECT DESCRIPTION

Underlying Representation
Unlike other analyses of the Tetris problem where the state
space has been reduced or simplified in order to reduce the
state space, in our implementation we considered a full
Tetris grid of width 10, and length 24 [TODO CIT]. There
exist seven possible block types:
B: {“LINE”, “SQUARE”, “Z”, “S”, “L”, “REVERSE_L”,
“TRIANGLE”}
With both of those functions written the algorithm
Additionally, each piece can be rotated into at most 4 of the reflex agent can be presented. The reflex agent
different pieces. This leads to the state space being works by first creating a list of all successor states that
exceptionally large. For example, consider any grid with a complete the greatest number of lines, and then choosing
s-piece about to be dropped. There are 30 different among those states those that score highest on a hand tuned
potential successor states that result from just one piece linear combination of the above two functions. The code is
being dropped. as follows:
applyRuleSet(permutations, old_grid):
Internally each state of the Tetris game was max = 0
represented as a 10x24 Matrix of values {0,1} depending sieve_results = list()
on whether that location in the grid was empty or not. As for p in permutations:
new piece was introduced to the grid, the various agents count = countCompletedLines(p)
were passed a list of grid states that represented all possible if count > max:
permutations of successor states. The agents then returned sieve_results = list()
sieve_results.append(p)
a list of actions for the game engine to execute. max = count
elif count == max:
Reflex Agent sieve_results.append(p)
The reflex agent was the first agent that was choice = max(sieve_results,
implemented and tested. The reflex agent acted as a key=lambda x:1 * calculateCompactness(x,
baseline for the subsequent agents in order to measure x.height, x.width)
performance. The reflex agent was designed with the + -.7 *
checkUnreachable(old_grid, x))
following informal rule set in mind "Complete lines when return choice
possible. Otherwise attempt to make the grid as compact as
possible and avoid making holes." Although the above algorithm seems simplistic, it actually
performs well in practice. It outperforms one of the authors
The first challenge was to formulate the
compactness of the graph in a way that the reflex agent of this paper and often completes 10 plus lines a game.
would treat properly. The algorithm that was chosen is as
follows: Q Learning Agent

calculateCompactness(Grid, Implemented second was the state based q learning


height, width) action. This q-learning agent works with no pre-knowledge
weight = 1 of actions that it should take, or even the model at all, and
sum = 0 instead works by analyzing feedback it receives through
for y in range(0,height): rewards and determines what actions to take based on
for x in range(0, width):
sum = sum + weight learned expected future rewards. The q-learning agent
weight = weight * .5 takes three different parameters:
This function has the desired feature that it scores successor α - learning rate
states that fill in lower blocks higher than those that do not. γ - discount
The second function that was needed for the reflex ε - exploration rate
agent was one that determined if a successor state would The learning rate is the rate at which new information is
result in more holes in the grid. added to its learned values. α = 0 results in an agent that
checkUnreachable(old_grid, new_grid) never learns and α = 1 and agent that never remembers old
new_tops = list()
old_tops= list()
inputs Discount is how much the agent appreciates values
count = 0 from states in the future. Exploration rate is how often the
for x in range(0, old_grid.width): agent chooses to take a non optimal action to learn more
for y in reversed(range(0, about the world. The update function for a state, action
old_grid.height)):
if old_grid[x][y]:
pairing in the q-learning agent is as follows:
old_tops[x] = y Q(s,a) = (1- α) * Q(s,a) + α *
break (reward + γ *
for x in range(0, new_grid.width):
for y in reversed(range(0,
max(Q(s',a')))
old_grid.height)): In the above equation, Q(s,a) is the learned value for that
if new_grid[x][y]: state action pairing. It can be seen however that this update
if y > old_tops[x] + 1 function learns based off of individual state action pairings
count++
break and has no concept of what states are similar to each other.
return count As a result, the table containing state action pairings and
their results grows obtrusively large without actually value compactness and density higher in the first iterations
providing valuable knowledge. This problem is intended to of the game.
be solved by the feature function based q-learning agent.

Feature Based Q Learning Agent


The feature based q-learning agent works by attempting to
Analysis of results
generalize qualities of Tetris states and thereby greatly
minimizing the problem solving difficulty that arises when Tetris has two build in score evaluators, completed lines
the state space is prohibitively large for traditional q- and level. Level is not particularly relevant since in our
learning. The mathematic underpinnings of this approach software system we had the Tetris game block on the
are as follows: agent's decision making. Therefore, in evaluating agents
we considered only completed lines as a metric. The
F - set of feature functions which map qualities of a state following were our results:
action pair to real number values
Reflex Agent: 12.7 lines completed per game over
250 games
Q(s,a) = wi - where wi is a weight value for
that feature function and fi F State Based Q-learning Agent: < 1 line completed
per game over 500 games
Weights are updated for the feature functions with
Feat Based Q-learning Agent: < 1 line completed
the equation
per game over 1200 games
wi = wi + α * (reward + γ * max(Q(s',a')) - Q(s,a)) *
fi(s,a) In the following paragraphs I will discuss each
agent and whether we consider it a failure or a success. First
The challenge of developing the feature based q-learning for the reflex agent. The reflex agent performed better than
agent lies in the engineering of the feature functions that are expected, and with more hand tuning of the weights, it
used. In developing the feature functions first we started could be expected to perform even better. Although 13
with qualities that an optimal Tetris board state should lines is not an especially large number per game, in some
have. games it completed as many as 45 lines.
The state based q-learning agent was also a
The first component that needed to be taken into
success. Although it performed worse than both the other
consideration when designing the feature functions was the
agents, it verified our original hypothesis that the state
reward functions of the Tetris game. Tetris is difficult in
space was much too large for the Tetris problem. The game
that it can take a random agent, like a fresh q-learning
slowed to a crawl after 250 games with the state action
agent, a long time to finally stumble onto positive rewards
table taking up significant amounts of memory and slowing
in the form of completing a line. In 1200 games of feature
down the entire system that the game was running on. This
based q-learning with a high exploration rate only 16 lines
is consistent with other work in the field such as in [2]
were completed. This resulted in the following reward
where state based q-learning performed sub optimally to
structure: (Losing game: -200 points), (Increasing max
other methods in solving Tetris due to the state space size.
height: -10 points), (Completing Line: 1000 points per line).
The feature based q-learning agent was a failure.
As for the feature functions themselves, they were Although, it ran fast and it implemented the weight
chosen in a similar way to the reflex agents rule set, by updating algorithm correctly, several attributes of the Tetris
considering what non-optimal game states, and optimal game made it not even begin to converge after 1200 games.
game states had in common. The features were as follows: This can be attributed to the rarity of positive feedback that
1.Number of completed lines in successor state Tetris provides to an agent that does not complete lines with
2.Number of holes created in successor state any frequency. This led to almost al l of the initial
3.Reciprocal of compactness of grid feedback being negative to the Tetris agent, and as a result
4. Reciprocal of density of grid it actually behaved contrary to common sense ways of
5. Maximum height of pieces in grid playing Tetris.
It should be evident how each of the features has
the potential to divide states that are optimal and sub A survey of feature functions used in other papers
optimal. The reciprocals of compactness and density of to solve Tetris show several feature functions that I had not
grid were used because that vast majority of feedback that used or thought valuable. For example in [3], whether there
the agent receives from the Tetris game is negative. are board wells was considered with the intent of
Therefore, in an effort to speed up convergence to optimal preventing board wells. In observation of human players
policy, the reciprocals were used so that the agent would however, board wells can be a particularly effective
technique. Furthermore, the sense of wells is already up convergence of feature function weights by restricting
contained somewhat in the density and compactness feature the sampling of feature functions to only a few to start
functions of my agent. before adding other features as the algorithm progresses.
Still, I expect that the best way of speeding up convergence
DISCUSSION is to alter the reward function of Tetris.
The purpose of this Tetris project to our Computer Science
education is to analyze the behavior of q-learning for
problems that have extremely large state spaces. In our
original hypothesis we expected standard q-learning to be
ineffective, the reflex agent to be mildly successful, and the CONCLUSIONS
feature based q-learning agent to outperform human In conclusion, for large state space problems, standard q-
players. Now I will discuss each of these hypotheses. learning is entirely ineffective. Feature based q-learning is
The standard q-learning agent had to theoretically effective depending entirely on the quality of the feature
map a state space that was around 22 millions states wide. functions chosen and given sufficiently large amounts of
Although theoretically q-learning would provide the best time to converge.
performance given enough time and enough memory, it is
ACKNOWLEDGMENTS
evident that practical concerns make it infeasible. In our
work with this agent, we confirmed the evident, that large The work described in this paper was conducted as part of a
state space problems are not suited for standard q-learning. Fall 2014 Artificial Intelligence course, taught in the
For the reflex agent, in its original conception was Computer Science department of the University of
to be a complex rule set that took actions based off of what Massachusetts Lowell by Prof. Fred Martin. Another
the current piece was and what the current grid looked like. thanks goes out to the creator of Pytris, who goes by the
However, we discovered that a simpler agent, that used
name Don Polettone. Without his code to start with, we
intelligent parameters to make its decisions would perform
at least as good as that complex rule set. In a way the reflex may not have been able to accomplish as much as we have.
agent is similar to the feature based q-learning agent in that Finally, the util.py class of the Berkley Pacman Project was
it uses a linear combination of state attributes to make its used to implement several algorithms in our project. As for
decisions, however it was hand tuned rather than having
the authors, Andrew Alves implemented the agents and
learned on its own.
Michael Dunham implemented the backend and much of
The feature based q-learning agent was the most the glue code between Pytris and the agents.
complex of the agents, and therefore provided the greatest
learning experience. The feature based agent sounds simple REFERENCES
in theory. Write the feature functions, and the agent will 1. Pytris 1.3.
figure out the rest. However we discovered that the larger
the state space, the more important it is to be extremely http://pygame.org/project-PYTRIS-2913-4779.html.
careful in designing the featgure functions. The feature 2. Driessens, K., & Eroski, S. (n.d.). Integrating
design process has similar qualities to learning to program guidance into relational reinforcement learning.
for the first time in that the agent does exactly what the 3. C. Thiery and B. Scherrer. Improvements on
features tell it to do, not what the designers intent was when Learning Tetris with Cross Entropy. International
writing those feature functions. Additionally, the scale of Computer Games Association Journal, 32, 2009
games needed to reach convergence became fully
appreciated by the authors after attempting to generate a 4. Steven Loscalzo , Robert Wright , Kevin Acunto , Lei
data set of 1200 games and discovering that the agent had Yu, Sample aware embedded feature selection for
barely learned at all. reinforcement learning, Proceedings of the fourteenth
international conference on Genetic and evolutionary
If work were to be continued on this project, I computation conference, July 07-11, 2012, Philadelphia,
would attempt to adapt the algorithms of [4] which speed Pennsylvania,USA

You might also like