Dunham Alves Tetris Game Playing

This document discusses a project that implemented various Tetris-playing agents using Python, focusing on three AI approaches: state-based Q-learning, feature-based Q-learning, and reflex agents. The results indicated that state-based Q-learning was ineffective, while reflex and feature-based agents performed better, with reflex agents completing an average of 12.7 lines per game. The study concludes that for large state spaces, standard Q-learning is impractical, and the effectiveness of feature-based Q-learning heavily relies on the quality of the feature functions designed.

Uploaded by

wj916356

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

Dunham Alves Tetris Game Playing

Uploaded by

wj916356

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Tetris Game-playing Agents in Python

Michael Dunham Andrew Alves

University of Massachusetts, Lowell University of Massachusetts, Lowell
[email protected] [email protected]
ABSTRACT
Tetris is a classic tile matching game that entertains and
appeals to people across all generations. Tetris is a prime BEHIND THE SCENES
example of the set of problems where humans find The first challenge was to either create or find a suitable
solutions evident and intuitive yet there is significant Tetris environment for testing. With the goal in mind of
difficulty in formulating and playing the game using an creating multiple Tetris playing agents, it made sense to
agent of artificial intelligence. In this project, we looked to start with an open source implementation of Tetris to
compare three classical approaches in Artificial simplify the project workload and therefore concentrate our
Intelligence, state based q-learning, feature based q- efforts. With python being our language of choice, Pytris
learning, and reflex agents. Our experiments led to the seemed like the best possible option.
following conclusions. State based q-learning is ineffective
Using as much of the original code as possible, a controller
at playing Tetris. Well designed reflex agents and feature
based q-learning agents can play Tetris well. class was created which has the ability to communicate
with our agents. By inheriting information from the player,
Author Keywords
the controller keeps a dictionary of valuable state
Artificial Intelligence, algorithm, A.I., state, action, state- information so that our agents can use it to decide on proper
action pair, reflex, Q-learning, analysis actions.
The agent then chooses an action or series of actions and
INTRODUCTION sends it back to the controller. The controller keeps these
The goal of this project was to successfully implement as actions on a queue so that on each update, the action on top
many different types of agents as possible so that their of the queue can be performed.
performance could be compared against both each other and
human agents. We have been able to test a total of three
different agents. These agents are reflex, q-learning, and q-
learning with features. Each of these agents is more
sophisticated than the last with q-learning with features
being the greatest achievement of this project.

Though simple in theory, implementation of the controller

was non-trivial because possible game actions were
scattered throughout the starting code.

PROJECT DESCRIPTION

Underlying Representation
Unlike other analyses of the Tetris problem where the state
space has been reduced or simplified in order to reduce the
state space, in our implementation we considered a full
Tetris grid of width 10, and length 24 [TODO CIT]. There
exist seven possible block types:
B: {“LINE”, “SQUARE”, “Z”, “S”, “L”, “REVERSE_L”,
“TRIANGLE”}
With both of those functions written the algorithm
Additionally, each piece can be rotated into at most 4 of the reflex agent can be presented. The reflex agent
different pieces. This leads to the state space being works by first creating a list of all successor states that
exceptionally large. For example, consider any grid with a complete the greatest number of lines, and then choosing
s-piece about to be dropped. There are 30 different among those states those that score highest on a hand tuned
potential successor states that result from just one piece linear combination of the above two functions. The code is
being dropped. as follows:
applyRuleSet(permutations, old_grid):
Internally each state of the Tetris game was max = 0
represented as a 10x24 Matrix of values {0,1} depending sieve_results = list()
on whether that location in the grid was empty or not. As for p in permutations:
new piece was introduced to the grid, the various agents count = countCompletedLines(p)
were passed a list of grid states that represented all possible if count > max:
permutations of successor states. The agents then returned sieve_results = list()
sieve_results.append(p)
a list of actions for the game engine to execute. max = count
elif count == max:
Reflex Agent sieve_results.append(p)
The reflex agent was the first agent that was choice = max(sieve_results,
implemented and tested. The reflex agent acted as a key=lambda x:1 * calculateCompactness(x,
baseline for the subsequent agents in order to measure x.height, x.width)
performance. The reflex agent was designed with the + -.7 *
checkUnreachable(old_grid, x))
following informal rule set in mind "Complete lines when return choice
possible. Otherwise attempt to make the grid as compact as
possible and avoid making holes." Although the above algorithm seems simplistic, it actually
performs well in practice. It outperforms one of the authors
The first challenge was to formulate the
compactness of the graph in a way that the reflex agent of this paper and often completes 10 plus lines a game.
would treat properly. The algorithm that was chosen is as
follows: Q Learning Agent

calculateCompactness(Grid, Implemented second was the state based q learning

height, width) action. This q-learning agent works with no pre-knowledge
weight = 1 of actions that it should take, or even the model at all, and
sum = 0 instead works by analyzing feedback it receives through
for y in range(0,height): rewards and determines what actions to take based on
for x in range(0, width):
sum = sum + weight learned expected future rewards. The q-learning agent
weight = weight * .5 takes three different parameters:
This function has the desired feature that it scores successor α - learning rate
states that fill in lower blocks higher than those that do not. γ - discount
The second function that was needed for the reflex ε - exploration rate
agent was one that determined if a successor state would The learning rate is the rate at which new information is
result in more holes in the grid. added to its learned values. α = 0 results in an agent that
checkUnreachable(old_grid, new_grid) never learns and α = 1 and agent that never remembers old
new_tops = list()
old_tops= list()
inputs Discount is how much the agent appreciates values
count = 0 from states in the future. Exploration rate is how often the
for x in range(0, old_grid.width): agent chooses to take a non optimal action to learn more
for y in reversed(range(0, about the world. The update function for a state, action
old_grid.height)):
if old_grid[x][y]:
pairing in the q-learning agent is as follows:
old_tops[x] = y Q(s,a) = (1- α) * Q(s,a) + α *
break (reward + γ *
for x in range(0, new_grid.width):
for y in reversed(range(0,
max(Q(s',a')))
old_grid.height)): In the above equation, Q(s,a) is the learned value for that
if new_grid[x][y]: state action pairing. It can be seen however that this update
if y > old_tops[x] + 1 function learns based off of individual state action pairings
count++
break and has no concept of what states are similar to each other.
return count As a result, the table containing state action pairings and
their results grows obtrusively large without actually value compactness and density higher in the first iterations
providing valuable knowledge. This problem is intended to of the game.
be solved by the feature function based q-learning agent.

Feature Based Q Learning Agent

The feature based q-learning agent works by attempting to
Analysis of results
generalize qualities of Tetris states and thereby greatly
minimizing the problem solving difficulty that arises when Tetris has two build in score evaluators, completed lines
the state space is prohibitively large for traditional q- and level. Level is not particularly relevant since in our
learning. The mathematic underpinnings of this approach software system we had the Tetris game block on the
are as follows: agent's decision making. Therefore, in evaluating agents
we considered only completed lines as a metric. The
F - set of feature functions which map qualities of a state following were our results:
action pair to real number values
Reflex Agent: 12.7 lines completed per game over
250 games
Q(s,a) = wi - where wi is a weight value for
that feature function and fi F State Based Q-learning Agent: < 1 line completed
per game over 500 games
Weights are updated for the feature functions with
Feat Based Q-learning Agent: < 1 line completed
the equation
per game over 1200 games
wi = wi + α * (reward + γ * max(Q(s',a')) - Q(s,a)) *
fi(s,a) In the following paragraphs I will discuss each
agent and whether we consider it a failure or a success. First
The challenge of developing the feature based q-learning for the reflex agent. The reflex agent performed better than
agent lies in the engineering of the feature functions that are expected, and with more hand tuning of the weights, it
used. In developing the feature functions first we started could be expected to perform even better. Although 13
with qualities that an optimal Tetris board state should lines is not an especially large number per game, in some
have. games it completed as many as 45 lines.
The state based q-learning agent was also a
The first component that needed to be taken into
success. Although it performed worse than both the other
consideration when designing the feature functions was the
agents, it verified our original hypothesis that the state
reward functions of the Tetris game. Tetris is difficult in
space was much too large for the Tetris problem. The game
that it can take a random agent, like a fresh q-learning
slowed to a crawl after 250 games with the state action
agent, a long time to finally stumble onto positive rewards
table taking up significant amounts of memory and slowing
in the form of completing a line. In 1200 games of feature
down the entire system that the game was running on. This
based q-learning with a high exploration rate only 16 lines
is consistent with other work in the field such as in [2]
were completed. This resulted in the following reward
where state based q-learning performed sub optimally to
structure: (Losing game: -200 points), (Increasing max
other methods in solving Tetris due to the state space size.
height: -10 points), (Completing Line: 1000 points per line).
The feature based q-learning agent was a failure.
As for the feature functions themselves, they were Although, it ran fast and it implemented the weight
chosen in a similar way to the reflex agents rule set, by updating algorithm correctly, several attributes of the Tetris
considering what non-optimal game states, and optimal game made it not even begin to converge after 1200 games.
game states had in common. The features were as follows: This can be attributed to the rarity of positive feedback that
1.Number of completed lines in successor state Tetris provides to an agent that does not complete lines with
2.Number of holes created in successor state any frequency. This led to almost al l of the initial
3.Reciprocal of compactness of grid feedback being negative to the Tetris agent, and as a result
4. Reciprocal of density of grid it actually behaved contrary to common sense ways of
5. Maximum height of pieces in grid playing Tetris.
It should be evident how each of the features has
the potential to divide states that are optimal and sub A survey of feature functions used in other papers
optimal. The reciprocals of compactness and density of to solve Tetris show several feature functions that I had not
grid were used because that vast majority of feedback that used or thought valuable. For example in [3], whether there
the agent receives from the Tetris game is negative. are board wells was considered with the intent of
Therefore, in an effort to speed up convergence to optimal preventing board wells. In observation of human players
policy, the reciprocals were used so that the agent would however, board wells can be a particularly effective
technique. Furthermore, the sense of wells is already up convergence of feature function weights by restricting
contained somewhat in the density and compactness feature the sampling of feature functions to only a few to start
functions of my agent. before adding other features as the algorithm progresses.
Still, I expect that the best way of speeding up convergence
DISCUSSION is to alter the reward function of Tetris.
The purpose of this Tetris project to our Computer Science
education is to analyze the behavior of q-learning for
problems that have extremely large state spaces. In our
original hypothesis we expected standard q-learning to be
ineffective, the reflex agent to be mildly successful, and the CONCLUSIONS
feature based q-learning agent to outperform human In conclusion, for large state space problems, standard q-
players. Now I will discuss each of these hypotheses. learning is entirely ineffective. Feature based q-learning is
The standard q-learning agent had to theoretically effective depending entirely on the quality of the feature
map a state space that was around 22 millions states wide. functions chosen and given sufficiently large amounts of
Although theoretically q-learning would provide the best time to converge.
performance given enough time and enough memory, it is
ACKNOWLEDGMENTS
evident that practical concerns make it infeasible. In our
work with this agent, we confirmed the evident, that large The work described in this paper was conducted as part of a
state space problems are not suited for standard q-learning. Fall 2014 Artificial Intelligence course, taught in the
For the reflex agent, in its original conception was Computer Science department of the University of
to be a complex rule set that took actions based off of what Massachusetts Lowell by Prof. Fred Martin. Another
the current piece was and what the current grid looked like. thanks goes out to the creator of Pytris, who goes by the
However, we discovered that a simpler agent, that used
name Don Polettone. Without his code to start with, we
intelligent parameters to make its decisions would perform
at least as good as that complex rule set. In a way the reflex may not have been able to accomplish as much as we have.
agent is similar to the feature based q-learning agent in that Finally, the util.py class of the Berkley Pacman Project was
it uses a linear combination of state attributes to make its used to implement several algorithms in our project. As for
decisions, however it was hand tuned rather than having
the authors, Andrew Alves implemented the agents and
learned on its own.
Michael Dunham implemented the backend and much of
The feature based q-learning agent was the most the glue code between Pytris and the agents.
complex of the agents, and therefore provided the greatest
learning experience. The feature based agent sounds simple REFERENCES
in theory. Write the feature functions, and the agent will 1. Pytris 1.3.
figure out the rest. However we discovered that the larger
the state space, the more important it is to be extremely http://pygame.org/project-PYTRIS-2913-4779.html.
careful in designing the featgure functions. The feature 2. Driessens, K., & Eroski, S. (n.d.). Integrating
design process has similar qualities to learning to program guidance into relational reinforcement learning.
for the first time in that the agent does exactly what the 3. C. Thiery and B. Scherrer. Improvements on
features tell it to do, not what the designers intent was when Learning Tetris with Cross Entropy. International
writing those feature functions. Additionally, the scale of Computer Games Association Journal, 32, 2009
games needed to reach convergence became fully
appreciated by the authors after attempting to generate a 4. Steven Loscalzo , Robert Wright , Kevin Acunto , Lei
data set of 1200 games and discovering that the agent had Yu, Sample aware embedded feature selection for
barely learned at all. reinforcement learning, Proceedings of the fourteenth
international conference on Genetic and evolutionary
If work were to be continued on this project, I computation conference, July 07-11, 2012, Philadelphia,
would attempt to adapt the algorithms of [4] which speed Pennsylvania,USA

AI Practical File
No ratings yet
AI Practical File
20 pages
Ai Practical Journal
No ratings yet
Ai Practical Journal
24 pages
Luo Vessels Jeffrey Yuen NESA
100% (4)
Luo Vessels Jeffrey Yuen NESA
174 pages
Midterm Exam - Solution
100% (1)
Midterm Exam - Solution
7 pages
202 2007 0 B PDF
100% (1)
202 2007 0 B PDF
10 pages
Shahal Farid AI Assignment 221484
No ratings yet
Shahal Farid AI Assignment 221484
14 pages
Pat Tutorial1v6 With Answers
No ratings yet
Pat Tutorial1v6 With Answers
11 pages
MI Sheet 2 - Ch. 3
No ratings yet
MI Sheet 2 - Ch. 3
7 pages
Examples of Toy Problems
No ratings yet
Examples of Toy Problems
7 pages
Super - Nim Report
No ratings yet
Super - Nim Report
4 pages
A - Mini - Project - Report - Tic - Tac - Toe 12
No ratings yet
A - Mini - Project - Report - Tic - Tac - Toe 12
18 pages
Midterm Exam - Solution
No ratings yet
Midterm Exam - Solution
7 pages
csd311 Project
No ratings yet
csd311 Project
11 pages
HW2 Multi-Agent Pacman PDF
No ratings yet
HW2 Multi-Agent Pacman PDF
9 pages
AI Lab File
No ratings yet
AI Lab File
24 pages
21 Dcs
No ratings yet
21 Dcs
20 pages
Ijaerv9n22 07
No ratings yet
Ijaerv9n22 07
13 pages
Assignment 1 Lab Report: 1 Task 1
No ratings yet
Assignment 1 Lab Report: 1 Task 1
12 pages
MIT6 034F10 Assn3
No ratings yet
MIT6 034F10 Assn3
11 pages
All
No ratings yet
All
10 pages
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
No ratings yet
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
6 pages
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
No ratings yet
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
6 pages
Term Project Report CIS 667
No ratings yet
Term Project Report CIS 667
20 pages
Artificial Intelligence Project
No ratings yet
Artificial Intelligence Project
11 pages
AI Unit 2
No ratings yet
AI Unit 2
41 pages
COCI Contest 4 Solutions
0% (1)
COCI Contest 4 Solutions
1 page
5 - Tic-Tak-Toe
No ratings yet
5 - Tic-Tak-Toe
7 pages
Desing and Analysis of Algorithms Team 11 AP Report
No ratings yet
Desing and Analysis of Algorithms Team 11 AP Report
4 pages
Ece 517 Final Project
No ratings yet
Ece 517 Final Project
15 pages
Berkeley AI Materials
No ratings yet
Berkeley AI Materials
1 page
R22-AI Lab Manual
No ratings yet
R22-AI Lab Manual
24 pages
1 PENTAGO Game
No ratings yet
1 PENTAGO Game
4 pages
AZUL Report Team 7
No ratings yet
AZUL Report Team 7
6 pages
Ai File Amrit
No ratings yet
Ai File Amrit
31 pages
AI Practical Exam
No ratings yet
AI Practical Exam
14 pages
Report
No ratings yet
Report
3 pages
Ai Projects
No ratings yet
Ai Projects
17 pages
Lab Programs
No ratings yet
Lab Programs
14 pages
Lab Programs
No ratings yet
Lab Programs
16 pages
Backtrackprob
No ratings yet
Backtrackprob
17 pages
Case StudyAI
No ratings yet
Case StudyAI
9 pages
Ai Unit 3
No ratings yet
Ai Unit 3
33 pages
Ai Lab Programs
No ratings yet
Ai Lab Programs
21 pages
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
No ratings yet
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
6 pages
Ai Theory: Answer To Q (1) : PEAS Description PEAS Stands For Performance, Environment, Actuators and Sensor. The PEAS
No ratings yet
Ai Theory: Answer To Q (1) : PEAS Description PEAS Stands For Performance, Environment, Actuators and Sensor. The PEAS
15 pages
Hw3 Programming
No ratings yet
Hw3 Programming
4 pages
CSE3664 - ProjectReport - Tic Tac Toe Game With AI Player
No ratings yet
CSE3664 - ProjectReport - Tic Tac Toe Game With AI Player
11 pages
Cross-Entropy Method For Reinforcement Learning
No ratings yet
Cross-Entropy Method For Reinforcement Learning
24 pages
Ai Lab File 2
No ratings yet
Ai Lab File 2
45 pages
Csci 4511w Artificial
No ratings yet
Csci 4511w Artificial
4 pages
Practice Question and Answers
No ratings yet
Practice Question and Answers
11 pages
AI Artificial Intelligence
No ratings yet
AI Artificial Intelligence
28 pages
Exp No 1: Implementation of Toy Problems (Tic Tac Toe) : 8-Puzzle)
No ratings yet
Exp No 1: Implementation of Toy Problems (Tic Tac Toe) : 8-Puzzle)
12 pages
Labview Core 1
No ratings yet
Labview Core 1
7 pages
Threads14 StrangeLoop Problems
No ratings yet
Threads14 StrangeLoop Problems
10 pages
CH 3
No ratings yet
CH 3
25 pages
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
No ratings yet
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
31 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
16 pages
Tik Tak Toe AI GAME Documentation
No ratings yet
Tik Tak Toe AI GAME Documentation
22 pages
As 1418.4-2004 Cranes Hoists and Winches Tower Cranes
No ratings yet
As 1418.4-2004 Cranes Hoists and Winches Tower Cranes
8 pages
04 Samss 035
No ratings yet
04 Samss 035
16 pages
Varian TOGA
No ratings yet
Varian TOGA
3 pages
ZQ200 User Manual V2.2
No ratings yet
ZQ200 User Manual V2.2
20 pages
Module 5 - Rocks
No ratings yet
Module 5 - Rocks
14 pages
Thriller English
No ratings yet
Thriller English
69 pages
Converting MicroSim® Schematics Designs To OrCAD Capture® Designs
No ratings yet
Converting MicroSim® Schematics Designs To OrCAD Capture® Designs
44 pages
Objective:: Power Plant Lab (Me-223L) Experiment No: 6 Title: Demonistration of Steam Engine
No ratings yet
Objective:: Power Plant Lab (Me-223L) Experiment No: 6 Title: Demonistration of Steam Engine
5 pages
Echoes of The Tambaran Masculinity History and The Subject in The Work of Donald F Tuzin David Lipset Instant Download
No ratings yet
Echoes of The Tambaran Masculinity History and The Subject in The Work of Donald F Tuzin David Lipset Instant Download
85 pages
Bab3 Matrikulasi
No ratings yet
Bab3 Matrikulasi
31 pages
Review of Design Fire Heat Release Rate For Tunnels With Fire Suppression Systems
No ratings yet
Review of Design Fire Heat Release Rate For Tunnels With Fire Suppression Systems
11 pages
Prerequis R
No ratings yet
Prerequis R
38 pages
XS2D LogPlot
No ratings yet
XS2D LogPlot
16 pages
Current El WS 14-12-24
No ratings yet
Current El WS 14-12-24
31 pages
Newgen Software Technologies Limited: Date: 30 BSE Limited National Stock Exchange of India Limited
No ratings yet
Newgen Software Technologies Limited: Date: 30 BSE Limited National Stock Exchange of India Limited
249 pages
Machine Standard Configuration: Horizon 03ix
No ratings yet
Machine Standard Configuration: Horizon 03ix
8 pages
Schools Division of Parañaque City Technology and Livelihood Education Electrical Installation & Maintenance 9 Quarter 4 Week 7 & 8 Wiring Diagrams
No ratings yet
Schools Division of Parañaque City Technology and Livelihood Education Electrical Installation & Maintenance 9 Quarter 4 Week 7 & 8 Wiring Diagrams
4 pages
Your Reliance Bill: Summary of Current Charges Amount (RS)
No ratings yet
Your Reliance Bill: Summary of Current Charges Amount (RS)
3 pages
New Curriculum Machine Design Projcet II Course Outline
No ratings yet
New Curriculum Machine Design Projcet II Course Outline
2 pages
Moba Compaction Assistance
No ratings yet
Moba Compaction Assistance
12 pages
Uc Colorado Springs
No ratings yet
Uc Colorado Springs
17 pages
Chap 4
No ratings yet
Chap 4
17 pages
Hyd Pressure Spek
No ratings yet
Hyd Pressure Spek
3 pages
Sample Questions For Citrix 1y0 312 Exam by Moon
No ratings yet
Sample Questions For Citrix 1y0 312 Exam by Moon
10 pages
Student Guide M2
No ratings yet
Student Guide M2
49 pages
Abyip 2024 1
No ratings yet
Abyip 2024 1
11 pages
MBTI Final
No ratings yet
MBTI Final
8 pages
NIPS2019 TGAN Supplementary PDF
No ratings yet
NIPS2019 TGAN Supplementary PDF
7 pages
Monday Tuesday Wednesday Thursday Friday: GRADES 1 To 12 Daily Lesson Log
No ratings yet
Monday Tuesday Wednesday Thursday Friday: GRADES 1 To 12 Daily Lesson Log
3 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Dunham Alves Tetris Game Playing

Uploaded by

Dunham Alves Tetris Game Playing

Uploaded by

Tetris Game-playing Agents in Python

Michael Dunham Andrew Alves

Though simple in theory, implementation of the controller

calculateCompactness(Grid, Implemented second was the state based q learning

Feature Based Q Learning Agent

You might also like