Dissertation
Dissertation
Dissertation
Dissertation
Author:
Ziqiang Diao
February 3, 2017
Supervisor:
Dissertation
Doktoringenieur (Dr.-Ing.)
Gutachterinnen/Gutachter
This work has been completed with the help of many people. Here, I would like to
express my sincere gratitude to them.
First of all, I would like to thank my supervisor, Prof. Dr. Gunter Saake. Thank him
for giving me the opportunity to pursue further studies in Germany. This work was
supported by his careful guidance. From him, I realized the precision and seriousness
of Germans. He helped me all the time. Even when his hand was injured and unable to
write, or during the holidays, he still insisted on working in order to help me improve
my dissertation. Here, I would like to once again express my respect for him. It is my
pleasure to study in your working group.
Secondly, I would like to thank Dr. Eike Schallehn, who is always ready to help others.
Whether I have any difficulty in my studies or in my life, he will lend a generous hand
to help me. Eike also played a crucial role in the completion of this work. Many of
my ideas are in the discussion with him. Therefore, he is almost a coauthor of all my
publications. He is also a modest man. Once I told him that you are like my second
supervisor. However, he replied that he was just a person who shared my research
interests.
I would like to thank my family in particular. They care about me throughout my life
and study. Especially during my Ph.D. studies, I have always had financial problems.
It is my parents’ support and encouragement that I could concentrate on my research.
My love for you is beyond words. I could only redouble my efforts to speed up the
completion of this work in order to repay your kindness as soon as possible.
I really like the atmosphere of the DBSE working group. Each college, whether former
or current, such as Ingolf Geist, Syed Saif ur Rahman, Ateeq Khan, Siba Mohammad
and so on, is so friendly and helpful. There was even a college, who gave me many
suggestions anonymously, when I just joined the group. I still do not know who he/she
is, and I do not need to know that. What I should do is to pass this kindness to the
other group members, especially the new ones.
Finally, I would like to thank the students that I supervised over these years, they are
Shuo Wang, Pengfei Zhao and Meihua Zhao. The three important experimental parts
of this work were done with their assistance.
viii
Contents
List of Tables xv
1 Introduction 1
1.1 Goal of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Massively Multiplayer Online Role-Playing Game . . . . . . . . . . . . 5
2.1.1 Brief Introduction of Massively Multiplayer Online Games . . . 5
2.1.2 Characteristics of MMORPGs . . . . . . . . . . . . . . . . . . . 6
2.1.3 Analysis of Current Data Management for MMORPGs . . . . . 8
2.1.3.1 A Typical Architecture of MMORPGs . . . . . . . . . 8
2.1.3.2 Classification of Data Sets in MMORPGs . . . . . . . 11
2.1.3.3 A Sample Session in an MMORPG . . . . . . . . . . . 12
2.2 A Typical Database System in MMORPGs . . . . . . . . . . . . . . . . 13
2.3 Data Management Requirements of MMORPGs . . . . . . . . . . . . . 14
2.4 Limitations of RDBMS in MMORPGs . . . . . . . . . . . . . . . . . . 17
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Evaluation 73
6.1 Experimental Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Experimental Proof of the System Scalability . . . . . . . . . . . . . . . 73
6.2.1 Prototype Architecture . . . . . . . . . . . . . . . . . . . . . . . 74
6.2.2 Implementation of the MMORPG Environment . . . . . . . . . 74
6.2.2.1 Implementation of the Game Client . . . . . . . . . . . 74
6.2.2.2 Implementation of the Game Server . . . . . . . . . . 76
6.2.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.4.1 Scalability of the Game Server . . . . . . . . . . . . . 77
6.2.4.2 Potential Scalability of Cassandra in an MMORPG . . 78
6.3 Comparative Experiments of System Performance . . . . . . . . . . . . 82
6.3.1 A Practical Game Database Case Study: PlaneShift Project . . 82
6.3.2 Implementation of Testbeds . . . . . . . . . . . . . . . . . . . . 83
6.3.2.1 Implementation of the Database using MySQL Cluster 83
6.3.2.2 Implementation of the Database using Cassandra . . . 87
6.3.2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.4.1 Experimental Results from Testbed-MySQL . . . . . . 90
6.3.4.2 Experimental Results from Testbed-Cassandra . . . . . 90
6.3.4.3 Comparison of Experimental Results from Two Testbeds 93
6.4 Experimental Proof of the Timestamp-based Model . . . . . . . . . . . 95
6.4.1 Implementation of the Testbed . . . . . . . . . . . . . . . . . . 95
6.4.1.1 Implementation of the Data Access Server . . . . . . . 95
6.4.1.2 Database Schema . . . . . . . . . . . . . . . . . . . . . 96
6.4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.4.3.1 Effect of Accessing the Timestamp Table (TST) in H2 99
6.4.3.2 Write/Read Performance Using the Timestamp-based
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.4.3.3 Read Performance under a Node Failure Environment 101
6.4.3.4 Effect of Data Size . . . . . . . . . . . . . . . . . . . . 102
6.4.3.5 Performance Comparison with Testbed-MySQL and Testbed-
Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7 Conclusion 107
A Appendix 111
Bibliography 115
List of Figures
P2P peer-to-peer
PaaS Platform as a Service
PC Player Character
Massively Multiplayer Online Games (MMOs or MMOGs) have become more and more
popular over the past decade [FBS07]. In this kind of game, hundreds or thousands of
players from all over the world are able to play together simultaneously. In a virtual
game world, players are allowed to choose a new identity, establish a new social network,
compete or cooperate with other players, or even make some of their dreams that cannot
be fulfilled in their real life come true [APS08]. This kind of game is so popular and
interesting that in 2013 there are already 628 million MMO players worldwide [She13]
and many players are even addicted to it [HWW09]. The game industry is developing
rapidly, and the global PC/MMO games market is expected to grow at a compound
annual growth rate of 7.9% between 2013 and 2017. In 2014 MMOGs have generated
a total of $17 billion in revenues [New14].
As a typical and the most famous type of MMOGs, Massively Multiplayer Online Role-
Playing Games (MMORPGs) develop rapidly. Every year at the Electronic Entertain-
ment Expo (E3) (the largest annual commercial exhibition of the global video game
industry) many upcoming MMORPGs are revealed by game publishers [IGN15]. There
are some examples of the most popular MMORPGs, like World of Warcraft, Aion, Guild
Wars and EVE Online. The difference between MMORPGs and other MMOGs, such
as first-person shooters and real-time strategy games, is that MMORPGs have popu-
larized the term Persistent World [ZKD08], which describes a virtual environment that
continuously exists and changes, no matter whether millions of users, only few users,
or even none at all interact with it [ILJ10]. In addition to the account information, the
state data of objects and characters must be recorded on the server side in real-time,
so that players can quit and continue the game at any time. Player behaviors in the
game will be monitored and backed up in order to maintain the order of the virtual
world. Furthermore, MMORPGs usually have more concurrent players than any other
MMOGs, (for example, World of Warcraft has millions of concurrent players), which
also exacerbates the burden of managing data.
2 1. Introduction
From a computer science perspective, these Persistent Worlds represent very complex
information systems consisting of multi-tiered architectures of game clients [KVK+ 09],
game logic, and game data management which typically implement application-specific
patterns of partitioning/sharding, replication, and load-balancing to fulfill the high
requirements regarding performance, scalability, and availability [WKG+ 07, FDI+ 10].
For this reason, a qualified database system for data persistence in MMORPGs must
guarantee the data consistency and also be efficient and scalable [ZKD08]. However,
existing conventional Database Management System (DBMS) cannot fully satisfy all
these requirements simultaneously [WKG+ 07, Cat10]. Therefore, with an increasing
data volume, the storage system becomes a bottleneck, and solving scalability and
availability issues becomes a major cost factor and a development risk.
In the last decade, Cloud data management has become a hot topic. The recently devel-
oped Cloud database management systems (CDBMSs), such as Cassandra, HBase and
Riak, are designed to support highly concurrent data accesses and huge storage, which
can easily meet the challenges mentioned above (e.g., efficiency, scalability and avail-
ability), thereby becoming a solution [Cat10]. Nevertheless, CDBMSs are generally de-
signed for Web applications that have different access characteristics than MMORPGs,
and require lower or various consistency levels [Aba09]. In other words, features of
CDBMSs and RDBMSs are complementary, so they cannot replace each other. For this
reason, if we hope to use a CDBMS to solve this issue, the following factors must be
considered:
we have classified data in MMORPGs into four data sets according to the data man-
agement requirements (e.g., data consistency, system availability, system scalability,
data model, security, and real-time processing), namely account data, game data, state
data, and log data; then, we have proposed to apply multiple data management sys-
tems (or services) in one MMORPG, and manage diverse data sets accordingly. Data
with strong requirements for data consistency and security (e.g., account data) are still
managed by an RDBMS (relational database management system), while data (e.g.,
log data and state data) requiring scalability and availability are stored in a Cloud
data storage system; to evaluate and improve the scalability and performance of the
new Cloud-based architecture, we have implemented a game prototype based on an
open source MMORPG project that we ported to run on Cassandra, one of the most
popular CDBMS. Furthermore, we have developed an environment to run simulated,
scripted interactions of many clients with many game logic servers as well as Cassandra
nodes. Based on observations made within this environment, we illustrate approaches
to efficiently use the scaling capabilities; we have also analyzed the data consistency
requirements in MMORPGs so as to achieve the required consistency-level for each data
set in our game prototype. We found that the guarantee of high-level consistency in
Cassandra is not efficient. So we have proposed to use a timestamp-based model as well
as a NodeAwarePolicy strategy to solve it; furthermore, we have implemented a testbed
using a conventional RDBMS (MySQL Cluster) to compare the performance with our
new Cloud-based prototype. Experimental results have demonstrated the feasibility of
our proposals.
Chapter 7 Conclusion: In this chapter we will give a conclusion and summary of our
research.
Chapter 8 Future Work: Some limitations and possible improvements of this thesis
work are outlined in this chapter.
2. Background
This chapter shares the material with the DASP article “Cloud Data
Management for Online Games: Potentias and Open Issues” [DSWM13].
In this chapter, we will introduce some foundations of MMORPGs, like their typical
architectures, various data sets, data management requirements for each data set, and
the limitations of RDBMSs in terms of managing data in MMORPGs.
Most popular MMOGs: MMORPGs have most of the subscribers of MMOGs, thereby
occupying most of the Subscription-based MMOG market share (World of War-
craft occupies 36% in 2013) [Sup14]. It is not only a subset of MMOGs, but also
a symbol of these games. For this reason, although games like MMOSGs have
many common features with MMORPGs, we still decided to focus on the latter.
hundred thousand players on the same server, with over 60,000 playing simultane-
ously at certain times [com10]. In contrast, games like MMOFPSs and MMORGs
always divide concurrent players into a large number of groups with only a limited
number of participators (e.g., 256 players in MAG [1UP08]), and disperse them
on separated battlefields hosted by different game servers. Only players in the
same group can communicate or combat with each other. This characteristic of
MMORPGs requires a high-performance framework of the game server to handle
a large number of concurrent requests.
Complex gameplay: Different with MMOFPSs and MMORGs, which only simulate
a particular scene (shooting or racing) in the life, MMORPGs is trying to cre-
ate a new virtual world. That means, a variety of scenarios in the real world
are presented in the game, such as trading, chatting and combating. In other
words, an MMORPG system contains many subsystems, the data management
requirements of which are total different. Thus, MMORPGs are more difficult to
develop, and the experience gained in the development can be applied to other
MMOGs.
Persistent world: Some MMOGs (e.g., MMOFPSs, MMORGs and MMORTSs) usu-
ally divide the virtual game world into small sessions. That means, the game
world will be reset after reaching a limited time or completing a system speci-
fied target. For example, in MMORTSs a match will be terminated after the car
driven by a player has reached the final line. Then, this player needs to wait for
other players to finish and start a new match. In this case, players can neither
save their records during the game, nor continue to complete a task when they log
in the game again. Effects of players on the game world will disappear with the
end of a session. Therefore, only player’s account information must be stored on
the server side. An MMORPG distinguishes itself from other MMOGs by keeping
the virtual game environment running even when players are offline [APS08]. In
other words, they provide a persistent game world. A player can save her/his
records at any time, and continue the game as if she/he never leaves. Hence, not
only the account information of a player, but also the state information of the
game world, player’s avatar and non player character (NPC) must be persisted on
8 2. Background
the game server. In order to maintain such a virtual world, a complex information
system consisting of a multi-tiered architecture is required.
Big data management: As mentioned above, MMOGs host game worlds on the
server. Players must log in to a remote server before starting a game. There-
fore, different with local area network games and single player games, MMOGs
record game logic, game world metadata, players information on the server side.
Note that the type and size of information stored by MMOG systems are dif-
ferently. We find that MMORPGs are facing the management issue of big data,
which can be tracked back to two sources:
1) Unlike other types of MMOGs, MMORPGs must also persist avatar’s state
information on the server, which could contain around one hundred at-
tributes (e.g., avatar’s basic information, skills, inventory, social relation,
equipment, tasks and position) and lead to a complex database schema.
Since an MMORPG could have millions of players, the data size of such
information is very large. Furthermore, the state information is modified
and retrieved constantly, and the parallel requests from players must be re-
sponded in real-time during the game.
2) Moreover, player behavior analysis is essential for game publishers to help
them understand the current game status, fix bugs in the current edition of
game, guide the development of the future expansion, and detect cheating
in the game [SSM11]. For instance, the number of average concurrent users
has been monitored to evaluate the popularity of a game; the income and
expense of players in the game have been analyzed for holding the economic
equilibrium of the game world and curbing inflation; task statistics could
help for setting the difficulty of a game. As a result, almost all the behaviors
of players as well as system logs need to be persistent on the server, which
increase continuously. For example, EverQuest 2 stores over 20 terabytes
(TB) of new log data per year [ILJ10].
Overall, development of an efficient game system to manage and analyze big data
is particularly crucial for MMORPGs, which has become a challenge [ILJ10].
of the game server, and consequently the implementation of each MMORPG system is
individual. However, we can still find some common characteristics and components,
namely applying three tiered architecture (see Figure 2.1) [CSKW02, Bur07, WKG+ 07].
Client 1...N
Client
Database
tomatically. Accordingly, players can only communicate, interact and influence others
in their geographic vicinity [WKG+ 07].
Since the game server needs to handle connecting requests from global players, Map/-
logic servers are deployed on a large number of machines [AT06](e.g., World of Warcraft
has over 10,000 servers [NIP+ 08]), which are usually distributed in several data centers
(zone server) around the world to reduce latency.
In this case, a gateway server is required to maintain the game session of a player,
besides monitoring requests of the player, forwarding them to a zone server, sending
back results to the player, and protecting the map/logic server from attacks. In other
words, it is an information transfer station, which is also composed of series of servers.
A single map/logic server usually can only support around 2000 concurrent play-
ers [KLXH04]. For the purpose of spreading the load out and extending the number
of players on each server, game developers often separate some non-core functions from
the map/logic server. Therefore, servers like the login server, patch server and chat
server are also necessary in a game server system. A login server is responsible for
determining the validity of a player’s identity. A patch server checks and updates game
data stored at the game client side. A chat server handles the chat messages between
players.
Transaction management is a technical challenge in MMORPGs because data corrup-
tion cannot be handled just by ending a game session. Additionally, the incomplete or
failed transaction must be rolled back. MMORPGs, consequently, manage the game
state in an authoritative storage system to provide persistent game worlds. For this
reason, the database plays an important role in this kind of game, and MMORPGs are
more like traditional database applications comparing with other MMOGs [WKG+ 07].
From Figure 2.1 we can see that the game server does not retrieve game state data di-
rectly from a disk-resident database, but through a transactional database cache (e.g.,
an in-memory database or a real-time database [Hva99]). There are the following rea-
sons [Ale03]: Firstly, it cannot cope with the heavy I/O workload of an MMORPG with
millions of players. Even using some advanced commercial databases like Oracle and
SQL server, the transaction rate cannot satisfy the game requirement, and it cannot
be improved by simply adding more machines [WKG+ 07]; secondly, executing transac-
tions in a disk-resident database will inevitably pause the game. This pause may occur
at any time, which cannot be tolerated by a real-time system; finally, game providers
need to invest large amounts of money to buy hardware in order to achieve the theo-
retical performance, which unfortunately can only be predicted in the late stage of the
game development. However, if the architecture has to be changed at that time, the
development of the game will be delayed as well.
The throughput of an in-memory database satisfies the data management requirement
of an MMORPG, consequently, it executes real-time operations in the game instead of
the disk-resident database. However, an in-memory database cannot guarantee the data
persistence when it fails, and its storage costs are high. Therefore, most MMORPGs
2.1. Massively Multiplayer Online Role-Playing Game 11
have a transactional database cache sited in front of disk-resident database [WKG+ 07].
That means, the required data needs to be loaded in a cache, which handles play-
ers’ requests in the form of transactions and then writes data back into the database
asynchronously (e.g., every five or ten minutes).
Typically, each data center hosts its own unique state database. A player has to execute
a character migration operation before she/he moves to another zone server. This
complex data migration process may take a long time [BAT15].
Not all data in an MMORPG have to be cached on the server side. Actually, MMORPGs
always manage diverse data sets accordingly, thereby separating them in different places
or databases. We have classified data in the game into four sets so as to understand
the process of data accessing in MMORPGs.
Account data: This category of data includes user account information, such as user
ID, password, recharge records and account balance. It is usually only used when
players log in to or log out of a game for identification and accounting purposes.
Compared with transferring game state, the requirement for shortening the re-
sponse time of retrieving account data is not that urgent, hence the use of a
database cache is not necessary.
Game data: Data such as the world geometry and appearance, object and NPC meta-
data (name, race, appearance, etc.), game animations and sounds, system logs,
configuration and game rules/scripts in an MMORPG are generally only modified
by game developers. Game client side often keeps a copy of part of game data
(e.g., world geometry and appearance, game animations and sounds) to minimize
the network traffic for unchangeable data. The game server (patch server) will
update the copy when a client connects it.
Log data: Data like player behaviors, chat histories and mails are also persisted in the
database, and are rarely modified. This kind of data is not used in a game session,
thereby eliminating the need to be preloaded. It is not suggested storing them
together with state data in the same database since collecting of log data brings
the database unnecessary burden and consumes the database capacity rapidly.
12 2. Background
A player connects the game server through a client, and sends a login request to the login
server, which is responsible for determining its validity. The login server cooperates with
an account database that stores user account information. If the validation is passed,
the login server encrypts the user ID, generates a token, and returns it to the player.
The client then updates the game data stored on it from a patch server, which gets
the data from a game database. When the update is complete, the client uses the
token to communicate with a gateway server, which will assign it to a zone server.
Note that only one zone server provides services throughout an entire session. The
zone server will load the player’s state information from the state database into the
cache, determine physical location of her/his avatar in the game world, assign it to a
logic/map server accordingly, and render a copy of the state information to the client.
Once the assignment is successful, the player can start the game, chat (through a chat
server) and interact with all other players on the same server. The computation of
the interaction occurs on the server or in the database cache, and then the result will
be rendered at the client. The updated state information will be backed up to the
state database periodically. The log database records all player actions (including the
reset actions) and the chat history. When the player quits the game, her/his final state
information will be persisted in the state database and then removed from the database
cache (see Figure 2.3).
Client
Zone Server
State
Database
Read request
Write/Read request
Cache
Checkpointing
Checkpointing
Quit request
Persisting
Quit the game
Clean cache
With increasing business requirements, centralized systems can hardly meet the needs
such as run-time performance and reliability. Distributed database systems can solve
these problems by increasing data redundancy and the number of computers. Dis-
tributed databases is distributed in the physical layer, but organized in the logical
layer. Each computer has a complete copy of the DBMS, and a local database. For
users, they still consider themselves operating a single database system.
As an example, consider MySQL Cluster [Orab] and its characteristics. MySQL Cluster
adopts a shared-nothing architecture to ensure scalability. In order to balance the
workload among nodes, it automatically partitions data within a table based on the
primary key across all nodes. Each node is able to help clients to access correct shards
to satisfy a query or commit a transaction. For the purpose of guaranteeing availability,
data are replicated to multiple nodes. MySQL Cluster applies a two-phase commit
mechanism to propagate data changes to the primary replica and a secondary replica
synchronously, and then modifies other replicas asynchronously. In this case, at least
one secondary replica has the consistent and redundant data, which can be used as a
fail-over server when the primary server fails. MySQL Cluster also writes redo logs
and checkpoint data to the disk asynchronously, which can be used for failure recovery.
When MySQL Cluster maintains tables in memory, it can support real-time responses.
However, as we mentioned in the introduction, RDBMS cannot fulfill some requirements
(e.g., scalability) of MMORPGs well, and considerable efforts have to be spent on
fulfilling these requirements on other levels [Cat10]. In the following sections, we will
discuss this in detail.
In a collaborative game, players interact with each other. Changes of state data must be
synchronously propagated to the relevant players within an acceptable period of time.
For this purpose, we need a continuous consistency model in MMORPGs [LLL04].
Changes of the state data and account data must be recorded in the database. It is
intolerable that players find that their last game records are lost when they log in to
the game again. As a result, a strong or at least Read-Your-Writes consistency [Vog09]
is required for such data. However, strong consistency is not necessary for log data and
game data. For example, the existence of a tree in the map, the synchronization of
a bird animation, or the clothing style of a game character is allowed to be different
among client sides. Log data are generally not analyzed immediately. Hence, eventual
consistency [Vog09] is sufficient for these two classes of data.
2.3. Data Management Requirements of MMORPGs 15
Performance/real-time:
State data are modified constantly by millions of concurrent players, which brings
a significant traffic to game servers (one player sends an average of six packets per
second [CHHL06]), thereby generating thousands of concurrent database connections.
These commands must be executed in real-time (within 200ms [CHHL06]) and persisted
on the disk efficiently, which have become a challenge to the database performance.
Availability:
As an Internet/Web application, an MMORPG system should be able to respond to
the request of each user within a certain period of time. If lags or complete denial of
services appear frequently, this will significantly decrease the acceptance of the game
and will result in sinking revenues. Availability can be achieved by increasing data
redundancy and setting up fail-over servers.
Scalability:
Typically, online games start with a small or medium number of users. If the game is
successful, this number can grow extremely. To avoid problems of a system laid out for
too few users or its costs when initially laid out for too many users, data management
needs to be extremely scalable [GDG08]. Furthermore, log data will be appended
continually and retained in the database statically for a long time [WKG+ 07]. The
expansion of data scale should not affect the database performance. Hence, the database
should have the ability to accommodate the growth by adding new hardware [IHK04].
Data sharding:
Performing all operations on one node can simplify the integrity control, but that may
cause a system bottleneck. Therefore, data must be divided into multiple nodes/shards
16 2. Background
in order to balance the workload, process operations in parallel, and reduce processing
costs. Current sharding schemes are most often based on application logic, such as
partial maps (map servers). This does not easily integrate with the requirement of
scalability, i.e., re-partitioning is not trivial when new servers are added. Accordingly,
suitable sharding schemes are a major research issue.
A part of data in the game like state data do not have a fixed schema, for example, PCs
have varying abilities, tasks, and inventory. Additionally, MMORPGs are typically
bug-fixed and extended during their run-time. Therefore, it is difficult to adopt the
relational model to manage such data. A flexible data model without a fixed schema is
more suitable.
Simplified processing:
In MMORPGs, only updates of account data and part of state data must be executed in
the form of transactions. In addition, transaction processing in online game databases is
different from that in business databases. For example, in MMORPGs, there are many
transactions, but most of them are of small size. Parallel operations with conflicts occur
rarely, especially in the state and log database, which are responsible for data backup.
There are no write conflicts among players in these two databases because they have
already been resolved in the transactional database cache. There might be a write
conflict from one player, which is usually caused by network latency or server failure,
and can be addressed by comparing the timestamp of the operation. Using locks as
in a traditional database increases the response time. Additionally, deadlock detection
in a distributed system is not easy. Hence, a simplified data processing mechanism is
required.
Security:
Game providers have to be concerned about data security because data breach may lead
to economic risks or legal disputes. For this reason, user-specific data, such as account
data and chat logs, must be strongly protected. Furthermore, it must be possible to
recover data after being maliciously modified.
The data management system should be easy to use by developers and apply for various
MMORPGs. Companies developing and maintaining MMORPGs should be able to re-
use or easily adapt existing data management solutions to new games, similar to the
idea separating the game engine from the game content currently widely applied. An
interesting solution would be game data management provided as a service to various
providers of MMORPGs, but this strongly depends on building up trustworthy services.
2.4. Limitations of RDBMS in MMORPGs 17
comply with multiple tables, which is not efficient; to reduce data redundancy RDB
normally complies 3NF (third normal form). In this case, to obtain the result of a
query the system has to perform a join operation across several tables, which might be
distributed in different nodes, or compute values of multiple attributes, both of which
will increase the response time.
Strict consistency:
The RDBMS follows a transaction mechanism, thereby providing ACID guarantees
(Atomicity, Consistency, Isolation and Durability) [HR83]. Furthermore, modification
of data object needs to be synchronized to all replicas, or all requests for a data object
need to be processed on a master/primary replica. As a result, the workload at the
master/primary replica or the network traffic between nodes will become a bottleneck
of the system. However, as we discussed above, strict consistency is not necessary for
all data sets in MMORPGs. For some use cases (e.g., backup of state data and log
data), the overhead of this ill-suited consistency mechanism becomes a drawback.
Low availability:
The RDBMS, nowadays, provides solutions to improve availability by replication and
fail-over within the context of distributed databases. Nevertheless, these properties are
not trivial to implement and maintain for a given application, especially considering
the large scale of MMORPGs.
High cost:
The number of mature and commercial RDB products is not too much (e.g., Oracle,
SQL Server, DB2, MySQL and Teradata). Furthermore, many of them require to pay
for an expensive license (except for MySQL). For example, there are three kinds of
licensing models for using SQL Server 2014, namely per core ($14,256 for enterprise
edition), Server + CAL ($8,908 for business intelligence edition) and per user ($38 for
developer edition)1 . The database cluster of an MMORPG is supported by a large
number of computers with multi-core processors, and is maintained by numerous game
1
http://www.microsoft.com/en-us/server-cloud/products/sql-server/purchasing.aspx (accessed
04.11.2015)
2.5. Summary 19
developers. Purchase of the license increases the cost of game development and op-
eration. Furthermore, it is difficult to predict the scale of a game database cluster,
which depends on the popularity of the game. Excessive purchase of the license is a
serious waste. Additionally, maintenance costs of RDBMSs are high, which make many
enterprises unbearable.
2.5 Summary
In summary, the RDBMS is powerful, but it cannot fit all application scenarios in
MMORPGs. Issues like system scalability, big data management and change of table
structure are challenges for web application developers using RDBMSs. In this context,
the concept of NoSQL databases has been proposed in 2009. Web sites in the pursuit
of high performance and high scalability, have chosen NoSQL technology as a priority
option. NoSQL database has various types, but a common feature of them is that they
have removed relational characteristic. There is no relationship between data, so it is
very easy to extend. Hence, it also brings the scalability to the architecture level. In
the next chapter we will highlight this kind of database.
20 2. Background
3. Cloud Storage Systems
This chapter also shares the material with the DASP article “Cloud Data
Management for Online Games: Potentials and Open Issues” [DSWM13].
In this chapter, we will introduce Cloud storage systems, highlight the new challenges
from managing big data, compare NoSQL stores and RDBs with their application sce-
narios, features and data models, and at last take Cassandra as an example to show
the implementation of a NoSQL store in detail.
and even rent such resources from a Cloud provider, which reduces their operational
costs [KKL10, KK10].
Cloud computing offers three kinds of service models [Kav14], namely infrastructure as
a service (IaaS), platform as a service (PaaS) and software as a service (SaaS)(see Fig-
ure 3.1).
PaaS Application
database, web server, Developers
development tools, ...
IaaS
physical computing resources, Networks
storage space, middleware, Architects
security, backup, ...
IaaS: over the Internet, consumers (e.g., networks architects) obtain infrastructure
services like physical computing resources, storage space, middleware, security,
backup, etc.
SaaS: consumers through the Internet obtain software services, such as e-mail, virtual
desktops or games. Typically, users do not need to buy the software, but lease
the right to use a web-based software from the providers by getting an account
name and a password.
This project aims at building a suitable game data management system. For this reason,
in the next section we focus on introducing Cloud data management in detail.
3.2. Cloud Data Management 23
more and more applied to the domains of Web application, data analysis and data
management.
There are two models using in Cloud databases, namely the SQL- and the NoSQL-
based model. That means, users can get both database services from conventional
RDBMSs (e.g., Oracle, MySQL, SQL Server and IBM DB2) and NoSQL DBMSs
like Cassandra, HBase and MongoDB. In the next section, we will introduce the
inevitability of the appearance of NoSQL stores.
theorem, it is still followed by distributed databases like NoSQL stores. This theorem
states that consistency (C), availability (A), and partition tolerance (P), at most two
of them can be guaranteed simultaneously for a distributed computer system.
Consistency: all replicas in a distributed system keep the same value at any time.
Availability: each request can be responded within a period of time. (Even if the
value is not consistent in all replicas, or just sends back a message saying the
system is down.)
The choice of CA could only be made when the system is deployed in a single data
center, where partition occurs rarely. However, even if the probability of occurrence of
the partition is not high, it is entirely possible to occur, which shakes the CA-oriented
design. In the case of node failure, developers have to go back, and make a trade-off
between C and A.
Current network hardware cannot avoid message delay, packet loss, and so on. So
in practice, partition tolerance must be achieved in a cross regional system. For this
reason, developers have to make a difficult choice on data consistency and availability.
Conventional RDBMSs are designed and optimized for OLTP (Online Transaction Pro-
cessing) systems like banking systems, where inconsistent data may lead to erroneous
computing results or customer’s economic losses. Consequently, this kind of DBMS
chooses to sacrifice system availability (CP type). When there is a network partition,
a write request would be blocked due to the continuous attempt of connecting with the
lost node.
Web 2.0 websites have many significant differences with OLTP systems:
Requirement for data consistency: many real-time Web systems do not require a
strict database transaction. The requirement for read consistency is low, and
in some cases the requirement for write consistency is also not high. Eventual
consistency is allowed.
Requirement for write and read in real-time: the RDB ensures that a read re-
quest could immediately fetch the data after a successful insertion of a data item.
However, for many web applications, such a high real-time feature is not required.
For example, it is totally acceptable on Twitter that after posting a new Tweet,
subscribers see it in a few seconds or even ten seconds.
Requirement for complex SQL queries, especially multi-table queries: any web
system dealing with big data avoids joining multiple large tables and creating com-
plex data analysis type of reports. Especially, SNS websites have avoided that
26 3. Cloud Storage Systems
Moreover, users of Web 2.0 websites expect to get 7*24 uninterrupted service [BFG+ 08],
which unfortunately cannot be fulfilled by RDBs guaranteeing strong consistency. For
these reasons, website developers have abandoned the SQL model and designed alterna-
tive DBs. Some NoSQL stores are developed to provide a variety of solutions to ensure
the priority of system availability (support AP).
It is noticed that NoSQL DBMSs are typically designed to deal with the scaling and
performance issues of conventional RDBs. In addition, their functionality highly de-
pends on their specific application scenarios (not only Web 2.0 websites). Therefore, it
does not mean all NoSQL stores (e.g., HBase) have dropped data consistency in favour
of availability.
Basically available: NoSQL DBMS typically does not concern isolation, but system
availability. In other words, multiple operations can simultaneously modify the
same data. Hence, the system is able to respond any request. However, the
response could be an inconsistent or changing state.
3.3. NoSQL Stores 27
Soft state: data state can be regenerated through additional computation or file I/O.
It is exploited to improve performance and failure tolerance. Data are not durable
in disk.
Eventual consistency: the change of a data item will be propagated to all replicas
asynchronously at a more convenient time. Hence, there will be a time lag, during
which the stale data would be seen by users. In this project, three kinds of eventual
consistency will be mentioned, namely causal consistency [Lam78, Vog09], read-
your-writes consistency [Vog09] and timed consistency [TRAR99, LBC12]:
BASE and ACID are actually at opposite ends of the consistency-availability spec-
trum [Bre12]. Most NoSQL stores limit ACID support [GS11]. Some of them use a
mix of both approaches. For example, Apache Cassandra introduces lightweight trans-
actions in a single partition.
describe valid changes to a database. This structure is suitable for join or com-
plex query operations across relations (tables). Figure 3.2 illustrates a sample of
tables in an RDB. Three tables are connected by foreign keys. All tables have
fixed schema.
NoSQL model: NoSQL stores have simplified the relational/SQL model. Their data
typically are represented as a collection of key-value pairs. And they provide
a flexible/soft schema. Each key-value pair could contain divers types/numbers
of value. Each tuple (row) support to increase or decrease the number of the
key-value pairs as needed. They typically do not place constraints on values,
so values can be comprised of arbitrary format. Each tuple is identified by a
primary key or composite keys. Many integrity constraints have been canceled
(e.g., foreign key constraint) or weakened (e.g., transactional integrity constraint).
For this reason, data partition is easy to reach, and the system can scale out
arbitrarily. The flexible data model makes it possible to use denormalization
in place of join operation across entities, so the system performance has been
significantly improved. We have mapped tables in Figure 3.2 to a NoSQL store
showed in Figure 3.3. Data in the RDB have been denormalized in one table,
which has a dynamic schema. If a character has more than one item, accordingly
more key-value pairs/columns will be appended to the corresponding row.
Character
Inventory Item
ID Name Gender age
ItemID CharacterID ID Name Description
1 Alex male 32
1 2 1 aa xxxx
2 Ann female null
Character
ID: 1 Name: Alex Gender: male Age: 32
ID: 2 Name: Ann Gender: female ItemName: aa ItemDescription: xxxx
NoSQL stores are implemented in significantly different ways, but they still have some
common characteristics. Based on the research result of Ben Scofield we have rated
different categories of RDBMSs and NoSQL DBMSs in Table 3.1 [Sco10].
NoSQL DBMSs are often excellent in the aspects of partition tolerance, performance,
availability, scalability and development costs. However, their drawbacks are also obvi-
ous. For example, they are limited to the functionality (e.g., ad-hoc query, data analysis
and transaction management) due to the lack of support of a SQL-like query language
3.3. NoSQL Stores 29
Table 3.1: A General Rating of Different Categories of Two Kinds of DBMS [Sco10]
and the limitation of their underlying structures. We can even state that all functions
that NoSQL DBMSs support could be achieved by RDBMSs; they are less mature than
RDBMSs because they do not have the decades of experience of application and de-
velopment. Particularly, they tend to be open-source, with normally just one or two
companies/communities handling the support angle; additionally, the simple key-value
pair structure is failed to support values with schemes of arbitrary complexity.
In fact, NoSQL DBMSs are complementary to RDBMSs in some aspects. These two
kinds of DBMSs have their own characteristics and application scenarios. Hence, they
will not replace the other. In the rapidly developing Web 2.0 era, we should choose
the right DBMS according to the business scenario, or even combine various DBMSs
in order to get their advantages. That means, we use RDBMSs to concern in the
functionality (e.g., ad-hoc query) of the system, and use NoSQL DBMS to persist data
(e.g., fast backup and recovery of data). In this project, we have adopted this approach
to manage data in MMORPGs.
Wide Column Store: it also uses key-value pairs to store data, in contrast one key
corresponds to multiple columns (key-value pairs). Wide column stores often
employ a structure like tables, rows and columns to store structured and semi-
structured data. Unlike in the relational model, the number of columns is not
5
http://nosql-database.org/ (accessed 03.12.2015)
30 3. Cloud Storage Systems
fixed, and the column names and their data types can vary from row to row
in one table. It has the ability to hold a large number (billions) of columns in
one row. Timestamp is recorded in each column to determine the valid content.
Google BigTable, Apache Cassandra and HBase use this data model.
Document Store: it generally uses a format as JSON to store data. Its content is in
the document type. Hence, it has the opportunity to build an index on certain
fields to achieve some of the features of an RDBMS. CouchDB and MongoDB are
based on this data model.
Others: there are still many other types of NoSQL stores, such as graph databases,
multimodel databases, object databases, XML databases and so on.
Query language: from the release of 0.8, Cassandra Query Language (CQL) has been
added. The CQL syntax is similar to that of SQL, so developers, who are familiar
with SQL, do not need to spend much time to learn it. The function of CQL is,
however, not as strong as SQL. For example, for retrieving data, only the columns
that are part of the primary key and/or have a secondary index defined on them
could be used as query criteria [Casb].
Lightweight transaction: from the release of 2.0, Cassandra starts to support lightweight
transactions, which are restricted to a single partition. This feature aims at sup-
porting a linearizable consistency or the isolation in ACID terms.
Column: column is the basic unit of data in Cassandra. It consists of three com-
ponents, namely name (key), value and timestamp (Table 3.2). Values of these
components are supplied by the client, including the timestamp (represents when
the column was last updated). Timestamp is used in Cassandra for conflict res-
olution. If there is a conflict with the column value between two replicas, the
column with the highest timestamp will replace another one. Timestamp cannot
be used in the client application. Therefore, we can consider the column as a
name/value pair.
A column value has two properties/markers, TTL (time to live) and tombstone.
TTl is an optional expiration period (set by the client application), after which
the data will be marked with a tombstone. Data with a tombstone will be then
automatically removed during the compaction and repair processes (we will discuss
it later in detail).
Additionally, from Cassandra 1.2, collection types (e.g., set, list and map type)
are supported. That means, we could store multiple elements in a single column
value. Elements in the set type are sorted, and there is no duplicate values. List
type keeps the insertion order, and allows duplicate values. A map type contains
a name and a pair of typed values (user defined), and elements are sorted. Each
element has an individual TTL. Furthermore, in Cassandra 2.1 and later, we can
create a user-defined type to attach multiple data fields to a column or even an
element in a collection type.
Row: row is a container of columns. Each row is uniquely identified by a row key that
supplied by the client, and consists of an ordered collection of columns related
to that key (Table 3.3). In RDBs it is only allowed to store column names as
32 3. Cloud Storage Systems
Column
name: byte[] value: byte[] timestamp: Int64
“user” “Mila” 2015-10-10 02:22
Row
Row Key: byte[] column 1 column 2 ... column N
1————>2 billion columns
Column Family
sorted by row key
————————>
strings, but in Cassandra both row keys and column names can be any kind of
byte array, like strings, integers and UUIDs. Cassandra supports a maximum
number of columns in a single row up to 2 billion [Casa], and is consequently
called wide-column store.
Column family: rows are ordered by their keys in column families (Table 3.4). Each
row does not have to share the same set of columns. Column families are prede-
fined, but the columns are not. Users can add any new column to any column
family at any time. Hence, column family has a flexible schema.
Cluster: Cassandra is typically distributed over several machines that operate to-
gether. Cluster (sometime also called ring) is the outermost container of the
system. Usually there is only one keyspace in a cluster. Data in Cassandra are
distributed in nodes, each of which contains at most one replica for a row. Cas-
sandra arranges nodes in the cluster in a ring format.
3.4. Apache Cassandra 33
Table 3.5 helps us transform from the relational world to Cassandra world. However,
we cannot use this analogy while designing Cassandra column families. Instead, we
need to consider column family as a map of a map. The key of an outer map is row key,
and similarly the key of an inner map is column name. Both maps are sorted by their
keys [Pat12]. Because of that we can do efficient lookups and range scans on row keys
and column keys. Furthermore, a key can itself hold a value, consequently, a valueless
column is supported.
3.4.2 Architecture
From this subsection, we start to discuss the internal design of Cassandra.
runs every second on a timer to communicate other nodes. The gossiper class on each
node holds a list of the state information of all nodes (alive and dead). Gossiper sends
a message to a random node periodically in order to synchronize the state information
and detect failure. State information in the gossiper class includes load-information,
migration and node status, such as bootstrapping (the node is booting), normal (the
node has been added into the ring and is ready for accepting reads), leaving (the node
is leaving the ring) and left (the node dies or leaves, or its token has been changed
manually).
3.4.2.2 Ring
Cassandra assigns data to nodes in the cluster by arranging them in a logical ring.
Token (a hash value) is used in Cassandra for data distribution. Each node holds a
unique token to determine its position in the ring (from small to large in the order
of clockwise, see Figure 3.4) and identify the portion of data it hosts. Each node is
assigned all data whose token is smaller than its token, but not smaller than that of
the previous node in the ring (see Figure 3.5). The range of token values is determined
by a partitioner. Cassandra uses Murmur3Partitioner as default for generating tokens,
consequently, the range of token values is from −263 to 263 − 1. Cassandra partitions
data based on the partition key, which is computed to a token value by a token function.
Cassandra uses the primary key (row key) as the partition key. When a row is identified
by a compound key (multiple columns), the first column declared in the definition is
treated as the partition key.
Token: -9223372036854775808
C
Token: 0
Data replication is typically used in Cassandra to ensure reliability and fault tolerance.
The number of copies for each row of data is specified while creating a keyspace by
specifying the replication factor (an attribute of keyspace). A typical setting of that is
THREE. That means, in the ring/cluster there are three nodes hosting copies of each
row. There is no primary or master replica. This replication is transparent to clients.
3.4. Apache Cassandra 35
4611686018427387904
to
92233720368547758087
0 -9223372036854775808
to D B to
4611686018427387903 -4611686018427387905
C
-4611686018427387904
to
-1
Replica placement strategy is another attribute of the keyspace, which refers to how
replicas will be placed in the ring. SimpleStrategy is used for a single data center,
which places the first replica on a node according to its token value, and places ad-
ditional replicas on the next nodes clockwise in the ring (see Figure 3.6). There is
another strategy named NetworkTopologyStrategy, which is recommended for multiple
data centers. Replicas will be placed on distinct racks across data centers.
Adding or moving a node in the ring will trigger the rearrangement of token values
on relevant nodes (not all) automatically. The new added node starts to provide read
services only after obtaining all required replicas.
3.4.2.3 Data Storage Mechanism
The storage mechanism of Cassandra borrows ideas from BigTable, which uses Memtable
and SSTable. Before writing data, Cassandra firstly writes the operation in a log, which
is called CommitLog (there are three kinds of commit log in the database, namely
undo-log, redo-log and undo-redo-log. Cassandra uses timestamp to recognize the data
version, hence CommitLog belongs to redo-log.). And then data are written to a column
family structure called a Memtable, which is a cache of data rows. Data in a Memtable
are sorted by keys. When a Memtable is full, it is flushed to disk as an SSTable. Once
flushed, a SSTable file is immutable. That means no further writes can be done, but
only reads. The Memtable will be then flushed to a new SSTable. Thus, we can consider
that there is only sequential writes and no random writes in Cassandra, which is the
primary reason that its write performance is so well.
SSTable cannot be modified, so that normally a column family corresponds to multiple
SSTables. While performing a key lookup, it would increase the workload greatly if all
36 3. Cloud Storage Systems
Replica A, D, C
Replica D, C, B D B Replica B, A, D
C
Replica C, B, A
SSTables are scanned. To avoid scanning the unnecessary SSTables Cassandra applies
Bloom filters, which map all keys containing in SSTables to a bit array in memory.
Only when the filter indicates that the required key exists in a SSTable file, the disk is
accessed to get it.
To bound the number of SSTable files Cassandra performs compaction regularly. Com-
paction refers to merging multiple old SSTables containing the same column family into
a new SSTable. The main tasks of compaction are:
Garbage Collection: Cassandra does not delete data directly, thereby consuming
more and more disk space. Compaction moves the data with tombstone marker
from disk.
Merger of SSTables: compaction merges multiple SSTable files (including index,
data and filter) into one to improve the read performance.
Generation of MerkleTree: In the process of the merger, a new MerkleTree of the
column family is generated. A MerkleTree is a hash tree to represent the data
in a column family. It is used to compare with that on other nodes to reconcile
data.
Hinted Handoff : during the processing of a write operation, if the write consistency
can be met, the coordinator (a node in the cluster handling for this write request)
creates a hint in the local system for a replica node that is offline due to network
partitioning, or some other reasons. A hint is like a small reminder, which contains
the information of the write operation. However, the written data are not readable
on the node holding the hint. When the replica node has recovered from the
failure, the node holding the hint sends a message immediately to it in order to
replay the write request. This mechanism makes Cassandra always available for
writes, and reduces the time that a recovered node gets ready for providing read
services.
Read Repair : during/after responding a read operation, Cassandra may ( 10% prob-
ability as default) check data consistency on all replicas. If data are inconsistent,
the repair work will be launched (a detailed introduction of Read Repair com-
bining with consistency levels will be given later in Section 3.4.4.2). The proba-
bility for Read Repair is configured while creating a column family by changing
read repair chance.
A
H B Replica 1
Coordinator
write request
Client G C Replica 2
write response
F D Replica 3
E
Figure 3.7: Procedure of Writing Data (Write Consistency Level is ONE )
Similar with writing data, a read request is also first sent to an arbitrary node in the
cluster. And then reading data is divided into two steps.
Step one: the coordinator forwards a direct read request to the closest replica node,
and a digest request to a number of replica nodes determined by the read con-
sistency level. Accordingly, these nodes respond back with the row/a digest of
the requested data. If multiple nodes are contacted, the coordinator compares
the rows in memory, and sends the most recent data (based on the timestamp
included in each column) back to the client. If the read consistency level cannot
be fulfilled at the moment, the coordinator has to respond back the client that
reading data is failed.
Step two: after that, the coordinator may also contact the remaining replica nodes
in the background. The rows from all replicas will be compared to detect the
inconsistent data. If the replicas are not consistent, the up-to-date data will be
pushed to the out-of-date replicas. As we introduced above, this process is called
Read Repair.
3.4. Apache Cassandra 39
A
H B Replica 1
Coordinator
read request
Client G C Replica 2
read response
F D Replica 3
A
H B Replica 1
est
requ
Coordinator write
Client G C Replica 2
F D Replica 3
E
Figure 3.8: Procedure of Reading Data (read consistency level is ONE )
Figure 3.8 shows an example, where the read consistency level is specified to ONE, and
the up-to-date rows are hold on replica node C and D. In the first step, only replica
node B has been contacted because it is the closest replica to the coordinator (that
means replica node B responds node G the fastest). The data fetched from node B is
responded back to the client. In the second step, the remaining two replica nodes have
been contacted. All rows from the replicas have been compared. The coordinator has
found that the replica holding on the node B is out-of-date, so the coordinator issues a
write to it.
40 3. Cloud Storage Systems
Cassandra improves its read performance by holding a partition key cache and a row
cache, which helps to avoid reading from disk. The partition key cache is enabled by
default. It caches the partition index to help Cassandra know where a partition is
located on disk so as to decrease seeking times and save CPU time as well as memory.
The row cache is similar to a traditional cache like memcached in MySQL, which stores
the entire contents of a partition that is frequently accessed. This feature, however,
consumes large amounts of memory. Thus, on the official website it is suggested enabling
this function unless it is in demand, and typically only one of these two kinds of caches
should be enabled for a column family8 .
For example, if the replication factor is specified to THREE, two replica nodes must
respond the write/read request.
Write consistency levels: Table 3.6 shows the possible write consistency levels and
their implications for a write request. It is noteworthy that the coordinator for-
wards a write request to all available replica nodes in all data centers, even if a
low consistency level is specified.
Read consistency levels: read consistency levels are declared in Table 3.7. Read
consistency level states the number of replicas must respond to a read request, so
not all replica nodes would be contacted. In addition, ANY level is not supported
here.
Level Description
ONE /TWO/THREE /QUORUM /ALL A write must be success on at least one/
two/three/a quorum of/all replica nodes.
If there are not enough available replica
nodes, the write will fail.
LOCAL ONE /LOCAL QUORUM A write must be success on at least one/a
quorum of replica nodes in the local data
center. It is used in multiple data center
clusters with the replica placement strat-
egy NetworkTopologyStrategy.
ANY A write must be written on at least one
node. If all replica nodes are down at
write time, the write can still succeed af-
ter a Hinted Handoff is written. How-
ever, reading of this write is only available
until the replica nodes for that partition
have recovered.
Level Description
ONE /TWO/THREE /QUORUM /ALL Coordinator returns the data after one/t-
wo/three/a quorum of/all replica nodes
have responded.
LOCAL ONE /LOCAL QUORUM Coordinator returns the data after one/a
quorum of replica nodes in the local data
center have responded.
this node becomes available again, it will compare its data with other nodes. This node
will mistakenly think all replicas nodes that received the delete request have missed a
write request, thereby launching a repair. As a result, the deleted data would reappear.
Cassandra uses compaction to collect garbage regularly (The default setting for that is
10 days.). Not only the data with a tombstone marker, but also the out-of-date data
generated by the “update” will be removed from disk.
3.4.5 CQL
CQL is the primary language for communicating with Cassandra. Cassandra develops
rapidly in recent years. Accordingly, many new features are continually added in CQL,
hence there are many differences between distinct CQL versions. This subsection focuses
on the CQL v3 [Casb], which is the latest version of this language at the moment.
The Syntax of CQL is close to SQL. Numerous keywords (e.g., CREATE, INSERT,
UPDATE, DROP, BATCH, COUNT, TABLE and PRIMARY KEY) from SQL are
42 3. Cloud Storage Systems
reused here so that users can grasp this language quickly. As a result, it makes a sense
that data are stored in tables containing rows of columns like in an RDB. However,
as we know, Cassandra has a different data model with an RDB. We will give some
samples of using CQL v3 in the following.
In Listing 3.1, we have created a keyspace called playerInfo. As introduced in the pre-
vious subsection, we can also specify a replica placement strategy (e.g., SimpleStrategy)
and the number of replicas (e.g., THREE ) while creating a keyspace.
In Listing 3.2, we have showed the statement of creating a column family. The syntax
is similar to that in SQL to create a table contains a number of columns and a primary
key for each row. Alternatively, we can even use the CREATE TABLE statement to
create a column family. Furthermore, some operations like the chance of performing
Read Repair can be specified here.
Drawbacks of CQL:
Although CQL is developing rapidly and getting more and more strong, it is still not
yet SQL. Currently there are still some restrictions comparing with SQL. For instance,
when we query data, only the columns that are part of the primary key and/or have a
secondary index defined on them are allowed in the WHERE clause; the IN relation is
only allowed on the last column of the partition key [Casb].
[ cqlsh ]
Driver, Java Driver, PHP Driver, and Python Driver) in production applications to
pass CQL statements from the client to Cassandra cluster. Since Cassandra and our
client applications are written in Java, in the following, we will give a brief introduction
for Java Driver9 .
Java Driver has a fully asynchronous architecture, which is based on layers. There are
three modules in the driver, namely driver-core, driver-mapping (an object mapper)
and driver-examples. Driver-core (also as the core layer) handles things like connection
pool, discovering new nodes and automatically reconnecting, which are related to the
connections to a Cassandra cluster. Moreover, the higher level layer can be built on
top of API provided by it.
Java Driver is built on Netty, which helps to provide non-blocking I/O with Cassandra.
Furthermore, it has the following main features:
UPDATE userLogs
SET actions = 'post a status'
WHERE took_at = '2016−02−16 20:20:00−0500'
IF actions = 'log in';
Automatic nodes discovery: the driver automatically obtains information like sta-
tus of all nodes in the cluster.
Transparent failure: the driver will automatically and transparently connects an-
other node if a Cassandra node is down and the client has not specified that node
as a coordinator. Java Driver will automatically perform a reconnection to the
unavailable node in the background.
Cassandra trace handling: the client can trace a query by using a convenient API of
Java Driver. Information of a query like the IP address of the coordinator as well
as nodes performing the query and the number of involved SSTable is included in
the tracing result.
3.5 Summary
In this chapter, we have introduced foundations of the rapid developed Cloud storage
systems, compared NoSQL DBMSs and RDBMSs in detail, and highlighted Cassandra.
In the next chapter, we will propose a Cloud-based architecture for MMORPGs.
46 3. Cloud Storage Systems
4. Cloud Data Management for
MMORPGs
Some ideas in this chapter are originally published in the DASP article
“Cloud Data Management for Online Games: Potentials and Open
Issues” [DSWM13] and the CLOSER paper “Towards Cloud Data Man-
agement for MMORPGs” [DS13].
Table 4.1: Data Management Requirements and Recommendations for Data Storage
Consistency vs. Availability: the inconsistency of account data might bring trou-
bles to a player as well as the game provider, or even lead to an economic or
legal dispute. Imagine the following two scenarios: a player has changed the
password successfully. However, when this player logs in to the game again, the
new password is still not valid; a player has transferred to the game account, but
the account balance is not properly presented in the game system. Both cases
would influence the player’s experience, and might result in the customer or the
economic loss of a game company. Hence, we need to access account data under
strong consistency guarantees, and manage them with transactions. Availability
is less important here.
Scalability: an account database manages data from millions of players, but with a
small size. Hence, system scalability is not required. Processing of a large number
of concurrent requests might become a challenge for the database, which could be
addressed by database sharding.
In general, an RDBMS can already fulfill all management requirements of account data.
On the other side, if there is no transaction between rows, a NoSQL DBMS (CP type,
see Section 3.3.1) supporting lightweight transactions can handle them better because
the user account and password (as well as further fields) could be regarded as key-value
pairs to process.
Consistency vs. Availability: players are not as sensitive to game data as to account
data. For example, the change of an NPC’s appearance or name, the duration of a
bird animation, and the game interface may not catch the attention of a player and
have no influence on players’ operations. On the other hand, some changes of the
game data must be propagated to all online players synchronously, for instance,
the change of the game world’s appearance, game rules as well as scripts, and
the occurrence frequency of an object during the game. The inconsistency of
these data will lead to errors on the game display and logic, unfair competition
among players, or even a server failure. For this reason, we also need to treat
data consistency of game data seriously.
Game data are updated or loaded from a game server to the client side when a
player logs in to or starts a game. Therefore, from a player’s point of view, a
causal consistency (see Section 3.3.2) is required. From a game server’s point of
view, as long as players in the same game world hold the same version of game
data, the game is fair (players across game worlds are not able to communicate
with each other, consequently, data inconsistency among game worlds will not
be detected or affect the gameplay). It is noted that a game world is typically
hosted by one game server. Hence, eventual consistency among game servers is
acceptable. That means, both conventional and Cloud DBMSs/file systems can
manage them.
Scalability: game data size changes with upgrading the game or launching a new game
edition. The growth of them typically will not pose a challenge to the hard disk
space. Hence, there is no scalability requirement.
Generally, the management of game data do not pose any challenge to the file system/-
database. Both conventional and Cloud storage system can handle them.
Consistency vs. Availability: state data are modified by players frequently dur-
ing gameplay. The modification must be perceived by all relevant players syn-
chronously, so that players and NPCs can respond correctly and timely. An
example for the necessity of data synchronization is that players cannot toler-
ate that a dead monster continues to attack their characters. Note that players
only access the in-memory database during gameplay. Hence, this database must
ensure strong consistency, and support the transaction management as well as
50 4. Cloud Data Management for MMORPGs
complex queries. Moreover, the request for data processing must be responded in
real-time, so it has high requirement for availability.
Another point about managing state data is that modified values must be backed
up to a disk-resident database asynchronously. Similarly, game developers also
need to take care of data consistency and durability in a disk-resident database.
For instance, it is intolerable for a player to find that her/his last game record
is lost when she/he starts the game again. In contrast to that in the in-memory
database, we do not recommend ensuring strong consistency to state data. The
reason is as follows: according to the CAP theorem, we have to sacrifice either
data consistency or availability in the case of network partitioning or high network
latency. Obviously, it is unacceptable that all backup operations are blocked until
the system recovers, which may lead to data loss. Consequently, the level of
data consistency has to be decreased. We propose to ensure a read-your-writes
consistency (see Section 3.3.2) for state data on disk.
Scalability: we have already discussed the scalability requirement of state data in Sec-
tion 2.3. MMORPGs are having trouble dealing with the surge in the size of state
data. However, it is noted that state data in the in-memory database do not have
a large scale, which is limited by the total number of concurrent players that a
game server supports.
It is reasonable to manage state data in the in-memory database and the disk-resident
database by using different technologies. An in-memory RDB is more suitable for
caching state data because it can perform complex queries. We propose to use a NoSQL
store (AP type, see Section 3.3.1) in our project to persist/back up state data on disk
because it can solve the scalability issue and guarantee high availability. The challenge
is that there is no such a NoSQL DBMS exactly designed for MMORPGs. We must
find a relatively appropriate one, and then improve it to fulfill all requirements (e.g.,
supporting read-your-writes consistency) of managing state data.
Consistency vs. Availability: log data are mainly used by game developers for fix-
ing bugs of the game program or by data analysts for data mining purpose. These
data are firstly sorted and cached on the server side during the game, and then
bulk stored into a disk-resident database, thereby reducing the conflict rate as well
as the I/O workload, and increasing the total simultaneous throughput [BBC+ 11].
It is noteworthy that propagation of log data among replicas will significantly
increase the network traffic and even block the network. Moreover, log data are
4.2. A Cloud-based Architecture for MMORPGs 51
generally organized and analyzed after a long time. Data analysts are only con-
cerned about the continuous sequence of log data, rather than the timeliness of
them. Hence, data inconsistency is acceptable in a period of time. For these
reasons, a deadline-based consistency model, such as timed consistency (see Sec-
tion 3.3.2), is more suitable for log data.
Scalability: log data are derived from millions of players, and are appended continu-
ally. Accordingly, the database managing them must be scalable.
The general format of log contents is a specific time followed by an operation infor-
mation, which is a natural key-value pair. Moreover, they have high requirements for
scalability and availability, but low requirements for data consistency. Therefore, man-
agement of log data using a NoSQL DBMS (AP type, see Section 3.3.1) is more sensible
than using an RDBMS. We can use some tools like Apache Spark1 for data analysis
then.
Client
Cloud Database
Zone Server
(RDB Service)
Account Data Cloud Database
Map/Logic Server (NoSQL Store)
Log Data
Cloud Storage Transactional database cache
(Cloud file system) (in-memory RDBMS)
& Cloud Database
(RDB Service)
Game Data
Cloud Database
(NoSQL Store)
State Data
Based on the analysis above, we have proposed a set of cooperating and composable
Cloud services to improve the existing game architecture [DSWM13].
We suggested managing game data with Cloud storage (Cloud file system) as well as
Cloud database (RDB service), and manage other data sets with Cloud databases.
Account data must be processed with ACID transactions, so they are managed by an
1
http://spark.apache.org/ (accessed 12.01.2016)
52 4. Cloud Data Management for MMORPGs
RDBMS service. In contrast, log data and state data are recommended persisting in a
NoSQL store.
Storing data in a public Cloud can make a game company focus on the game develop-
ment rather than maintaining the data storage system. Data are stored in the Cloud
provider’s data center. The provider is responsible for managing the data, which helps
a game company to reduce the time of developing new game editions and costs for
buying new drivers. However, the drawback is that the data could be stored in an
insecure environment, which cannot be controlled by the game company. Furthermore,
the latency of data transmission highly depends on the network traffic between game
servers and the Cloud provider’s data center. For this reason, this hosting solution is
suitable for a small game company or a company in its infancy. However, it is better
to host the account data by the game company itself.
For a game company already having expensive data centers, building a private Cloud
environment could be the best option. Data are transmitted across the company’s in-
tranet, hosted at internal data centers, and protected behind a firewall. The drawbacks
are the costs for maintenance/upgrading infrastructures and buying software licenses.
The game company can use Hadoop Distributed File System (HDFS) to store game
data. However, choosing a suitable NoSQL DBMS for state data and log data is a
challenge. We will discuss it in the next section.
Query characteristic: includes the condition in queries and the range of hotspot
queries. For example, the retrieval of user information is a random query, and the
retrieval of news is in accordance with time (the latest news is queried normally
more frequently).
In our case, there are two kinds of queries, namely the random query (retrieving
the state of a player) based on the primary key (user ID) and the range query (re-
trieving all operations of a player in a certain period of time) based on composite
keys (user ID and time).
A document store (implementing a document model) is designed to load the whole
document, when data stored in one document are retrieved or modified. This
feature ensures high-level consistency. However, in our case, we only need to
read/write parts of the whole row. For this reason, a NoSQL store implementing
a wide column model is a better choice because only the columns we need will be
loaded.
Data structure: Cassandra is a partitioned row store. Data are structured in the
column family, where each row is identified by a primary key or composite keys.
This structure is similar with that in an RDBMS. Therefore, it facilitates data
mapping between two kinds of database systems.
Cassandra offers a collection column, which is declared using the collection type
(e.g., list, set and map type), followed by another type like int or text. This
feature helps for denormalizating state data from numerous relations in an RDB.
54 4. Cloud Data Management for MMORPGs
Query: although the functionality of CQL is less powerful than SQL, it still supports
random/range queries on (compound) primary key.
Cassandra cannot only comply with these criteria, but also meet the requirements like
system scalability, availability and data consistency. Therefore, it becomes a good
choice to manage state and log data in MMORPGs.
In [Muh11], the author proposed to use another NoSQL store Riak2 , whose database
model is similar to that of Cassandra, to substitute the RDBMS in MMOGs. However,
after analyzing the use of databases in MMORPGs and features of NoSQL stores, we
come to the conclusion that NoSQL DBMSs cannot substitute RDBMSs in the game
scenario because of their limitations on the support of complex queries and the real-time
processing, etc. [ZKD08]. We propose to make them coordinate in a new architecture,
and deal with different data processing requirements.
In Section 4.1, we have analyzed game consistency requirements of each data set from
the storage system’s perspective. Although some studies have also focused on the
classification of game consistency, they generally discussed it from players’ or servers’
point of view [ZK11, LLL04, PGH06, FDI+ 10]. In other words, they have actually
analyzed data synchronization among players. Another existing research work did not
discuss diverse data sets accordingly [DAA10, SSR08], or just handled this issue based
on a rough classification of game data [ZKD08].
4.6 Summary
In this chapter, we have proposed a Cloud-based architecture for MMORPGs, and dis-
cussed the possibility of use Cassandra to substitute an RDBMS for managing state and
log data. Cassandra is, however, not designed for online games, so it has some draw-
backs while using in this application scenario. We will point out these shortcomings,
and propose some solutions in the next chapter.
2
Riak: http://basho.com/riak/ (accessed 05.08.2015)
56 4. Cloud Data Management for MMORPGs
5. Using Cassandra in MMORPGs
Some ideas in this chapter are originally published in the DASP article
“Cloud Data Management for Online Games: Potentials and Open
Issues” [DSWM13], the GvD paper “Consistency Models for Cloud-based
Online Games: the Storage System’s Perspective” [Dia13], an article “Cloud-
Craft: Cloud-based Data Management for MMORPGs” [DWSS14], the
DB&IS paper “Cloud-based Persistence Services for MMORPGs” [DWS14],
and the IDEAS paper “Achieving Consistent Storage for Scalable MMORPG
Environments” [DZSM15].
In the last chapter, we have proposed to take Cassandra as an example for data persis-
tence in the Cloud-based game architecture. However, there are some issues to address
when we use it. In this chapter, we will analyze shortcomings of Cassandra using in the
game scenario, and offer a viable solution for each of them. Issues like the guarantee
of read-your-writes consistency efficiently, optimization of read performance, mapping
database schema between an RDBMS and Cassandra will be discussed.
consistency (e.g., strong consistency or read-your-writes consistency for state data), the
consistency level of a write operation and its subsequent read operation must meet the
following prerequisite:
In this formula, N refers to the total number of replicas (replication factor ) for a
row, while W and R represent the consistency level of write and read, respectively.
This formula states that only if the total number of replicas responded write and its
subsequent read exceeds the replication factor, Cassandra could ensure data consistency.
This is because only in this case at least one replica responding to the query contains
the up-to-date data. However, this mechanism causes following issues.
Issues:
1) More than half of the replicas have to participate in the process of updating and
getting data, which increases the response time for both write and read operations.
Particularly in multiple data centers, where replicas are distributed in several loca-
tions, the influence is significant.
2) In the case of multiple nodes failure, only one replica available (replication factor
is larger than one), write or/and read will fail because the prerequisite is not met,
thereby undermining system availability. In fact, if only a fixed replica node performs
write and its subsequent read (just like a master replica), the result must be up-to-
date. Other replicas will be eventually consistent through the built-in repair utilities
(see Section 3.4.3).
Operation Number of nodes Write requests Read requests Detection method Inconsistent data
WORO 5 10000 10000 Eager 10.43%
WORO 5 10000 10000 Lazy 0%
WORO 3→5 10000 10000 Lazy 4.09%
WORO 3→5 200000 10000 Lazy 22.16%
against this row is sent (so called eager detection). In theory, 66.7% of the results could
be stale. In our practical experiment, however, only an average of 10.43% of them is
stale. If the read requests are sent after all (10000) write requests are successful (so
called lazy detection), no stale data have been detected (see test II).
The reason is that, no matter which consistency level is specified by the client, Cassan-
dra actually forwards the write request to all available replicas at the same time. In
other words, if all replicas are available, the modification will be synchronized to all of
them. Since the write performance of Cassandra is high, in our experiment environment
(a single data center) stale data are rarely detected.
In test III we have simulated a bad case: two nodes are failed when updating data, but
all five nodes are alive when reading data (if more than two nodes fail, for some data
objects all three replicas will be unavailable. Thus, write operations against these data
could not be performed.). In this case, reading from the two temporarily unavailable
nodes should fetch stale data. However, the result (see test III) shows that only an
average of 4.09% of data is stale.
We increase the total number of write requests to 200000 so that there could be more
inconsistent data (about 300 MB) in the cluster, and Cassandra needs more time to
replay writes. The experimental result shows that although the number of stale data is
enlarged (about 22.16%), it is not directly proportional with the increase of the number
of write requests (see test IV). Furthermore, the inconsistency window in this situation
is about 942296 ms (about 15 minutes 42 seconds), which is short.
Game Layer
Generic functionality for login,
user interaction, game state,
rules, maps, etc.
Proposal:
To address this issue, the system should be able to detect stale data automatically.
Only when stale data are returned, the system increases the consistency level of read
and executes it again. To achieve this goal, the existing Cloud storage system needs to
be extended [APV07, GBS11, WPC09, Dia13].
To describe the timestamp-based detection model, we need to first outline the process
of checkpointing game state data.
The DAS creates a consistent snapshot of game state from the in-memory database pe-
riodically. The system’s current time of the DAS will be used as a unique monotonically
increasing version ID (also called TS) for each checkpoint. The DAS executes a bulk
write to Cassandra with consistency level (CL) ONE. Cassandra divides the message
into several write requests based on Id. The current state of an object and the TS are
persisted together in one row. When the DAS receives a success acknowledgment, it
will use the same TS to update the TST accordingly (see Algorithm 1).
When a player has quit the game and the state data of her/his avatar have been backed
up to Cassandra, the log status will be modified to “Logout”. Then, the DAS sends a
delete request to the in-memory database to remove the state data of the avatar. Both
situations are showed in Figure 5.2.
When a player restarts the game, the DAS first checks the player’s log status in the
TST.
62 5. Using Cassandra in MMORPGs
write request
data & TS
Checkpointing
TS → Version ID
[Id, TS, "Login"]
quit request
Read (1): If the value is “Login”, that means the previous checkpointing is not yet
completed, so the up-to-date state data of her/his avatar is still hosted in the
in-memory database. In this case, the state data do not need to be recovered,
and data will be directly fetched from the in-memory database (see Figure 5.3
Read 1).
Read (2): If the value is “Logout”, the DAS gets the timestamp from the TST, and
then uses TS and Id as query criteria to retrieve the up-to-date checkpoint with
CL ONE. When a replica in Cassandra receives the request, it compares the TS
with its own TS. If they match, the state data will be returned. Otherwise, a null
value will be sent back. In this case, the DAS has to increase the CL and send the
read request again until the up-to-date checkpoint is found or all available replicas
have been retrieved. If the expected version still has not been found, the latest
version (but stale) in Cassandra has to be used for recovery. At last, the player’s
log status in the TST will be modified from “Logout” to “Login” (see Algorithm 2
and Figure 5.3 Read 2).
5.1. Guarantee of Read-your-writes Consistency 63
//get the version ID of the avatar/game object from the timestamp table
(TST)
T S ←− TST.getTS(Id)
return(data)
end
Algorithm 2: The Process of Data Recovery
64 5. Using Cassandra in MMORPGs
data RR
RR & TS
Read (2) Check
[Id, TS, "Login"] Version ID
write state data data
New issue description: if the first attempt of retrieval fails, the read operation has
to be executed again with a higher read consistency level, which increases the
response time. Therefore, we can conclude that the success rate determines the
read performance.
The reason is that the read request is executed by a replica node, which does not
host the up-to-date checkpoint. For instance, Figure 5.4 shows the process of writing
a checkpoint and its subsequent operation of reading the checkpoint. The up-to-date
checkpoint is hosted by node B (replica 1). Unfortunately, the coordinator has for-
warded the read request to node C (replica 2), which hosts a stale checkpoint. In this
case, a null value would be returned to the client, and the read operation has to be
executed again.
To optimize our timestamp-based solution, we propose to sacrifice a part of database
transparency in exchange for the success rate. In other words, the IP address of the
replica node that has performed the last checkpointing, will also be recorded in the
TST. For subsequent read requests on this checkpoint, the DAS will connect to that
5.1. Guarantee of Read-your-writes Consistency 65
A Coordinator A
H B Replica 1 H B Replica 1
est (up-to-date)
requ onse
Coordinator read resp
write request read
Client G C Replica 2 Client G C Replica 2
write response (stale)
F D Replica 3 F D Replica 3
(stale)
E E
Figure 5.4: Process of Executing Write and Read Operations in Cassandra Cluster
A A
H Replica 1 B H Replica 1
(stale)
B
F Replica 3 D F Replica 3 D
(stale)
E E
Figure 5.5: Process of Executing Write and Read Operations Using NodeAwarePolicy
node (as a coordinator) directly. In this case, the success rate will be increased if that
node is still available (see Figure 5.5).
We can understand this strategy like this for write operations each replica is still iden-
tical as before, but for read operations there is a “primary” replica. For this reason, our
proposal will not affect the system availability. The checkpoint could still be flushed to
any replica as before; if that replica node fails, a read request could be executed by the
other replica nodes. In our project, we name this strategy as NodeAwarePolicy.
It is noteworthy that the Cassandra tool, Java Drive, provides a TokenAwarePolicy for
load balancing (see Section 3.4.6), which has a similar function with our NodeAware-
Policy. However, there are the following differences:
1) For each write/read using the TokenAwarePolicy, only the node hosting the first
replica (determined by the token value of a data object’s partition key) will be
used as a coordinator. In other words, each replica node is not identical any more,
instead there is a “primary” replica. So the system performance will be affected if
the workload of the “primary” replica node is heavy, or the physical distance between
that node and the Cassandra client initiating a request.
NodeAwarePolicy needs to record the host IP address for each data object on the
server side, so we have to consider the persistence of these information (we will
discuss that in the next subsection.).
4) The TokenAwarePolicy cannot get information of all replica nodes of a data object
because it has no idea about the replication factor and replica placement strategy of
the ring. As a result, if the “primary” replica node is unavailable, a random node will
take its place as a coordinator. In contrast, NodeAwarePolicy considers the node
failure, thus all these information will be collected. If a replica node fails, the other
replica nodes will be first used as an alternative.
In summary, our NodeAwarePolicy has a better write/read performance than the To-
kenAwarePolicy, especially in an unstable environment, where node failure occurs oc-
casionally. We will give an experimental proof later in Section 6.4.
the system performance. Our timestamp-based solution could guarantee data currency
in an eventually consistent environment.
The possibility of using the timestamp as a version ID to identify current data in a
replicated database has already been studied. However, these solutions either focus on
some other storage systems or are designed for other application scenarios. For instance,
in [APV07], authors have proposed a new key-based timestamping service to generate
monotonically increasing timestamps, which is designed for distributed hash tables with
a P2P structure. The main objective is to synchronize timestamps in a P2P environment
and to scale out to large numbers of peers. The testbed in this work takes a client/server
structure, where timestamp synchronization is not problematic. Improvement of the
efficiency and accuracy of queries in a NoSQL DBMS (e.g., Cassandra) is our research
focus.
In [GBS11], authors also record the version ID got from a Cloud storage system for data
currency. However, they suggested that if a read request fetched an old version ID from
the Cloud, the system should wait for some milliseconds and try it again. Consequently,
this approach blocks the read operation and increases unnecessary response time, which
is not suitable for MMORPGs.
Issues:
1) We proposed to persist the state information of an avatar/object in a single row
in Cassandra. As introduced in Section 3.4.2.3 Cassandra stores data in SSTables
and Memtables. When a read request against a row comes in to a replica node, all
SSTables and any Memtables that contain columns from that row must be combined.
For this reason, the more fragments the row has, the more CPU and disk I/O will be
consumed. Unfortunately, Cassandra is used in our project to persist checkpoints.
That means data in one row are frequently updated. If data are not well structured
in a column family, it is easy to produce large amounts of fragments.
2) The row cache can be used in Cassandra to avoid fetching data from disk (see Sec-
tion 3.4.4.2), which works when a partition/row is frequently accessed. However, in
our case, a checkpoint is at most used once. Therefore, this built-in mechanism is
useless to us.
68 5. Using Cassandra in MMORPGs
Option I
We store the timestamp as a “regular” column (see Figure 5.6a).
Advantages: the old values will be deleted when storing a new checkpoint. That
means, if a replica node has executed an update successfully, there are no stale
data on this node any more. Hence, the database size will not grow unlimitedly.
Disadvantages: columns of a row are fragmented in several SSTable files because new
values will be stored in a new SSTable (see Section 3.4.4.1). As a result, data will
be then collected from multiple SSTable files, which affects the read performance;
furthermore, a secondary index on the timestamp column needs to be created,
if we want to use the timestamp as a criterion to look up data. However, the
timestamp is changed too frequently in our application scenario. Maintenance of
such an index affects the write performance.
Option II
The second approach is to use the avatar’s/object’s ID and the timestamp as compound
primary key (see Figure 5.6b). In this case, the timestamp is used as a clustering column.
Option III
We can also combine the avatar’s/object’s ID and the timestamp (separated by a special
symbol like @), and use it as a primary key (see Figure 5.6c).
This design has all the advantages and disadvantages of the second option because
each checkpoint is also stored in a single SSTable. The difference between them is the
partition key. In the option II the avatar’s/object’s ID is used as the partition key,
so all checkpoints of one avatar/object could be fetched from one node. However, in
the option III the partition key is the primary key, the value of witch is unique. That
means, checkpoints of one avatar/object are distributed on different nodes (probably
all nodes in the cluster), which brings some new advantages and disadvantages.
Advantages: it offers higher fault tolerance. By using the option II (as well as the
option I), if all replica nodes hosting the state data of one avatar/object are down,
checkpointing of this avatar’s/object’s state data has to be blocked until one of
the replica nodes recovers. By using the option III, the replica nodes are not
fixed because the value of each checkpoint’s partition key is different. So if a
latest checkpoint has been successfully flushed to disk, the backup of the blocked
checkpoint could be ignored.
Disadvantages: frequent changes (adding and removing) of row keys cause a heavy
overhead for maintaining the primary index. Furthermore, when the timestamp
table on the server side fails, fetching the latest checkpoint is problematic. CQL
does not support a ‘like’ query as in SQL. That means, currently, it is impossible
to get the checkpoint by only using an avatar’s/object’s ID.
From the above discussion, we can reasonably reach a conclusion that the option II
is more appropriate for our application scenario. In addition, this schema could also
be applied by the log column family, so that all log information of a player could be
ordered by the timestamp and stored in the same replica node.
Issue:
The design concept of these two kinds of database is different. For example, tables in
the RDB are often normalized in order to minimize data redundancy, so the result of
a query is always gotten from multiple tables. In contrast, tables (column families) in
Cassandra are denormalized to improve the query performance, so the result is gotten
from a single row. For this reason, we need a middleware on the server side to transfer
data between a multi-table schema and a single-column family schema. Consequently,
the structure of a column family must be well designed, so that the middleware can
work efficiently and the change of the structure of a table will not cause a wide range
of modification of the middleware program.
Solution I:
In our previous work, limited by the capability of the early CQL and Cassandra, we
simply proposed to design the structure of a column family like in Figure 5.8 [DSWM13].
Since the column name in Cassandra must be unique, we have to rename the columns
in the Inventory table. Although this is feasible, it brings a number of problems:
1) The middleware must be aware of the naming rules in Cassandra. As long as the
rules have been changed, the middleware program has to be modified to adapt to it
either. Otherwise, it will inevitably lead to system errors. Furthermore, the program
does not have the versatility, which is hard to be used for other games.
2) Columns in a column family are ordered by their names. After we map multiple
tables in an RDB to a single column family, all columns will be reordered. That
means, columns once belonged to the same table will not be stored sequentially,
which increases the processing time of the middleware to map results/checkpoints
back to the in-memory DB.
Solution II:
With the help of Cassandra 2.1 and later, we can use collection types and a user-defined
type (UDT) to improve the structure of a column family (see Section 3.4.1). Figure 5.9
shows a sample of the new data structure of character column family. Inventory is
the name of a map column, and <quantity:10, location:1> is the value of a UDT (The
implementation of this structure will be illustrated later in Section 6.3). Through using
this structure, we can order all columns of a table together, and use their own names.
Accordingly, all problems mentioned above have been solved.
5.4. Other Issues 71
Character Inventory
ID Name Gender age ItemID CharacterID quantity location
1 Alex male 32 1 2 10 1
2 Ann female null 3 2 5 2
Character
ID: 1 Name: Alex Gender: male Age: 32
ID: 2 Name: Ann Gender: female Item_1: 1 Item_1_quantity: 10 Item_1_location: 1 Item_2: 3 Item_2_quantity: 5 Item_1_location: 2
Character
ID: 1 Name: Alex Gender: male Age: 32
ID: 2 Name: Ann Gender: female Inventory
Read Repair:
Read Repair is used to guarantee eventual consistency (see Section 3.4.3). However,
in our use case, we do not mind whether the data in the cluster are consistent or not
after fetching the checkpoint. Performing this request consumes unnecessary system
resources, which affects the throughput of the cluster. Although the default probability
of performing it is reduced to 0.1 in Cassandra 1.0 and later, we still propose to disable
it.
Query capability:
CQL and Cassandra are developing rapidly, but their query capability is not as strong
as RDBMSs. As a result, game developers have to consider all possible queries before
they design a column family. That will increase the difficulty of developing games and
limit the expansion of the game functionality.
72 5. Using Cassandra in MMORPGs
5.5 Summary
In this chapter, we have discussed the potential issues we have to face when using
Cassandra for backing up checkpoints. Some possible solutions like a timestamp-based
model have been proposed to address these issues. However, for some issues we can only
rely on the further development of Cassandra. In the next chapter, we will introduce
the implementation of the Cloud-based game testbed and some other testbeds in detail,
and then evaluate and compare them.
6. Evaluation
Under the CloudCraft project, we have run a number of sub-projects in order to verify
our proposal of a Cloud-based MMORPGs. In this chapter, we classify these sub-
projects into three groups, namely the potential scalability of a Cloud-based game
system, the performance comparison of MySQL Cluster and Cassandra in MMORPGs,
and the efficiency of guaranteeing the Read-Your-Writes consistency. Next, we will
introduce them one by one.
designed and implemented a prototypical game platform, which borrowed the design
from an open source MMORPG test environment and ported it to Cassandra [Wan13].
We have to point out that physical resources for the experiments were limited as de-
scribed below, so the focus is mostly on scaling the number of clients versus a small set
of up to five Cassandra servers. Nevertheless, we got some interesting results.
node 1
node 5
Cassandra Internet
Cluster
Client Computer 2
Game Server 2 VPN and
node 2
SSH Server
node 4
Client Computer 3
node 3 Game Server 3
game client randomly ordered a write/read command regarding one of those columns,
and then sent it to the game server. Meanwhile, the response time of each command
has been recorded.
The evaluation focuses on the potential scalability and performance of our prototype
in the case of multi-player concurrent accesses. In the experiment, we will change the
number of nodes in the Cassandra cluster from one to five, so we keep the replication
factor of the cluster to one (That means, the cluster has only one copy for each row, and
a write/read command will succeed once one replica node responds to it.). Otherwise, if
the replication factor is larger than the number of nodes in the Cluster (for example, in
a single node Cassandra, the replication factor is specified to three), the system throws
an exception when executing a command.
6.2.4 Experiments
We have evaluated the scalability of the game server and Cassandra cluster in an online
game scenario separately.
1000
500
(a) Average Response Time (Calculated from (b) Maximum Number of Concurrent Clients
(500*Number of Concurrent Clients) Com- Supported by Different Number of Game servers
mands) of a single Game Server Connected
by Different Number of Concurrent Clients
Figure 6.4: Scalability of the Game Server Connecting with Five-node Cassandra
the acceptance of subsequent commands. (The default maximum amount of time that
a transaction will be permitted to run before being aborted is 100 milliseconds4 .) So
the maximum number of concurrent clients in the case of single game server is around
500. Similarly, we found that the client number is directly proportional to the growth
of the number of game servers (see Figure 6.4b). Therefore, we came to the conclusion
that the total amount of clients is limited by the concurrent processing capability of
the game server, whereas it could be raised easily by adding more servers.
Write Write
Read Read
Average Response Time for One Command (ms)
Three-node Cassandra
Write
Read
Average Response Time for One Command (ms)
Write Write
Read Read
Average Response Time for One Command (ms)
Figure 6.6: Comparison of Write and Read Performance of Different Cassandra Clusters
(Node Number from One to Five) Connected by Various Number of Concurrent Clients
(from 300 to 1500)
Figure 6.5b shows that the maximum number of clients reaches 1200 when there are two
nodes in the Cassandra cluster. In the case of 1500 concurrent connections, the issue of
timeout appears again. Therefore, we conclude that a two-node Cassandra cluster can
support about 1200 clients by using our prototype.
Figure 6.5c, 6.5d and 6.5e present that, when the number of nodes in Cassandra cluster
is more than three, our prototype can support at least 1500 concurrent players.
In order to observe the different results in Figure 6.5, we plot the writing and reading
response time in Figure 6.6a and 6.6b.
3. The five-node Cassandra cluster exhibits the best and most stable performance
in all range of clients’ number. With increasing number of clients, there is no
obvious variation of reading and writing response time. Both of them fluctuate
around 15 ms.
4. Generally, the system performance has been improved by scaling out the Cas-
sandra cluster. For example, five-node Cassandra has the best performance;
three-node and four-node Cassandra are observably better than two-node cluster.
However, there are still some exceptions. An example is that the performance of
three-node and four-node Cassandra is similar. Theoretically, four-node Cassan-
dra should be better. However, our experiment shows some contrary results, such
as reading response time at 1500 clients and writing response time at 900 clients.
It may be caused by network latency, system configurations, or even some internal
processing mechanism of Cassandra. Unfortunately, our prototype cannot reveal
the reason.
5. One-node Cassandra shows a better performance in the case of 300 or 600 clients.
The reason could be that the advantage of a multi-node Cassandra cluster is
not outstanding when the number of concurrent players is relatively small. In
addition, the communication between nodes also consumes some time since data
are distributed on different nodes.
Based on the analysis above, we can conclude that a NoSQL DBMS like Cassandra
exhibits a satisfactory scalability for typical MMORPG requirements. With increasing
numbers of clients, the database performance encounters a bottleneck. However, the
database throughput as well as response time can be improved easily by scaling out
the cluster; Cassandra shows a high performance in the experiment. The response time
of writing and reading typically fluctuates between 10 ms and 40 ms, which fulfills the
requirement of an MMOG [CHHL06]; Cassandra is a write-intensive database. The
experimental results show that its writing performance is stable and excellent. This
feature makes it suitable to perform a backend database of a multi-player online game,
which needs to handle more write requirements.
82 6. Evaluation
PlaneShift is presented in Chapter A. There are totally 92 tables in the database, which
could be mainly divided into eight groups based on the game scenario, namely character
tables, guild tables, NPC movement tables, NPC dialog tables, crafting tables, spells
tables, items tables and Mini-game Tables9 . These tables are designed to mange various
data sets (see Section 2.1.3.2). For example, the accounts table is used to store players’
account data; the gm command log table is applied to back up players’ log data; state
data in PlaneShift are respectively managed in character tables, NPC Movement tables,
items tables, and so on.
In our experiment, we only focus on accessing the state data of character entities. For
this reason, only nine tables are involved (see Figure 6.8):
Characters table : has total 61 attributes, such as id, name, account id, loc x (loca-
tion coordinate) and bank money circles. It is the core table among these tables.
Other tables refer to the id attribute of it.
Item instances table : records the information of an item instance, like its owner
(char id owner), guardian (char id guardian), creator (creator mark id), location
coordinate and so on. It has 32 attributes.
Other tables : including character events, character traits, player spells, character quests,
trainer skills and character skills are associative tables (bridge tables). These ta-
ble are used to resolve many-to-many relationships between characters table and
another table in the database. For example, character skills table maps characters
table and skills table together by referencing the primary keys of them to present
all skills that a character has.
Figure 6.8: Character State Data Related Tables in the PlaneShift Database
6.3. Comparative Experiments of System Performance 85
Management node
B C D E
Group 1 Group 2
Stored procedure : is a set of SQL statements with an assigned name and parameters
(if it has) that is stored in the database in compiled form. Business logic could
be embed in the procedure. The conditional logic applied to the results of a
SQL statement can determine which subsequent SQL statements are going to be
executed. Furthermore, it can be shared by a number of applications by calling
86 6. Evaluation
E B
D C
We will use them on all three kinds of operations, and choose the best experimental
result of each operation to compare with results from testbed-Cassandra.
Moreover, for data checkpointing, a strategy called CopyUpdated is adopted. That
means, only the changed values will be updated into the database. This strategy can
significantly reduce the number of operations for each checkpointing. In order to realize
it, we have used an in-memory database, H210 , in the testbed to store the information of
the last checkpoint, which will be used to compare with the current one. Comparative
results are used to determine things like which row/column in a table needs to be
updated, which row needs to be removed, which data need to be inserted into a table.
characters
id account_id loc_sector_id racegender_id name col1 ... col56 character_events ... item_instances
want to use the basic operations for the later comparison. For the data checkpointing,
we have adopted another strategy called CopyAll. That means, the current checkpoint
will completely substitute the stale one in the column family. This strategy leads to a
number of repeated writes, if the change between two checkpoints is small. However, it is
ideal for Cassandra because in this way there is only write operations for checkpointing
without delete and query operations.
A comparison of two testbeds shows in Table 6.2.
...
We have carried out two groups of experiments under different experimental environ-
ments.
6.3.4 Experiments
We will evaluate each testbed separately, and then compare their results at last.
6.3.4.1 Experimental Results from Testbed-MySQL
The experimental results shown in Figure 6.13 make it easy for us to reach the following
conclusions:
1) The best performance of inserting state data is gotten from using prepared state-
ments. For update and read operations in both experiential environments, the use
of stored procedures helps more. We will then use the best result for each operation
in this testbed to compare with that from testbed-Cassandra.
2) The experiential results have proven that both prepared statement and stored pro-
cedure can help for enhancing the system performance.
3) Stored procedure is more suitable for our application scenario. Some complex
database operations, like update, insert, query and delete data from multiple ta-
bles can be packaged together as a stored procedure, and manage as a transaction
in the database.
4) The read performance of MySQL Cluster is high. The reason is that MySQL Cluster
partitions and distributes data by hashing on keys. It uses then a hash index to query
data, rather than scanning all rows.
5) The performance of update is higher than insert in our experiment. That is because
by using the CopyUpdated strategy less data have been modified during the updating.
6) The data volume in the database does not affect on the results significantly. The
reason could be that in this experiment, it is not a challenge to use a five-node
MySQL Cluster to manage records of one million characters.
1) In contrast with the results from testbed-MySQL, the write performance of Cassan-
dra is higher than read. The reason has been discussed in Section 3.4. Especially in
this experiment, the read consistency level is ALL, which is higher than the write
consistency level (ONE ).
2) The data volume does not affect the experimental results. A five-node Cassandra
cluster can also deal with the records of one million characters.
6.3. Comparative Experiments of System Performance 91
3500000
2500000
2000000
(ms)
1500000
1000000
500000
0
Insert Update Read
Basic 2983157 2034964 257146
Prepared Statement 1948707 1710949 251526
Stored Procedure 2195305 1555513 190943
(a) Comparison of the Performance of Insert, Update and Read Using Three
Methods in the Experimental Environment I (No Character)
Average Running Time for 10000 Operations
2500000
2000000
1500000
(ms)
1000000
500000
0
Update Read
Basic 1905043 297146
Prepared Statement 1709935 291786
Stored Procedure 1653074 177632
(b) Comparison of the Performance of Update and Read Using Three Methods
in the Experimental Environment II (One Million Characters)
Figure 6.13: Comparison of the Performance (Average Running Time for 10000 Op-
erations) of Different Operations of Testbed-MySQL Using Three Methods in Two
Experimental Environments
92 6. Evaluation
114000
113000
111000
Operations (ms)
110000
109000
108000
107000
106000
Insert Update Read
Average Running Time 108776,9 110803,4 113359,9
(a) The Performance of Insert, Update and Read in the Experimental Environ-
ment I (No Character)
140000
120000
Average Running Time for 10000
100000
Operations (ms)
80000
60000
40000
20000
0
Update Read
Average Running Time 101119,2 122918
Figure 6.14: Performance (Average Running Time for 10000 Operations) of Different
Operations of Testbed-Cassandra in Two Experimental Environments
6.3. Comparative Experiments of System Performance 93
We have integrated the best results from two testbeds into two diagrams (see Figure 6.15
and Figure 6.16). The comparative results are obvious and convincing. In both two
experimental environments, use of Cassandra can significantly improve the efficiency
of data processing, even though we have not used any optimization technique. For
example, in Figure 6.15 the running time for inserting, updating and reading state data
has been reduced respectively by 94.4%, 92.9% and 40.6% through using Cassandra.
Besides the different ways of data processing between RDBMS and NoSQL DBMS,
there are following specific reasons:
1) The read performance has not been enhanced as much as the write performance.
There are two reasons: MySQL Cluster caches all data in memory. Cassandra
only caches the latest or the most frequently accessed data. Therefore, disk I/O is
inevitable; the read consistency level in Cassandra is specified to ALL. In contrast,
MySQL Cluster fetched data only from one replica. In the next section, we will
focus on improving the read performance of Cassandra.
2) The running time of checkpointing has been reduced obviously by using Cassandra.
The benefit is that the frequency of checkpointing could be increased, which reduces
the loss caused by a game server failure. Or we can checkpoint more characters’
state data at the original frequency.
Overall, these experimental results have shown that the use of an RDBMS for data
checkpointing and recovery in our game scenario is not efficient. The performance
could be easily improved by applying Cassandra instead.
94 6. Evaluation
Insert Update
(ms)
(ms)
Read
Average Running Time for 10000 Operations
(ms)
Testbed- Testbed-
MySQL Cassandra
Read 190943 113359,9
Figure 6.15: Comparison of the Performance (Average Running Time for 10000 Oper-
ations) of Two Testbeds in the Experimental Environment I (No Character)
6.4. Experimental Proof of the Timestamp-based Model 95
Update Read
(ms)
(ms)
Figure 6.16: Comparison of the Performance (Average Running Time for 10000 Oper-
ations) of Two Testbeds in the Experimental Environment II (One Million Characters)
Input: an avatar/game object’s UUID and the operation type (read, write, or
delete)
Output: the coordinator for this operation
begin
//***step I***//
if is a read operation then
get the host address (IP) from H2 based on the UUID
while not yet checked all replica nodes in Cassandra do
if a replica node with that IP is found && is up then
return(use this replica node as the coordinator)
end
end
end
//***step II***//
//is not a read operation//
//or did not find any alive replica node with that IP//
while not yet checked all replica nodes in Cassandra do
if a replica node is up then
return(use this replica node as the coordinator)
end
end
//***step III***//
//all replica nodes are down//
while not yet checked all other nodes in Cassandra do
if a node is up then
return(use this node as the coordinator)
end
end
end
Algorithm 3: Process of the NodeAwarePolicy Class
30 million rows have been previously inserted into Cassandra cluster and the TST to
simulate a real number of registered players in an MMORPG; in practice, there could
be multiple TSTs working in parallel. We have only used one in order to evaluate its
performance under a heavy workload; Each row in Cassandra contains a flexible number
of columns from 110 to 160, which simulates the different number of properties that
an object has; Cassandra is mainly used in this scenario to back up data, so writes are
significantly more than reads. During the experiment, we have executed 10 processes
in parallel to access Cassandra, with nine for writing and one for reading, which is used
to simulate a real game environment; we use the average running time for executing
6.4. Experimental Proof of the Timestamp-based Model 99
10000 operations under this experimental environment as the criterion to evaluate the
system performance.
6.4.3 Experiments
We will evaluate the efficiency of our timestamp-based model (TSModel), and compare
it with some built-in methods in various environments (e.g., all nodes are up, or some
nodes are down). Furthermore, there are two factors (accessing of the TST and data
size in the cluster) that affect the efficiency of the new model. We will also assess their
impact through experiments.
6.4.3.1 Effect of Accessing the Timestamp Table (TST) in H2
We have introduced a TST in the new game architecture, which certainly enhances the
running time for data processing. However, the TST has a simple structure (only four
columns), and is held in memory. Compared to the data accessing in a distributed disk
resident database with data replication, its effect is negligible.
Figure 6.18 presents the running time of writes and reads with different CLs. The
running time of accessing the TST only occupies about 9% in writes and about 12% in
reads. Moreover, even if the running time of H2 is calculated, the total running time
of querying with a low CL is still shorter than the running time of querying Cassandra
with a high CL. For instance, the total time of write ONE is 129231 ms, which is still
shorter than performing write TWO in Cassandra (131929 ms).
6.4.3.2 Write/Read Performance Using the Timestamp-based Model
We have proposed to use TSModel to guarantee the game consistency, and also proposed
to integrate TSModel with the NodeAwarePolicy strategy (N TSModel ) to improve the
system performance. In this experiment, we will evaluate the write/read performance of
TSModel, N TSModel and T TSModel (TSModel integrates with the build-in strategy
TokenAwarePolicy in Java Driver), and compare them with basic operations in Cassan-
dra. The write/read CL of the first three methods is set to ONE, and the running time
of H2 is calculated in the total running time. The write/read CL of basic operations
is set to ONE, TWO, or ALL, and the total running time only includes the running
time of Cassandra. During the experiment, all five nodes are available. The results are
showed in Figure 6.19. We can reach conclusions from the figures that:
100 6. Evaluation
200000
Operations (ms)
140000
120000
100000
80000
60000
40000
20000
0
W. One W. Two W. All
Running Time of H2 13088 12648 11642
Running Time of Cassandra 116143 131929 164566
700000
Average Running Time for 10000
600000
Operations (ms)
500000
400000
300000
200000
100000
0
R. One R. Two R. All
Running Time of H2 59879 58951 57061
Running Time of Cassandra 379952 484359 541479
1) the query performance using TSModel is between write/read ONE and TWO. That
means when all nodes in the cluster are available, the write/read performance to
guarantee the read-your-writes consistency is now close to CL ONE, which is efficient.
2) the query performance of both N TSModel and T TSModel is closer to or even bet-
ter than write/read ONE. Actually, through applying the TokenAwarePolicy and
NodeAwarePolicy strategy, the write/read performance of Cassandra has been in-
creased. The reason is that there is less communication among nodes in the cluster
(coordinator is one of the replica nodes).
6.4. Experimental Proof of the Timestamp-based Model 101
180000
Average Running Time for 10000 160000
140000
120000
Operations (ms)
100000
80000
60000
40000
20000
0
N_TSModel T_TSModel TSModel W. One W. Two W. All
Running time (ms) 102262 112965 131543 116143 131929 164566
(a) Comparison of Write Performance (Average Running Time for 10000 Operations)
600000
Average Running Time for 10000
500000
400000
Operations (ms)
300000
200000
100000
0
N_TSModel T_TSModel TSModel R. One R. Two R. All
Running time (ms) 384067 387952 429913 379952 484359 541479
(b) Comparison of Read Performance (Average Running Time for 10000 Operations)
Figure 6.19: Performance Comparison of TSModel and its Derivative (N TSModel and
T TSModel ) with Basic Operations (Write/Read with Different Consistency Levels) in
Cassandra
3) the performance of N TSModel is better than T TSModel, especially the write per-
formance. That is because the most efficient replica node has been chosen as the
coordinator, so the running time is short. And the workload of each node in the
cluster is more balanced.
1400000
1000000
Operations (ms)
800000
600000
400000
200000
0
N_TSModel T_TSModel TSModel R. One R. Two R. All
Running time (ms) 642208 663153 754319 515129 767003 1207829
Nr. of invalid results 12030 → 0 1777
0→0 24950 → 0 2568 342 0
1) only read ALL could ensure to fetch the up-to-date data by just retrieving once.
However, its performance is the worst.
2) although the running time of read ONE and read TWO is relatively shorter, both
of them cannot guarantee data currency.
4) in theory, by using the NodeAwarePolicy, no invalid data (null value) should be found
since the coordinator already holds the update-to-date data. However, in practice,
the invalid data are still returned. Through tracing the query, we found that the
coordinator does not always execute the request locally. Sometimes it forwards the
request to another replica node, which might be just recovered from a node failure.
Consequently it does not hold the up-to-date data. The reason could be that the
workload of this coordinator is too heavy, so the other replica node can process the
request faster. However, we can still state that the invalid data are halved (from
24995 to 1203), and consequently, the read performance is much closer to read ONE.
180000
Operations (ms)
120000
100000
80000
60000
40000
20000
0
W. One W. Two W. All
Running Time (Data size:
105771 117650 147317
130GB)
Running Time (Data size:
116143 131929 164566
280GB)
600000
Average Running Time for 10000
500000
Operations (ms)
400000
300000
200000
100000
0
R. One R. Two R. All
Running Time (Data size:
335105 419211 463480
130GB)
Running Time (Data size:
379952 484359 541479
280GB)
size (130 GB and 280 GB). Both write and read performance have been affected. The
time is wasted by retrieving a large number of files (SSTables) in disk. Therefore, we
conclude that it is imperative to clean up all stale data timely.
The strategy of deleting stale data could be classified into two groups, namely, eager
deletion and lazy deletion. Eager deletion refers to deleting the stale data instantly
after flushing a new checkpoint; Lazy deletion describes that stale data will be deleted
together asynchronously during a garbage collection at a specific time or under a certain
condition (e.g., when the cluster is idle). Cassandra does not yet support a range query
on the second compound primary key (TS in our experiment). That means, the stale
104 6. Evaluation
Update
Read
Average Running Time for 10000 Operations (ms)
data could only be deleted one by one, thereby spending the same time, whichever
strategy is chosen. Lazy deletion prevents bringing extra workload during peak hours.
However, we have to record the timestamp of each checkpoint on the server side, or get
it by executing an expensive read ALL in the cluster and use it to detect stale data.
Overall, we need to choose the strategy based on the actual scenario.
update and read performance with that of the other two testbeds when guaranteeing
the Read-Your-Writes consistency. The experimental setup is the same with the ex-
perimental environment II (one million characters) (see Section 6.3.3). There are two
replicas for each row in all three testbeds. The write consistency level is ONE and read
consistency level is ALL (TWO) in testbed-Cassandra. In testbed-N TSModel, if there
is no node failure, both write and read consistency level is ONE. Experimental results
are showed in Figure 6.22.
We can reach the conclusion that by using the N TSModel, both read and update
performance have been enhanced. The average time for processing 10000 updates and
reads is reduced by 94.9% and 57.5% comparing with the testbed-MySQL, which is
satisfactory.
6.5 Summary
In this chapter, we have implemented a number of testbeds for different experimental
purposes. The experimental results have proven that using NoSQL DBMS (e.g., Cas-
sandra) for checkpointing and recovering state data is reasonable. It brings a potential
scalability and high performance for persisting data of MMORPGs.
106 6. Evaluation
7. Conclusion
In a distributed system (like Cassandra cluster), the implied trade-off between con-
sistency and run-time performance and/or availability cannot be solved easily, as we
demonstrated. For this purpose, we introduced several concepts to manage consistency
in a multi-layered architecture. A key ingredient is a simple timestamp-based approach,
which checks for inconsistencies on the fly and only mends these if they occur. As the
check itself is quite light-weight and, as also shown, situations triggering the inconsis-
tency are very unlikely in our scenario, the new approach provides excellent run-time
performance compared to strongly consistent operations as provided by Cassandra it-
self. The average time for processing 10000 updates and reads is reduced by 94.9% and
57.5% respectively comparing with the testbed using MySQL Cluster.
Overall, by using the Cloud technology, we can solve the data persistence issues that
MMORPGs are facing. The new Cloud-based architecture helps to improve the scala-
bility, availability and performance of the game system significantly.
8. Future Work
We have carried out many experiments on the system scalability, performance and
game consistency. However, limited by the infrastructure and time there are more
experiments to be done, and some open issues are left to address. This research could
continue from the following aspects:
1) All experiments in this work were carried out in an intranet. In practice, data of an
MMORPG are distributed and replicated in multiple data centers to avoid getting
data from remote geographic locations. However, data synchronization across mul-
tiple data centers would delay the running time of writes. The write performance
of RDBMSs supporting strong consistency is even worse in this situation. By using
a NoSQL DBMS like Cassandra, we can specify the write/read consistency to the
LOCAL level (e.g., LOCAL ONE or LOCAL QUORUM ), which could improve the
write performance. That means, in theory the performance difference between using
two kinds of DBMSs in a multi-data center environment should be more obvious.
In the future, we would like to evaluate the performance of our prototype in this
environment.
2) We proposed a node-aware policy to specify the coordinator for each query, which
only works at the moment with the SimpleStrategy, that places the additional replicas
on the next nodes of the first replica node clockwise in the ring. In a multi-data
center environment, another replica placement strategy NetworkTopologyStrategy is
recommended, which places replica across multiple data centers and on distinct
racks. In this case, the method of getting information of all replica nodes using
in our prototype does not work. We would modify our program to adapt to this
environment in the future.
3) With the help of the node-aware policy, we can specify the replica node holding
the current checkpoint as a coordinator. And we also have checked the source code
110 8. Future Work
of Cassandra 2.11 , which showed if a coordinator holds the data locally, it returns
them directly without forwarding the request to other replica nodes. However, in
practice sometimes the coordinator still gets data from other replica nodes, which
could be stale. Currently, it is still an open issue for us. In the future, we would use
the node-aware policy in the latest release of Cassandra (currently is version 3.6) to
check whether it works or not. If not, we would check the code of Cassandra and
trace a query request to find out the reason.
4) Limited by the infrastructure, we could only take at most five nodes in a Cassandra
cluster, so we can only conclude that our Cloud-based prototype has potential scal-
ability. The number of nodes in a practical game database is far more than that.
Hence, we would increase the number of nodes in the future, and redo the experi-
ment. Furthermore, we also would like to evaluate the scalability of RDBMS in this
experimental environment. We hope the experimental result could prove that using
a NoSQL DBMS is more suitable in the game scenario.
1
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/
AbstractReadExecutor.java#L78 (accessed 26.11.2014)
A. Appendix
1
Database Schema of PlaneShift: https://github.com/baoboa/planeshift/blob/master/src/server/
database/planeshift db rev1256.png (accessed 20.12.2015).
112 A. Appendix
Account Data
Log Data
State Data
Game Data
114 A. Appendix
Bibliography
[Aba09] Daniel J. Abadi. Data Management in the Cloud Limitations and Op-
portunities. IEEE Data Engineering Bulletin, 32(1):3–12, 2009. (cited
on Page 2)
[AFG+ 10] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph,
Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel
Rabkin, Ion Stoica, and Matei Zaharia. A view of cloud computing.
Commun. ACM, 53(4):50–58, 2010. (cited on Page 21)
[APS08] Leigh Achterbosch, Robyn Pierce, and Gregory Simmons. Massively mul-
tiplayer online role-playing games: The past, present, and future. Theo-
retical and Practical Computer Applications in Entertainmen, 5(4):Arti-
cle No. 9, 2008. (cited on Page 1 and 7)
[APV07] Reza Akbarinia, Esther Pacitti, and Patrick Valduriez. Data Currency
in Replicated DHTs. In Proceedings of the 2007 ACM SIGMOD In-
ternational Conference on Management of data (SIGMOD 2007), pages
211–222, June 2007. (cited on Page 60 and 67)
[BBC+ 11] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin,
James Larson, Jean-Michel Léon, Yawei Li, Alexander Lloyd, and Vadim
Yushprakh. Megastore: Providing Scalable, Highly Available Storage
for Interactive Services. In Conference on Innovative Data Systems Re-
search(CIDR), pages 223–234, January 2011. (cited on Page 50)
[BFG+ 08] Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann,
and Tim Kraska. Building a database on s3. In Proceedings of the 2008
ACM SIGMOD International Conference on Management of Data (SIG-
MOD 2008), pages 251–264, June 2008. (cited on Page 26)
[Bre12] Eric Brewer. CAP Twelve Years Later: How the “Rules” have Changed.
Computer, 45(2):23–29, 2012. (cited on Page 26 and 27)
[Bur07] Brendan Burns. Darkstar: The Java Game Server. O’Reilly Media, 2007.
(cited on Page 9 and 76)
[BW09] Tim Blackman and Jim Waldo. Scalable Data Storage in Project Dark-
star. Technical report, 2009. (cited on Page 76)
[Cat10] Rick Cattell. Scalable SQL and NoSQL Data Stores. ACM Special Inter-
est Group on Management of Data (SIGMOD 2010), 39(4):12–27, 2010.
(cited on Page 2, 14, and 17)
[CBKN11] Navraj Chohan, Chris Bunch, Chandra Krintz, and Yoshihide Nomura.
Database-Agnostic Transaction Support for Cloud Infrastructures. In
IEEE International Conference on Cloud Computing (CLOUD 2011),
pages 692–699, July 2011. (cited on Page 66)
[CDG+ 08] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E
Gruber. Bigtable: A distributed storage system for structured data. ACM
Transactions on Computer Systems (TOCS 2008), 26(2):4, 2008. (cited
on Page 30)
Bibliography 117
[CHHL06] Kuan-ta Chen, Polly Huang, Chun-ying Huang, and Chin-laung Lei.
Game Traffic Analysis: An MMORPG Perspective. Computer Networks,
50(16):3002–3023, 2006. (cited on Page 15 and 81)
[Cod85a] Edgar Frank Codd. Does your DBMS Run by the Rules? Computer-
World, 19:49–64, 1985. (cited on Page 17)
[CSKW02] Sergio Caltagirone, Bryan Schlief, Matthew Keys, and Mary Jane Will-
shire. Architecture for a Massively Multiplayer Online Role Playing Game
Engine. Journal of Computing Sciences in Colleges, 18(2):105–116, 2002.
(cited on Page 9)
[CVS+ 11] Tuan Cao, Marcos Vaz Salles, Benjamin Sowell, Yao Yue, Alan Demers,
Johannes Gehrke, and Walker White. Fast Checkpoint Recovery Algo-
rithms for Frequently Consistent Applications. In Proceedings of the 2011
international conference on Management of data (SIGMOD 2011), pages
265–276, October 2011. (cited on Page 88)
[DAA10] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. G-store: A Scal-
able Data Store for Transactional Multi Key Access in the Cloud. In Pro-
ceedings of the 1st ACM symposium on Cloud computing (SoCC 2010),
pages 163–174, June 2010. (cited on Page 55 and 66)
[DBM09] DBMS2. MMO Games are Still Screwed Up in Their Database Tech-
nology. Website, June 2009. Available online at http://www.dbms2.
com/2009/06/14/mmo-rpggames-database-technology/; visited on Au-
gust 21th, 2015. (cited on Page 13)
118 Bibliography
[Dia13] Ziqiang Diao. Consistency Models for Cloud-based Online Games: the
Storage System’s Perspective. In 25th GI-Workshop on Foundations of
Databases (Grundlagen von Datenbanken), pages 16–21. CEUR-WS, May
2013. (cited on Page 57 and 60)
[DS13] Ziqiang Diao and Eike Schallehn. Towards Cloud Data Management for
MMORPGs. In The 3rd International Conference on Cloud Computing
and Services Science (CLOSER 2013), pages 303–308. SciTePress, May
2013. (cited on Page 47)
[DSWM13] Ziqiang Diao, Eike Schallehn, Shuo Wang, and Siba Mohammad. Cloud
Data Management for Online Games: Potentials and Open Issues.
Datenbank-Spektrum, 13(3):179–188, 2013. (cited on Page 5, 21, 47, 51,
57, and 70)
[DWS14] Ziqiang Diao, Shuo Wang, and Eike Schallehn. Cloud-based Persistence
Services for MMORPGs. In The 11th International Baltic Conference on
DB and IS (DB&IS2014), pages 303–314. IOS Press, June 2014. (cited
on Page 57 and 73)
[DWSS14] Ziqiang Diao, Shuo Wang, Eike Schallehn, and Gunter Saake. Cloud-
Craft: Cloud-based Data Management for MMORPGs. Databases and
Information Systems VIII, 270(9):71–84, 2014. (cited on Page xiv, 57, 73,
and 77)
[DZSM15] Ziqiang Diao, Pengfei Zhao, Eike Schallehn, and Siba Mohammad.
Achieving Consistent Storage for Scalable MMORPG Environments. In
The 19th International Database Engineering & Applications Symposium
(IDEAS 2015), pages 33–40. ACM Press, July 2015. (cited on Page 57,
60, and 73)
[FBS07] Wu-chang Feng, David Brandt, and Debanjan Saha. A long-term study of
a popular mmorpg. In Proceedings of the 6th ACM SIGCOMM Workshop
on Network and System Support for Games (NetGames 2007), pages 19–
24, September 2007. (cited on Page 1)
Bibliography 119
[FDI+ 10] Thomas Fischer, Michael Daum, Florian Irmert, Christoph Neumann,
and Richard Lenz. Exploitation of Event-semantics for Distributed Pub-
lish/Subscribe Systems in Massively Multiuser Virtual Environments. In
Proceedings of the Fourteenth International Database Engineering & Ap-
plications Symposium (IDEAS 2010), pages 90–97. ACM Press, August
2010. (cited on Page 2 and 55)
[FGC+ 97] Armando Fox, Steven D Gribble, Yatin Chawathe, Eric A Brewer, and
Paul Gauthier. Cluster-based scalable network services. In Proceedings
of the sixteenth ACM symposium on Operating systems principles, pages
78–91, October 1997. (cited on Page 26)
[FMC+ 11] Craig Franke, Samuel Morin, Artem Chebotko, John Abraham, and Pearl
Brazier. Distributed Semantic Web Data Management in HBase and
MySQL Cluster. In IEEE International Conference on Cloud Computing
(CLOUD 2011), pages 105–112. IEEE, July 2011. (cited on Page 17)
[GDG08] Nitin Gupta, Alan Demers, and Johannes Gehrke. SEMMO : A Scalable
Engine for Massively Multiplayer Online Games. In Proceedings of the
2008 ACM SIGMOD International Conference on Management of Data
(SIGMOD 2008), pages 1234–1238. ACM Press, June 2008. (cited on
Page 15 and 54)
[GL02] Seth Gilbert and Nancy Lynch. Brewer’s Conjecture and the Feasibility
of Consistent, Available, Partition-tolerant Web Services. ACM Special
Interest Group on Algorithms and Computation Theory (SIGACT 2002),
33(2):51–59, June 2002. (cited on Page iii, v, 2, and 24)
[Hew10] Eben Hewitt. Cassandra: The Definitive Guide. O’Reilly Media, 2010.
(cited on Page 31)
[HWW09] Shang Hwa Hsu, Ming-Hui Wen, and Muh-Cherng Wu. Exploring user
experiences as predictors of mmorpg addiction. Computers & Education,
53(3):990–999, 2009. (cited on Page 1)
[IGN15] IGN. Games at E3 2015. Website, July 2015. Available online at http:
//www.ign.com/wikis/e3/Games at E3 2015; visited on July 16th, 2015.
(cited on Page 1)
[IHK04] Takuji Iimura, Hiroaki Hazeyama, and Youki Kadobayashi. Zoned Fed-
eration of Game Servers : a Peer-to-peer Approach to Scalable Multi-
player Online Games. In Proceedings of 3rd ACM SIGCOMM Workshop
on Network and System Support for Games (NETGAMES 2004), pages
116–120. ACM Press, August 2004. (cited on Page 15)
[ILJ10] Alexandru Iosup, Adrian Lascateu, and Nicolae Japui. CAMEO: En-
abling Social Networks for Massively Multiplayer Online Games through
Continuous Analytics and Cloud Computing. In Proceedings of the
9th Annual Workshop on Network and Systems Support for Games
(NetGames 2010), pages 1–6. IEEE, November 2010. (cited on Page 1
and 8)
[Kav14] Michael J. Kavis. Architecting the Cloud: Design Decisions for Cloud
Computing Service Models (SaaS, PaaS, and IaaS). John Wiley & Sons,
2014. (cited on Page 22)
[KK10] Donald Kossmann and Tim Kraska. Data management in the cloud:
Promises, state-of-the-art, and open questions. Datenbank-Spektrum,
10(3):121–129, 2010. (cited on Page 22 and 47)
[KKR14] Jörn Kuhlenkamp, Markus Klems, and Oliver Röss. Benchmarking Scal-
ability and Elasticity of Distributed Database Systems. Proceedings of
the VLDB Endowment, 7(13):1219–1230, 2014. (cited on Page 30)
[KLXH04] Bjorn Knutsson, Honghui Lu, Wei Xu, and Bryan Hopkins. Peer-to-peer
support for massively multiplayer games. In Twenty-third AnnualJoint
Conference of the IEEE Computer and Communications Societies (IN-
FOCOM 2004), volume 1. IEEE Press, March 2004. (cited on Page 10)
[KVK+ 09] Jörg Kienzle, Clark Verbrugge, Bettina Kemme, Alexandre Denault, and
Michael Hawker. Mammoth A Massively Multiplayer Game Research
Framework. In Proceedings of the 4th International Conference on Foun-
dations of Digital Games (FDG 2009), pages 308–315. ACM, April 2009.
(cited on Page 2 and 54)
[Lam78] Leslie Lamport. Time, Clocks, and the Ordering of Events in a Dis-
tributed System. Communications of the ACM, 21(7):558–565, 1978.
(cited on Page 27)
[LBC12] Huaiyu Liu, Mic Bowman, and Francis Chang. Survey of State Melding
in Virtual Worlds. ACM Computing Surveys, 44(4):1–25, 2012. (cited
on Page 27)
[LLL04] Frederick W.B. Li, Lewis W.F. Li, and Rynson W.H. Lau. Supporting
Continuous Consistency in Multiplayer Online Games. In Proceedings of
the 12th Annual ACM International Conference on Multimedia (ACM
Multimedia 2004), pages 388–391. ACM Press, October 2004. (cited on
Page 14 and 55)
[LLMZ11] Justin J. Levandoski, David Lomet, Mohamed F. Mokbel, and Kevin Ke-
liang Zhao. Deuteronomy: Transaction support for cloud data. In Confer-
ence on Innovative Data Systems Research (CIDR 2011), pages 123–133,
January 2011. (cited on Page 66)
[McF13] Patrick McFadin. The Data Model is Dead, Long Live the Data
Model. Website, May 2013. Available online at http://de.slideshare.net/
122 Bibliography
patrickmcfadin/the-data-model-is-dead-long-live-the-data-model; vis-
ited on February 3rd, 2016. (cited on Page xv and 32)
[NIP+ 08] Vlad Nae, Alexandru Iosup, Stefan Podlipnig, Radu Prodan, Dick
Epema, and Thomas Fahringer. Efficient Management of Data Center
Resources for Massively Multiplayer Online Games. In Proceedings of
the ACM/IEEE Conference on High Performance Computing, Network-
ing, Storage and Analysis (SC 2008), pages 1–12. IEEE Press, November
2008. (cited on Page 10)
[PD10] Daniel Peng and Frank Dabek. Large-scale incremental processing using
distributed transactions and notifications. In 9th USENIX Symposium
on Operating Systems Design and Implementation (OSDI 2010), pages
251–264, October 2010. (cited on Page 66)
[PGH06] Wladimir Palant, Carsten Griwodz, and Pål Halvorsen. Consistency Re-
quirements in Multiplayer Online Games. In Proceedings of the 5th Work-
shop on Network and System Support for Games (NETGAMES 2006),
page Article No. 51, October 2006. (cited on Page 55)
[Roe11] Kevin Roebuck. Cloud Storage: High-impact Strategies - What You Need
to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors.
Emereo Pty Limited, 2011. (cited on Page 23)
[She13] IE Sherpa. There are 628 Million MMO Players Worldwide Accounting
for $14.9 Billion in Annual Revenue. Website, July 2013. Available online
at http://www.iesherpa.com/?p=1177; visited on July 16th, 2015. (cited
on Page 1)
[SKS+ 08] Arne Schmieg, Patric Kabus, Michael Stieler, Bettina Kemme, Sebastian
Jeckel, and Alejandro Buchmann. pSense - Maintaining a Dynamic Lo-
calized Peer-to-peer Structure for Position Based Multicast in Games. In
IEEE International Conference on Peer-to-Peer Computing (P2P 2008),
pages 247 – 256, September 2008. (cited on Page 54)
[SLBA11] Sherif Sakr, Anna Liu, Daniel M. Batista, and Mohammad Alomari. A
Survey of Large Scale Data Management Approaches in Cloud Environ-
ments. IEEE Communications Surveys and Tutorials, 13(3):311–336,
2011. (cited on Page 21 and 24)
[SMA+ 07] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Hari-
zopoulos, Nabil Hachem, and Pat Helland. The End of an Architectural
Era. In Proceedings of the 33rd International Conference on Very Large
Data Bases (VLDB 2007), pages 1150–1160, September 2007. (cited on
Page 17)
124 Bibliography
[SSM11] Mirko Suznjevic, Ivana Stupar, and Maja Matijasevic. Mmorpg player
behavior model based on player action categories. In Proceedings of
the 10th Annual Workshop on Network and Systems Support for Games
(NetGames 2011), page Article No. 6, October 2011. (cited on Page 8)
[VCO10] Hoang Tam Vo, Chun Chen, and Chin Ooi. Towards Elastic Transactional
Cloud Storage with Range Query Support. Proceedings of the VLDB
Endowment, 3(1):506–517, 2010. (cited on Page 66)
[VCS+ 09] Marcos Vaz Salles, Tuan Cao, Benjamin Sowell, Alan Demers, Johannes
Gehrke, Christoph Koch, and Walker White. An Evaluation of Check-
point Recovery for Massively Multiplayer Online Games. Proceedings of
the VLDB Endowment, 2(1):1258–1269, 2009. (cited on Page 88)
[Wan13] Shuo Wang. Towards Cloud Data Management for Online Games - A
Prototype Platform. master thesis, University of Magdeburg, Germany,
September 2013. (cited on Page 74)
[WFZ+ 11] Hiroshi Wada, Alan Fekete, Liang Zhao, Kevin Lee, and Anna Liu. Data
Consistency Properties and the Trade-offs in Commercial Cloud Storages:
Bibliography 125
[WKG+ 07] Walker White, Christoph Koch, Nitin Gupta, Johannes Gehrke, and Alan
Demers. Database Research Opportunities in Computer Games. ACM
SIGMOD Record, 36(3):7–13, 2007. (cited on Page xiii, 2, 9, 10, 11, 12, 15,
and 54)
[WPC09] Zhou Wei, Guillaume Pierre, and Chi-Hung Chi. Scalable Transactions
for Web Applications in the Cloud. In 15th International Euro-Par Con-
ference (Euro-Par 2009), pages 442–453, August 2009. (cited on Page 60
and 66)
[YK13] Amir Yahyavi and Bettina Kemme. Peer-to-Peer Architectures for Mas-
sively Multiplayer Online Games: A Survey. ACM Computing Surveys
(CSUR), 46(1):Article No. 9, 2013. (cited on Page 54)
[ZK11] Kaiwen Zhang and Bettina Kemme. Transaction Models for Massively
Multiplayer Online Games. In Proceedings of the 2011 IEEE 30th Inter-
national Symposium on Reliable Distributed Systems (SRDS 2011), pages
31–40, October 2011. (cited on Page 55)