OpenStack Trove Essentials - Sample Chapter
OpenStack Trove Essentials - Sample Chapter
pl
C o m m u n i t y
Use Image builder to create guest images
for Trove
Sunil Sarat
$ 34.99 US
22.99 UK
Alok Shrivastwa
P U B L I S H I N G
Sa
m
OpenStack Trove
Essentials
ee
E x p e r i e n c e
OpenStack Trove
Essentials
Build your own cloud based Database as a Service using
OpenStack Trove
D i s t i l l e d
Alok Shrivastwa
Sunil Sarat
of cloud services for Microland Ltd. in their Center of Innovation. He has a keen
interest in all things physical and metaphysical and is an innovator at heart. He
has worked with multiple large- and medium-sized enterprises, designing and
implementing their network security solutions, automation solutions, databases,
VoIP environments, datacenter designs, public and private clouds, and integrations.
He has also created several tools and intellectual properties in the field of
operationalization of emerging technologies. He has authored a book, Learning
OpenStack, with Packt Publishing, and has authored several whitepapers and blogs
on technology and metaphysical topics, in addition to writing poems in Hindi. Also,
he has been a guest speaker for undergraduate engineering students in Chennai.
You can connect with him at https://in.linkedin.com/in/alokas or follow
him on Twitter at @alok_as.
Sunil Sarat is the vice president of cloud and mobility services at Microland Ltd.,
an India-based global hybrid IT infrastructure services provider.
He played a key role in setting up and running the emerging technologies practice,
dealing with areas such as public/private cloud (AWS and Azure, VMware vCloud
Suite, Microsoft, and OpenStack), hybrid IT (VMware vRealize automation/
orchestration, Chef, and Puppet), enterprise mobility (Citrix Xenmobile and VMware
Airwatch), VDI /app virtualization (VMware Horizon Suite, Citrix XenDesktop/
XenApp, Microsoft RDS, and AppV), and associated transformation services.
He is a technologist and a business leader with expertise in creating new practices
and service portfolios, building and managing high-performance teams, strategy
definition, technological roadmaps, and 24/7 global remote infrastructure
operations. He has varied experience in handling diverse functions such as
innovation/technology, service delivery, transition, presales/solutions,
and automation.
He has authored whitepapers, blogs, and articles on various technologies and
service-related areas, is a speaker at cloud-related events, and reviews technical
books. He has authored Learning OpenStack and reviewed Learning AirWatch and
Mastering VMware Horizon 6, all by Packt Publishing.
He holds various industry certifications in the areas of compute, storage, and
security and holds an MBA in marketing.
Besides technology and business, he is passionate about filmmaking and is a
part-time filmmaker as well.
For more information, you can visit his LinkedIn profile at https://www.linkedin.
com/in/sunilsarat or follow him on Twitter at @sunilsarat.
Preface
Database management has come a long way over the last decade or so. The process
of provisioning databases used to start with racking and stacking a physical
server, installing and configuring an operating system, and finally, installing and
configuring a database management system. This entire process took weeks and,
in some cases, months. Once the database is provisioned, you then of course have
a whole host of things to be managed, including availability, backups, security,
and performance. This provisioning and management consumed a lot of time and
resources. During the evolution, we had two trends that have had a significant
impact on the way databases were provisioned and managed. Automation eased
the management aspect and virtualization eased the provisioning, at least up to
the operating-system layer. Meanwhile, the other trend that we have seen is that
enterprises are moving away from a single database technology model to a model
which is fancily termed "polyglot persistence". This basically means adopting
multiple database technologies with the intention of storing the data in a database
that is best suited for that type of data. With multiple types of database technologies
coming into play, enterprises are finding it difficult to manage this complexity while
maintaining corporate standards and compliance.
Preface
Fortunately for us, over the last couple of years, cloud is the other trend that came to
our rescue. With the advent of cloud, we have initially seen self-service based agile
provisioning of infrastructure take off, which has been termed as Infrastructure as
a Service and has automated a lot of aspects and made infrastructure management
easier. Building on this a bit more, we now have self-service based agile provisioning
of multiple types of databases, which is popularly known as Database as a Service
(DBaaS). This has made things much easier for enterprises in terms of bringing in
efficiencies and enforcing corporate standards and compliance. Enterprises can
avail DBaaS from a public cloud such as Amazon Web Services or Microsoft Azure.
Alternatively, they can build their own private cloud-based DBaaS and the need
for this could be owing to various reasons such as data privacy and security. This
is where OpenStack and Trove comes into the picture. OpenStack Trove is an open
source implementation of DBaaS. While it has been in existence for a couple of
years, it has started gaining momentum only recently with enterprises giving
it a serious thought.
The benefits of DBaaS in general and OpenStack Trove in particular are obvious.
The key challenge, however, is that beyond the documentation that is available from
the OpenStack project itself, there is not much reading material out there to help
potential DBAs and system/cloud administrators. This lack of skill and know-how
is one of the potential inhibitors to OpenStack Trove adoption.
This book is an attempt to provide all the essential information that is necessary
to kick-start your learning of OpenStack Trove and set up your own cloud-based
DBaaS. In this book, the readers will be introduced to all major components of
OpenStack Trove. Following this, the readers will get to understand how to set
up Trove in both development and production environments, configuring it, and
performing management activities such as backup and restore. Not to mention,
it also deals with certain advanced database features, such as replication and
clustering. This book takes a more practical approach to learning, as the learning
from each chapter will contribute to the reader's ability to build his/her own private
cloud-based DBaaS by the time he/she completes reading this book. We hope you
will enjoy reading this book and, more importantly, find it useful in your journey
towards learning and implementing DBaaS using OpenStack Trove.
Preface
Chapter 2, Setting up Trove with DevStack in a Box, provides a list of prerequisites for
the book. This chapter also helps you understand DevStack and its components and
then helps you set up Trove with DevStack.
Chapter 3, Installing Trove in an Existing OpenStack Environment, gives you an
overview of the different available methods to deploy Trove. It deals a little
bit more in detail with installing Trove from source and the Ubuntu repository.
Chapter 4, Preparing the Guest Images, as the name implies, details how to build
production-ready images that will be required by Trove.
Chapter 5, Provisioning Database Instances, looks at creating and launching instances
using both CLI and GUI.
Chapter 6, Configuring the Trove Instances, introduces you to configuring Trove
instances and also how to make configuration changes to multiple Trove instances
using configuration groups.
Chapter 7, Database Backup and Restore, introduces the concept of Strategies and
provides an overview of how to back up and restore Trove instances.
Chapter 8, Advanced Database Features, deals with advanced features such as
replication and clustering in Trove.
Database Backup
and Restore
Data being critical in every enterprise IT, it needs to be protected. This protection is
done at various levels, by creating a cluster/replica to ensure more than one hot/
warm copy of the data exists.
In order to have a cold copy of the data for disaster recovery, a database backup
is normally taken. Database backups and restore are possibly one of the most
important operational tasks of a DBA. Trove helps automate the entire process,
from backing it up, encrypting data at rest, and also restoring the backup. Trove
also supports incremental backups of your databases and supports creating a new
instance from an existing backup.
In this chapter, we will cover the following topics:
Configuration aspects
[ 115 ]
Based on the preceding parameters, we will need to derive the following key points:
This plan is applicable to any form of backup and this will help us with scripting and
automating the backup tasks.
Multiple copies of the database independent of each other for purposes like
auditing, running reports based on old data, and so on
The Trove system internally uses backup and restore strategies to seed the
replication data (discussed in the next chapter). Let's now dive in and see how
the backup/restore methodology works in Trove.
Chapter 7
This is a fully pluggable architecture, and what this actually means is that different
technologies and different codes can be used to perform the same functions across
different database engines.
The concept of strategies is used for backups, restores, replication, clustering, and
storage (this determines where the backups are stored along with its associated
properties). These are implemented in the guest agent code (can also be implemented
for the API and task manager components), which also makes the code run closest to
the place where the action has to happen.
So, effectively, each strategy needs to implement a list of functions at a minimum
(these can be seen in the base.py file for that particular strategy), which the system
can then use to call and perform the functions.
For example, each backup strategy needs to provide a command that needs to be
executed in order to take the backup, and each storage strategy needs to implement
a save function, which will allow us to save to that particular storage system.
The following diagram shows the concept of strategies. It also shows that the control
components use an abstracted term and send the message using the message bus, say
create_backup, and the guest agent looks at the default or configured strategy for
that particular database engine and executes those commands.
Guest Instance
Guest Instance
Mongo DB
MySQL DB
Guest Agent
create_backup
Guest Agent
create_backup
[ 117 ]
The concept is valid for everything that supports the strategies. Please note that
not all the control components are shown in this case and the diagram is for
representation purposes only.
3. The Guest Agent pulls the message and checks the backup and storage
strategy (configured/default) for the particular data store version.
4. The backup commands are executed by the guest agent. (It gets the command
by the strategy definition.) For example, if the MySQLDump strategy is
used, then the command executed is mysqldump --all-databases user
<username> --password, along with the command to zip and encrypt
the backup (these are all defined in the strategy files (as shown in the
next section)).
5. The Guest Agent stores the backup as stored in the storage strategy.
[ 118 ]
Chapter 7
backup_namespace: The file to load the code for the strategies from
These configuration options are set in the trove-guestagent.conf file, which will
inject them to the guest during build time.
We don't have to configure anything additional in the guest agent configuration; this
section is purely informational.
In order to understand the different strategies available to us and the corresponding
namespaces, let us take a look at the following table, which shows the different
backup strategies that are available in Trove at the time of writing the book:
Data store name /
Backup type
Strategy name
MySQL / Full
MySQLDump
MySQL / Full
MySQL / Incremental
Couchbase / Full
Mongo DB / Full
PostgreSQL / Full
Redis / Full
Strategy namespace
trove.guestagent.strategies.
backup.mysql_impl
trove.guestagent.strategies.
InnoBackupEX
backup.mysql_impl
InnoBackupExIncremental trove.guestagent.strategies.
backup.mysql_impl
trove.guestagent.strategies.
CbBackup
backup.experimental.
couchbase_impl
trove.guestagent.strategies.
MongoDump
backup.experimental.mongo_
impl
trove.guestagent.strategies.
PgDump
backup.experimental.
postgresql_imp
trove.guestagent.strategies.
RedisBackup
backup.expreimental.redis_
impl
[ 119 ]
As we can see, at this point in time, only MySQL (and its variants like MariaDB) have
the ability to perform the incremental backup and offer two strategies for full backup
(if we choose not to use InnoDB, we could just use MySQLDump). Also, not all the
different data stores support full backup at this moment.
This means that we can also implement a simple backup strategy of our choice, if we
so choose, by writing a different Python class. However, in most cases, we don't have
to as the ones provided by default with Trove are sufficient.
There are plans to add support for other storage strategies like AWS S3 and so on.
But since this is the only strategy available to us at the moment, let us take a moment
to also look at its sub-configuration parameters. The bucket, where the backups need
to be stored, whether the backup needs to be encrypted, if it needs to be encrypted,
what key needs to be used, and so on. All of these are configured using the following
configuration variables:
(default is true)
Most times, the default would work fine. But these options can be configured should
we need to tweak their values.
[ 120 ]
Chapter 7
Backup prerequisites
The requisites for backup are fairly simple:
The Swift system is configured and accessible. (In DevStack, please check if
the Swift services are enabled.)
In our case, we don't have to worry about the last point as we will leave the entire
configuration to the default values. Also, we have the second and third point taken
care of. Since we are using MySQL as the database, the first point also has been
satisfied. (Please note that while creating the DIB image, we installed the InnoDB
tool by using the ubuntu-mysql element provided. So, we are good to go.)
Full backup
The Trove command line with the backup-create option helps us create a full
backup of the database.
Please note that backup/restore may turn off (or pause) the database service for a
brief moment to ensure that data is not corrupted. So, caution should be exercised
while taking backups or performing restores of production databases.
The command format is trove backup-create <instance-id> <backup-name>.
Please remember that we can check the instance ID using the trove list command.
[ 121 ]
The details of the backup are shown on the screen. The backup command also backs
up the metadata, which is especially useful while restoring or creating another
database from the current backup.
Incremental backup
Incremental backup, as we know, is only supported for MySQL at this time, and we
can perform an incremental backup of the database that we just backed up.
For the incremental backup to work, we will obviously need a parent or a full
backup to anchor the incremental backup onto, so we will need the backup-id
of a full backup to be used as a parent.
We will use the last backup we just took, whose id was e3737982-2220-42a3-8e63-
The command format to take the incremental backup is similar to that of the full
backup. The only exception is that a --parent parameter is being passed, so it will
be trove backup-create <instance-id> <backup name> --parent <parent
backup-id>.
So, we will execute the following command:
trove backup-create 723c048e-bd5b-4e1a-84cd-836be970d7db incremental-bkp
--parent e3737982-2220-42a3-8e63-52d61f73f523
[ 122 ]
Chapter 7
The output shows that incremental-bkp has myfirstbackup as its parent. The backup
is also stored in the Swift storage, so let us take a look at the Swift containers,
by using the command swift list.
As we can see, only the database_backups container is created. Please note that this
is a default name for the Swift container and can be overridden by the configuration
variables as shown in the previous sections.
[ 123 ]
If you get a user warning that states UserWarning: Providing attr without
filter_value to get_urls() is deprecated as of the 1.7.0 release while executing the
Swift command, you will need to set an additional environmental variable called
OS_REGION_NAME.
Please set this to the default region name of your system. You can view this
by executing the command keystone endpoint-list and then looking
under the region. For us, it was RegionOne, so we export the variable with
the following command:
export OS_REGION_NAME=RegionOne
We will then look into the container itself, by typing the command:
swift list database_backups
Restoring backups
In Trove, the restoration of the database is not done directly, but by creating a
new instance and loading the data onto it. We can restore from a full backup or an
incremental backup. If we choose to restore from an incremental backup, the entire
chain (up to the last parent full backup) is restored onto the system.
So, in order to create a new instance from a backup, we simply use the
trove create command passingbackup parameter, trove create <name>
<flavor-id> --size <volume size> --backup <backup-id>.
In our case, we will use incremental backup to ensure the full chain restore happens.
trove create copyoftest 2 --size 1 \
--backup e8ba6800-7ff0-40c8-9dd3-e396b84dd4f1 \
--datastore mysql --datastore_version 5.6
[ 124 ]
Chapter 7
The new instance starts building and once it gets to the active state, we can verify
that the same databases were found.
Deleting backups
The backup can be deleted by the trove backup-delete command by passing
backup-id as the argument to the command. It is to be noted that we should only
delete the full backups after all the incremental backups dependent on them are
deleted. If the parent is deleted, then Trove automatically deletes the dependent
backups as well.
So, here we delete the parent backup:
trove backup-delete e3737982-2220-42a3-8e63-52d61f73f523
When we execute a subsequent backup-list, we see that both the backups were
deleted, as the incremental backup was dependent on the parent.
[ 125 ]
Summary
In this chapter, we dealt with backups and restores of the database. We have learned
about the ways backup is implemented in Trove. In the next and final chapter,
we will look at more advanced features such as replication and clustering.
[ 126 ]
www.PacktPub.com
Stay Connected: