Cloud Computing New Unit 2
Cloud Computing New Unit 2
Web application design, Machine image design, Privacy design, Data management.
---------------------------------------------------------------------------------------------------------------------
We have some kind of (most often scripting) language that generates content from a combination
of templates and data pulled from a model backed by a database. The system updates the model
through actions that execute transactions against the model.
A common Java approach that works well in a single-server environment but fails in a
multiserver context might use the following code:
public void book(Customer customer, Room room, Date[] days)
throws BookingException {
synchronized( room ) { // synchronized "locks" the room object
if( !room.isAvailable(days) ) {
throw new BookingException("Room unavailable.");
}
room.book(customer, days);
}
}
Because the code uses the Java locking keyword synchronized, no other threads in the current
process can make changes to the room object.If you are on a single server, this code will work
under any load supported by the server. Unfortunately, it will fail miserably in a multiserver
context.
If you had two clients making two separate booking requests against the same server, Java would
allow only one of them to execute the synchronized block at a time. As a result, you would not
end up with a double booking.
On the other hand, if you had each customer making a request against different servers (or
even distinct processes on the same server), the synchronized blocks on each server could
execute concurrently. As a result, the first customer to reach the room.book() call would lose
his reservation because it would be overwritten by the second. Figure 4-2 illustrates the
doublebooking problem.
A deadlock occurs between two transactions when each transaction is waiting on the other to
release a lock. Our reservations system example is an application in which a deadlock is certainly
possible. Because we are booking a range of dates in the same transaction, poorly structured
application logic could cause two clients to wait on each other as one attempts to book a date
already booked by the other, and vice versa.
Regardless of what approach we use, however, the important key for the cloud is simply making
sure that we are not relying on memory locking to maintain your application state integrity.
When Servers Fail
The ultimate architectural objective for the cloud is to set up a running environment where
the failure of any given application server ultimately doesnt matter.To get around the problems
described in the previous section is data segmentationalso known as sharding. Figure 4-3
shows how you might use data segmentation to split processing across multiple application
servers. In other words, each application server manages a subset of data. As a result, there is
never any risk that another server will overwrite the data.
--------------------------------------------------------------------------------------------
Open Source SECurity- OSSEC is an Open Source Host-based Intrusion Detection System that
performs log analysis, file integrity checking, policy monitoring, rootkit detection, real-time
alerting and active response.
The services you want to run on an instance generally dictate the operating system on which
you will base the machine image. If you are deploying a .NET application, you probably will
use one of the Amazon Windows images. A PHP application, on the other hand, probably will be
targeting a Linux environment.
Hardening an operating system is the act of minimizing attack vectors into a server. Among
other things, hardening involves the following activities:
Removing unnecessary services.
Removing unnecessary accounts.
Running all services as a role account (not root) when possible.
Running all services in a restricted jail when possible.
Verifying proper permissions for necessary system services.
The best way to harden your Linux system is to use a proven hardening tool such as Bastille.
Once you have the deployment structured the right way, you will need to test it. That means
testing the system from launch through shutdown and recovery. Therefore, you need to take
the following steps:
1. Build a temporary image from your development instance.
2. Launch a new instance from the temporary image.
3. Verify that it functions as intended.
4. Fix any issues.
5. Repeat until the process is robust and reliable.
Privacy Design
It is important to consider how you approach an application architecture for systems that have a
special segment of private data, notably e-commerce systems that store credit cards and health
care systems with health data.
Privacy in the Cloud
The key to privacy in the cloudor any other environmentis the strict separation of sensitive
data from nonsensitive data followed by the encryption of sensitive elements. The simplest
example is storing credit cards.
Figure 4-5 provides an application architecture in which credit card data can be securely
managed.
Its a pretty simple design that is very hard to compromise as long as you take the following
precautions:
The application server and credit card server sit in two different security zones with only
web services traffic from the application server being allowed into the credit card processor
zone.
Credit card numbers are encrypted using a customer-specific encryption key.
The credit card processor has no access to the encryption key, except for a short period of
time (in memory) while it is processing a transaction on that card.
The application server never has the ability to read the credit card number from the credit
card server.
No person has administrative access to both servers.
A couple rules of thumb:
Make sure the two servers have different attack vectors.
Make sure that neither server contains credentials or other information that will make it
possible to compromise the other server.
With that data stored in the e-commerce system database, the system then submits the credit
card number, credit card password, and unique credit card ID from the e-commerce system to
the credit card processor.
The credit card processor does not store the password. Instead, it uses the password as salt to
encrypt the credit card number, stores the encrypted credit card number, and associates it with
the credit card ID. Figure 4-7 shows the credit card processor data model.
Processing a credit card transaction
When it comes time to charge the credit card, the e-commerce service submits a request to the
credit card processor to charge the card for a specific amount. The e-commerce system refers
to the credit card on the credit card processor using the unique ID that was created when the
credit card was first inserted. It passes over the credit card password, the security code, and
the amount to be charged. The credit card processor then decrypts the credit card number for the
specified credit card using the specified password. The unencrypted credit card number,
security code, and amount are then passed to the bank to complete the transaction.
Because the objective of a privacy server is simply to physically segment out private data, you
do not necessarily need to encrypt everything on the privacy server. Figure 4-8 illustrates how
the e-commerce system might evolve into a privacy architecture designed to store all private
data outside of the cloud.Under no circumstances does the main web application have any access
to personally identifying information, unless that data is aggregated before being presented to the
web application.
Database Management
The trickiest part of managing a cloud infrastructure is the management of your persistent
data. Persistent data is essentially any data that needs to survive the destruction of your cloud
environment. Because you can easily reconstruct your operating system, software, and simple
configuration files, they do not qualify as persistent data. Only the data that cannot be
reconstituted qualify.
Database server in the cloud will be much less reliable than your database server in a physical
infrastructure. The virtual server running your database will fail completely and without
warning.Whether physical or virtual, when a database server fails, there is the distinct possibility
that the files that comprise the database state will get corrupted.
It is much easier to recover from the failure of a server in a virtualized environment than in
the physical world: simply launch a new instance from your database machine image, mount
the old block storage device, and you are up and running.
Clustering or Replication?
The most effective mechanism for avoiding corruption is leveraging the capabilities of a
database engine that supports true clustering. In a clustered database environment, multiple
database servers act together as a single logical database server. The mechanics of this process
vary from database engine to database engine, but the result is that a transaction committed
to the cluster will survive the failure of any one node and maintain full data consistency.
Database clustering is very complicated and generally quite expensive:
Unless you have a skilled DBA on hand, you should not even consider undertaking the
deployment of a clustered database environment.
A clustered database vendor often requires you to pay for the most expensive licenses to
use the clustering capabilities in the database management system (DBMS).
Clustering comes with significant performance problems. you will pay a hefty network latency
penalty.
The alternative to clustering is replication. A replication-based database infrastructure
generally has a main server, referred to as the database master. Client applications execute
write transactions against the database master. Successful transactions are then replicated to
database slaves.
Replication has two key advantages over clustering:
It is generally much simpler to implement.
It does not require an excessive number of servers or expensive licenses.
Using database clustering in the cloud
A few guidelines:
A few cluster architectures exist purely for performance and not for availability.
Clusters designed for high availability are often slower at processing individual write
transactions, but they can handle much higher loads than standalone databases.
Some solutionssuch as MySQLmay require a large number of servers to operate
effectively.
The dynamic nature of IP address assignment within a cloud environment may add new
challenges in terms of configuring clusters and their failover rules.
Using database replication in the cloud
For most non mission-critical database applications, replication is a good enough
solution that can save you a lot of money and potentially provide you with opportunities
for performance optimization.
A MySQL replication system in the cloud can provide you with a flawless backup and
disaster recovery system.
Figure 4-9 shows a simple replication environment.
In this structure, you have a single database server of record (the master) replicating to one or
more copies (the slaves).The database on the slave should always be in an internally consistent
state (uncorrupted).
Under a simple setup, your web applications point to the database master. Consequently, your
database slave can fail without impacting the web application. To recover, start up a new
database slave and point it to the master. Recovering from the failure of a database master is
much more complicated. A database can recover using a slave in one of two ways:
Promotion of a slave to database master (you will need to launch a replacement slave)
Building a new database master and exporting the current state from a slave to a new master.
Replication for performance
Another reason to leverage replication is performance. Without segmenting your data, most
database engines allow you to write against only the master, but you can read from the master
or any of the slaves.
Figure 4-10 illustrates the design of an application using replication for performance benefits.
The rewards of using replication for performance are huge, but there are also risks. The primary
risk is that we might accidentally execute a write operation against one of the slaves. When
we do that, replication falls apart and our master and slaves end up in inconsistent states.
Two approaches to solving this problem include:
Clearly separating read logic from write logic in our code and centralizing the acquisition
of database connections.
Making our slave nodes read-only.
Primary Key Management
With a web application operating behind a load balancer in which individual nodes within
the web application do not share state information with each other, the problem of crossdatabase
primary key generation becomes a challenge.
How to generate globally unique primary keys
We could use standard UUIDs to serve as your primary key mechanism. They have the
benefit of an almost nonexistent chance of generating conflicts, and most programming
languages have built-in functions for generating them.
A universally unique identifier (UUID) is an identifier standard used in software construction,
standardized by the Open Software Foundation (OSF) as part of the Distributed Computing
Environment (DCE).
Dont use them, however, for three reasons:
They are 128-bit values and thus take more space and have longer lookup times.
Cleanly representing a 128-bit value in Java and some other programming languages is
painful.
The possibility of collisions, although not realistic, does exist.
In order to generate identifiers at the application server level that are guaranteed to be unique
in the target database, traditionally I rely on the database to manage key generation. I accomplish
this through the creation of a sequencer table.
The technique for unique key generation just described generates (more or less) sequential
identifiers.In some cases, it is important to remove reasonable predictability from identifier
generation. You therefore need to introduce some level of randomness into the equation.
Support for globally unique random keys
To get a random identifier, you need to multiply your next_key value by some power of 10 and
then add a random number generated through the random number generator of your language
of choice.
rnd = random.randint(0,99999);
nextId = nextId + rnd;
return nextId;
following challenges:
The generation of unique keys will take longer.
Your application will take up more memory.
The randomness of your ID generation is reduced.
Database Backups
Types of database backups
Most database engines provide multiple mechanisms for executing database backups. The
rationale behind having different backup strategies is to provide a trade-off between the impact
that executing a backup has on the production environment and the integrity of the data in
the backup. Typically, our database engine will offer at least these backup options (in order
of reliability):
Database export/dump backup
Filesystem backup
Transaction log backup
The most solid backup we can execute is the database export/dump. When we perform a
database export, we dump the entire schema of the database and all of its data to one or more
export files. We can then store the export files as the backup. During recovery, we can
leverage the export files to restore into a pristine install of your database engine.
The downside of the database export is that our database server must be locked against writes in
order to get a complete export that is guaranteed to be in an internally consistent state.
Unfortunately, the export of a large database takes a long time to execute.
Most databases provide the option to export parts of the database individually. If the table has
any dependencies on other tables in the system, however, we can end up with inconsistent data
when exporting on a table-by-table basis. Partial exports are therefore most useful on data from a
data warehouse.
Filesystem backups involve backing up all of the underlying files that support the database.
For some database engines, the database is stored in one big file. For others, the tables and their
schemas are stored across multiple files. Either way, a backup simply requires copying the
database files to backup media.Though a filesystem backup requires you to lock the database
against updates, the lock time is typically shorter.
The least disruptive kind of backup is the transaction log backup. As a database commits
transactions, it writes those transactions to a transaction logfile. Because the transaction log
contains only committed transactions, we can back up these transaction logfiles without
locking the database or stopping. They are also smaller files and thus back up quickly. Using
this strategy, we will create a full database backup on a nightly or weekly basis and then back
up the transaction logs on a more regular basis.
Restoring from transaction logs involves restoring from the most recent full database backup
and then applying the transaction logs. This approach is a more complex backup scheme than
the other two because you have a number of files created at different times that must be
managed together. Furthermore, restoring from transaction logs is the longest of the three
restore options.
Applying a backup strategy for the cloud
The best backup strategy for the cloud is a file-based backup solution. We lock the database
against writes, take a snapshot, and unlock it. It is elegant, quick, and reliable. Snapshots work
beautifully within a single cloud, but they cannot be leveraged outside our cloud provider. To
make sure our application is portable between clouds, we need to execute full database exports
regularly.
Approach is to regularly execute full database exports against a MySQL slave, as shown in
Figure 4-11.
For the purposes of a backup, it does not matter if your database slave is a little bit behind the
master. What matters is that the slave represents the consistent state of the entire database at
a relatively reasonable point in time. We can therefore execute a very long backup against the
slave and not worry about the impact on the performance of your production environment.