0% found this document useful (0 votes)
24 views5 pages

Mod 2 Continue Edited

The document discusses version stamps which are used in NoSQL databases to help ensure consistency when updates occur without transactions. It describes how different types of version stamps like counters, timestamps, and GUIDs work and compares their advantages. It also covers how version stamps are handled in distributed systems with multiple nodes rather than a single server.

Uploaded by

sirisatishakadur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

Mod 2 Continue Edited

The document discusses version stamps which are used in NoSQL databases to help ensure consistency when updates occur without transactions. It describes how different types of version stamps like counters, timestamps, and GUIDs work and compares their advantages. It also covers how version stamps are handled in distributed systems with multiple nodes rather than a single server.

Uploaded by

sirisatishakadur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NOSQL Data Base

Chapter 3.
Version Stamps

Many critics of NoSQL databases focus on the lack of support for transactions. Transactions are a
useful tool that helps programmers support consistency. One reason why many NoSQL proponents
worry less about a lack of transactions is that aggregate-oriented NoSQL databases do support
atomicupdates within an aggregate—and aggregates are designed so that their data forms a natural
unit of update. That said, it’s true that transactional needs are something to take into account when
you decide what database to use.
As part of this, it’s important to remember that transactions have limitations. Even within a
transactional system we still have to deal with updates that require human intervention and usually
cannot be run within transactions because they would involve holding a transaction open for too
long. We can cope with these using version stamps—which turn out to be handy in other situations
as well,particularly as we move away from the single-server distribution model.
3.1 Business and System Transactions

 The need to support update consistency without transactions is actually a common feature of
systems even when they are built on top of transactional databases.
 When users think about transactions, they usually mean business transactions. A business
transaction may be something like browsing a product catalog, choosing a bottle of Talisker at a
good price, filling in credit card information, and confirming the order.
 Yet all of this usually won’t occur within the system transaction provided by the database
because this would mean locking the database elements while the user is trying to find their credit
card and gets called off to lunch by their colleagues.
 Usually applications only begin a system transaction at the end of the interaction with the user, so
that the locks are only held for a short period of time.
 The problem, however, is that calculations and decisions may have been made based on data
that’s changed. The price list may have updated the price of the Talisker, or someone may have
updated the customer’s address, changing the shipping charges.
 The broad techniques for handling this are offline concurrency [Fowler PoEAA], useful in NoSQL
situations too.
 A particularly useful approach is the Optimistic Offline Lock [Fowler PoEAA], a form of
conditional update where a client operation rereads any information that the business transaction

Page 1
NOSQL Data Base

relies on and checks that it hasn’t changed since it was originally read and displayed tothe user.
 A good way of doing this is to ensure that records in the database contain some form of version
stamp: a field that changes every time the underlying data in the record changes. When you read
the data you keep a note of the version stamp, so that when you write data you can check tosee if
the version has changed.
 You may have come across this technique with updating resources with HTTP [HTTP]. One way
of doing this is to use etags.
 Whenever you get a resource, the server responds with an etag in the header. This etag is an
opaque string that indicates the version of the resource.
 If you then update that resource, you can use a conditional update by supplying the etag that you
got from your last GET. If the resource has changed on the server, the etags won’t match and the
server will refuse the update, returning a 412 (Precondition Failed) response.
 Some databases provide a similar mechanism of conditional update that allows you to ensure
updates won’t be based on stale data.
 You can do this check yourself, although you then have to ensure no other thread can run against
the resource between your read and your update. (Sometimes this is called a compare-and-set
(CAS) operation, whose name comes from the CAS operations done in processors. The
difference is that a processor CAS compares a value before setting it, while a database
conditional update compares a version stamp of the value.)
 There are various ways you can construct your version stamps. You can use a counter, always
incrementing it when you update the resource.
 Counters are useful since they make it easy to tell if one version is more recent than another. On
the other hand, they require the server to generate the counter value, and also need a single master
to ensure the counters aren’t duplicated.
 Another approach is to create a GUID, a large random number that’s guaranteed to be unique.
These use some combination of dates, hardware information, and whatever other sources of
randomness they can pick up.
 The nice thing about GUIDs is that they can be generated by anyone and you’ll never get a
duplicate; a disadvantage is that they are large and can’t be compared directly for recentness.
 A third approach is to make a hash of the contents of the resource. With a big enough hash key
size, a content hash can be globally unique like a GUID and can also be generated by anyone; the
advantage is that they are deterministic—any node will generate the same content hash for same

Page 2
NOSQL Data Base

resource data.
 However, like GUIDs they can’t be directly compared for recentness, and they can be lengthy.
 A fourth approach is to use the timestamp of the last update. Like counters, they are reasonably
short and can be directly compared for recentness, yet have the advantage of not needing a single
master.
 Multiple machines can generate timestamps—but to work properly, their clocks have to be kept in
sync. One node with a bad clock can cause all sorts of data corruptions.
 There’s also a danger that if the timestamp is too granular you can get duplicates—it’s no good
using timestamps of a millisecond precision if you get many updates per millisecond.
 You can blend the advantages of these different version stamp schemes by using more than one
of them to create a composite stamp.
 For example, CouchDB uses a combination of counter and content hash. Most of the time this
allows version stamps to be compared for recentness, even when you use peer- to-peer
replication.
 Should two peers update at the same time, the combination of the same count and different content
hashes makes it easy to spot the conflict.
 As well as helping to avoid update conflicts, version stamps are also useful for providing session
consistency.

3.2 Version Stamps on Multiple Nodes

 The basic version stamp works well when you have a single authoritative source for data, such as
a single server or master-slave replication.
 In that case the version stamp is controlled by the master. Any slaves follow the master’s stamps.
But this system has to be enhanced in a peer-to-peer distribution model because there’s no longer
a single place to set the version stamps.
 If you’re asking two nodes for some data, you run into the chance that they may give you
different answers. If this happens, your reaction may vary depending on the cause of that
difference.
 It may be that an update has only reached one node but not the other, in which case you can
accept the latest (assuming you can tell which one that is).
 Alternatively, you may have run into an inconsistent update, in which case you need to decide
how to deal with that.

Page 3
NOSQL Data Base

 In this situation, a simple GUID or etag won’t suffice, since these don’t tell you enough about the
relationships.
 The simplest form of version stamp is a counter. Each time a node updates the data, it increments
the counter and puts the value of the counter into the version stamp.
 If you have blue and green slave replicas of a single master, and the blue node answers with a
version stamp of 4 and the green node with 6, you know that the green’s answer is more recent.
 In multiple-master cases, we need something fancier. One approach, used by distributed version
control systems, is to ensure that all nodes contain a history of version stamps.
 That way you can see if the blue node’s answer is an ancestor of the green’s answer. This would
either require the clients to hold onto version stamp histories, or the server nodes to keep version
stamp histories and include them when asked for data.
 This also detects an inconsistency, which we would see if we get two version stamps and neither
of them has the other in their histories.
 Although version control systems keep these kinds of histories, they aren’t found in NoSQL
databases.
 A simple but problematic approach is to use timestamps. The main problem here is that it’s usually
difficult to ensure that all the nodes have a consistent notion of time, particularly if updates can
happen rapidly. Should a node’s clock get out of sync, it can cause all sorts of trouble.
 In addition, you can’t detect write-write conflicts with timestamps, so it would only work well for
the single-master case—and then a counter is usually better.
 The most common approach used by peer-to-peer NoSQL systems is a special form of version
stamp which we call a vector stamp.
 In essence, a vector stamp is a set of counters, one for each node. A vector stamp for three nodes
(blue, green, black) would look something like [blue: 43, green: 54, black: 12].
 Each time a node has an internal update, it updates its own counter, so an update in the green
node would change the vector to [blue: 43, green: 55, black: 12].
 Whenever two nodes communicate, they synchronize their vector stamps. There are several
variations of exactly how this synchronization is done.
 We’re coining the term “vector stamp” as a general term in this book; you’ll also come across
vector clocks and version vectors—these are specific forms of vector stamps that differ in how
they synchronize.

Page 4
NOSQL Data Base

 By using this scheme you can tell if one version stamp is newer than another because the newer
stamp will have all its counters greater than or equal to those in the older stamp. So [blue: 1, green:
2, black: 5] is newer than [blue:1, green: 1, black 5] since one of its counters is greater. If both
stamps have a counter greater than the other, e.g. [blue: 1, green: 2, black: 5] and [blue: 2, green:
1,black: 5], then you have a write-write conflict.
 There may be missing values in the vector, in which case we use treat the missing value as 0.
So[blue: 6, black: 2] would be treated as [blue: 6, green: 0, black: 2].
 This allows you to easily addnew nodes without invalidating the existing vector stamps.
 Vector stamps are a valuable tool that spots inconsistencies, but doesn’t resolve them. Any
conflictresolution will depend on the domain you are working in.
 This is part of the consistency/latency tradeoff. You either have to live with the fact that network
partitions may make your system unavailable, or you have to detect and deal with
inconsistencies.

Key Points
• Version stamps help you detect concurrency conflicts. When you read data, then update it, you can
check the version stamp to ensure nobody updated the data between your read and write.
• Version stamps can be implemented using counters, GUIDs, content hashes, timestamps, or a
combination of these.
• With distributed systems, a vector of version stamps allows you to detect when different nodes
have conflicting updates.

Page 5

You might also like