A scalable generic transaction model scenario for distributed NoSQL databases
2015, Journal of Systems and Software
https://doi.org/10.1016/J.JSS.2014.11.037Sign up for access to the world's latest research
Abstract
With the development of cloud computing and internet; e-Commerce, e-Business and corporate world revenue are increasing with high rate. These areas not only require scalable and consistent databases but also require inter database transaction support. In this paper, we present, a scalable three-tier architecture along with a distributed middle-ware protocol to support atomic transactions across heterogeneous NoSQL databases. Our methodology does not compromise on any assumption on the accuracy of failure modalities. Hence, it is suitable for a class of heterogeneous distributed systems. To achieve such a target, our architectural model exploits an innovative methodology to achieve distributed atomic transactions. We simulate this architectural setup with different latency tests under different environments to produce reliable impact and correctness.
Related papers
2007
Abstract A recently proposed abstraction, called e-transaction (exactly-once transaction), specifies a set of properties capturing end-to-end reliability aspects for three-tier Web-based systems. In this paper we propose a distributed protocol ensuring the e-transaction properties for the general case of multiple, autonomous back-end databases. The key idea underlying our proposal consists in distributing, across the back-end tier, some recovery information reflecting the transaction processing state.
VLDB, 2013
Web service providers have been using NoSQL datastores to provide scalability and availability for globally distributed data at the cost of sacrificing transactional guarantees. Recently, major web service providers like Google have moved towards building storage systems that provide ACID transactional guarantees for globally distributed data. For example, the newly published system, Spanner, uses Two-Phase Commit and Two-Phase Locking to provide atomicity and isolation for globally distributed data, running on top of Paxos to provide fault-tolerant log replication. We show in this paper that it is possible to provide the same ACID transactional guarantees for multi-datacenter databases with fewer crossdatacenter communication trips, compared to replicated logging. Instead of replicating the transactional log, we replicate the commit operation itself, by running Two-Phase Commit multiple times in different datacenters and using Paxos to reach consensus among datacenters as to whether the transaction should commit. Doing so not only replaces several inter-datacenter communication trips with intra-datacenter communication trips, but also allows us to integrate atomic commitment and isolation protocols with consistent replication protocols to further reduce the number of cross-datacenter communication trips needed for consistent replication; for example, by eliminating the need for an election phase in Paxos. We analyze our approach in terms of communication trips to compare it against the log replication approach, then we conduct an extensive experimental study to compare the performance and scalability of both approaches under various multi-datacenter setups.
Future Internet
Internet has become so widespread that most popular websites are accessed by hundreds of millions of people on a daily basis. Monolithic architectures, which were frequently used in the past, were mostly composed of traditional relational database management systems, but quickly have become incapable of sustaining high data traffic very common these days. Meanwhile, NoSQL databases have emerged to provide some missing properties in relational databases like the schema-less design, horizontal scaling, and eventual consistency. This paper analyzes and compares the consistency model implementation on five popular NoSQL databases: Redis, Cassandra, MongoDB, Neo4j, and OrientDB. All of which offer at least eventual consistency, and some have the option of supporting strong consistency. However, imposing strong consistency will result in less availability when subject to network partition events.
2005
Abstract The e-transaction abstraction is a recent formalization of end-to-end reliability properties for three-tier systems. In this work, we present a protocol ensuring the e-transaction guarantees in case the back-end tier consists of a centralized database. Our proposal addresses the case of stateless application servers, and is both simple and effective since 1) it does not employ any distributed commit protocol and 2) does not require coordination among the replicas of the application server.
In distributed database systems, the primary need for commit protocols is to maintain the atomicity of distributed transactions. Atomic commitment issue is of prime importance in the distributed system and the issue becomes more necessary to deal with if some of the sites participating in the execution of the transaction commitment fail. Several atomic commit protocols have evolved to terminate distributed transactions. This paper presents an overview of a distributed transaction model, and a description of the two phase commit (2PC) protocol (which is blocking) and the one phase (1PC) commit protocols (which is non-blocking). This paper further examines the assumptions of these commit protocols in their bid to addressing the atomic commitment issue in distributed database systems. By restricting possible encountered failure to site failure, drawbacks in the assumptions of these atomic commit protocols were identified, which clearly show that the non-blocking protocol studied addres...
Data consistency is big issue while using NoSQL Cloud data stores. They ensure scalability and high availability properties for web applications, but while providing these they sacrifice data consistency. Some available applications cannot afford data inconsistency. To achieve Data consistency in multi-item transactions on web applications, CloudTPS is best solution. CloudTPS acts as a scalable transaction manager which guarantees full ACID properties for multi-item transactions on web applications. It does not depend on the presence of server failures and network partitions. There is no effect of failures and network partitions on functionality of CloudTPS. HBase and Hadoop provides scalable data layers. Hence we perform this approach on top of this scalable data layers.
Proceedings of the VLDB Endowment, 2013
Web service providers have been using NoSQL datastores to provide scalability and availability for globally distributed data at the cost of sacrificing transactional guarantees. Recently, major web service providers like Google have moved towards building storage systems that provide ACID transactional guarantees for globally distributed data. For example, the newly published system, Spanner, uses Two-Phase Commit and Two-Phase Locking to provide atomicity and isolation for globally distributed data, running on top of Paxos to provide fault-tolerant log replication. We show in this paper that it is possible to provide the same ACID transactional guarantees for multi-datacenter databases with fewer cross-datacenter communication trips, compared to replicated logging. Instead of replicating the transactional log, we replicate the commit operation itself, by running Two-Phase Commit multiple times in different datacenters and using Paxos to reach consensus among datacenters as to wheth...
arXiv (Cornell University), 2021
MongoDB is a popular general-purpose, document-oriented, distributed NoSQL database. It supports transactions in three different deployments: single-document transactions utilizing the WiredTiger storage engine in a standalone node, multi-document transactions in a replica set which consists of a primary node and several secondary nodes, and distributed transactions in a sharded cluster which is a group of multiple replica sets, among which data is sharded. A natural and fundamental question about MongoDB transactions is: What transactional consistency guarantee do MongoDB transactions in each deployment provide? However, it lacks both concise pseudocode of MongoDB transactions in each deployment and formal specification of the consistency guarantees which MongoDB claimed to provide. In this work, we formally specify and verify the transactional consistency protocols of MongoDB. Specifically, we provide a concise pseudocode for the transactional consistency protocols in each MongoDB deployment, namely WiredTiger, ReplicaSet, and ShardedCluster, based on the official documents and source code. We then prove that WiredTiger, ReplicaSet, and ShardedCluster satisfy different variants of snapshot isolation, namely StrongSI, RealtimeSI, and SessionSI, respectively. We also propose and evaluate efficient white-box checking algorithms for MongoDB transaction protocols against their consistency guarantees, effectively circumventing the NP-hard obstacle in theory.
VLDB, 2012
We present a framework for concurrency control and availability in multi-datacenter datastores. While we consider Google's Megastore as our motivating example, we define general abstractions for key components, making our solution extensible to any system that satisfies the abstraction properties. We first develop and analyze a transaction management and replication protocol based on a straightforward implementation of the Paxos algorithm. Our investigation reveals that this protocol acts as a concurrency prevention mechanism rather than a concurrency control mechanism. We then propose an enhanced protocol called Paxos with Combination and Promotion (Paxos-CP) that provides true transaction concurrency while requiring the same per instance message complexity as the basic Paxos protocol. Finally, we compare the performance of Paxos and Paxos-CP in a multi-datacenter experimental study, and we demonstrate that Paxos-CP results in significantly fewer aborted transactions than basic Paxos.
Many modern enterprise applications follow the threetier architecture. Typical three-tier systems have some nice properties: clients are thin, application states are stored in database,; and the middle servers are stateless. This makes three-tier systems well scalable and manageable. However, it is challenging to implement end-to-end reliable and safe transactions for such applications without violating their nice properties.