Serializability: Difference between revisions

Content deleted Content added
m Overview: Fixed another word's capitalization.
Overview: Slightly improved readability by inserting a couple of words.
Line 132:
'''Distributed serializability''' is the serializability of a schedule of a transactional [[distributed system]] (e.g., a [[distributed database]] system). Such a system is characterized by ''[[distributed transaction]]s'' (also called ''global transactions''), i.e., transactions that span computer processes (a process abstraction in a general sense, depending on computing environment; e.g., [[operating system]]'s [[Thread (computer science)|thread]]) and possibly network nodes. A distributed transaction comprises more than one of several ''local sub-transactions'' that each has states as described above for a [[Serializability#Database transaction|database transaction]]. A local sub-transaction comprises a single process, or more processes that typically fail together (e.g., in a single [[processor core]]). Distributed transactions imply a need for an [[atomic commit]] protocol to reach consensus among its local sub-transactions on whether to commit or abort. Such protocols can vary from a simple (one-phase) handshake among processes that fail together to more sophisticated protocols, like [[Two-phase commit protocol|two-phase commit]], to handle more complicated cases of failure (e.g., process, node, communication, etc. failure). Distributed serializability is a major goal of [[distributed concurrency control]] for correctness. With the proliferation of the [[Internet]], [[cloud computing]], [[grid computing]], and small, portable, powerful computing devices (e.g., [[smartphone]]s,) the need for effective distributed serializability techniques to ensure correctness in and among distributed applications seems to increase.
 
Distributed serializability is achieved by implementing distributed versions of the known centralized techniques.<ref name=Bernstein87 /><ref name=Weikum01 /> Typically, all such distributed versions require utilizing conflict information (of either materialized or non-materialized conflicts, or, equivalently, transaction precedence or blocking information; conflict serializability is usually utilized) that is not generated locally, but rather in different processes, and remote locations. Thus information distribution is needed (e.g., precedence relations, lock information, timestamps, or tickets). When the distributed system is of a relatively small scale and message delays across the system are small, the centralized concurrency control methods can be used unchanged while certain processes or nodes in the system manage the related algorithms. However, in a large-scale system (e.g., ''grid'' and ''cloud''), due to the distribution of such information, a substantial performance penalty is typically incurred, even when distributed versions of the methods (vs. the centralized ones) are used, primarily due to computer and communication [[latency (engineering)|latency]]. Also, when such information is distributed, related techniques typically do not scale well. A well-known example with scalability problems is a [[distributed lock manager]], which distributes lock (non-materialized conflict) information across the distributed system to implement locking techniques.
 
==See also==