0% found this document useful (0 votes)
95 views

Introduction To Transaction Processing

Introduction to transaction processing
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Introduction To Transaction Processing

Introduction to transaction processing
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

1.

Introduction
CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety
Copyright 2012 Philip A. Bernstein

1/4/2012

Outline
1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Performance 5. Scalability

1/4/2012

1.1 The Basics - Whats a Transaction?


The execution of a program that performs an administrative function by accessing a shared database, usually on behalf of an on-line user. Examples

1/4/2012

Reserve an airline seat. Buy an airline ticket. Withdraw money from an ATM. Verify a credit card sale. Order an item from an Internet retailer. Place a bid at an on-line auction. Submit a corporate purchase order.
3

The ities are What Makes Transaction Processing (TP) Hard



1/4/2012

Reliability - system should rarely fail Availability - system must be up all the time Response time - within 1-2 seconds Throughput - thousands of transactions/second Scalability - start small, ramp up to Internet-scale Security for confidentiality and high finance Configurability - for above requirements + low cost Atomicity - no partial results Durability - a transaction is a legal contract Distribution - of users and data
4

What Makes TP Important?


Its at the core of electronic commerce
Most medium-to-large businesses use TP for their production systems. The business cant operate without it. Its a huge slice of the computer system market. One of the largest applications of computers.

1/4/2012

TP System Infrastructure
Users viewpoint
Enter a request from a browser or other display device The system performs some application-specific work, which includes database accesses Receive a reply (usually, but not always)

The TP system ensures that each transaction


Is an independent unit of work Executes exactly once Produces permanent results

TP system makes it easy to program transactions TP system has tools to make it easy to manage
1/4/2012 6

TP System Infrastructure
Defines System and Application Structure
End-User

Front End Program

Client

requests
Request Controller
(routes requests and supervises their execution)

Transaction Server Database System


1/4/2012

Back-End (Server)
7

System Characteristics
Typically < 100 transaction types per application Transaction size has high variance. Typically,
0-30 disk accesses 10K - 1M instructions executed 2-20 messages

A large-scale example: airline reservations


Hundreds of thousands of active display devices Indirect access via Internet Tens of thousands of transactions per second, peak
1/4/2012 8

Availability
Fraction of time system is able to do useful work Some systems are very sensitive to downtime Airline reservation, stock exchange, on-line retail, Downtime is front page news Downtime Availability 1 hour/day 95.8% 1 hour/week 99.41% 1 hour/month 99.86% 1 hour/year 99.9886% 1 hour/20years 99.99942% Contributing factors Failures due to environment, system mgmt, h/w, s/w Recovery time 1/4/2012

Application Servers
A software product to create, execute and manage TP applications Formerly called TP monitors. Some people say App Server = TP monitor + web functionality. Programmer writes an app to process a single request. App Server scales it up to a large, distributed system
E.g. application developer writes programs to debit a checking account and verify a credit card purchase. App Server helps system engineer deploy it to 10s/100s of servers and 10Ks of displays App Server helps system engineer deploy it on the Internet, accessible from web browsers
1/4/2012 10

Application Servers (contd)


Components include
An application programming interface (API) (e.g., Enterprise Java Beans) Tools for program development Tools for system management (app deployment, fault & performance monitoring, user mgmt, etc.)

Enterprise Java Beans, IBM Websphere, Microsoft .NET (COM+), Oracle Weblogic and Application Server
1/4/2012 11

App Server Architecture, Pre-Web


Boxes below are distributed on an intranet
Message Inputs

Front End Program Queues Request Controller Transaction Server


1/4/2012

Network

Transaction Server
12

Automated Teller Machine (ATM) Application Example


Bank Branch 1 ATM ATM Bank Branch 2 ATM ATM Bank Branch 500
ATM ATM

Request Controller

Request Controller

Interbank Transfer
1/4/2012

Checking Accounts

Credit Card Accounts

Loan Accounts
13

Application Server Architecture


Web Server Queues Message Inputs

Request Controller
Transaction Server
1/4/2012

intranet

other TP systems

Transaction Server
14

Internet Retailer
The Internet

Toys Books

Web Server

Request Controller

Music
1/4/2012

Electronics

Computers
15

Service Oriented Architecture (SOA)


Web services - interface and protocol standards to do app server functions over the internet.
The Internet
Web Service
Web Service

Toys Books

Web Server

Request Controller

Music
1/4/2012

Electronics

Computers
16

Enterprise Application Integration (EAI)


A software product to route requests between independent application systems. It often includes
A queuing system A message mapping system Application adaptors (SAP, Oracle PeopleSoft, etc.)

EAI and Application Servers address a similar problem, with different emphasis Examples
IBM Websphere MQ, TIBCO, Vitria, Sun SeeBeyond
1/4/2012 17

ATM Example with an EAI System


Bank Branch 1 ATM ATM Bank Branch 2 ATM ATM Bank Branch 500
ATM ATM

Queues

EAI Routing

Queues

EAI Routing

Interbank Transfer
1/4/2012

Checking Accounts

Credit Card Accounts

Loan Accounts
18

Workflow, or Business Process Mgmt


A software product that executes multi-transaction long-running scripts (e.g., process an order) Product components
A workflow script language Workflow script interpreter and scheduler Workflow tracking Message translation Application and queue system adaptors

Transaction-centric vs. document-centric Structured processes vs. case management


Examples: IBM Websphere MQ Workflow, Microsoft BizTalk, SAP, Vitria, Oracle Workflow, IBM FileNET, EMC Documentum, TIBCO
1/4/2012 19

Data Integration Systems (Enterprise Information Integration)


Query Mediator

Checking Accounts

Loan Accounts

Credit card Accounts

Heterogeneous query systems (mediators). Its database system software, but Its similar to EAI with more focus on data transformations than on message mgmt.
1/4/2012 20

Transactional Middleware
In summary, there are many variations that package different combinations of middleware features
Application Server Enterprise Application Integration Business process management (aka Workflow) Enterprise Server Bus

New ones all the time, that defy categorization


1/4/2012 21

System Software Vendors View


TP is partly a component product problem
Hardware Operating system Database system Application Server

TP is partly a system engineering problem


Getting all those components to work together to produce a system with all those ilities

This course focuses primarily on the Database System and Application Server
1/4/2012 22

Outline
1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Performance 5. Scalability

1/4/2012

23

1.2 The ACID Properties


Transactions have 4 main properties
Atomicity - all or nothing Consistency - preserve database integrity Isolation - execute as if they were run alone Durability - results arent lost by a failure

1/4/2012

24

Atomicity
All-or-nothing, no partial results
E.g. in a money transfer, debit one account, credit the other. Either debit and credit both run, or neither runs. Successful completion is called Commit Transaction failure is called Abort

Commit and abort are irrevocable actions An Abort undoes operations that already executed
For database operations, restore the datas previous value from before the transaction But some real world operations are not undoable
Examples - transfer money, print ticket, fire missile
1/4/2012 25

Example - ATM Dispenses Money (a non-undoable operation)


T1: Start . . . Dispense Money Commit
T1: Start . . . Commit Dispense Money Deferred operation never gets executed
1/4/2012

System crashes Transaction aborts Money is dispensed

System crashes

26

Reading Uncommitted Output Isnt Undoable


T1: Start ... Display output ... If error, Abort

User reads output User enters input

Brain transport

T2: Start Get input from display ... Commit


1/4/2012 27

Compensating Transactions
A transaction that reverses the effect of another transaction (that committed). For example,
Adjustment in a financial system Annul a marriage

Not all transactions have complete compensations


E.g., Certain money transfers E.g., Fire missile, cancel contract Contract law talks a lot about appropriate compensations

A well-designed TP application should have a compensation for every transaction type


1/4/2012 28

Consistency
Every transaction should maintain DB consistency
Referential integrity - E.g., each order references an existing customer number and existing part numbers The books balance (debits = credits, assets = liabilities)

Consistency preservation is a property of a transaction, not of the TP system (unlike the A, I, and D of ACID) If each transaction maintains consistency, then serial executions of transactions do too
1/4/2012 29

Some Notation
ri[x] = Read(x) by transaction Ti wi[x] = Write(x) by transaction Ti ci = Commit by transaction Ti ai = Abort by transaction Ti A history is a sequence of such operations, in the order that the database system processed them
30

1/4/2012

Consistency Preservation Example


T1: Start; A = Read(x); A = A - 1; Write(y, A); Commit;
T2: Start; B = Read(x); C = Read(y); If (B -1> C) then B = B - 1; Write(x, B); Commit;

Consistency predicate is x > y Serial executions preserve consistency. Interleaved executions may not. H = r1[x] r2[x] r2[y] w2[x] w1[y]
e.g., try it with x=4 and y=2 initially
1/4/2012 31

Isolation
Intuitively, the effect of a set of transactions should be the same as if they ran independently Formally, an interleaved execution of transactions is serializable if its effect is equivalent to a serial one Implies a user view where the system runs each users transaction stand-alone Of course, transactions in fact run with lots of concurrency, to use device parallelism
32

1/4/2012

Serializability Example 1
T1: Start; A = Read(x); A = A + 1; Write(x, A); Commit; T2: Start; B = Read(y); B = B + 1; Write(y, B); Commit;

H = r1[x] r2[y] w1[x] c1 w2[y] c2 H is equivalent to executing


T1 followed by T2 T2 followed by T1

1/4/2012

33

Serializability Example 2
T1: Start; A = Read(x); A = A + 1; Write(x, A); Commit; T2: Start; B = Read(x); B = B + 1; Write(y, B); Commit;


1/4/2012

H = r1[x] r2[x] w1[x] c1 w2[y] c2 H is equivalent to executing T2 followed by T1 Note, H is not equivalent to T1 followed by T2 Also, note that T1 started before T2 and finished before T2, yet the effect is that T2 ran first
34

Serializability Examples
Client must control the relative order of transactions, using handshakes (wait for T1 to commit before submitting T2) Some more serializable executions r1[x] r2[y] w2[y] w1[x] T1 T2 T2 T1 r1[y] r2[y] w2[y] w1[x] T1 T2 T2 T1 r1[x] r2[y] w2[y] w1[y] T2 T1 T1 T2 Serializability says the execution is equivalent to some serial order, not necessarily to all serial orders
1/4/2012 35

Non-Serializable Examples
r1[x] r2[x] w2[x] w1[x] (race condition)
e.g., T1 and T2 are each adding 100 to x

r1[x] r2[y] w2[x] w1[y]


e.g., each transaction is trying to make x = y, but the interleaved effect is a swap

r1[x] r1[y] w1[x] r2[x] r2[y] c2 w1[y] c1 (inconsistent retrieval)


e.g., T1 is moving $100 from x to y T2 sees only half of the result of T1

Compare to the OS view of synchronization


1/4/2012 36

Durability
When a transaction commits, its results will survive failures (e.g., of the application, OS, DB system even of the disk) Makes it possible for a transaction to be a legal contract Implementation is usually via a log
DB system writes all transaction updates to its log To commit, it adds a record commit(Ti) to the log When the commit record is on disk, the transaction is committed System waits for disk ack before acking to user
1/4/2012 37

Outline
1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Performance 5. Scalability

1/4/2012

38

1.3 Atomicity and Two-Phase Commit


Distributed systems make atomicity harder Suppose a transaction updates data managed by two DB systems One DB system could commit the transaction, but a failure could prevent the other system from committing The solution is the two-phase commit protocol Abstract DB system by resource manager (could be a SQL DBMS, message mgr, queue mgr, OO DBMS, etc.)
1/4/2012 39

Two-Phase Commit
Main idea - all resource managers (RMs) save a durable copy of the transactions updates before any of them commit If one RM fails after another commits, the failed RM can still commit after it recovers The protocol to commit transaction T
Phase 1 - Ts coordinator asks all participant RMs to prepare the transaction. Each participant RM replies prepared after Ts updates are durable. Phase 2 - After receiving prepared from all participant RMs, the coordinator tells all participant RMs to commit
1/4/2012 40

Two-Phase Commit System Architecture


Application Program
Read, Write Start Commit, Abort

Other Transaction Managers 1. Start transaction returns a unique transaction identifier 2. Resource accesses include the transaction identifier For each transaction, RM registers with TM 3. When application asks TM to commit, the TM runs two-phase commit Resource Manager Transaction Manager (TM)
1/4/2012 41

Outline
1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Performance 5. Scalability

1/4/2012

42

1.4 Performance Requirements


Measured in max transaction per second (tps) or per minute (tpm), and dollars per tps or tpm Dollars measured by list purchase price plus 5 year vendor maintenance (cost of ownership) Workload typically has this profile

10% application server plus application 30% communications system (not counting presentation) 50% DB system http://www.tpc.org

TP Performance Council (TPC) sets standards TPC A & B (89-95), now TPC C & E
1/4/2012 43

TPC-A/B Bank Tellers


Obsolete (a retired standard), but interesting Input is 100 byte message requesting deposit/withdrawal Database tables = {Accounts, Tellers, Branches, History} Start Read message from terminal (100 bytes) Read+write account record (random access) Write history record (sequential access) Read+write teller record (random access) Read+write branch record (random access) Write message to terminal (200 bytes) Commit

End of history and branch records are bottlenecks


1/4/2012

44

TPC-C Order-Entry for Warehouse


Table Warehouse District Customer History Order New-Order OrderLine Stock Item Rows/Whse 1 10 30K 30K 30K 9K 300K 100K 100K Bytes/row 89 95 655 46 24 8 54 306 82

TPC-C uses heavier weight transactions


1/4/2012 45

TPC-C Transactions
New-Order
Get records describing a warehouse, customer, & district Update the district Increment next available order number Insert record into Order and New-Order tables For 5-15 items, get Item record, get/update Stock record Insert Order-Line Record

Payment, Order-Status, Delivery, Stock-Level have similar complexity, with different frequencies tpmC = number of New-Order transaction per min
1/4/2012 46

Comments on TPC-C
Enables apples-to-apples comparison of TP systems Does not predict how your application will run, or how much hardware you will need, or which system will work best on your workload Not all vendors optimize for TPC-C
Some high-end system sales require custom benchmarks
47

1/4/2012

Current TPC-C Numbers


All numbers are sensitive to date submitted Systems
cost $60K (Dell/HP) - $12M (Oracle/IBM) mostly Oracle/DB2/MS SQL on Unix variants/Windows $0.40 - $5 / tpmC

Example of high throughput


Oracle, 30M tpmC, $30.0M, $1/tpmC, Oracle/Solaris

Example of low cost


HP ProLiant, 290K tpmC, $113K, $0.39/tpmC, Oracle/Linux
1/4/2012 48

TPC-E
Approved in 2007 Models a stock trading app for brokerage firm Should replace TPC-C, its database-centric More complex but less disk IO per transaction

1/4/2012

49

TPC-E
33 tables in four sets
Market data (11 tables)

Customer data (9 tables)


Broker data (9 tables)

Reference data (4 tables)

Scale
500 customers per tpsE
1/4/2012 50

TPC-E Transactions
Activities
Stock-trade, customer-inquiry, feeds from markets, market-analysis

tpsE = number of Trade-Result transaction per sec Trade-Result



1/4/2012

Completes a stock market trade Receive from market exchange confirmation & price Update customers holdings Update broker commission Record historical information
51

TPC-E Transactions
Name
Broker-Volume Customer-Position

Access Description
RO RO DSS-type medium query What am I worth? Processing of Stock Ticker Whats the market doing? Details about a security Look up historical trade info Enter a stock trade Completion of a stock trade Check status of trade order Correct historical trade info
52

Market-Feed Market-Watch Security-Detail Trade-Lookup Trade-Order Trade-Result Trade-Status Trade-Update


1/4/2012

RW RO RO RO RW RW RO RW

Current TPC-E Numbers


Systems
Cost $60K - $2.3M Almost all are MS SQL on Windows $130 - $250 / tpsE

Example of high throughput


IBM, 4.5k tpsE, $645k, $140/tpsE, MS SQL/Windows

Example of low cost


IBM, 2.9K tpsE, $371K, $130/tpsE, MS SQL/Windows

1/4/2012

53

Outline
1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Performance 5. Scalability

1/4/2012

54

1.5 Scalability
Techniques for better performance
Textbook, Chapter 2, Section 6

Scale-up
Caching Resource Pooling

Scale-out
Partitioning Replication

1/4/2012

55

Caching
Key idea
Use more memory Keep a copy of data from its permanent home Accessing a cached copy is fast

Key issues
Which data to keep
Popular read-only data

Cache replacement What if original data is updated


Invalidations Timeouts
1/4/2012 56

Caching
Applied at multiple levels
Database and application server

Updates
Write through
Better cache coherence

Write back
Batching and write absorption

Example products
Memcached, MS Velocity

1/4/2012

57

Resource Pooling
Key idea
If a logical resource is expensive to create and cheap to access, then manage a pool of the resource

Examples
Session pool Thread pool

1/4/2012

58

Partitioning
To add system capacity, add server machines Sometimes, you can just relocate some server processes to different machines But if an individual server process overloads one machine, then you need to partition the process
Example One server process manages flights, cars, and hotel rooms. Later, you partition them in separate processes. We need mapping from resource name to server name

1/4/2012

59

Partitioning: Routing
Sometimes, its not enough to partition by resource type, because a resource is too popular
Example: flights

Partition popular resource based on value ranges


Example flight number 1-1000 on Server A, flight number 1000-2000 on Server B, etc. Request controller has to direct its calls based on parameter value (e.g. flight number) This is called parameter-based routing
E.g., range, hashing, dynamic
1/4/2012 60

Replication
Replication - using multiple copies of a server or resource for better availability and performance.
Replica and Copy are synonyms

If youre not careful, replication can lead to


worse performance - updates must be applied to all replicas and synchronized worse availability - some algorithms require multiple replicas to be operational for any of them to be used

1/4/2012

61

Replicated Server
Can replicate servers on a common resource
Data sharing - DB servers communicate with shared disk

Client Server Replica 1 Server Replica 2

Resource

Helps availability for process (not resource) failure Requires a replica cache coherence mechanism, so this helps performance only if
Little conflict between transactions at different servers or Loose coherence guarantees (e.g. read committed)
1/4/2012 62

Replicated Resource
To get more improvement in availability, replicate the resources (too) Also increases potential throughput This is whats usually meant by replication
Client Server Replica 1 Resource replica
1/4/2012

Client Server Replica 2 Resource replica


63

Outline
1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Performance 5. Scalability

1/4/2012

64

Whats Next?
This chapter covered TP system structure and properties of transactions and TP systems The rest of the course drills deeply into each of these areas, one by one.

1/4/2012

65

Next Steps
We covered
Chapter 1 Chapter 2, Section 6

Assignment 1 Teams for the project

1/4/2012

66

You might also like