Notesh
Notesh
"
I "
"!
. 3.#., )ro& A FT B, B FT C, . . . , AFT P, 1/
F111F
>Ds, #et "
"/
>Ds, about a &illio!. 8) #o up to . . . , 9 > :, : > V, #et
"
+"
>Ds, about $,///,///,///,///,/// >Ds ()our quadrillio!.
4ots. 3(po!e!ti al e(plosio! i! !u&ber o) attri butes. 3ve! i) start 'i th
a &a!a#eabl e !u&ber 'i th co&&o!se!se rules, ca! #et proble&.
Really o!l y 'a!t to arrive at a 78087A4 set o) >Ds, so 'e try to avoid
this ki!d o) e(plosio!. Still 'e !eed all these co!cepts.
De). ,.,.1/. >D Set Cover . A set > o) >Ds o! a table 2 is said to
C153R a!other set G o) >Ds o! 2 i) the set G ca! be deri ved by
i&plicati o! rules )ro& the set >, i.e., i) G >
\
. 8) > covers G a!d G
covers >, 'e say the t'o sets o) >Ds are equi vale!t, > G.
3(. ,.,.,. > I RB FT C D, A D FT 3, B FT AS a!d
G I RB FT C D 3, B FT A B C, A D FT 3S.
De&ostrate i! book ho' > covers G. But also G covers >. See 'hyM B
FT C D 3 i&plies B FT C D a!d B FT 3 by deco&posi ti o! rule. So have
)irst >D i! >.
A better behaved de)i!i ti o! )ro& a sta!dpoi !t o) e(plosio! to
characteriBe a set o) >Ds is the )ollo'i !#.
De). ,.,.11. Closure o) a set o) attri butes . Give! a set 9 o) attri butes
i! a table 2 a!d a set > o) >Ds o! 2, 'e de)i!e the C41S;R3 o) the set
9 (u!der >, de!oted by 9
\
, as the lar#est set o) attri butes : such that
9 FT : is i! >
\
.
<e 'ill study closure o) set o) ATTRIBUTES, !ot closure o) set o) FDs.
Al#ori th& to deter&i !e set closure, p#. *$". 6retty i!tui ti ve: Start
'i th 9 I 9
\
a!d Hust keep loopi!# throu#h the (hope)ull y s&all set o)
>Ds as lo!# as !e' attri butes ca! be added to 9
\
.
6#. *$1F $". %lgori tm ).). !". -et Closure. Al#ori th& to deter&i !e 9
\
,
the closure o) a #ive! set o) attri butes 9, u!der a #ive! set > o) >Ds.
% = &; '"&# = '; (* integer %, attr. set '"&#
*(
)*+*,- (* loop to find larger '"%#
*(
F11"F
% = % . /; (* new %
*(
'"%# = '"%0/#; (* initiali1e new '"%#
*(
23) ,44 5 > 6 in 2 (* loop on all 27s 5 > 6 in 2
*(
%2 5 '"%# (* if 5 contained in '"%#
*(
-8*N '"%# = '"%# 6; (* add attrib9tes in 6 to '"%#
*(
*N7 23) (* end loop on 27s
*(
:N-%4 '"%# = '"%0/#; (* loop tillno new attrib9tes
*(
)*-:)N '
. =
'"%#
; (* ret9rn clos9re of '
*(
0ote that the step i! Al#ori th& + ,.1" that adds attri butes to 9J8K is
based o! a si&ple i!)er e!ce rule that 'e call the Set Accumulati on !ule,
stated thus: 8) 9 > : V a!d V > < the! 9 > : V <.
8! our al#ori th& 'e are sayi!# that si!ce 9 > 9J8K (our i!ducti ve asF
su&pti o! a!d 9J8K ca! be represe!ted as : V (because V 9J8K, 'e ca!
'ri te 9 > 9J8K as 9 > : V, a!d si!ce > co!tai!s the >D V > <, 'e co!F
clude by the set accu&ul ati o! rule that 9 > : V < or i! other 'ords 9 >
9J8K ;0810 <.
E:ample ).). + . 8! 3(a&ple ,.,.,, 'e 'ere #ive! the set > o) >Ds:
> I RB > C D, A D > 3, B > AS (Get B\ I B C D (1 A (* 3 ("
(Shortha!d (1, (": 1RD3R 8761R2A02
Give! 9 I B, 'e deter&i !e that 9
\
I A B C D 3. 8! ter&s o) Al#ori th&
,.,.1", 'e start 'i th 9J/K I B. 2he! 9J1K I B, a!d 'e be#i! to loop
throu#h the >Ds. Because o) B > C D, 9J1K I B C D.
2he !e(t >D, A D > 3, does !ot appl y at this ti &e, si!ce A D is !ot a
subset o) 9J1K. 0e(t because o) B > A, 'e #et 9J1K > A B C D.
0o' 9J/K is strictl y co!tai!ed i! 9J1K (i.e., 9J8F1K 9J8K so 9J8F1K a 9J8K.
2hus 'e have &ade pro#ress i! this last pass o) the loop a!d #o o! to a
!e' pass, setti !# 9J"K I 9J1K I A B C D.
F11*F
4oopi!# throu#h the >Ds a#ai!, 'e see that all o) the& ca! be applied
('e could skip the o!es that have bee! applied be)ore si!ce they 'ill
have !o !e' e))ect, 'ith the o!l y !e' >D, A D > 3, #ivi!# us 9J"K I A B
C D 3.
At the e!d o) this loop, the al #ori th& !otes that 9J1K 9J"K, pro#ress
has bee! &ade, so 'e #o o! to create 9J*K a!d loop thou#h the >Ds
a#ai!, e!di!# up this pass 'i th 9J*K I 9J"K.
Si!ce all o) the >Ds had bee! applied already, 'e could o&i t this pass by
!oti !# this )act. 0ote that a di))ere!t 1RD3R80G o) the >Ds i! > ca!
cha!#e the details o) e(e cutio! )or this al#ori th&.
7inding a Minimal Co5er of a set of 7Ds .
Al#ori th& ,.,.1*, p#. *,.. 3%-1C 2O NO/M%L1 L%21 ON D Give! a set
> o) >Ds, co!struct a set 7 o) >Ds that is &i!i &al a!d covers >.
4etGs appl y this to the (!o!F reduced set o) >Ds above.
>: (1 A > B, (" C > B, (* D > A B C, ($ A C > D
^ Re&e&ber, this ca! co&e outQQ
Step 1, >ro& the set > o) >Ds, create a! equi vale!t set % 'ith o!ly
si!#le >Ds o! the ri#ht. ;se deco&posi ti o! rule. See step 1, p#. *$*.
%: (1 A > B, (" C > B, (* D > A, ($ D > B, (+ D > C, (, A C >
D
Step ". Re&ove i!esse!ti al >Ds )ro& the set % to #et the set A.
Deter&i !e i!esse!ti al 9 > A i) A is i! 9
\
u!der >Ds 'ithout 9 FT A.
2ry re&ovi !# (1 A > B, leavi !# o!l y (" C > B, (* D > A, ($ D > B,
(+ D > C, (, A C > D. 2ake 9 I A i! closure al#ori th&, clearl y #et
o!l y 9
\
I A, because !o other >D has its le)t side co!tai !ed i! 9. So
!eed (1.
2ry re&ovi !# others. (" stays, si!ce !o other C o! le)t. 2here)ore i)
set 9 I C could!G t #et 9\ to co!tai ! &ore tha! C. (2hat reaso!i!# is
1P.
%o' about (*M <ould be le)t 'ith o!ly: (1 A > B, (" C > B, ($ D >
B, (+ D > C, (, A C > D. Co!sider 9 I D. Get 9
\
I D B ($ C (+.
2he! stop. 8! )act A !ot o! ri#ht o) a!y >Ds i) take out (*, so (*
!eeded.
F11$F
0o' try re&ovi !# ($. Peep: (1 A > B, (" C > B, (* D > A, (+ D >
C, (, A C > D. Ca! 'e derive D FT BM 9 I D. Get: D A (* C (+ B
(". 2here)ore, #ot D FT B. So ca! leave ($ out .
Ca!G t leave out (+ because C !ot o! ri#ht o) a!y other >D. Ca!G t
leave out (, because D !ot o! ri#ht o) a!y other. 2here)ore ca! o!l y
reduce set to:
% I (1 A > B, (" C > B, (* D > A, ($ D > C, (+ A C > D
(Re!u&ber
Step *. Successivel y replace >Ds i! % 'ith >Ds that have a s&aller
!u&ber o) >Ds o! the le)tF ha!d side so lo!# as %
\
re&ai !s the sa&e.
2est this by successivel y re&ovi !# si!#le attri butes )ro& &ul tiF
attri bute le)t ha!d sides o) >Ds, cha!#i !# 9 > A to : > A, the!
checki!# i) :
\
u!der !e' >D set is u!cha!#ed.
(Clearl y i) 'e assu&e : > A, a!d : 9, ca! derive everythi !# used to
be able to: still true that 9 > A. 104: R8SP is that : > A &i#ht i&pl y
211 7;C%. 8.e., &i#ht have :
\
is 4ARG3R tha! be)oreQ
1!ly o!e to try is (+. 8) 'e cha!#e this to A > D, does D cha!#eM
used to be A
\
I A B, !o', A
\
I A B D C. 0o #ood. %o' about
cha!#i !# A C > D to C > DM Does C
\
cha!#eM ;sed to be C
\
I C B.
0o' C
\
I C B D A. So !o #ood, ca!G t reduce. (01 033D 21 2R: S236
" AGA80.
8> <3 D8D R3D;C3 a!d created a !e' >D, let us say A FT D to replace
A C FT D, 'e 'ould !eed to appl y Step " a#ai! to test i) A FT D could
be re&ovedQQ Q
Step $. Apply ;!io! rules to bri!# thi!#s back to#ether o! the ri#ht
)or co&&o! sets o) attri butes o! the le)t o) >Ds, re!a&ed 7.
%: (1 A > B, (" C > B, (* D > A, ($ D > C, (+ A C > D
7: (1 A > B, (" C > B, (* D > A C, ($ A C > D
2his is the reduced set, above.
1P, !o' have al#ori th& to )i!d a &i!i &al cover )ro& a!y set o) >Ds.
Al&ost ready to do 0or&aliBati o!. But !eed o!e &ore co!cept.
F11+F
Sectio! ,.E. 4ossy a!d 4ossless deco&posi ti o!. <eGre #oi!# to be
)actori !# tables i!to s&aller tables (proHecti !# o!to t'o subsets o)
colu&!s that cover all colu&!s a!d have so&e colu&!s i! co&&o!,
but it does!G t al 'ays 'ork 'he! Hoi! back that keep all i!)or&ati o! o)
ori#i!al table.
Al'ays #et A44 ro's back, but &i#ht #et 71R3. 4ose 8!)or&ati o! .
See 3(a&ple ,.E.1 i! te(t, 6#. *E$, a 4ossy deco&posi ti o!.
E: ).+. !. % Lossy Decomposi tion . Co!sider table, ABC:
2able ABC
A B C
a1 1// c1
a" "// c"
a* *// c*
a$ "// c$
8) 'e )actor this table i!to t'o parts, AB a!d BC, 'e #et the )ollo'i !#
table co!te!ts:
2able AB 2able BC
A B B C
a1 1// 1// c1
a" "// "// c"
a* *// *// c*
a$ "// "// c$
%o'ever, the resul t o) Hoi!i!# these t'o tables is
AB A180 BC
A B C
a1 1// c1
a" "// c"
a" "// c$
a* *// c*
a$ "// c"
a$ "// c$
2his is 012 the ori#i!al table co!te!t )or ABCQ 0ote that the sa&e
deco&posed tables AB a!d BC 'ould have arise! i) the table 'e had
started 'i th 'as ABC9, 'i th co!te!t equal to AB A180 BC above, or either
o) t'o other tables, ABC: or ABCV:
F11,F
ABC: ABCV
A B C A B C
a1 1// c1 a1 1// c1
a" "// c" a" "// c"
a" "// c$ a* *// c*
a* *// c* a$ "// c"
a$ "// c$ a$ "// c$
Si!ce 'e ca!G t tell 'hat table co!te!t 'e started )ro&, i!)or&ati o! has
bee! lost by this deco&posi ti o! a!d the subseque!t Hoi!.
2his is k!o'! as a "ossy #ecomposi ti on , or so&eti &es a "ossy$ %oin
#ecomposi ti on .
Reaso! 'e lose i!)or&ati o! is &a!y to &a!y &atchi !# o! Hoi!
colu&!s. 4ose 'hich o!e o! the le)t 'as 'ith 'hich o!e o! the ri#ht.
3.#. a", "// o! le)t a!d a$, "// o! le)t &atch 'i th "//, c" a!d "//,
c$ o! ri#ht.
<ould be 1P i) al'ays had 0F1 relati o!shi ps o! Hoi! colu&!s (or 1F1.
3.#., CA6, orders ca! have &ul ti pl e ro's 'i th sa&e pid, but pid is
u!ique )or products. 7ust A4<A:S be u!ique o! o!e side, !ot a!
accide!t, so !eed Hoi! colu&!s to be S;63RP3: o! at least 103 S8D3
o) Hoi!.
2heore& ,.E.*. Give! a table 2 'i th a set o) >Ds > a!d a set o)
attri butes 9 i! %ead(2 the! 9 is a superkey o) 2 i)) 9 )u!cti o!al l y
deter&i !es all attri butes i! 2, i.e., 9 FT %ead(2 is i! >
\
.
2heore& ,.$.E. Give! a table 2 'i th a set > o) >Ds valid o! 2, the! a
deco&posi ti o! o) 2 i!to t'o tables R2
1
, 2
"
S is a lossless deco&posi ti o!
i) o!e o) the )ollo'i !# )u!cti o!al depe!de!ci es is i&plied by >:
(1 %ead(2
1
%ead(2
"
FT %ead(2
"
, or
(" %ead(2
1
%ead(2
"
FT %ead(2
"
.
C1S2
C6;
(@
k
Si&ilarl y )or 8N1. 2ells us ho' &a!y disks to buy, ho' po'er)ul a C6;.
(Costs #o up appro(i &atel y li!earl y 'i th C6; po'er 'ithi ! s&all ra!#es.
F1-+F
8deall y, the 'ei#hts <
1
a!d <
"
used to calculate C1S2(64A0 )ro& 8N1 a!d
C6; costs 'ill re)lect actual costs o) equip&e!t
Good DBA tries to esti &ate 'orkload i! adva!ce to &ake purchases, have
equip&e!t ready )or a !e' applicati o!.
WWW 0ote that o!e other &aHor e(pe!se is respo!se ti &e. Could have poor
respo!se ti &e (1F " &i!utes eve! o! li#htl y loaded i!e(pe!si ve syste&.
2his is because &a!y queries !eed to per)or& a 412 o) 8N1, a!d so&e
co&&ercial database syste&s have !o parallelis&: all 8N1s i! seque!ce.
But lo!# respo!se ti &e also has a cost, i! that it 'astes user ti &e. 6ay
'orkers )or 'asted ti &e. 3&ployees qui t )ro& )rustrati o! a!d others &ust
be trai!ed. 5e!dors are tryi !# hard to reduce respo!se ti &es.
;suall y there e(ist parallel versio!s o) these syste&s as 'ellC 'orth it i)
e(tre&el y heavy 8N1 a!d respo!se ti &e is a proble&, 'hile !u&ber o)
queries ru!!i !# at o!ce is s&all co&pared to !u&ber o) disks.
0ote that i! pla!s that )ollo', 'e 'ill !ot try to esti &ate C6;. 2oo hard.
Assu&e choose pla! 'i th best 8N1 a!d try to esti &ate that. 1)te! total cost
is proporti o!al to 8N1 si!ce each 8N1 e!tails e(tra C6; cost as 'ell.
-tatistics . 0eed to #ather, put i! Syste& tables. Syste& does !ot auto F
&aticall y #ather the& 'i th table load, i!de( create, updates to table.
8! DB", use ;tili ty called R;0S2A2S. >i#. ..*, p#. +*-.
R;0S2A2S 10 2AB43 user!a&e.tabl e!a&e
J<82% D8S2R8B;2810 JA0D D32A843DK R80D393S A44 ] 80D39 i!de(!a&eS K
Jother clauses !ot covered or de)erredK
e.#.: ru!stats o! table po!eil.custo&ersC
Syste& lear!s ho' &a!y ro's, ho' &a!y data pa#es, stu)) about i!de(es,
etc., placed i! catalo# tables.
1RAC43 uses A0A4:V3 co&&a!d. >i#. ..$, p#. +*-.
A0A4:V3 R80D39 ] 2AB43 ] C4;S23RS
Jsche&a.K Ri !de(!a&e ] table!a&e ] cluster!a&eS
RC176;23 S2A28S28CS ] other alter!ati ves !ot coveredS
R>1R 2AB43 ] >1R A44 J80D393DK C14;70S JS8V3 !K
] other alter!ati ves !ot coveredS
F1-,F
<ill see ho' statistics kept i! catalo# tables i! DB" a bit later.
Retrievi !# the @uery 6la!s. 8! DB" a!d 1RAC43, per)or& S@4 state&e!t. 8!
DB":
3964A80 64A0 JS32 @;3R:01 I !K JS32 @;3R:2AG I Gstri !#G K >1R
e(plai !ableF sqlFstate&e!tC
>or e(a&pl e:
e(plai! pla! set query!o I 1/// )or
select W )ro& custo&ers
'here city I GBosto!G a!d disc!t bet'ee! 1" a!d 1$C
2he 3(plai! 6la! state&e!t puts ro's i! a Opla!Ltabl eO to represe!t
i!di vi dual procedural steps i! a query pla!. Ca! #et ro's back by:
select W )ro& pla!Ltable 'here query!o I 1///C
Recall that a @uery 6la! is a seque!ce o) procedural access steps that carry
out a pro#ra& to a!s'er the query. Steps are peculiar to the DB7S.
>ro& o!e DB7S to a!other, di))ere!ce i! steps used is like di))ere!ce
bet'ee! pro#ra&&i !# la!#ua#es. Ca!G t lear! t'o la!#ua#es at o!ce.
<e 'ill stick to a speci)ic DB7S i! 'hat )ollo's, 75S DB", so 'e ca! e!d up
'i th a! i!)or&ati ve be!ch&ark 'e had i! the )irst editio!.
But 'e 'ill have occasio!al re)ere!ces to 1RAC43, DB" ;DB. %ere is the
1RAC43 3(plai! 6la! sy!ta(.
3964A80 64A0 JS32 S2A237302L8D I G te(tF ide!ti )i erG K J8021
Jsche&a.K tabl e!a&eK
>1R e(plai !ableF sqlFstate&e!tC
2his i!serts a seque!ce o) state&e!ts i!to a user created DB"N1RAC43 table
k!o'! as 64A0L2AB43 o!e ro' )or each access step. 2o lear! &ore about
this, see 1RAC43- docu&e!tati o! !a&ed i! te(t.
0eed to u!dersta!d 'hat basic procedural access steps AR3 i! the particular
product youG re 'orki !# 'i th.
2he set o) steps allo'ed is the Oba# o) tricksO the query opti &iBer ca! use.
2hi!k o) these procedural steps as the Oi!structi o!sO a co&piler ca! use to
create OobHect codeO i! co&pili!# a hi#herF level request.
F1-EF
A syste& that has a s&aller ba# o) tricks is likel y to have less e))icie!t acF
cess pla!s )or so&e queries.
75S DB" (a!d the archi tectural l y allied DB" ;DB have a 'ide ra!#e o)
tricks, but !ot bit&ap i!de(i !# or hashi!# capabili ty.
Still, very !ice capabili ti es )or ra!#e search queries, a!d probabl y the &ost
sophisticated query opti &iBer.
Basic procedrual steps covered i! the !e(t )e' Sectio!s (thu&b!ai l :
2able Sca! 4ook throu#h all ro's o) table
;!ique 8!de( Sca! Retrieve ro' throu#h u!ique i!de(
;!clustered 7atchi!# 8!de( Sca! Retrieve &ul ti pl e ro's throu#h a
!o!F u!ique i!de(, ro's !ot sa&e order
Clustered 7atchi!# 8!de( Sca! Retrieve &ul ti pl e ro's throu#h a
!o!F u!ique clustered i!de(
8!de(F 1!ly Sca! @uery a!s'ered i! i!de(, !ot ro's
0ote that the steps 'e have listed access all the ro's restricted by the
<%3R3 clause i! so&e si!#le table query.
0eed t'o tables i! the >R17 clause to requi re t'o steps o) this ki!d a!d
Aoi!s co&e later. A &ul tiF step pla! )or a si!#le table query 'ill also be
covered later.
Such a &ul tiF step pla! o! a si!#le table is o!e that co&bi!es &ul ti pl e
i!de(es to retrieve data. ;p to the!, o!l y o!e i!de( per table ca! be used.
;." 2able -pace -cans and 1JO
Si!#le step. 2he pla! table (pla!Ltable 'ill have a colu&! ACC3SS2:63
'i th value R (ACC3SS2:63 I R )or short.
E:ample ;.". !. 2able -pace -can -tep . 4ook throu#h all ro's i! table
to a!s'er query, &aybe because there is !o i!de( that 'ill help.
Assu&e i! DB" a! e&ployees table 'ith "//,/// ro's, each ro' o) "//
bytes, each $ PByte pa#e Hust E/[ )ull. 2hus "-// usable pa#es, 1$
ro'sNp#. 0eed C384("//,///N1$ I 1$,"-, pa#es.
Co!sider the query:
select eid, e!a&e )ro& e&ployees 'here socsec!o I 11**+*1E.C
F1--F
8) there is !o i!de( o! socsec!o, o!ly 'ay to a!s'er query is by readi!# i!
all ro's o) table. (Stupid !ot to have a! i!de( i) this query occurs 'i th a!y
)reque!cy at allQ
8! 2able Space Sca!, &i#ht !ot stop 'he! )i!d proper ro', si!ce statistics
)or a !o!F i!de(ed colu&! &i#ht !ot k!o' socsec!o is u!ique.
2here)ore have to read all pa#es i! table i! )ro& disk, a!d C1S2
8N1
(64A0
I 1$"-, 8N1s. Does this &ea! 1$"-, ra!do& 8N1sM 7aybe !ot.
But i) 'e assu&e ra!do& 8N1, the! at -/ 8N1s per seco!d, !eed about
1$"-,N-/ I 1E-., seco!ds, a bit u!der three &i!utes. 2his do&i !ates C6;
by a lar#e )actor a!d 'ould predict elapsed ti &e quite 'ell.
%o&e'ork $, !o!F dotted 3(ercises throu#h 3(ercise .... 2his is due 'he!
'e )i!ish Sectio! ..,..
%ssumptions about 1JO
<e are !o' #oi!# to talk about 8N1 assu&pti o!s a bit. >irst, there &i#ht be
6ARA44348S7 i! per)or&i !# ra!do& 8N1s )ro& disk.
2he pa#es o) a table &i#ht be striped across several di))ere!t disks, 'ith the
database syste& &aki!# requests i! parallel )or a si!#le query to keep all
the disk ar&s busy. See >i#ure ..+, p#. +$"
<he! the )irst editio! ca&e out, it 'as rare )or &ost DB7S syste&s to &ake
&ul ti pl e requests at o!ce (a )or& o) parallelis&, !o' itGs co&&o!.
DB" has a special )or& o) seque!ti al pre)etch !o' 'here it stripes *" pa#es
at a ti &e o! &ul ti pl e disks, requests the& all at o!ce.
<hile parallelis& speeds up 212A4 8N1 per seco!d (e(peciall y i) thereGs o!y
o!e user process ru!!i!#, it does!G t reall y save a!y R3S1;RC3 C1S2.
8) it takes 1".+ &s (/./1"+ seco!ds to do a ra!do& 8N1, does!G t save
resources to do 1/ ra!do& 8N1s at o!ce o! 1/ di))ere!t disks.
Still have to &ake all the sa&e disk ar& &ove&e!ts, cost to re!t the disk
ar&s is the sa&e i) there is parallelis&: Hust spe!d &ore per seco!d.
<ill speed thi!#s up i) there are )e' queries ru!!i!#, )e'er tha! the !u&ber
o) disks, a!d there is e(tra C6; !ot utiliBed. 2he! ca! use &ore o) the disk
ar&s a!d C6;s 'i th this sort o) parallelis&.
6arallelis& sho's up best 'he! there is o!ly o!e query ru!!i !#Q
F1-.F
But i) there are lots o) queries co&pared to !u&ber o) disks a!d accessed
pa#es are ra!do&l y placed o! disks, probabl y keep all disk ar&s busy
already.
But thereGs a!other )actor operati !#. 2'o disk pa#es that are close to each
other o! o!e disk ca! be read )aster because thereGs a shorter seek ti &e.
Recall that the syste& tries to &ake e(te!ts co!ti #uous o! disk, so 8N1s i!
seque!ce are )aster. 2hus, a table that is &ade up o) a seque!ce o) (&ai!l y
co!ti#uous pa#es, o!e a)ter a!other 'i thi ! a track, 'ill take &uch less ti &e
to read i!.
8! )act it see&s 'e should be able to read i! successive pa#es at )ull
tra!s)er speed 'ould take about .//1"+ secs per pa#e.
;sed to be that by the ti &e the disk co!trol ler has read i! the pa#e to a
&e&ory bu))er a!d looked to see 'hat the !e(t pa#e request is, the pa#e
i&&edi atel y )ollo'i !# has already passed by u!der the head.
But !o' 'i th &ul ti pl e requests to the disk outsta!di !#, 'e reall y C1;4D #et
the disk ar& to read i! the !e(t disk pa#e i! seque!ce 'i thout a &iss.
A!other )actor supports this speedup: the typical disk co!troll er bu))ers a!
e!ti re track i! itGs &e&ory 'he!ever a disk pa#e is requested.
Reads i! 'hole track co!tai !i !# the disk pa#e, retur!s the pa#e requested,
the! i) later request is )or pa#e i! track does!G t have to access disk a#ai!.
So 'he! 'eG re readi!# i! pa#es o!e a)ter a!other o! disk, itGs like 'eG re
readi!# )ro& the disk a! e!tire track at a ti &e.
8N1 is about 230 2873S )aster )or disk pa#es i! seque!ce co&pared to
ra!do&l y place 8N1. (Accurate e!ou#h )or rule o) thu&b.
P42 ON 3O%/D: <e ca! do -// 8N1s per seco!d 'he! pa#es i! seque!ce
(S i!stead o) -/ )or ra!do&l y placed pa#es (R. Seque!ti al 8N1 takes
/.//1"+ secs i!stead o) /./1"+ secs )or ra!do& 8N1.
DB" Seque!ti al 6re)etch &akes this possible eve! i) tur! o)) bu))eri !# o!
disk ('hich actuall y hurts per)or&a!ce o) ra!do& 8N1, si!ce reads 'hole
track it does!G t !eed: adds /.//- sec to ra!do& 8N1 o) /./1"+ sec
8B7 puts a lot o) e))ort i!to &aki !# 8N1 requests seque!ti al l y i! a query pla!
to #ai! this 8N1 adva!ta#eQ
F1./F
E:ample ;.". ". 2able -pace -can $it -e=uenti al %d5ant age . 2he
1$"-,R o) 3(a&pl e ..".1 beco&es 1$"-,S (S )or Seque!ti al 6re)etch 8N1
i!stead o) Ra!do& 8N1. A!d 1$"-,S requires 1$"-,N-// I 1E.-, seco!ds
i!stead o) the 1E-., seco!ds o) 1$""-,R. 0ote that this is a R3A4 C1S2
SA580GS, that 'e are actuall y usi!# the disk ar& )or a s&aller period.
Stripi!# reduces elapsed ti &e but !ot C1S2.
Cover idea o) 4ist 6re)etch. *" pa#es, !ot i! per)ect seque!ce, but rela tivel y
close to#ether. Di))icul t to predict ti &e.
<e use the rule o) thu&b that 4ist 6re)etch reads i! "// pa#es per seco!d .
See >i#ure ..1/, pa#e +$,, )or table.
6la! table ro' )or a! access step 'ill have 6R3>32C% I S )or seque!ti al
pre)etch, 6R3>32C% I 4 )or list pre)etch, 6R3>32C% I bla!k i) ra!do& 8N1.
See >i#ure ..1/. A!d o) course ACC3SS2:63 I R 'he! reall y Ra!do& 8N1.
0ote that seque!ti al pre)etch is Hust beco&i !# available o! ;089 database
syste&s. 1)te! Hust put a lot o) requests out i! parallel a!d depe!d o!
s&art 8N1 syste& to use ar& e))icie!tl y
F1.1F
Class !(.
3(a& 1.
Class !*.
;.' -imple 1nde:ed %ccess in D3".
8!de( helps e))icie!cy o) query pla!. 2here is a #read deal o) co&ple(i t y
here. Re&e&ber, 'e are !ot yet coveri!# queries 'ith Hoi!s: o!ly o!e table
i! >R17 clause a!d 01 subquery i! <%3R3 clause.
3(a&ples 'ith tables: 21, 2", . . , colu&!s C1, C", C*, . .
E:ample ;.'. ! . Assu&e i!de( C19 e(ists o! colu&! C1 o) table 21 (al'ays
a BFtree seco!dary i!de( i! DB". Co!sider:
select W )ro& 21 'here C1 I 1/C
2his is a .atching &nde* Scan. 8! pla! table: ACC3SS2:63 I 8, ACC3SS0A73
I C19, 7A2C%C14S I 1. (7A2C%C14S &i#ht be T1 i! &ul ti pl e colu&!
i!de(.
6er)or& &atchi !# i!de( sca! by 'alki!# do'! BFtree to 43>271S2 e!try o)
C19 'i th C1 I 1/. Retrieve ro' poi!ted to.
4oop throu#h e!tries at lea) level )ro& le)t to ri#ht u!til ru! out o) e! tries
'i th C1 I 1/. >or each such e!try, retrieve ro' poi!ted to. 0o assu&pti o!
about clusteri !# or !o!F clusteri !# o) ro's here.
8! 3(a&pl e ..*.", assu&e other restricti o!s i! <%3R3 clause, but &atchi !#
i!de( sca! used o! C19. 2he! other restricti o!s are validated as ro's are
accessed (ro' is qualifi ed: look at ro', check i) &atches restricti o!s.
0ot all predicates are inde*able . 8! DB", i!de(abl e predicate is o!e that ca!
be used i! a matchi ng inde* scan, i.e. a lookup that uses a co!ti #uous
sectio! o) a! i!de(. Covered i! )ull i! Sectio! ..+.
>or e(a&pl e, looki!# up 'ords i! the dictio!ary that start 'i th the letters
GpreG is a &atchi !# i!de( sca!. 4ooki!# up 'ords e!di!# 'i th Gtio!G is !ot.
DB" co!siders the predicate C1 UT 1/ to be !o!F i!de(able. 8t is !ot i&F
possible that a! i!de( 'ill be used i! a query 'ith this predicate:
select W )ro& 21 'here C1 UT 1/C
F1."F
But the statistics usuall y 'ei#h a#ai!st itGs use a!d so the query 'ill be
per)or&ed by a table space sca!. 7ore o! i!de(abl e predicates later.
1P, !o' 'hat about query:
select W )ro& 21 'here C1 I 1/ a!d C" bet'ee! 1// a!d "//
a!d C* like GA[G C
2hese three predicates are all i!de(able. 8) have o!ly C19, 'ill be like
previous e(a&pl e 'i th retrieved ro's restricted by tests o! other t'o
predicates.
8) have i!de( co&bi !(, created by:
create i!de( co&bi !( o! 21 (C1, C", C* . . .
<ill be able to li&i t ()ilter R8Ds o) ro's to retri eve &uch &ore co&pletel y
be)ore #oi!# to data. 4ike books i! a card catalo#, looki!# up
authorl !a&e I GAa&esG (c1 I 1/ a!d author)!a&e bet'ee! G%G a!d GPG
a!d title be#i!s 'ith letter GAG
>i!all y, 'e 'ill cover the questio! o) ho' to )il ter the R8Ds o) ro's to
retrieve i) 'e have three i!de(es, C19, C"9, a!d C*9. 2his is !ot si&ple .
See ho' to do this by taki!# out cards )or each i!de(, orderi !# by R8D, the!
&er#eF i!tersecti !#.
8t is a! i!teresti !# query opti &iBati o! proble& 'hether this is 'orth it.
1P, !o' so&e e(a&pl es o) si&ple i!de( sca!s.
E:ample ;.'. '. 1nde: -can -tep, 4ni=ue Matc . Co!ti!ui !# 'ith
3(a&ple ..".1, e&ployees table 'i th "//,/// ro's o) "// bytes a!d pct )ree
I */, so 1$ ro'sNp# a!d C384("//,///N1$ I1$,"-, data pa#es. Assu&e i!
i!de( o! eid, also have pct)ree I */, a!d eid]]R8D takes up 1/ bytes, so "-/
e!tri es per p#, a!d C384("//,///N"-/ I E1+ lea) level pa#es. 0e(t level up
C384(E1+N"-/ I *. Root !e(t level up. <rite o! board:
e&ployees table: 1$,"-, data pa#es
i!de( o! eid, eid(: E1+ lea) !odes, * level " !odes, 1 root !ode.
0o' query: select e!a&e )ro& e&ployees 'here eid I G1"./1AG C
Root, o! level " !ode, 1 lea) !ode, 1 data pa#e. See&s like $R. But 'hat
about bu))ered pa#esM 7i5e minute rule says should purchase e!ou#h
F1.*F
&e&ory so pa#es re)ere!ced &ore )reque!tl y tha! about o!ce every 1"/
seco!ds (popular pa#es should stay i! &e&ory. Assu&e 'e have do!e
this. 8) 'orkload assu&es 1 query per seco!d o) this o! e!a&e 'ith eid I
predi cate (!o others o! this table, the! lea) !odes a!d data pa#es !ot
bu))ered, but upper !odes o) eid( are. So reall y "R is cost o) query.
2his @uery 6la! is a si!#le step, 'i th ACC3SS2:63 I 8, ACC3SS0A73 I eid(,
7A2C%C14S I 1.
F1.$F
Class !).
1P !o' 'e i!troduce a !e' table called prospects. Based o! direct &ail
applicati o!s (Hu!k &ail. 6eople )ill out 'arre!t y cards, !a&e hobbies,
salary ra!#e, address, etc.
+/7 ro's o) $// bytes each. >;44 data pa#es (pct)ree I / a!d o! all i! F
de(es: 1/ ro's o! $ PByte pa#e, so +7 data pa#es.
prospects table: +7 data pa#es
0o': create i!de( addr( o! prospects (Bipcode, city, straddr cluster . . .C
Bipcode is i!te#er or $ bytes, city requires 1" bytes, straddr "/ bytes, R8D $
bytes, a!d assu&e 01 duplicate values so !o co&pressio!.
2hus each e!try requires $/ bytes, a!d 'e ca! )it 1// o! a $ PByte pa#e.
<ith +/7 total e!tries, that &ea!s +//,/// lea) pa#es. +/// directory
!odes at level ". +/ level * !ode pa#es. 2he! root pa#e. >our levels.
Also assu&e a !o!clusteri !# hobby( i!de( o! hobbies, 1// disti!ct hobbies
(. . . cards, chess, coi! collecti !#, . . .. <e say CARD(hobby I 1//.
(4ike 'e say CARD(Bipcode I 1//,///. 0ot all possible i!te#er Bipcodes
ca! be used, but )or si&plici ty say they are.
Duplicate co&pressio! o! hobby(, each key (- bytesM a&ortiBed over "++
R8DS (&oreM, so ca! )it .-$ R8Ds (or &ore per $ PByte pa#e, call it 1///.
2hus 1/// e!tries per lea) pa#e. <ith +/7 e!tri es, have +/,/// lea) pa#es.
2he! +/ !odes at level ". 2he! root.
i!de( o! eid, eid(: E1+ lea) !odes, * level " !odes, 1 root !ode.
prospects table addr: inde: obby: inde:
+/,///,/// ro's +//,/// lea) pa#es +/,/// lea) pa#es
+,///,/// data pa#es +,/// level * !odes 1+1 level " !odes
(1/ ro's per pa#e +/ level " !odes 1 root !ode
1 root !ode (1/// e!triesNlea)
CARD(Bipcode I 1//,/// CARD(hobby I1//
7igure ;.!" . So&e statistics )or the prospects table, pa#e ++"
E:ample ;.'. (. Matcing 1nde: -can -tep, 4nclustered Matc .
Co!sider the )ollo'i !# query:
F1.+F
select !a&e, straddr )ro& prospects 'here hobby I GchessGC
@uery opti &iBer assu&es each o) 1// hobbies equall y likel y (k!o's there
are 1// )ro& R;0S2A2S, so restricti o! cuts +/7 ro's do'! to +//,///.
<alk do'! hobby i!de( ("R )or directory !odes a!d across +//,/// e! tries
(1/// per pa#e so +// lea) pa#es, seque!ti al pre)eth so +//S.
>or every e!try, read i! ro' FF !o! clustered so all ra!do& choices out o) +7
data pa#es, +//,/// disti!ct 8N1s (!ot i! order, so R, +//,///R.
2otal 8N1 is +//S \ +//,//"R. 2i&e is +//N-// \ +//,//"N-/, about
+//,///N-/ I ,"+/ seco!ds. 1r about 1.E+ hours (" hrs I E"// secs.
Really o!l y picki!# up +//,/// disti !ct pa#es, 'ill lie o! less tha! +//,///
pa#es (out o) + 7. <ould this &ea! less tha! +//,/// R because bu))eri !#
keeps so&e pa#es arou!d )or doubleNtri pl e hitsM
53R: 2R858A4 3>>3C2Q %ours o) access, 1"/ seco!ds pa#es stay i! bu))er.
Ca! #e!erall y assu&e that upper level i!de( pa#es are bu))er reside!t (skip
"R but lea) level pa#es a!d &aybe o!e level up are !ot. Should calculate
i!de( ti &e a!d ca! the! i#!ore it i) i!si#!i )ica!t.
8) 'e used a table space sca! )or 3(a&pl e ..*.$, quali)yi !# ro's to e!sure
hobby I Gchess, ho' 'ould ti &e co&pare to 'hat 'e Hust calculatedM
Si&ple: +7 pa#es usi!# seque!ti al pre)etch, +,///,///N-// I ,"+ seco!ds.
(:es, C6; is still i#!ored ? i! )act is relati vel y i!si#!i)ica!t.
But this is the sa&e elapsed ti &e as )or i!de(ed access o) 1N1// o) ro'sQQ
:es, surprisi!#. But 1/ ro's per pa#e so about 1N1/ as &a!y pa#es hit, a!d
S is 1/ ti &es as )ast as R.
@uery opti &iBer co&pares these t'o approaches a!d chooses the )aster
o!e. <ould probabl y select 2able Space Sca! here But &i!or variati o! i!
CARD(hobby could &ake either pla! a better choice.
E:ample ;.'. *. Matcing 1nde: -can -tep, Clustered Matc .
Co!sider the )ollo'i !# query:
select !a&e, straddr )ro& prospects
'here Bipcode bet'ee! /"1+. a!d /*1+-C
F1.,F
Recall CARD(Bipcode I 1//,///. Ra!#e o) Bipcodes is 1///. 2here)ore, cut
!u&ber o) ro's do'! by a )actor o) 1N1//. SA73 AS ..*.$.
Bi##er i!de( e!tri es. <alk do'! to lea) level a!d 'alk across 1N1// o) lea)
level: +//,/// lea) pa#es, so +/// pa#es traversed. 8N1 o) +///S.
A!d data is clustered by i!de(, so 'alk across 1N1// o) +7 data pa#es,
+/,/// data pa#es, a!d theyG re i! seque!ce o! disk, so +/,///S.
Co&pared to 0o!&atchi !# i!de( sca! o) 3(a&ple ..*.$, 'alk across 1N1/ as
&a!y pa#es a!d do it 'ith S 8N1 i!stead o) R. 8#!ore directory 'alk.
2he! 8N1 cost is ++,///S, 'i th elapsed ti &e ++,///N+// I 1*E.+ seco!ds, a
bit over " &i!utes, co&pared 'i th 1.E+ hrs )or u!clustered i!de( sca!.
2he di))ere!ce bet'ee! 3(a&ples ..*.$ a!d ..*.+ does!G t sho' up i! the
64A0 table. %ave to look at ACC3SS0A73 I addr( a!d !ote that this i!de(
is clustered, (clusterrati o 'hereas ACC3SS0A73 I hobby( is !ot.
(1 Clusterrati o deter&i !es i) i!de( still clustered i! case ro's e(ist that
do!G t )ollo' clusteri !# rule. (8!serted 'he! !o space le)t o! pa#e.
(" 0ote that e!tri es i! addr( are $/ bytes, ro's o) prospects are $// bytes.
See&s !atural that +///S )or i!de(, +/,///S )or ro's.
6roperti es o) i!de(:
1. 8!de( has directory structure, ca! retri eve ra!#e o) values
". 8!de( e!tri es are A4<A:S clustered by values
*. 8!de( e!tri es are s&aller tha! the ro's.
E:ample ;.'. ) . Concatenat ed 1nde:, 1nde:K Only -can . Assu&e (Hust
)or this e(a&pl e a !e' i!de(, !addr(:
create i!de( !addr( o! prospects (Bipcode, city, straddr, !a&e
. . . cluster . . .C
0o' sa&e query as be)ore:
select !a&e, straddr )ro& prospects 'here Bipcode
bet'ee! /"1+. a!d /*1+-C
Ca! be a!s'ered i! 80D39 104: (because )i!d ra!#e o) Bipcodes a!d read
!a&e a!d straddr o)) co&po!e!ts o) i!de(: Sho' co&po!e!ts:
!addr( keyval ue: Bipcodeval.ci tyval .straddrval .!a&eval
F1.EF
2his is called a! 8!de( 1!ly sca!, a!d 'i th 3964A80 pla! table #ets !e'
colu&!: 80D39104: I : (ACC3SS2:63 I 8, ACC3SS0A73 I !addr(.
6revious pla!s had 80D39104: I 0.
(All these colu&!s al'ays reportedC 8 Hust &e!ti o! the& 'he! releva!t.
2i&eM Assu&e !addr( takes ,/ bytes i!stead o) $/ bytes, the! a&ou!t
read i! i!de(, i!stead o) +///S is E+//S, elapsed ti &e E+//N-// I ..$
seco!ds. Co&pare to ,".+ seco!ds 'i th 3(a&pl e ..*.+.
5aluable idea, 8!de( 1!ly. Select cou!t(W )ro& . . . is al'ays i!de( o!ly i)
i!de( ca! do i! a si!#le step at all, si!ce cou!t e!tries.
But ca!G t build i!de( o! the spur o) the &o&e!t. 8) do!G t have !eeded o!e
already, out o) luck. 3.#., co!sider query:
select !a&e, straddr, a#e )ro& prospects 'here Bipcode
bet'ee! /"1+. a!d /""+-C
0o' !addr( does!G t have all !eeded co&po!e!ts. 1ut o) luck.
8) try to )oresee all !eeded co&po!e!ts i! a! i!de(, esse!ti all y duplicat i!#
the ro's, a!d lose per)or&a!ce boost )ro& siBe.
8!de(es cost soði !#. Disk &edia cost (!ot co&&o!l y crucial. <ith i! F
serts or updates o) i!de(ed ro's, lot o) e(tra 8N1 (!ot co&&o!.
<ith readF o!ly, like prospects table, load ti &e i!creases. Still o)te! have
every col o) a readF o!ly table i!de(ed.
Capter ;.(. 7ilter 7actors and -tatistics
Recall, esti &ated probabili ty that a ra!do& ro' &ade so&e predicate true.
By statistics, deter&i !e the )racti o! (>>(pred o) ro's re trieved.
3.#., hobby colu&! has 1// values. Ge!erall y assu&e u!i)or& distri buti o!,
a!d #et: >>(hobby I co!st I 1N1// I ./1.
A!d Bipcode colu&! has 1//,/// values, >>(Bipcode I co!st I 1N1//,///.
>>(Bipcode bet'ee! /"1+. a!d /*1+- I 1///
.
(1N1//,/// I 1N1//.
%o' does the DB" query opti &iBer &ake these esti &atesM
DB" statistics.
F1.-F
-ee >i#ure ..1*, p#. ++-. A)ter use R;0S2A2S, these statistics are up to
date. (0e(t p#. o) these !otes 1ther statistics as 'ell, !ot covered.
D10G2 <R823 2%8S 10 B1ARD FF S33 80 B11P
Catalog
Name
-tatistic
Name
Defaul t
.alue
Descripti on
S:S2AB43S CARD
06AG3S
1/,///
C384(1\CARDN"/
0u&ber o) ro's i! the table
0u&ber o) data pa#es co!tai !i !# ro's
S:SC14;70S C14CARD
%8G%"P3:
41<"P3:
"+
!Na
!Na
0u&ber o) disti !ct values i! this colu&!
Seco!d hi#hest value i! this colu&!
Seco!d lo'est value i! this colu&!
S:S80D393S 043534S
043A>
>8RS2P3:F
CARD
>;44P3:F
CARD
C4;S23RF
RA281
/
CARDN*//
"+
"+
/[ i) C4;S23R3D I G0G
.+[ i) C4;S23R3D I
G:G
0u&ber o) 4evels o) the 8!de( BFtree
0u&ber o) lea) pa#es i! the 8!de( BFtree
0u&ber o) disti !ct values i! the )irst
colu&!, C1, o) this key
0u&ber o) disti !ct values i! the )ull
key, all co&po!e!ts: e.#. C1.C".C*
6erce!ta#e o) ro's o) the table that are
clustered by these i!de( values
7igure ;.!'. So&e Statistics #athered by R;0S2A2S used )or access pla!
deter&i !ati o!
Statistics #athered i!to DB" Catalo# 2ables !a&ed. Assu&e that i!de(
&i#ht be co&posi te, (C1, C", C*
Go over table. CARD, 06AG3S )or table. >or colu&!, C14CARD, %8G%"P3:,
41<"P3:. >or 8!de(es, 043534S, 043A>, >8RS2P3:CARD, >;44P3:CARD,
C4;S23RRA281. 3.#., )ro& >i#ure ..1", statistics )or prospects table (#ive!
o! pp. ++"F *. <rite these o! Board.
S:S2AB43S
0A73 CARD 06AG3S
. . . . . . . . .
prospects +/,///,/// +,///,///
. . . . . . . . .
S:SC14;70S
0A73 2B0A73 C14CARD %8G%"P3: 41<"P3:
. . . . . . . . . . . . . . .
hobby prospects 1// <i!es Bicycli !#
Bipcode prospects 1///// ....- ////1
. . . . . . . . . . . . . . .
S:S80D393S
0A73 2B0A73 043534S 043A> >8RS2P3:
CARD
>;44P3:
CARD
C4;S23R
RA281
. . . . . . . . . . . . . . . . . . . . .
addr( prospects $ +//,/// 1//,/// +/,///,/// 1//
F1..F
hobby( prospects * +/,/// 1// 1// /
. . . . . . . . . . . . . . . . . . . . .
C4;S23RRA281 is a &easure o) ho' 'ell the clusteri !# propert y holds )or a!
i!de(. ,ith /0 or more , 'ill use Seque!ti al 6re)etch i! retrievi !# ro's.
8!de(able 6redicates i! DB" a!d their >ilter >actors
Look at >i#ure ..1$, p#. +,/. @162 #uesses at >ilter >actor. 6roduct rule
assu&es i!depe!de!t distri buti o!s o) colu&!s. Still !o subquery predicate.
Predicate 2ype 7ilter 7actor Notes
Col I co!st 1NC14CARD OCol UT co!stO sa&e as O!ot (Col I co!stO
Col co!st
8!terpol ati o! )or&ul a
OO is a!y co&pariso! predicate other
tha! equali tyC a! e(a&pl e )ollo's
Col U co!st or
Col UI co!st
41<"P3: a!d %8G%"P3: are esti &ates )or
e(tre&e poi!ts o) the ra!#e o) Col values
Col bet'ee! co!st1
a!d co!st"
OCol !ot bet'ee! co!st1 a!d co!st"O sa&e
as O!ot (Col bet'ee! co!st1 a!d co!st"O
Col i! list (list siBeNC14CARD OCol !ot i! listO sa&e as O!ot (Col i! listO
Col is !ull 1NC14CARD OCol is !ot !ullO sa&e as O!ot(Col is !ullO
Col like Gpatter!G 8!terpol ati o! >or&ula Based o! the alphabet
6red1 a!d 6red"
>>(6red1
.
>>(6red"
As i! probabi l i t y
6red1 or 6red" >>(6red1 \>>(6red"
F>>(6red1
.
>>(6red"
As i! probabi l i t y
!ot 6red1 1 F >>(6red1 As i! probabi l i t y
7igure ;."I. 7ilter 7actor formul as for 5arious predicate types Class !+.
7atchi!# 8!de( Sca!s 'ith Co&posite 8!de(es
(>i!ish FT .., Class 1., ho&e'ork due Class "/ (<ed, April 1"
Assu&e !e' i!de( &ail (:
create i!de( &ail ( o! prospects (Bipcode, hobby, i!co&eclass, a#eC
012 clustered. colu&! i!co&eclass has 1/ disti!ct values, a#e has +/.
>;44P3:CARD(&ail ( could be as &uch as
CARD(Bipcode
.
CARD(hobby
.
CARD(i!co&eclass
.
CARD(a#e I
1//,///
.
1//
.
1/
.
+/ I 1,///,///,///.
Ca!G t be that &uch, o!l y +/,///,/// ro's, so assu&e >;44P3:CARD is
+/,///,///, 'i th !o duplicate ro's. (Actuall y, +/7 darts i! +G slots. About
1N1// o) slots hit, so o!l y about 1[ duplicate keyval ues.
F"//F
3!tries )or &ail ( have le!#th: $ (i!te#er Bipcode \ - (hobby \ "
(i!co&eclass \ " (a#e \ $ (R8D I "/ bytes. So "// e!tri es per pa#e.
043A> I +/,///,///N1// I +//,/// pa#es. 0e(t level up has +,/// !odes,
!e(t level +/, !e(t is root, so 043534S I $.
S:S80D393S
0A73 2B0A73 043534S 043A> >8RS2P3:
CARD
>;44P3:
CARD
C4;S23R
RA281
. . . . . . . . . . . . . . . . . . . . .
&ail ( prospects $ "+/,/// 1//,/// +/,///,/// /
. . . . . . . . . . . . . . . . . . . . .
E:ample ;.*. !. Concatenat ed 1nde:, Matcing 1nde: -can .
select !a&e straddr )ro& prospects 'here
Bipcode I /"1+. a!d hobby I GchessG a!d i!co&eclass I 1/C
7atchi !# 8!de( Sca! here &ea!s that the three predicates i! the <%3R3
clause &atch the 80828A4 colu&! i! co!cate!ated &ail ( i!de(.
Ar#ue that matchi ng &ea!s all e!tries to be retri eved are co!ti #uous i! the
i!de(.
>ull )ilter )actor )or three predicates #ive! is 1N1//,///
.
1N1//
.
1N1/ I
1N1//7, 'i th +/7 ro's, so o!l y /.+ ro's selected. /.+R
8!terpret this probabilisticall y, a!d e(pected ti &e )or retri evi !# ro's is o!l y
1N-/ seco!d. %ave to add i!de( 8N1 o) course. "R, ./+ sec.
E:ample ;.*. " . Concatenat ed 1nde:, Matcing inde: scan .
select !a&e straddr )ro& prospects
'here Bipcode bet'ee! /"1+. a!d /$1+-
a!d hobby I GchessG a!d i!co&eclass I 1/C
0o', i&porta!t. 0ot o!e co!ti #uous i!terval i! i!de(. 2here is o!e i!ter val
)or: B I /"1+. a!d h I GcG a!d i!c I 1/, a!d the! a!other )or B I /"1,/ a!d
h I GcG a!d i!c I 1/, a!d . . . But there is stu)) bet'ee! the&.
A!alo#y i! telepho!e directory: last !a&e bet'ee! GS&aG a!d GS&BG a!d
)irst !a&e GAoh!G. 4ot o) di rectory to look throu#h, !ot all &atches.
@uery opti &iBer here traverses )ro& le)t&ost B I /"1+. to ri#ht&ost B I
/$1+- a!d uses h I GcG a!d i!c I 1/ as scree!i !# predicates.
F"/1F
<e say the )irst predicate is a 7A2C%80G predicate (used )or cutti !# do'!
i!terval o) i!de( co!sidered a!d other t'o are SCR33080G predicates.
(2his 7A2C%80G predicate is 'hat 'e &ea! by 7atchi!# 8!de( Sca!.
So i!de( traversed is: ("///N1//,/// ()ilter )actor o) +//,/// lea) pa#es,
I 1/,/// lea) pa#es. @uery opti &iBer actuall y calculates >> as
(/$1+-F /"1+.(%8G%"P3:F41<"P3: I "///N(....-F ////1
I "//N....E or appro(i &atel y "///N1//,/// I 1N+/
%ave to look throu#h 1N+/
.
043A> I +/// pa#es, 8N1 cost is +,///S 'i th
elapsed ti &e: +,///N$// I 1".+ seco!ds.
%o' &a!y ro's retrievedM (1N+/(1N1//(1N1/ I (1N+/,/// 'i th +/7 ro's,
so 1/// ro's retrieved. Seque!ti al, classM
0o. 1///R, 'ith elapsed ti &e 1///N$/ I "+ seco!ds. 2otal elapsed ti &e is
*E.+ secs.
E:ample ;.*. '. Concatenat ed 1nde:, NonK Matci ng 1nde: -can .
select !a&e straddr )ro& prospects 'here
hobby I GchessG a!d i!co&eclass I 1/ a!d a#e I $/C
4ike sayi!# >irst !a&e I GAoh!G a!d City I G<altha&G a!d street I G7ai!G .
%ave to look throu#h 'hole i!de(, !o &atchi !# colu&!, o!l y scree!i!#
predicates.
Still #et s&all !u&ber o) ro's back, but have to look throu#h 'hole i!de(.
"+/,///S. 3lapsed ti &e "+/,///N$// I ,"+ seco!ds, about 1/.+ &i!utes.
0u&ber o) ro's retri eved: (1N1//(1N1/(1N+/(+/,///,/// I 1///. 1///R
I "+ seco!ds.
8! 64A0 2AB43, )or 3(a&ple ..+.", have ACC3SS2:63 I 8, ACC3SS0A73 I
&ail (, 7A2C%C14S I 1C 8! 3(a&pl e ..+.*, have 7A2C%C14S I /.
Defini tion ;.*. ! . 7atchi!# 8!de( Sca!. A pla! to e(ecute a query 'here at
least o!e i!de(able predicate &ust &atch the )irst colu&! o) a! i!de(
(k!o'! as &atchi !# predicate, &atchi !# i!de(. 7ay be &ore.
<hat is a! i!de(able predicateM 3qual &atch predicate is o!e: Col I co!st
See De)i!i ti o! ..+.*. 6#. +,+
F"/"F
Say have i!de( C1"*$9 o! table 2, co&posi te i!de( o! colu&!s (C1, C", C*,
C$. Co!sider )ollo'i !# co&pou!d predicates.
C1 I 1/ a!d C" I + a!d C* I "/ a!d C$ I "+ (&atches all colu&!s
C" I + a!d C* I "/ a!d C1 I 1/ (&atches )irst three: !eed!G t be i!
order
C" I + a!d C$ I "" a!d C1 I 1/ a!d C, I *+ (&atches )irst t'o
C" I + a!d C* I "/ a!d C$ I "+ (012 a &atchi !# i!de( sca!
Scree!i!# predicates are o!es that &atch !o!F leadi!# colu&!s i! i!de(.
3.#., i! )irst e(a&pl e all are &atchi !#, i! seco!d all are &atchi !#, i! third,
t'o are &atchi !#, o!e is scree!i!#, a!d o!e is !ot i! i!de(, i! )ourth all
three are scree!i!#.
>i!ish throu#h Sectio! .., by !e(t class. %o&e'ork due !e(t class
(<ed!esday a)ter 6atriotG s day. 0392 ho&e'ork is rest o) Chapter . !o!F
dotted e(ercises i) you 'a!t to 'ork ahead.
Defini tion ;.*. " . Basic Rules o) 7atchi!# 6redicates
(1 A matchi ng predicate &ust be a! inde*abl e predicate. See p#. +,/,
2able ..1$ )or a list o) i!de(abl e predicates.
(" 7atchi !# predicates &ust &atch successive colu&!s, C1, C", . . . o) a!
i!de(. 6rocedure: 4ook at i!de( colu&!s )ro& le)tF to ri#ht. 8) )i!d a
&atchi !# predicate )or this colu&!, the! this is a &atchi !# colu&!. As soo!
as colu&! )ails to be &atchi !# ter&i !ate the search.
8dea is that seque!ce o) &atchi !# predicates cuts do'! i!de( search to
s&aller co!ti#uous ra!#e. (1!e e(cepti o!: 8!Flist predicate, covered
shortl y.
(* A !o!F &atchi !# predicate i! a! i!de( sca! ca! still be a scree!i!#
predicate.
4ook at rule (1 a#ai!. 2his is actuall y a ki!d o) circular de)i!i ti o!. >or a
predicate to be matchi ng it &ust be inde*able a!d:
Defini tion ;.*. 'C A! inde*abl e predicate is o!e that ca! be used to &atch
a colu&! i! a &atchi !# i!de( sca!.
Calli!# such a predicate inde*abl e is co!)usi!#. 3ve! i) a predicate is !ot
i!de(able, the predicate ca! use the i!de( )or scree!i !#.
F"/*F
<ould be &uch better to call such predicates matchabl e , but this !o&e! F
clature is e&bedded i! the )ield )or !o'.
<he! P leadi!# colu&!s o) i!de( C1"*$9 are &atchi !# )or a query, 3964A80
i!to pla! table, #et ACC3SS2:63 I 8, ACC3SS0A73 I C1"*$9, 7A2C%C14S
I P. <he! !o!F &atchi !# i!de( sca!, 7A2C%C14S I /.
Recall 8!de(able 6redicates i! >i#ure .. 1$, p#. +,/, a!d relate to telepho!e
directory. Does the predicate #ive you a co!ti #uous ra!#eM
Col T co!stM bet'ee!M 8!Flist is special. like Gpatter!G 'i th !o leadi!# 'ild
cardM Col1 like Col"M (sa&e &iddle !a&e as street !a&e 6redicate a!dM
6redicate orM 6redicate !otM
1P, a )e' &ore rules o! ho' to deter&i !e &atchi !# predicates. 6a#e +,,,
Def ;.*. ( . 7atch cols. i! i!de( le)t to ri#ht u!til ru! out o) predicates. But
(* Stop at )irst ra!#e predicate (bet'ee!, U, T, UI, TI, like.
($ At &ost o!e 8!Flist predicate.
8!Flist is special because it is co!sidered a seque!ce o) equal &atchi !#
predicates that the query opti &iBer a#rees to brid#e i! the access pla!
C1 i! (,, -, 1/ a!d C" I + a!d C* I "/ is like C1 I , a!d . . .C
6la! )or C1 I - a!d . . .C the! C1 I 1/ a!d . . .C etc.
But the )ollo'i !# has o!ly t'o &atchi !# colu&!s si!ce o!l y o!e i!Flist ca!
be used.
C1 i! (,, -, 1/ a!d C" I + a!d C* i! ("/, */, $/
<he! 8!Flist is used, say ACC3SS2:63 I G0G.
E:ample ;.*. ( . 8! )ollo'i !# e(a&pl es, have i!de(es C1"*$9, C+,9,
;!ique i!de( CE9.
(1 select C1, C+, C- )ro& 2 'here C1 I + a!d C" I E a!d C* UT .C
ACC3SS2:63 I 8, ACC3SS0A73 I C1"*$9, 7A2C%C14S I "
(" select C1, C+, C- )ro& 2 'here C1 I + a!d C" TI E a!d C* I .C
C* predicate is i!de(abl e but stop at ra!#e predicate.
ACC3SS2:63 I 8, ACC3SS0A73 I C1"*$9, 7A2C%C14S I "
(* select C1, C+, C- )ro& 2
'here C1 I + a!d C" I E a!d C+ I - a!d C, I 1*C
F"/$F
<e i#!ore )or !o' the possibili ty o) co&bi !i !# &ul ti pl e i!de(es
0ote, 'e do!G t k!o' 'hat @162 'ill choose u!til 'e see pla! table ro'
ACC3SS2:63 I 8, ACC3SS0A73 I C+,9, 7A2C%C14S I "
($ select C1, C$ )ro& 2
'here C1 I 1/ a!d C" i! (+, , a!d (C* I 1/ or C$ I 11C
C1 a!d C" predicates are &atchi !#. 2he OorO operator does!G t #ive
i!de(able predicate, but 'ould be used as scree!i !# predicate (!ot
&e!ti o!ed i! pla! table, but all predicates used to )ilter, o!es that
e(ist i! i!de( certai !l y 'ould be used 'he! possible. ACC3SS2:63 I G0G,
ACC3SS0A73 I C1"*$9, 7A2C%C14S I " Also: 80D39104: I G:G
(+ select C1, C+, C- )ro& 2 'here C1 I + a!d C" I E a!d CE I 1/1C
ACC3SS2:63 I 8, ACC3SS0A73 I CE9, 7A2C%C14S I 1
(Because u!ique &atch, but !othi !# said about this i! pla! table
(, select C1, C+, C- )ro& 2
'here C" I E a!d C* I 1/ a!d C$ I 1" a!d C+ I 1,C
<ill see ca!G t be &ul ti pl e i!de(. 3ither !o!F &atchi !# i! C1"*$9 or
&atchi !# o! C+,9. ACC3SS2:63 I 8, ACC3SS0A73 I C1"*$9,
7A2C%C14S I "
So&e Special 6redicates
6atter! 7atch Search. 4eadi!# 'ildcards !ot i!de(abl e.
OC1 like Gpatter!G O 'ith a leadi!# G[G i! patter! (or leadi!# GLGM like looki!# i!
dictio!ary )or all 'ord e!di!# i! Gtio!G . 0o!F&atchi !# sca! (!o!F i!de( able
predicate.
3(ist dictio!aries that i!de( by back'ard spelli!#, a!d DBA ca! use this
trick: look )or 'ord 'i th &atch: Back'ards I G!oi t[G
3(pressio!s. ;se i! predicate &akes !ot i!de(abl e.
select W )ro& 2 'here "WC1 UI +,C
DB" does!G t do al#ebra. :ou ca! reF'ri te: 'here C1 UI "-C
0ever i!de(abl e i) t'o di))ere!t colu&!s are used i! predicate: C1 I C".
1!eF>etch Access. Select &i!N&a( . . .
select &i!(C1 )ro& 2C
F"/+F
4ook at i!de( C1"*$9, le)t&ost value, read o)) value o) C1. Say have i!de(
C1"D*9 o! 2 (C1, C" D3SC, C*. 3ach o) )ollo'i !# qs has o!eF )etch access.
select &i!(C1 )ro& 2 'here C1 T +C (012 obviousl y +
select &i!(C1 )ro& 2 'here C1 bet'ee! + a!d ,C
select &a((C" )ro& 2 'here C1 I +C
select &a((C" )ro& 2 'here C1 I + a!d C" U */C
select &i!(C* )ro& 2 'here C1 I , a!d C" I "/ a!d C* bet'ee! , a!d .C
F"/,F
Class !,.
;.) Mul ti pl e 1nde: %ccess
Assu&e i!de( C19 o! (C1, C"9 o! (C", C*$+9 o! (C*, C$, C+, query:
(..,.1 select W )ro& 2 'here C1 I "/ a!d C" I + a!d C* I 11C
By 'hat 'eG ve see! up to !o', 'ould have to choose o!e o) these i!de(es.
2he! o!ly o!e o) these predicates could be &atched.
1ther t'o predicates are !ot eve! scree!i !# predicates. Do!G t appear i!
i!de(, so &ust retrieve ro's a!d validate predicates )ro& that.
B;2 i) o!l y use >> o) o!e predicate, terri bl e i!e))icie!cy &ay occur. Say 2
has 1//,///,/// ro's, >> )or each o) these predicates is 1N1//.
2he! a)ter appl yi !# o!e predicate, #et 1,///,/// ro's retrieved. 8) !o!e o)
i!de(es are clustered, 'ill take elapsed ti &e o): 1,///,///N$/ I "+,///
seco!ds, or !earl y seve! hours.
8) so&eho' 'e could co&bi !e the )ilter )actors, #et co&posi te )ilter )actor o)
(1N1//(1N1//(1N1// I 1N1,///,///, o!l y retrieve 1// ro's.
2rick. 7ulti pl e i!de( access. >or each predicate, &atchi !# o! di))ere!t
i!de(, e(tract R8D list. Sort it by R8D value.
(Dra' a picture F three e(tracts )ro& lea) o) 'ed#e i!to lists
8!tersect (A0D all R8D lists (6icture (easy i! sorted order. Result is list o)
R8Ds )or a!s'er ro's. ;se 4ist pre)etch to read i! pa#es. (6icture
2his is our )irst &ul ti pl e step access pla!.
See >i#ure ..1+. 6ut o! board.
5 0ote
2N%ME %CCE--20PE M%2C#COL- %CCE--N%ME P/E7E2C# M18OP-EA
2 7 / 4 /
2 79 1 C19 S 1
2 79 1 C"9 S "
2 79 1 C*$+9 S *
2 78 / $
2 78 / +
7igure ;.!* 6la! table ro's o) a 7ulti pl e 8!de( Access pla! )or @uery (..,.1
Ro' 7, start 7ulti pl e i!de( access. %appe!s at the e!d, a)ter 8!tersect
Dia#ra& steps i! picture. R8D lists placed i! R8D 6ool i! &e&ory.
F"/EF
0ote 6la! acts like reverse polish calculati o!: push R8D lists as created by
79 stepC 'ith 78, pop t'o a!d i!tersect , push resul t back o! stack.
79 steps require reads )ro& i!de(. 78 steps require !o 8N1: already i!
&e&ory, &e&ory area k!o'! as R8D 6ool.
>i!al ro' access uses 4ist pre)etch, pro#ra& disk ar&s )or &ost e))icie!t
path to bri!# i! *" pa#e blocks that are !ot co!ti #uous, but seque!ti al l y
listed. 7ost e))icie!t disk ar& &ove&e!ts.
(Re&e&ber that a! R8D co!sists o) (pa#eL!u&ber, slotL!u&ber, so i) R8Ds
are placed i! asce!di!# order, ca! Hust read o)) successive pa#e !u&bers.
Speed o) 8N1 depe!ds o! dista!ce apart. Rule o) thu&b )or Ra!do& 8N1 is
$/Nsec, Seq. pre)etch $//Nsec, 4ist pre)etch is 1//Nsec.
F"/-F
%ave &e!ti o!ed ACC3SS2:63s I 7, 79, 78. 1!e other type )or &ul ti pl e
i!de( access, k!o'! as 7; (to take the ;!io! (1R o) t'o R8D lists. 3.#.
(.. ,." select W )ro& 2 'here C1 I "/ a!d (C" I + or C* I 11C
%ere is @uery 6la! (>i#ure .. 1,:
2N%ME %CCE--20PE M%2C#COL- %CCE--N%ME P/E7E2C# M18OP-EA
2 7 / 4 /
2 79 1 C19 S 1
2 79 1 C"9 S "
2 79 1 C*$+9 S *
2 7; / $
2 78 / +
7igure ;.!) 6la! table ro's o) a 7ulti pl e 8!de( Access pla! )or @uery (..,."
Actuall y, query opti &iBer 'ould!G t #e!erate three lists i! a ro' i) avoid able.
2ries !ot to have T " i! e(iste!ce (!ot al'ays possible.
>i#ure .. 1E. 1. 79 C"9, ". 79 C*$+9, *. 7;, $. 79 C19, +. 78
E:ample ;.). ! . Multipl e inde: access . prospects table, addr( i!de(,
hobby( i!de( (see >i#ure ..11, p +$$, i!de( o! a#e (a#e( a!d i!co&eclass
(i!co&e(. <hat is 043A> )or these t'o, classM (4ike hobby(: +/,///.
select !a&e, straddr )ro& prospects
'here Bipcode I /"1+. a!d hobby I GchessG a!d i!co&eclass I 1/C
Sa&e query as 3(a&ple ..+.1 'he! had Bhobi!ca#e i!de(, ti!y cost, "R i! F
de(, o!l y .+R )or ro'. But !o' have three di))ere!t i!de(es: 64A0.
2N%ME %CCE--20PE M%2C#COL- %CCE--N%ME P/E7E2C# M18OP-EA
2 7 / 4 /
2 79 1 hobby( S 1
2 79 1 addr( S "
2 78 / *
2 79 1 i!co&e( S $
2 78 / +
7igure ;.!, 6la! table ro's o) 7ulti pl e 8!de( Access pla! )or 3(a&pl e ..,.1
Calculate 8N1 cost. >>(hobby I GchessG I 1N1//, o! hobby( (043A> I
+/,///, +//S (a!d i#!ore directory 'alk.
>or 78916S3@ I ", >>(Bipcode I /"1+. I 1N1//,/// o! addr( (043A> I
+//,/// , so +S.
F"/.F
8!tersect steps 78 requires !o 8N1.
>>(i!co&eclass I 1/ I 1N1/, o! i!co&e( (043A> I +/,///, so +,///S.
Still e!d up 'ith /.+ ro's at the e!d by taki!# product o) >>s )or three
predicates: (1N1//,///(1N1//(1N1/ I 1N1//,///,///
1!ly +/,///,//// ro'sC i#!ore list pre)etch o) .+4.
So +,+/+, elapsed ti &e +,+/+N$// I 1*.- seco!ds.
List Prefetc and /1D Pool
All the thi!#s 'eG ve bee! &e!ti o!i !# up to !o' have bee! relati vel y u!i F
versal, but R8D list rules are quite product speci)ic. 6robabl y 'o!G t see all
this duplicated i! 1RAC43.
FT <e !eed R8D e(tracti o! to access the ro's 'i th 4ist 6re)etch , but si!ce
this is )aster tha! Ra!do& 8N1 could!G t 'e al'ays use R8D lists, eve! i!
si!#le i!de( sca! to #et at ro's 'i th 4ist 6re)etch i! the e!dM
:3S, B;2: 2here are restricti ve rules that &ake it di))icul t. 7ost o) these
rules arise because R8D list space is a scarce resource i! &e&ory.
FT 2he R8D 6ool is a separate &e&ory stora#e area that is +/[ the siBe o)
all )our Disk Bu))er 6ools, e(cept ca!!ot e(ceed "// 7Bytes.
2hus i) DBA sets aside "/// Disk Bu))er 6a#es, 'ill have 1/// pa#es (about
$ 7Bytes o) R8D 6ool. (Di))ere!t &e&ory area, thou#h.
Def. ;.). ! . /ules for /1D List 4se .
(1 @162, 'he! it co!structs query pla!, predicts siBe o) R8Ds acti ve at a!y
ti &e. Ca!!ot use T +/[ o) total capaci ty. 8) #uess 'ro!#, abort duri!#
ru!ti &e.
(" 0o Scree!i!# predicates ca! be used i! a! 8!de( sca! that e(tracts a
R8D 4ist.
(* A! 8!Flist predicate ca!!ot be used 'he! e(tract a R8D list.
E:ample ;.). '. /1D List -iMe Limit . addr(, hobby(, i!co&e(, add se((
o! colu&! se(. 2'o values, G7G a!d G>G, u!i)or&l y distri buted.
select !a&e, straddr )ro& prospects 'here Bipcode bet'ee!
F"1/F
/"1+. a!d /"*+- a!d i!co&eclass I 1/ a!d se( I G>GC
2ry &ul ti pl e i!de( access. 3(tract R8D lists )or each predicate a!d i!ter sect
the lists.
But co!sider, se(( has +/,/// lea) pa#es. (Sa&e as hobby(, )ro&
+/,///,/// R8D siBe e!tri es. 2here)ore se( I G>G e(tracts "+,/// pa#e R8D
list. $ Pbytes each pa#e, !earl y 1// 7Bytes.
Ca! o!ly use hal) o) R8D 6ool, so !eed "// 7Byte R8D 6ool.
0ot u!reaso!able. A proo) that 'e should use lar#e bu))ers.
E:ample ;.). *. No 1nde: -creening . Recall Bhobi!ca#e i!de( i!
3(a&ple ..*.-.
select !a&e, straddr )ro& prospects 'here Bipcode bet'ee! /"1+.
a!d /$1+- a!d hobby I GchessG a!d i!co&eclass I 1/C
2'o latter predicates 'ere used as scree!i!# predicates. But scree!i!#
predicates ca!G t be used i! 4ist 6re)etch. 2o do 4ist 6re)etch, &ust re solve
t'o latter predicates a)ter bri!# i! ro's.
2he GBipcode bet'ee!G predicate has a )ilter )actor o) "///N1//,/// I 1N+/,
so 'i th +/ 7 ro's, #et 17 ro's, R8D list siBe o) $ 7Bytes, 1/// bu))er pa#es,
at least "/// i! pool. Assu&e R8D pool is lar#e e!ou#h.
So ca! do 4ist 6re)etch, 1,///,///N1// I 1/,/// seco!ds, !earl y * hours.
8) use scree!i !# predicates, co&pou!d >> is (1N+/(1N1//(1N1/ I 1N+/,///.
3!d 'i th 1/// ro's. Ca!G t use 4ist 6re)etch, so 1///R but
1///N$/ is o!l y "+ seco!ds. Sa&e i!de( 8N1 cost i! both cases. Clearl y
better !ot to use R8D list.
<hat is reaso! !ot to allo' 4ist 6re)etch 'i th Scree!i!# 6redicatesM Aust
because resource is scarce, do!G t 'a!t to tie up R8D list space )or a lo!#
ti &e 'hile add !e' R8Ds slo'l y. 8) re)use scree!i!#, @162 &i#ht use other
pla!s 'ithout R8Ds.
Point of Diminising /eturns in M8 %ccess.
<a!t to use R8D lists i! 789. TT 2alk it
Def. ;.). " . (1 8! 'hat )ollo's, assu&e have &ul ti pl e i!de(es, each 'ith a
disHoi!t set o) &atchi !# predicates )ro& a query. <a!t to use R8D lists.
F"11F
(" <ay @162 sees it: 4ist R8D lists )ro& s&allest siBe up (by i!creasi!#
>ilter >actor. 2hus start 'i th s&aller i!de( 8N1 costs (&atchi !# o!ly a!d
lar#er e))ect i! savi!# data pa#e 8N1.
(* Ge!erate successive R8D lists a!d calculate costsC stop 'he! cost o)
#e!erati !# !e' R8D list does!G t save e!ou#h i! eve!tual data pa#e reads by
reduci!# R8D set.
E:ample ;.). ) .
select !a&e straddr )ro& prospects
'here prospects bet'ee! o"1+. a!d /",+.
a!d a#e I $/ a!d hobby I GchessG a!d i!co&eclass I 1/C
(1 >>(Bipcode bet'ee! /"1+. a!d /",+- I +//N1//,/// I 1N"//.
(" >>(hobby I GchessG I 1N1//
(* >>(a#e I $/ I 1N+/ (2ypo i! te(t
($ >>(i!co&eclass I 1/ I 1N1/
Appl y predicate 1. (1N"// (+/7 ro's retrieved, "+/,/// ro's, all o!
separate pa#es )ro& +7, a!d "+/,///4 is "+// seco!ds. 8#!ore the i!de(
cost.
Appl y predicate " a)ter 1. 4ea) pa#es o) hobby( sca!!ed are (1N1//
(+/,/// I +//S, taki!# +//N$// I 1."+ seco!ds. Reduce !u&ber o) ro's
retrieved )ro& "+/,/// to about "+//, all o! separate pa#es, a!d "+//4
takes about "+ seco!ds. So at cost o) 1."+ seco!ds o) i!de( 8N1, reduce
table pa#e 8N1 )ro& "+// seco!ds to "+ seco!ds. Clearl y 'orth it.
Appl y predicate * a)ter 1 a!d ". 4ea) pa#es o) a#e( sca!!ed are (1N+/
(+/,/// I 1///S, taki!# 1///N$// I ".+ seco!ds. Ro's retrieved do'! to
(1N+/ ("+// I +/, +/4 is .+ seco!ds. 2hus at cost o) ".+ seco!ds o) i!de(
8N1 reduce table pa#e 8N1 )ro& "+ seco!ds to /.+ seco!ds. <orth it.
<ith predicate $, !eed i!de( 8N1 at lea) level o) (1N1/ (+/,/// I +///S or
1".+ seco!ds. 0ot e!ou#h table pa#e 8N1 le)t (/.+ seco!ds to pay )or it.
Do!G t use predicate $. <e have reached poi!t o) di&i!ishi!# retur!s.
Capter !I
Class "I.
<e have e!cou!tered the idea o) a tra!sactio! be)ore i! 3&bedded S@4.
F"1"F
Def. !I. ! 2ransaction . A tra!sactio! is a &ea!s to packa#e to#ether a
!u&ber o) database operati o!s per)or&ed by a process, so the database
syste& ca! provi de several #uara!tees, called the AC8D properti es.
2hi!k o) 'ri ti !#: B3G80 2RA0SAC2810 op1 op" . . . op0 30D 2RA0SAC2810
2he! all ops 'i thi ! the tra!sactio! are packa#ed to#ether.
2here is !o actual B3G80 2RA0SAC2810 state&e!t i! S@4. A tra!sacti o! is
be#u! by a syste& 'he! there is !o!e i! pro#ress a!d the applicatio! )irst
per)or&s a! operati o! that accesses data: Select, 8!sert, ;pdate, etc.
2he applicati o! lo#ic ca! e!d a tra!sactio! success)ull y by e(ecuti !#:
e(ec sql co&&i t 'orkC NW called si&pl y a Commi t WN
2he! a!y updates per)or&ed by operati o!s i! the tra!sactio! are successF
)ull y co&pleted a!d &ade per&a!e!t a!d all locks o! data ite&s are re F
leased. Alter!ati vel y:
e(ec sql rollback 'orkC NW called a! Abort WN
&ea!s that the tra!sacti o! 'as u!success)ul: all updates are reversed, a!d
locks o! data ite&s are released.
2he AC8D #uara!tees are e(tre&el y i&porta!t FF 2his a!d S@4 is 'hat
di))ere!ti ates a database )ro& a )ile syste&.
8&a#i !e that you are tryi !# to do ba!ki!# applicati o!s o! the ;089 )ile
syste& ('hich has itGs o'! bu))ers, but !o tra!sactio!s. 2here 'ill be a
!u&ber o) proble&s, the ki!d that )aced database practi ti o!ers i! the +/s.
1. 1nconsistent result . 1ur applicati o! is tra!s)erri !# &o!ey )ro& o!e
accou!t to a!other (di))ere!t pa#es. 1!e accou!t bala!ce #ets out to disk
(ru! out o) bu))er space a!d the! the co&puter crashes.
<he! bri!# co&puter up a#ai!, have !o idea 'hat used to be i! &e&ory
bu))er, a!d o! disk 'e have destroyed &o!ey.
". Errors of concurrent e:ecution . (1!e ki!d: 8!co!siste!t A!alysis.
2eller 1 tra!s)ers &o!ey )ro& Acct A to Acct B o) the sa&e custo&er, 'hile
2eller " is per)or&i !# a credi t check by addi!# bala!ces o) A a!d B. 2eller "
ca! see A a)ter tra!s)er subtracted, B be)ore tra!s)er added.
F"1*F
*. 4ncertai nt y as to $en canges become permanent . At the very
least, 'e 'a!t to k!o' 'he! it is sa)e to ha!d out &o!ey: do!G t 'a!t to
)or#et 'e did it i) syste& crashes, the! o!ly data o! disk is sa)e.
<a!t this to happe! at tra!sacti o! co&&i t. A!d do!G t 'a!t to have to 'ri te
out all ro's i!vol ved i! tra!sacti o! (teller cash bala!ce FF very popular, 'e
bu))er it to save reads a!d 'a!t to save 'ri tes as 'ell.
2o solve these proble&s, syste&s a!al ysts ca&e up 'ith idea o) tra!sac tio!
()or&aliBed i! 1.E/s. %ere are AC8D #uara!tees:
%tomicity . 2he set o) record updates that are part o) a tra!sactio! are
i!di visible (either they all happe! or !o!e happe!. 2his is true eve! i!
prese!ce o) a crash (see Durabili ty, belo'.
Consistency . 8) all the i!di vi dual processes )ollo' certai ! rules (&o!ey is
!ei ther created !or destroyed a!d use tra!sactio!s ri#ht, the! the rules
'o!G t be broke! by a!y set o) tra!sactio!s acti!# to#ether. 8&plied by
8solatio!, belo'.
1 solation . 7ea!s that operati o!s o) di))ere!t tra!sactio!s see& !ot to be
i!terleaved i! ti &e FF as i) A44 operati o!s o) o!e 2( be)ore or a)ter all
operati o!s o) a!y other 2(.
Durabili ty . <he! the syste& retur!s to the lo#ic a)ter a Co&&i t <ork
state&e!t, it #uara!tees that all 2( ;pdates are o! disk. 0o' A27 &a chi!e
ca! ha!d out &o!ey.
2he syste& is ki!d o) clever about Durabili ty. 8t does!G t 'a!t to )orce all
updated disk pa#es out o) bu))er o!to disk 'i th each 2( Co&&i t.
So it 'ri tes a set o) !otes to itsel) o! disk (called lo#s. A)ter crash ru!
!ecovery (also called !estart a!d &akes sure !otes tra!slate i!to apF
propri ate updates.
<hat about ReadF 1!ly 2(M (0o data updates, o!l y Selects. Ato&ici ty a!d
Durabili t y have !o e))ect, but 8solatio! does.
7o!ey spe!t o! 2ra!sacti o!al syste&s today is about S89 B844810 D144ARS
A :3AR. <eGre bei!# ri#orous about so&e o) this )or a B;S803SS reaso!.
!I. ! 2ransactional #istories .
Reads a!d <rites o) data ite&s. A data ite& &i#ht be a ro' o) a table or it
&i#ht be a! i!de( e!try or set o) e!tri es. >or !o' talki !# about ro's.
F"1$F
Read a data ite& 'he! access it 'i thout cha!#i !# it. 1)te! a select.
select val i!to :p#&val 1 )ro& 21 'here u!iquei d I AC
<e 'ill 'ri te this as R
i
(A: tra!sactio! 'i th ide!ti )icati o! !u&ber i reads
data ite& A. Pi!d o) rou#h ? 'o!G t al'ays have be retri evi !# by u!iquei d
I A. But it &ea!s that 'e are readi!# a ro' ide!ti )i ed as A. 0o':
update 21 set val I p#&val " 'here u!iquei d I BC
'e 'ill 'ri te this as <
H
(BC 2( H 'ri tes BC say ;pdate resul ts i! <rite.
Ca! #et co&plicated. Really readi!# a! i!de( e!try as 'ell to 'ri te B.
Co!sider:
update 21 set val I val \ " 'here u!iqueid I BC
%ave to read a! i!de( e!try, R
H
(predicate: u!iquei d I B, the! a pair o) ro'
operati o!s: R
H
(B (have to read it )irst, the! update it <
H
(B. %ave to read
it i! this case be)ore ca! 'ri te it.
update 2 set val I val \ " 'here u!iquei d bet'ee! :lo' a!d :hi#hC
<ill resul t i! a lot o) operati o!s: R
H
(predicate: u!iquei d bet'ee! :lo' a!d
:hi#h, the! R
H
(B1 <
H
(B1 R
H
(B" <
H
(B" . . . R
H
(B0 <
H
(B0.
TT2he reaso! )or this !otati o! is that o)te! have to co!sider co&ple( i! ter F
leaved histories o) co!curre!t tra!sacti o!sC 3(a&ple history:
(1/.1." . . . R
"
(A <
"
(A R
1
(A R
1
(B R
"
(B <
"
(B C
1
C
"
. . .
0ote C
i
&ea!s co&&i t by 2( i. A seque!ce o) operati o!s like this is k!o'!
as a 1istory or so&eti &es a Schedule .
A history resul ts )ro& a series o) operati o!s sub&i t ted by users, tra!s lated
i!to R D < operati o!s at the level o) the Scheduler. See >i#. 1/.1 .
8t is the Hob o) the scheduler to look at the history o) operati o!s as it co&es
i! a!d provi de the 8solatio! #uara!tee , by so&eti &es delayi !# so&e
operati o!s, a!d occasio!all y i!sisti !# that so&e tra!sactio!s be aborted.
By this &ea!s it assures that the seque!ce o) operati o!s is equi vale!t i!
e))ect to so&e serial schedule (all ops o) a 2( are per)or&ed i! seque!ce
'i th !o i!terleavi !# 'i th other tra!sactio!s. See >i#ure 1/.1, p#. ,$/.
F"1+F
8! )act, (1/.1." above is a! 8443GA4 schedule. Because 'e ca! 2%80P o) a
situati o! 'here this seque!ce o) operati o!s #ives a! i!co!siste!t resul t.
3(a&ple 1/.1.1 . Say that the t'o ele&e!ts A a!d B i! (1/.1." are Acct
records 'i th each havi!# bala!ce +/ to be#i! 'i th. 8!co!siste!t A!alysis.
2
1
is addi!# up bala!ces o) t'o accou!ts, 2
"
is tra!s)erri !# */ u!i ts )ro& A
to B.
. . . R
"
(A, +/ <
"
(A, "/ R
1
(A, "/ R
1
(B, +/ R
"
(B, +/ <
"
(B, -/ C
1
C
"
. . .
A!d 2 deter&i !es that the custo&er )ails the credi t check (because u!der
bala!ce total o) -/, say.
But this could !ever have happe!ed i! a serial schedule, 'here all opera tio!
o) 2
"
occurred be)ore or a)ter all operati o!s o) 2
"
.
. . . R
"
(A, +/ <
"
(A, "/ R
"
(B, +/ <
"
(B, -/ C
"
R
1
(A, "/ R
1
(B, -/ C
"
. . .
or
. . . R
1
(A, +/ R
1
(B, +/ C
1
R
"
(A, +/ <
"
(A, "/ R
"
(B, +/ <
"
(B, -/ C
"
. . .
A!d i! both cases, 2
1
sees total o) 1//, a Co!siste!t 5ie'.
0otice 'e 8023R6R323D the Reads a!d <rites o) (1/.1." to create a &odel
o) 'hat 'as bei!# read a!d 'ri tte! to sho' there 'as a! i!co!siste!cy.
2his 'ould !ot be do!e by the Scheduler. 8t si&pl y )ollo's a !u&ber o)
rules 'e e(plai! shortl y. <e &ai!tai ! that a serial history is al'ays
co!siste!t u!der a!y i!terpretati o!.
!I. ". 1nterl ea5ed /eadJ ?ri t e Operati ons
@uickTT
8) a serial history is al'ays co!siste!t, 'hy do!G t 'e Hust e!)orce serial
histories.
2he scheduler could take the )irst operati o! that it e!cou!ters o) a #ive!
tra!sactio! (2
"
i! the above e(a&pl e a!d delay all ops o) other 2(s (the
Scheduler is allo'ed to do this u!til all operati o!s o) 2
"
are co&pleted a!d
the tra!sactio! co&&i ts (C
"
.
Reaso! 'e do!G t do thisM 6er)or&a!ce. 8t tur!s out that a! avera#e 2( has
relati vel y s&all C6; bursts a!d the! 8N1 duri!# 'hich C6; has !othi !# to do.
See >i# 1/.*, p#. ,$$. <he! 8N1 is co&plete, C6; ca! start up a#ai!.
F"1,F
<hat do 'e 'a!t to doM 4et a!other 2( ru! (call this a!other thread duri!#
slack C6; ti &e. (8!terleave. Does!G t help &uch i) have o!l y o!e disk (disk
is bottl e!eck. See >i# 1/.$, p#. ,$$.
But i) 'e have t'o disks i! use all the ti &e 'e #et about t'ice the
throu#hput. >i# 1/.+, p#. ,$+.
A!d i) 'e have &a!y disks i! use, 'e ca! keep the C6; 1//[ occupied. >i#
1/.,, p# ,$,.
8! actuali t y, everythi !# does!G t 'ork out per)ectl y eve!l y as i! >i# 1/.,.
%ave &ul ti pl e threads a!d &ul ti pl e disks, a!d like thro'i !# darts at slots.
2ry to have e!ou#h threads ru!!i !# to keep lots o) disk occupied so C6; is
./[ occupied. <he! o!e thread does a! 8N1, 'a!t to )i!d a!other thread
'i th co&pleted 8N1 ready to ru! a#ai!.
4eave this to you FF covered i! %o&e'ork.
!I. ' -erialiMabili ty and te Precedence >rap .
<e 'a!t to co&e up 'i th a set o) rules )or the Scheduler to allo' opera tio!s
by i!terl eaved tra!sacti o!s a!d #uara!tee Serializabili ty .
SerialiBabili t y &ea!s the series o) operati o!s is 3@;85A4302 to a Serial
schedule, 'here operati o!s o) 2( are !ot i!terleaved.
%o' ca! 'e #uara!tee thisM >irst !otice that i) t'o tra!sactio!s !ever
access the sa&e data ite&s, it does!G t &atter that theyG re i!terleaved .
<e ca! co&&ute ops i! the history o) requests per&i t ted by the scheduler
u!til all ops o) o!e 2( are to#ether (serial history. 2he operati o!s do!G t
a))ect each other, a!d order does!G t &atter.
<e say that the Scheduler is readi!# a history % (order operati o!s are
sub&i tted a!d is #oi!# to create a serialiBable history S(% (by delay, etc.
'here operati o!s ca! be co&&uted to a serial history.
1P, !o' i) 'e have operati o!s by t'o di))ere!t tra!sactio!s that do a) )ect
the sa&e data ite&, 'hat the!M
2here are o!l y )our possibili ti es: R or < by 2
1
)ollo'ed by R or < by 2
"
.
Co!sider history:
. . . R
1
(A . . . <
"
(A . . .
F"1EF
<ould it &atter i) the order 'ere reversedM :3S. Ca! easil y i&a#i !e a!
i!terpretati o! 'here 2
"
cha!#es data 2
1
reads: i) 2
1
reads it )irst, sees old
versio!, i) reads it a)ter 2
"
cha!#es it, sees later versio!.
<e use the !otati o!:
R
1
(A UU
%
<
"
(A
to &ea! that R
1
(A co&es be)ore <
"
(A i! %, a!d 'hat 'e have Hust !oticed
is that 'he!ever 'e have the orderi !# i! % 'e &ust also have:
R
1
(A UU
S(%
<
"
(A
2hat is, these ops &ust occur i! the sa&e order i! the serialiBable sched ule
put out by the Scheduler. 8) R
1
(A UU
%
<
"
(A the! R
1
(A UU
%
<
"
(A.
0o' these tra!sacti o! !u&bers are Hust arbi trari l y assi#!ed labels, so it is
clear 'e could have 'ri tte! the above as )ollo's:
8) R
"
(A UU
%
<
1
(A the! R
"
(A UU
%
<
1
(A.
%ere 2( 1 a!d 2( " have e(cha!#ed labels. 2his is a!other o!e o) the )our
cases. 0o' 'hat ca! 'e say about the )ollo'i !#M
R
1
(A UU
%
R
"
(A
2his ca! be co&&uted FF reads ca! co&e i! a!y order si!ce they do!G t a) )ect
each other. 0ote that i) there is a third tra!sacti o!, 2
*
, 'here:
R
1
(A UU
%
<
*
(A UU
%
R
"
(A
2he! the reads ca!!ot be co&&uted (because 'e ca!!ot co&&ute either
o!e 'i th <
*
(A, but this is because o) applicati o! o) the earlier rules, !ot
depe!di !# o! the reads as they a))ect each other.
>i!all y, 'e co!sider:
<
1
(A UU
%
<
"
(A
A!d it should be clear that these t'o operati o!s ca!!ot co&&ute. 2he ul F
ti &ate outco&e o) the value o) A 'ould cha!#e. 2hat is:
8) <
1
(A UU
%
<
"
(A the! <
1
(A UU
S(%
<
"
(A
F"1-F
2o su&&ariBe our discussio!, 'e have De)i!i ti o! 1/.*.1, p#. ,+/.
De). 1/.*.1. 2'o operati o!s 9
i
(A a!d :
H
(B i! a history are said to conflict
(i.e., the order &atters i) a!d o!l y i) the )ollo'i !# three co!di tio!s hold:
(1 A B. 1peratio!s o! disti!ct data ite&s !ever co!)lict.
(" i a H. 1perati o!s co!)lict o!l y i) they are per)or&ed by di))ere!t 2(s.
(* 1!e o) the t'o operati o!s 9 or : is a 'ri te, <. (1ther ca! be R or <.
0ote i! co!!ectio! 'i th (" that t'o operati o!s o) the SA73 tra!sactio! also
ca!!ot be co&&uted i! a history, but !ot because they co!)lict. 8) the
scheduler delays the )irst, the seco!d o!e 'ill !ot be sub&i t ted.
F"1.F
Class "!.
<e have Hust de)i!ed the idea o) t'o co!)licti !# operati o!s. (RepeatM
<e shall !o' sho' ho' so&e histories ca! be sho'! !ot !ot to be serial F
iBable. 2he! 'e sho' that such histori es ca! be characteriBed by a! eas ily
ide!ti )i ed characteristi c i! ter&s o) co!)licti !# operati o!s.
2o sho' that a history is !ot serialiBable (SR, 'e use a! interpretati on o)
the history.
De). 1/.*.". A! i!terpretati o! o) a! arbi trary history % co!sists o) * parts.
(1 A descripti o! o) the purpose o) the lo#ic bei!# per)or&ed. (" Spec F
i)icati o! o) precise values )or data ite&s bei!# read a!d 'ri tte! i! the
history. (* A co!siste!cy rule, a propert y that is obviousl y preserved by
isolated tra!sactio!s o) the lo#ic de)i!ed i! (1.
3(. 1/.*.1. 3(a&ple 1/.*.1. %ere is a history, %1, 'e clai! is !ot SR.
%1 I R
"
(A <
"
(A R
1
(A R
1
(B R
"
(B <
"
(B C
1
C
"
%ere is a! i!terpretati o!. 2
1
is doi!# a credi t check, addi!# up the bala!ces
o) A a!d B. 2 is tra!s)erri !# &o!ey )ro& A to B. %ere is the co!siste!cy
rule: 0either tra!sactio! creates or destroys &o!ey. 5alues )or %1 are:
%1G I R
"
(A,+/ <
"
(A,"/ R
1
(A,"/ R
1
(B,+/ R
"
(B,+/ <
"
(B,E/ C
1
C
"
2he schedule %1 is !ot SR because %1G sho's a! inconsistent resul t2 su& o)
E/ )or bala!ces A a!d B, thou#h !o &o!ey 'as destroyed by 2
"
i! the
tra!s)er )ro& A to B. 2his could !ot have occurred i! a serial e(ecuti o!.
2he co!cept o) co!)licti !# operati o!s #ives us a direct 'ay to co!)i r& that
the history %1 is !ot SR. 0ote the seco!d a!d thi rd operati o!s o) %1,
<
"
(A a!d R
1
(A. Si!ce <(A co&es be)ore R(A i! %1, 'ri tte!:
<
"
(A UU
%1
R
1
(A
<e k!o' si!ce these operati o!s co!)lict that they &ust occur i! the sa&e
order i! a!y equival e!t serial history, S(%1, i.e.: <
"
(A UU
S(%1
R
1
(A
0o' i! a serial history, all operati o!s o) a tra!sactio! occur to#ether
2hus <
"
(A UU
S(%1
R
1
(A &ea!s that 2
"
UU
S(%1
2
1
, i.e. 2
"
occurs be)ore 2
1
i! a!y serial history S(%1 (there &i#ht be &ore tha! o!e.
F""/F
But !o' co!sider the )ourth a!d si(th operati o!s o) %1. <e have:
R
1
(B UU
%1
<
"
(B
Si!ce these operati o!s co!)lict, 'e also have R
1
(B UU
S(%1
<
"
(B
But this i&plies that 2
1
co&es be)ore 2
"
, 2
1
UU
S(%1
2
"
, i! a!y equival e!t
serial history %1. A!d this is at odds 'i th our previous co!clusio!.
8! a!y serial history S(%1, either 2
1
UU
S(%1
2
"
or 2
"
UU
S(%1
2
1
, !ot both.
Si!ce 'e co!clude )ro& e(a&i !i !# %1 that both occur, S(%1 &ust !ot reall y
e(ist. 2here)ore, %1 is !ot SR.
<e illustrate this tech!i que a )e' &ore ti &es, a!d the! prove a #e!eral
char acteriBati o! o) SR histori es i! ter&s o) co!)licti !# operati o!s.
3(. 1/.*.". Co!sider the history:
%" I R
1
(A R
"
(A <
1
(A <
"
(A C
1
C
"
<e #ive a i!terpretati o! o) this history as a paradi#& called a lost update .
Assu&e that A is a ba!k bala!ce starti !# 'i th the value 1// a!d 21 tries to
add $/ to the bala!ce at the sa&e ti &e that 2" tries to add +/.
%"G I R
1
(A, 1// R
"
(A, 1// <
1
(A, 1$/ <
"
(A, 1+/ C
1
C
"
Clearl y the )i!al resul t is 1+/, a!d 'e have lost the update 'here 'e added
$/. 2his could!G t happe! i! a serial schedule, so %1 is !o!F SR.
8! ter&s o) co!)licti !# operati o!s, !ote that operati o!s 1 a!d $ i&pl y that 21
UU
S(%"
2". But operati o!s " a!d * i&pl y that 2" UU
S(%"
21. 0o SR
schedule could have both these properti es, there)ore %" is !o!F SR.
By the 'ay, this e(a&pl e illustrates that a co!)licti !# pair o) the )or& R
1
(A .
. . <
"
(A does i!deed i&pose a! order o! the tra!sactio!s, 21 UU 2", i! a!y
equival e!t serial history.
%" has !o other types o) pairs that C1;4D co!)lict a!d &ake %" !o!F SR.
3( 1/.*.*. Co!sider the history:
%* I <
1
(A <
"
(A <
"
(B <
1
(B C
1
C
"
F""1F
2his e(a&pl e 'ill illustrate that a co!)licti !# pair <
1
(A . . . <
"
(A ca!
i&pose a! order o! the tra!sacti o!s 21 UU 2" i! a!y equi vale!t SR history.
;!dersta!d that these are 'hat are k!o'! as OBli!d <ritesO: there are !o
Reads at all i!vol ved i! the tra!sactio!s.
Assu&e the lo#ic o) the pro#ra& is that 21 a!d 2" are both &ea!t to Otop
upO the t'o accou!ts A a!d B, setti !# the su& o) the bala!ces to 1//.
21 does this by setti !# A a!d B both to +/, 2" does it by setti !# A to -/ a!d
B to "/. %ere is the resul t )or the i!terl eaved history %*.
%*G I <
1
(A, +/ <
"
(A, -/ <
"
(B, -/ <
1
(B, +/ C
1
C
"
Clearl y i! a!y serial e(ecuti o!, the resul t 'ould have A \ B I 1//. But 'i th
%*G the e!d value )or A is -/ a!d )or B is +/.
2o sho' that %* is !o!F SR by usi!# co!)licti !# operati o!s, !ote that op F
eratio!s 1 a!d " i&pl y 21 UU 2", a!d operati o!s * a!d $ that 2" UU 21.
See&i!#l y, the ar#u&e!t that a! i!terl eaved history % is !o!F SR see&s to
reduce to looki!# at co!)licti !# pairs o) operati o!s a!d keepi!# track o) the
order i! 'hich the tra!sactio!s 'ill occur i! a! equi vale!t S(%.
<he! there are t'o tra!sacti o!s, 21 a!d 2", 'e e(pect to )i!d i! a !o!F SR
schedule that 21 UU
S(%
2" a!d 2" UU
S(%
21, a! i&possibili ty.
8) 'e do!G t have such a! i&possibili ty arise )ro& co!)licti !# operati o!s i! a
history %, does that &ea! that % is SRM
A!d 'hat about histori es 'i th * or &ore tra!sactio!s i!vol vedM <8ll 'e ever
see soði !# i&possible other tha! 21 UU
S(%
2" a!d 2" UU
S(%
21M
<e start by de)i!i!# a 6recede!ce Graph. 2he idea here is that this allo's
us to track all co!)licti !# pairs o) operati o!s i! a history %.
De). 1/.*.*. 2he 6recede!ce Graph. A precede!ce #raph )or a history % is a
directed #raph de!oted by 6G(%.
2he vertices o) 6G(% correspo!d to the tra!sactio!s that have C1778223D
i! %: that is, tra!sactio! 2i 'here C e(ists as a! operati o! i! %.
A! ed#e 2i FT 2H e(ists i! 6G(% 'he!ever t'o co!)licti !# operati o!s 9
i
a!d
:
H
occur i! that order i! %. 2hus, 2i FT 2H should be i!ter preted to &ea! that
2i &ust precede 2H i! a!y equival e!t serial history S(%.
F"""F
<he!ever a pair o) operati o!s co!)lict i! % )or C1778223D tra!sactio!s, 'e
ca! dra' the correspo!di !# direct arc i! the 6recede!ce Graph, 6G(%.
2he reaso! u!co&&i t t ed tra!sacti o!s do!G t cou!t is that 'eG re tryi !# to
)i#ure out 'hat the scheduler ca! allo'. ;!co&&i t t ed tra!sactio!s ca!
al'ays be aborted, a!d the! it 'ill be as i) they did!G t happe!.
8t 'ould be u!)ai r to hold u!co&&i t ted tra!sactio!s a#ai!st the scheduler
by sayi!# the history is !o!F SR because o) the&.
2he e(a&pl es 'e have #ive! above o) i&possible co!di tio!s arisi!# )ro&
co!)licti !# operati o!s look like this:
-/ 0000000000000000000000; -<
=>000000000000000000000000(
1) course this is 'hat is called a circui t i! a directed #raph (a di#raph. 2his
su##ests other proble&s that could arise 'ith * or &ore 2(s.
8t should be clear that i) 6G(% has a circui t, there is !o 'ay to put the
tra!sactio!s i! serial order so 2i al'ays co&es be)or 2H )or all ed#es 2i FT
2H i! the circui t.
2hereG ll al'ays be o!e ed#e Opoi!ti !# back'ardO i! ti &e, a!d thatG s a co! F
tradicti o!, si!ce 2i FT 2H &ea!s 2i should co&e B3>1R3 2H i! S(%.
%o' do 'e &ake this i!tui ti o! ri#orousM A!d i) 6G(% does!G t have a cir cuit,
does that &ea! the history is SRM
2h&. 1/.*.$. 2e -erialiMabili ty 2eorem . A history % has a! equi vale!t
serial e(ecuti o! S(% i)) the precede!ce #raph 6G(% co!tai!s !o circui t.
6roo). 4eave o!l y i) )or e(ercises at e!d o) chapter. 8.e., 'ill sho' there
that a circui t i! 6G(% i&plies there is !o serial orderi !# o) tra!sactio!s.
%ere 'e prove that i) 6G(% co!tai!s !o circui t, there is a serial orderi !# o)
the tra!sactio!s so !o ed#e o) 6G(% ever poi!ts )ro& a later to a! ear lier
tra!sactio!.
Assu&e there are & tra!sactio!s i!vol ved, a!d label the& 21, 2", . . ., 2&.
<e are tryi !# to )i!d a reorderi !# o) the i!te#ers 1 to &, i(1, i(", . . ., i(&,
so that 2i(1, 2i(", . . ., 2i(& is the desired serial schedule.
F""*F
Assu&e a le&&a to prove later: 8! a!y directed #raph 'ith !o circui t there
is al'ays a verte( 'ith !o ed#e e!teri !# it.
1P, so 'e are assu&i !# 6G(% has !o circui t, a!d thus there is a verte(, or
tra!sactio!, 2k, 'ith !o ed#e e!teri !# it. <e choose 2k to be 2i(1.
0ote that si!ce 2i(1 has !o ed#e e!teri !# it, there is !o co!)lict i! % that
)orces so&e other tra!sactio! to co&e earlier.
(2his )its our i!tui ti o! per)ectl y. All other tra!sactio!s ca! be placed a)ter it
i! ti &e, a!d there 'o!G t be a! ed#e #oi!# back'ard i! ti &e.
0o' re&ove this verte(, 2i(1, )ro& 6G(% a!d all ed#es leavi !# it. Call the
resul ti !# #raph 6G
1
(%.
By the 4e&&a, there is !o' a verte( i! 6G
1
(% 'ith !o ed#e e!teri !# it.
Call that verte( 2i(".
(0ote that a! ed#e )ro& 2i(1 &i#ht e!ter 2i(", but that ed#e does!G t cou!t
because itGs bee! re&oved )ro& 6G
1
(%.
Co!ti!ue i! this )ashio!, re&ovi !# 2i(" a!d all itGs ed#es to )or& 6G
"
(%,
a!d so o!, choosi!# 2i(* )ro& 6G
"
(%, . . ., 2i(& )ro& 6G
&F 1
(%.
By co!structi o!, !o ed#e o) 6G(% 'ill ever poi!t back'ard i! the seque!ce
S(%, )ro& 2i(& to 2i(!, & T !.
2he al#ori th& 'e have used to deter&i !e this seque!ce is k!o'! as a
topological sort . 2his 'as a hiri!# questio! 8 sa' asked at 7icroso)t.
2he proo) is co&plete, a!d 'e !o' k!o' ho' to create a! equival e! SR
schedule )ro& a history 'hose precede!ce #raph has !o circui t.
4e&&a 1/.*.+. 8! a!y )i!i te directed acyclic #raph G there is al'ays a
verte( 'ith !o ed#es e!teri !# it.
6roo). Choose a!y verte( v1 )ro& G. 3ither this has the desired propert y, or
there is a! ed#e e!teri !# it )ro& a!other verte( v".
(2here &i#ht be several ed#es e!teri !# v1, but choose o!e.
0o' v" either has the desired propert y or there is a! ed#e e!teri !# it )ro&
verte( v*. <e co!ti !ue i! this 'ay, a!d either the seque!ce stops at so&e
verte( v&, or the seque!ce co!ti !ues )orever.
F""$F
8) the seque!ce stops at a verte( v&, thatG s because there is !o ed#e e! F
teri!# v&, a!d 'e have )ou!d the desired verte(.
But i) the seque!ce co!ti!ues )orever, si!ce this is a )i!i te #raph, soo!er or
later i! the seque!ce 'e 'ill have to have a repeated verte(.
Say that 'he! 'e add verte( v!, it is the sa&e verte( as so&e previousl y
&e!ti o!ed verte( i! the seque!ce, vi.
2he! there is a path )ro& v! FT v(!F 1 FT . . . v(i \1 FTvi , 'here vi v!. But
this is the de)i!i ti o! o) a circui t, 'hich 'e said 'as i&possible.
2here)ore the seque!ce had to ter&i !ate 'i th v& a!d that verte( 'as the
o!e desired 'i th !o ed#es e!teri !#.
F""+F
Class "".
!I. ( Locking Ensures -erialiMabil t y
See >i#. 1/.-, p#. ,/.. 27 passes o! calls such as )etch, select, i!sert,
delete, abortC Scheduler i!terprets the& as: R
i
(A, <
H
(B.
8t is the Hob o) the scheduler to &ake sure that !o !o!F SR schedules #et
past. 2his is !or&al l y do!e 'i th 3wo$ hase "oc'ing , or "64.
De). 1/.$.1. "64. 4ocks take! i! released )ollo'i !# three rules.
(1 Be)ore 2( i ca! read a data ite&, R
i
(A, scheduler atte&pts to Read 4ock
the ite& o! itGs behal), R4
i
(AC be)ore <
i
(A, try <rite 4ock, <4
i
(A.
(" 8) co!)licti !# lock o! ite& e(ists, requesti !# 2( &ust <A82. (Co! )lict i!#
locks correspo!di !# to co!)licti !# ops: t'o locks o! a data ite& co!)lict i)
they are atte&pt ed by di))ere!t 2(s a!d at least o!e o) the& is a <4.
(* 2here are t'o phases to locki!#, the #ro'i !# phase a!d the shri!ki !#
phase ('he! locks are released: R;
i
(AC 2he scheduler &ust e!sure that
ca!G t shri!k (drop a lock a!d the! #ro' a#ai! (take a !e' lock.
Rule (* i&plies ca! release locks be)ore Co&&i tC 7ore usual to release all
locks at o!ce o! Co&&i t, a!d 'e shall assu&e this i! 'hat )ollo's.
0ote that a tra!sacti o! ca! !ever co!)lict 'i th its o'! locks Q 8) 2i holds R4
o! A, ca! #et <4 so lo!# as !o other 2( holds a lock (&ust be R4.
A 2( 'i th a <4 does!G t !eed a R4 (<4 &ore po'er)ul tha! R4.
Clearl y locki!# is de)i!ed to #uara!tee that a circui t i! the 6recede!ce
Graph ca! !ever occur. 2he )irst 2( to lock a! ite& )orces a!y other 2( that
#ets to it seco!d to Oco&e laterO i! a!y SG.
But 'hat i) other 2( already holds a lock the )irst o!e !o' !eedsM 2his
'ould &ea! a circui t, but i! the <A82 rules o) locki!# it &ea!s 0382%3R 2(
CA0 353R G1 >1R<ARD AGA80. 2his is a D3AD41CP. 3(a&ple shortl y.
Side e))ect o) "64 is that Deadlocks ca! occur: <he! a deadlock occurs,
scheduler 'ill reco#!iBe it a!d )orce o!e o) the 2(s i!vol ved to Abort.
(0ote, there &i#ht be &ore tha! " 2(s i!vol ved i! a Deadlock.
3(. %ere is history !ot SR (3rror i! te(t: this is a varia!t o) 3(. 1/.$.1.
F"",F
%$ I R
1
(A R
"
(A <
"
(A R
"
(B <
"
(B R
1
(B C
1
C
"
Sa&e idea as 1/.*.1 'hy it is !o!F SR: 2" reads t'o bala!ces that start out
AI+/ a!d BI+/, 21 &oves */ )ro& A to B. 0o!FSR history because 21 sees
AI+/ a!d BI-/. 0o' try locki!# a!d releasi!# locks at co&&i t.
R4
1
(A R
1
(A R4
"
(A R
"
(A <4
"
(A (co!)licti !# lock held by 21 so 2" &ust
<A82 R4
1
(B R
1
(B C
1
(!o' 2" ca! #et <4
"
(A <
"
(A R4
"
(B R
"
(B <4
"
(B
<
"
(B C
"
<orks )i!e: 21 !o' sees AI+/, BI+/. Serial schedule, 21 the! 2".
But 'hat i) allo'ed to ;!lock a!d the! acquire &ore locks later. Get !o!F SR
schedule. Sho's !ecessity o) "64 Rule (*.
R4
1
(A R
1
(A R;
1
(A R4
"
(A R
"
(A <4
"
(A <
"
(A <;
"
(A R4
"
(B R
"
(B <4
"
(B
<
"
(B <;
"
(B R4
1
(B R
1
(B C
1
C
"
So %$ above is possible. But o!l y "64 rule broke! is that 21 a!d 2" u!lock
ro's, the! lock other ro's later.
2he <aitsF >or Graph. %o' scheduler checks i) deadlock occurs. 5ertices
are curre!tl y acti ve 2(s, Directed ed#s 2i FT 2H i)) 2i is 'ai ti !# )or a lock held
by 2H.
(0ote, &i#ht be 'ai ti !# )or lock held by several other 2(s. A!d possibl y #et
i! queue )or < lock behi!d others 'ho are also 'aiti !#. Dra' picture.
2he scheduler per)or&s lock operati o!s a!d i) 2( requi red to 'ait, dra's
!e' directed ed#es resul ti !#, the! checks )or circui t.
3( 1/.$.". %ere is schedule like %$ above, 'here 2" reverses order it
touches A a!d B (!o' touches B )irst, but sa&e e(a&pl e sho's !o!F SR.
%+ I R
1
(A R
"
(B <
"
(B R
"
(A <
"
(A R
1
(B C
1
C
"
4ocki!# resul t:
R4
1
(A R
1
(A R4
"
(B R
"
(B <4
"
(B <
"
(B R4
"
(A R
"
(A <4
"
(A (>ails: R41(A
held, 2" &ust <A82 )or 21 to co&plete a!d release locks R4
1
(B (>ails:
<4"(B held, 21 &ust 'ait )or 2" to co&plete: But this is a deadlockQ
Choose 2" as victi & (21 chose! i! te(t A" (!o' R4
1
(B 'ill succeed
R1(B C1 (start 2" over, retry, it #ets 2( !u&ber * R4
*
(B R
*
(B <4
*
(B
<
*
(B R4
*
(A R
*
(A <4
*
(A <*(A C*.
F""EF
4ocki!# serialiBed 21, the! 2" (retried as 2*.
2m. !I. (. ". 4ocki!# 2heore&. A history o) tra!sactio!al operati o!s that
)ollo's the "64 discipli!e is SR.
>irst, Lemma !I. (. ' . 8) % is a 4ocki!# 3(te!ded %istory that is "64 a!d the
ed#e 2i FT 2H is i! 6G(%, the! there &ust e(ist a data ite& D a!d t'o
co!)licti !# operati o!s 9i(D a!d :H(D such that 9;i(D UU% :4H(D.
Proof . Si!ce 2i FT 2H i! 6G(%, there &ust be t'o co!)licti !# ops 9i(D a!d
:H(D such that 9i(D UU% :H(D.
By the de)i!i ti o! o) "64, there &ust be locki!# a!d u!locki!# ops o! either
side o) both ops, e.#.: 94i(D UU% 9i(D UU% 9;i(D.
0o' bet'ee! the lock a!d u!lock )or 9i(D, the 9 lock is held by 2i a!d
si&ilarl y )or :H(D a!d 2H. Si!ce 9 a!d : co!)lict, the locks co!)lict a!d the
i!tervals ca!!ot overlap. 2hus, si!ce 9i(D UU% :H(D, 'e &ust have:
94i(D UU% 9i(D UU% 9;i(D UU% :4H(D UU% :H(D UU% :;H(D
A!d i! particular 9;i(D UU% :4H(D.
Proof of 2m. !I. (. " . <e 'a!t to sho' that every "64 history % is SR.
Assu&e i! co!tradicti o! that there is a cycle 21 FT 2" FT . . . FT 2! FT 21 i!
6G(%. By the 4e&&a, )or each pair 2k FT 2(k\1, there is a data ite& Dk
'here 9;k(Dk UU% :4(k\1(Dk. <e 'ri te this out as )ollo's:
1. 9;1(D1 UU% :4"(D1
". 9;"(D" UU% :4*(D"
. . .
!F1. 9;(!F 1(D(!F 1 UU% :4!(D(!F 1
!. 9;!(D! UU% :41(D! (0ote, 21 is 2(!\1 too.
But !o' have (i! 1. a! u!lock o) a data ite& by 21 be)ore (i! !. a lock o) a
data ite&. So !ot "64 a)ter all. Co!tradicti o!.
2here)ore % is "64 i&plies !o circui t i! the 6G(%, a!d thus % is SR.
0ote that !ot all SR schedules 'ould obey "64. 3.#., the )ollo'i !# is SR:
%E I R1(A <1(A R"(A R1(B <1(B R"(B C1 C"
F""-F
But it is !ot "64 (2" breaks throu#h locks held by 21. <e ca! opti &isti cally
allo' a 2( to break throu#h locks i! the hopes that a circui t 'o!G t
occur i! 6G(%. But &ost databases do!G t do that.
F"".F
Class "'.
!I. * Le5els of 1solation
2he idea o) 8solatio! 4evels, de)i!ed i! A0S8 S@4F.", is that people &i#ht
'a!t to #ai! &ore co!curre!cy, eve! at the e(pe!se o) i&per)ect isolatio!.
A paper by 2ay sho'ed that 'he! there is serious loss o) throu#hput due to
locki!#, it is #e!erall y !ot because o) deadlock aborts (havi !# to retry but
si&pl y because o) tra!sactio!s bei!# bloc'ed a!d havi!# to 'ai t.
Recall that the reaso! )or i!terleavi !# tra!sacti o! operati o!s, rather tha!
Hust i!sisti !# o! serial schedules, 'as so 'e could keep the C6; busy.
<e 'a!t there to al'ays be a !e' tra!sactio! to ru! 'he! the ru!!i!#
tra!sactio! did a! 8N1 'ai t.
But i) 'e assu&e that a lot o) tra!sacti o!s are 'aiti !# )or locks, 'e lose this.
2here &i#ht be o!ly o!e tra!sactio! ru!!i !# eve! i) 'e have "/ try i!# to
ru!. All but o!e o) the tra!sactio!s are i! a 'ai t chai!Q
So the idea is to be less strict about locki!# a!d let &ore tra!sacti o!s ru!.
2he proble& is that droppi !# proper "64 &i#ht cause S3R81;S errors i!
applicati o!s. But people S2844 do it.
2he idea behi!d A0S8 S@4F.. 8solatio! 4evels is to 'eake! ho' locks are
held. 4ocks are!G t al'ays take!, a!d eve! 'he! they are, &a!y locks are
released be)ore 312.
A!d &ore locks are take! a)ter so&e locks are released i! these sche&es.
0ot 2'oF 6hase, so !ot per)ect 8solatio!.
(0ote i! passi!#, that A0S8 S@4F." 'as ori#i!all y i!te!ded to de)i!e isolatio!
levels that did !ot require locki!#, but it has bee! sho'! that the de)i!i ti o!s
)ailed to do this. 2hus the locki!# i!terpretati o! is ri#ht.
De)i!e short$ term loc's to &ea! a lock is take! prior to the operati o! (R or
< a!d released 8773D8A234: A>23R<ARD. 2his is the o!l y alter!ati ve to
long$ term loc's, 'hich are held u!til 312.
2he! A0S8 S@4F." 8solatio! levels are de)i!ed as )ollo's (>i#. 1/.. FF so&e
di))ere!ce )ro& the te(t:
<rite locks o!
ro's o) a table
are lo!# ter&
Read 4ocks o!
ro's o) a table
are lo!# ter&
Read locks o!
predicates are
lo!# ter&
F"*/F
Read ;!co&&i t ted
(Dirty Reads
0A
(Read 1!ly
0o Read 4ocks
take! at all
0o Read 4ocks
take! at all
Read Co&&i tted :es 0o 0o
Repeatabl e Read :es :es 0o
SerialiBable :es :es :es
Note tat ?ri te Predicate Locks are taken and eld longK term in all
isolation le5els listed . <hat this &ea!s is e(plai !ed later.
8! Read ;!co&&i t ted (R;, !o Read locks are take!, thus ca! read data o!
'hich <rite lock e(ists (!othi !# to stop you i) do!G t have to <A82 )or R4.
2hus ca! read u!co&&i t t ed dataC it 'ill be 'ro!# i) 2( that cha!#ed it later
aborts. But R; is Hust to #et a S2A28S28CA4 idea o) sales duri!# the day
(say. C31 'a!ts to k!o' ballpark )i#ure FF 1P i) !ot e(act.
8! Read Co&&i t ted (RC, 'e take <rite locks a!d hold the& to 312, a!d
Read 4ocks o! ro's read a!d predicates a!d release i&&edi atel y. (Cover
predicates belo'.
6roble& that ca! arise is serious o!e, 4ost ;pdate (3(a&pl e 1/.*.":
...R
1
(A,1// R
"
(A,1// <
1
(A,1$/ <
"
(A,1+/ C
1
C
"
...
Si!ce R locks are released i&&edi atel y, !othi !# stops the later <rites, a!d
the i!cre&e!t o) $/ is over'ri t te! by a! i!cre&e!t o) +/, i!stead o) the t'o
i!cre&e!ts addi!# to #ive ./.
Call this the ScholarG s 4ost ;pdate A!o&al y (si!ce &a!y people say 4ost
;pdate o!ly happe!s at Read ;!co&&i t ted.
2his is 392R3734: serious, obviousl y, a!d a! e(a&pl e o) lost update i! S@4
is #ive! i! >i#ure 1/.1" (p#. ,,, )or a sli#htl y &ore restricti ve level: Cursor
Stabili ty. Applicati o!s that use RC &ust avoid this ki!d o) update.
8! >i#ure 1/.11, 'e see ho' to avoid this by doi!# the ;pdate i!di visibl y i! a
si!#le operati o!.
0ot all ;pdates ca! be do!e this 'ay, ho'ever, because o) co&ple( cases
'here the ro's to bue updated ca!!ot be deter&i !ed by a Boolea! search
co!di ti o!, or 'here the a&ou!t to update is !ot a si&ple )u!cti o!.
8t tur!s out the 8B7Gs Cursor Stabili ty #uara!tees a special lock 'ill be held
o! curre!t ro' u!der cursor, a!d at )irst it 'as thou#ht that A0S8 Read
Co&&i tted #uara!teed that, but it does !ot.
F"*1F
6robabl y &ost products i&ple&e!t a lock o! curre!t o) cursor ro', but there
is !o #uara!tee. 033D 21 23S2 i) #oi!# to depe!d o! this.
8! Repeatabl e Read 8solatio! , this is the isolati o! level that &ost people
thi!k is all that is &ea!d by "64. All data ite&s read a!d 'ri tte! have R4s
a!d <4s take! a!d held lo!#F ter&, u!til 312.
So 'hatG s 'ro!#M <hat ca! happe!M
3(a&ple 1/.+.*, p#. ,,,, 6ha!to& ;pdate A!o&al y.
R
1
(predicate: bra!chLid I GS>BayG R
1
(A1,1//.// R
1
(A", 1//.//
R
1
(A*,1//.// 8
"
(A$, bra!chLid I GS>BayG, bala!ce I1//.//
R
"
(bra!chLtotals, bra!chLid I S>Bay, *//.//
<
"
(bra!chLtotals, bra!chLid I S>Bay ,$//.// C
"
R
1
(bra!chLtotals, bra!chLid I S>Bay, $// (6ri!ts out error &essa#e C
1
21 is readi!# all the accou!ts 'i th bra!chLid I S>Bay a!d testi !# that the
su& o) bala!ces equals the bra!chLtotal )or that bra!ch (accou!ts a!d
bra!chLtotals are i! di))ere!t tables
A)ter 21 has #o!e throu#h the ro's o) accou!ts, 2" i!serts a!other ro' i!to
accou!ts 'i th bra!chLid I S>Bay (21 'ill !ot see this as itGs already sca!!ed
past the poi!t 'here it is i!serted, 2" the! updates the bra!chLtotal )or
S>Bay, a!d co&&i ts.
0o' 21, havi!# &issed the !e' accou!t, looks at the bra!chLtotal a!d sees
a! error.
2here is !o error reall y, Hust a !e' accou!t ro' that 21 did!G t see.
0ote that !obody is breaki !# a!y rules about data ite& locks. 2he i!sert by
2" holds a 'ri te lock o! a !e' accou!t ro' that 21 !ever read. 2" locks the
bra!chLtotal ro', but the! co&&i ts be)ore 21 tries to read it.
0o data ite& lock C1;4D help 'i th this proble&. But 'e have !o!F SR be F
havior !o!ethel ess.
2he solutio! is this: <he! 21 reads the predicate bra!chLid I S>Bay o!
accou!ts, it takes a Read lock 10 2%A2 6R3D8CA23, that is to say a Read
lock o! the S32 o) ro's to be retur!ed )ro& that Select state&e!t.
0o' 'he! 2" tries to 8!sert a !e' ro' i! accou!ts that 'ill cha!#e the set
o) ro's to be retur!ed )or S>Bay, it &ust take a <rite lock o! that predicate.
F"*"F
Clearl y this <rite lock a!d Read lock 'ill co!)lict. 2here)ore 2" 'ill have to
'ai t u!til 21 reaches 312 a!d releases all locks.
So the history o) 3(a&pl e 1/.+.* ca!G t happe!. (8! reali ty, use a type o)
locki!# called PeyF Ra!#e locki!# to #uara!tee predicate locks. Cover i!
Database 8&ple&e!tati o! course.
A0S8 Repeatable Read 8solatio! does!G t provi de 6redicate 4ocks, but A0S8
SerialiBable does.
0ote that 1racle does!G t ever per)or& predicate locks. 1racleGs
S3R8A48VAB43 isolati o! level uses a di))ere!t approach, based o! s!apshot
reads, that is beyo!d 'hat 'e ca! e(plai! i! this course.
!I. ) 2ransactional /eco5ery .
2he idea o) tra!sactio!al recovery is this.
7e&ory is O5olatileO, &ea!i!# that at u!schedul ed ti &es 'e 'ill lose
&e&ory co!te!ts (or beco&e usure o) the validi ty.
But a database tra!sactio!, i! order to 'ork o! data )ro& disk, &ust read it
i!to &e&ory bu))ers.
Re&e&ber that a tra!sacti o! is Oato&i cO, &ea!i !# that all update opera F
tio!a a tra!sacti o! per)or&s &ust either A44 succeed or A44 )ail.
8) 'e read t'o pa#es i!to &e&ory duri!# a tra!sactio! a!d update the&
both, 'e &i#ht (because o) bu))eri !# have o!e o) the pa#es #o back out to
disk be)ore 'e co&&i t.
<hat are 'e to do about this a)ter a crashM A! update has occurred to disk
'here the tra!sacti o! did !ot co&&i t. %o' do 'e put the old pa#e back i!
placeM
%o' do 'e eve! k!o' 'hat happe!edM 2hat 'e did!G t co&&i tM
A si&ilar proble& arises i) 'e have t'o pa#es i! &e&ory a!d a)ter co&&i t
'e &a!a#e to 'ri te o!e o) the pa#es back to disk, but !ot the other.
(8! )act, 'e al'ays atte&pt to &i!i &iBe disk 'ri tes )ro& popular bu))ers,
Hust as 'e &i!i &iBe disk reads.
%o' do 'e )i( it so that the pa#e that did!G t #et 'ri tte! out to disk #ets out
duri!# recoveryM
F"**F
2he a!s'er is that as a tra!sacti o! pro#resses 'e 'ri te !otes to ourselves
about 'hat cha!#es have bee! &ade to disk pa#es. <e e!sure that these
!otes #et out to disk to allo' us to correct a!y errors a)ter a crash.
2hese !otes are called Olo#sO, or Olo# e!tri esO. 2he lo# e!tries co!tai !
OBe)ore 8&a#esO a!d OA)ter 8&a#esO o) every update &ade by a 2ra!sacti o!.
8! recovery, 'e ca! back up a! update that should!G t have #otte! to disk
(the tra!sactio! did!G t co&&i t by appl yi !# a Be)ore 8&a#e.
Si&ilarl y, 'e ca! appl y A)ter 8&a#es to correct )or disk pa#es that should
have #otte! to disk (the tra!sactio! did co&&i t but !ever &ade it.
2here is a Olo# bu))erO i! &e&ory (qui te lo!#, a!d 'e 'ri te the lo# bu))er
out to the Olo# o! diskO every ti &e o!e o) )ollo'i !# eve!ts occur.
(1 2he lo# bu))er )ills up. <e 'ri te it to disk a!d &ea!'hi l e co!ti!ue )illi!#
a!other lo# bu))er 'i th lo#s. 2his is k!o'! as Odoubl e bu))eri !#O a!d saves
us )ro& havi!# to 'ai t u!til the disk 'ri te co&pletes.
(" So&e tra!sactio! co&&i ts. <e 'ri te the lo# bu))er, i!cludi!# all lo#s up
to the prese!t ti &e, be)ore 'e retur! )ro& co&&i t to the applicati o! (a!d
the applicati o! ha!ds out the &o!ey at the A27. 2his 'ay 'eG re sure 'e
'o!G t )or#et 'hat happe!ed.
3verythi !# else i! the !e(t )e' sectio!s is details: 'hat do the lo#s look like,
ho' does recovery take place, ho' ca! 'e speed up recovery, etc.
!I. + /eco5ery in DetailC Log 7ormats .
Co!sider the )ollo'i !# history %+ o) operati o!s as see! by the sched uler:
(1/.E.1 %+ I R
1
(A,+/ <
1
(A,"/ R
"
(C,1// <
"
(C,+/ C" R
1
(B,+/ <
1
(B,-/ C
1
Because o) bu))eri !#, so&e o) the updates sho'! here &i#ht !ot #et out to
disk as o) the seco!d co&&i t, C1. Assu&e the syste& crashes i&&edi atel y
a)ter. %o' do 'e recover all these lost updatesM
<hile the tra!sactio! 'as occurri !#, 'e 'rote out the )ollo'i !# lo#s as each
operati o! occurred (>i#ure 1/.1*, p#. ,E*.
OPE/%K LO> EN2/0 <<< LE%.E 4P ON 3O%/D <<<
21ON
F"*$F
R
1
(A,+/ (S, 1 ? Start tra!sacti o! 2
1
F !o lo# e!try is 'ri tte!
)or a Read operati o!, but this operati o! is the start o) 2
1
<
1
(A,"/ (<, 1, A, +/, "/ ? 2
1
<rite lo# )or update o) A.bala!ce.
2he value +/ is the 3efore 1mage (31 )or A.bala!ce
colu&! i! ro' A, "/ is the %fter 1mage (%1 )or A.bala!ce
R
"
(C,1// (S, ", a!other start tra!sacti o! lo# e!try.
<
"
(C,+/ (<, ", C, 1//, +/, a!other <rite lo# e!try.
C" (C, " ? Co&&i t 2
"
lo# e!try. (Write og Buffer to og
File .
R
1
(B,+/ 0o lo# e!try.
<
1
(B,-/ (<, 1, B, +/, -/
C
1
(C, 1 Co&&i t 2
1
(Write og Buffer to og File .
Assu&e that a Syste& Crash occurred i&&edi atel y a)ter the <
1
(B,-/ opF
eratio!.
2his &ea!s that the lo# e!try (<, 1, B, +/, -/ has bee! placed i! the lo#
bu))er, but the last poi!t at 'hich the lo# bu))er 'as 'ri tte! out to disk 'as
'i th the lo# e!try (C, "
2his is the )i!al lo# e!try 'e 'ill )i!d 'he! 'e be#i! to recover )ro& the
crash. Assu&e that the values out o! disk are A I "/ (the update to "/
dri)ted out to disk, B I +/ (update did!G t #et to disk, a!d C I 1// (sa&e.
8) you look care)ull y at the seque!ce, 'here 2" co&&i t t ed a!d 21 did!G t,
you 'ill see that the values should be: A I +/, B I +/, C I +/.
A)ter the crash, a co&&e!d is #ive! by the syste& operator that i!itiates
recovery. 2his is usuall y called the R3S2AR2 co&&a!d.
2he process o) recovery takes place i! t'o phases, !oll +ac' a!d !oll
4orward .. 2he Roll Back phase backs out updates by u!co&&i t t ed tra!s acF
tio!s a!d Roll >or'ard reapplies updates o) co&&i t ted tra!sactio!s.
8! Roll Back, the e!tri es i! the disk lo# are read back'ard to the be#i! !i!#,
Syste& Startup, 'he! A I +/, B I +/, a!d C I 1//.
F"*+F
8! Roll Back, the syste& &akes a list o) all tra!sactio!s that did a!d did !ot
co&&i t. 2his is used to decide 'hat #ets backed out a!d reapplied.
LO> EN2/0 /OLL 3%C6J/OLL 7O/?%/D %C21ON PE/7O/MED
1. (C, " 6ut 2
"
i!to OCo&&i t ted 4istO
". (<, ", C,1//,+/ Si!ce 2
"
is o! OCo&&i t ted 4istO, 'e do !othi !#.
*. (S, " 7ake a !ote that 2
"
is !o lo!#er OActi veO
$. (<, 1, A, +/, "/ 2ra!sactio! 2
1
has !ever co&&i t ted (itGs last
operati o! 'as a <rite. 2here)ore, the syste&
per)or&s ;0D1 o) this update by <riti !# the
Be)ore 8&a#e value (+/ i!to data ite& B.
6ut 2
1
i!to O;!co&&i t ted 4istO
+. (S, 1 7ake a !ote that 2
1
is !o lo!#er OActi veO. 0o'
that !o tra!sacti o!s 'ere acti ve, 'e ca! e!d the
R144 BACP phase.
/OLL 7O/?%/D
,. (S, 1 0o actio! requi red
E. (<, 1, A, +/, "/ 2
1
is ;!co&&i t t ed ? 0o actio! required
-. (S, " 0o actio! requi red
.. (<, ", C,1//,+/ Si!ce 2
"
is o! Co&&i tted 4ist, 'e R3D1 this
update by 'ri ti !# A)ter 8&a#e value (+/ i!to
data ite& C
1/ (C, " 0o actio! requi red
11 <e !ote that 'e have rolled )or'ard throu#h all
lo# e!tries a!d ter&i !ate Recovery.
0ote that at this poi!t, A I +/, B I +/, a!d C I +/.
>uarant ees 2at Needed Log Entries are on Disk
%o' could a proble& occur 'i th our ðod o) 'ri ti !# lo#s a!d recover i!#M
4ook at the history earlier a#ai! a!d thi!k 'hat 'ould happe! i) 'e e!ded
up 'i th B I -/ because the )i!al 'ri tte! value o) B #ot out to disk.
F"*,F
Si!ce 'e have bee! assu&i !# that the lo# (<, 1, B, +/, -/ did 012 #et out
to disk, 'e 'ould!G t be able to ;0D1 this update ('hich should !ot occur,
si!ce 21 did !ot co&&i t.
2his is a proble& that 'e solve 'i th a policy that ties the database bu))er
'ri tes to the 4o#. 2he policy is called ,rite$ Ahead "og 5,A"6 .
8t #uara!tees that !o bu))er dirty pa#e #ets 'ri tte! back to disk be)ore the
4o# that 'ould be able to ;0D1 it #ets 'ri tte! to the disk 4o# )ile.
1P, this 'ould solve the proble& o) ;0D1s. Do 'e ever have a proble&
'i th R3D1sM 0o, because 'e al'ays 'ri te the 4o# bu))er to the 4o# )ile as
part o) the Co&&i t. So 'eG re sa)e i! doi!# R3D1 )or co&&i t t ed 2(s.
2he te(t has a Oproo)O that recovery 'ill 'ork, #ive! these lo# )or&ats, the
R3S2AR2 procedure o) Roll BackNRoll >or'ard, a!d the <A4 policy.
F"*EF
Class "(.
!I. , Ceckpoints
8! the recovery process 'e Hust covered, 'e per)or&ed R144BACP to
System Startup 3ime , 'he! 'e assu&e that all data is valid.
<e assu&e that Syste& Startup occurs i! the &or!i !#, a!d database pro F
cessi!# co!ti !ues duri!# the day a!d everythi !# co&pletes at the e!d o)
day.
(2his is a questio!able assu&pti o! !o'adays, 'ith &a!y co&pa!i es !eed F
i!# to per)or& "$9E processi!#, "$ hours a day, E days a 'eek, 'ith !o
ti &e 'he! tra!sactio!s are #uara!teed to !ot be acti ve.
3ve! i) 'e have a! -Fhour day o) processi!#, ho'ever, 'e ca! ru! i!to real
proble&s recoveri !# a busy tra!sacti o!al syste& (heavy update throu#h F
put 'i th the approach 'eG ve outli!ed so )ar.
2he proble& is that it takes !earl y as &uch processi!# ti &e to R3C153R a
tra!sactio! as it did to ru! it i! the )irst place.
8) our syste& is strai!ed to the li&i t keepi!# up 'ith updates )ro& .:// A7
to +:// 67, a!d the syste& crashes at $:+., it 'ill take !earl y 38G%2 %1;RS
21 R3C153R.
2his is the reaso! )or checkpoi!ti !#. <he! a OCheckpoi!tO is take! at a
#ive! ti &e ($:*/ 67 this &akes it possible )or Recovery to li&i t the lo#s it
!eeds to R144BACP a!d R144>1R<ARD.
A si&ple type o) Checkpoi!t, a OCo&&i t Co!siste!t Checkpoi !t,O &erel y
duplicates the process o) shutti !# do'! )or the !i#ht, but the! tra!sac tio!s
start ri#ht up a#ai!.
2he proble& is that it &i#ht take &i!utes to take a Co&&i t Co!siste!t
Checkpoi !t, a!d duri!# that ti &e 01 03< 2RA0SAC2810S CA0 S2AR2 ;6.
>or this reaso!, database syste&s pro#ra&&ers have devised t'o other
&aHor checkpoi !ti !# sche&es that reduce the OhiccupO i! tra!sactio! pro F
cessi!# that occurs 'hile a checkpoi!t is bei!# per)or&ed.
2he Co&&i t Co!siste!t Checkpoi !t is i&proved o! by usi!# soði !#
called a OCache Co!siste!t Checkpoi !tO. 2he! a! eve! &ore co&plicated
checkpoi !t called a O>uBBy Checkpoi !tO i&proves the situati o! )urther.
So this is 'hat 'e 'ill cover !o', i! order (put o! board:
F"*-F
Commi t Consistent Ceckpoint
Cace Consistent Ceckpoint
7uMMy Ceckpoint
<e de)i!e the Checkpoi!t 6rocess. >ro& ti &e to ti &e, a Checkpoi!t is
tri##ered, probabl y by a ti &e si!ce last checkpoi!t syste& clock eve!t.
Def !I. ,. !. Commi t Consistent Ceckpoint steps . A)ter the Oper F
)or&i !# checkpoi!t stateO is e!tered, 'e have the )ollo'i !# rules.
(1 0o !e' tra!sactio!s ca! start u!til the checkpoi!t is co&plete.
(" Database operati o! processi!# co!ti!ues u!til all e(isti !# tra!sac F
tio!s Co&&i t, a!d all thei r lo# e!tri es are 'ri tte! to disk. (2hus 'e are
Commi t Consistent .
(* 2he! the curre!t lo# bu))er is 'ri tte! out to the lo# )ile, a!d a)ter this
the syste& e!sures that all dirty pa#es i! bu))ers have bee! 'ri t te! out
to disk.
($ <he! steps (1F (* have bee! per)or&ed, the syste& 'ri tes a special lo#
e!try, (CP62, to disk, a!d the Checkpoi !t is co&plete. h
8t should be clear that these steps are basicall y the sa&e o!es that 'ould be
per)or&ed to BR80G 2%3 S:S237 D1<0 )or the eve!i !#.
<e allo' tra!s actio!s i! pro#ress to )i!ish, but do!G t allo' !e' o!es, a!d
everythi !# i! volatil e &e&ory that re)lects a disk state is put out to disk.
As a &atter o) )act, the Disk 4o# >ile ca! !o' be e&pti ed. <e !eeded it
'hile 'e 'ere per)or&i !# the checkpoi !t i! case 'e crashed i! the &id dle,
but !o' 'e do!G t !eed it a!y lo!#er.
<e 'ill !ever !eed the 4o# >ile a#ai! to ;0D1 u!co&&i t t ed tra!sactio!s
that have data o! disk (there are !o such u!co&&i t ted tra!sactio!s or
R3D1 co&&i t t ed tra!sactio!s that are &issi!# updates o! disk (all up dates
have #o!e out to disk already.
>ro& this, it should be clear that 'e ca! &odi)y the Recovery approach 'e
have bee! talki !# about so that i!stead o) a R144BACP to the Be#i!!i !# o)
the 4o# >ile at Syste& Startup, 'e R144BACP to the 4AS2 C%3CP61802QQQ
8) 'e take a Checkpoi!t every )ive &i!utes, 'e 'ill !ever have to recover
&ore tha! )ive &i!utes o) lo##ed updates, so recovery 'ill be )ast.
F"*.F
2he proble& is that the Checkpoi !t 6rocess itsel) &i#ht !ot be very )ast.
0ote that 'e have to allo' all tra!sacti o!s i! pro#ress to co&plete be)ore
'e ca! per)or& successive steps. 8) all applicati o!s 'e have use very short
tra!sactio!s, there should be !o proble&.
Cace Consistent Ceckpoint
But 'hat i) so&e tra!sacti o!s take &ore tha! )ive &i!utes to e(ecuteM
2he! clearl y 'e ca!G t #uara!tee a Checkpoi !t every )ive &i!utesQQ Q
<orse, 'hile the checkpoi !t is #oi!# o! (a!d the last )e' tra!sacti o!s are
'i!di !# up !obody else ca! start a!y S%1R2 tra!sactio!s to read a! acF
cou!t bala!ce or &ake a deposi tQ
<e address this proble& 'i th soði !# called the OCache Co!siste!t
Checkpoi !tO. <ith this sche&e, tra!sacti o!s ca! co!ti !ue acti ve throu#h
the checkpoi !t. <e do!G t have to 'ai t )or the& all to )i!ish a!d co&&i t.
Defini tion !I. ,. ". Cace Consistent Ceckpoint procedure steps .
(1 0o !e' tra!sactio!s are per&i t ted to start.
(" 3(isti!# tra!sactio!s are !ot per&i t ted to start a!y !e' operati o!s.
(* 2he curre!t lo# bu))er is 'ri tte! out to disk, a!d a)ter this the syste&
e!sures that all dirty pa#es i! cache bu))ers have bee! 'ri tte! out to
disk. (2hus, 'e are OCacheO (i.e., Bu))er Co!siste!t o! disk.
($ >i!all y, a special lo# e!try, (CP62, 4ist is 'ri tte! out to disk, a!d the
Checkpoi !t is co&plete. 0123: this (CP62 lo# e!try co!tai !s a list o)
acti ve tra!sactio!s at the ti &e the Checkpoi!t occurs. h
2he recovery procedure usi!# Cache Co!siste!t Checkpoi!ts di))ers )ro&
Co&&i t Co!siste!t Checkpoi !t recovery i! a !u&ber o) 'ays.
E: !I. ,. ! Cache Co!siste!t Checkpoi!t Recovery.
Co!sider the history %+:
%+: R
1
(A, 1/ <
1
(A, 1 C
1
R
"
(A, 1 R
*
(B, " <
"
(A, * R
$
(C, + CP62
<
*
(B, $ C
*
R
$
(B, $ <
$
(C, , C
$
CRAS%
F"$/F
%ere is the series o) lo# e!try eve!ts resul ti !# )ro& this history. 2he last
o!e that #ets out to disk is the (C, * lo# e!try.
(S, 1 (<, 1, A, 1/, 1 (C, 1 (S " (S, * (<, ", A, 1, * (S, $
(CP62, (48S2 I 2
"
, 2
$
(<, *, B, ", $ (C, * (<, $, C, +, , (C, $
At the ti &e 'e take the Cache Co!siste!t Checkpoi !t, 'e 'ill have val ues
out o! disk: A I *, B I " , C I +. (2he dirty pa#e i! cache co!tai !i !# A
at checkpoi !t ti &e is 'ri tte! to disk.
Assu&e that !o other updates &ake it out to disk be)ore the crash, a!d
so the data ite& values re&ai ! the sa&e.
%ere is a dia#ra& o) the ti &e scale o) the various eve!ts. 2ra!sacti o! 2
k
be#i!s 'i th the (S, k lo#, a!d e!ds 'ith (C, k. WW43A53 10 B1ARDWW
Checkpoi!t Crash
21 ]FFFFFFF]
2
"
]FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
2
*
]FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF]
2
$
]FFFFFFFFF]
0e(t, 'e outli!e the actio!s take! i! recovery, starti !# 'ith R144 BACP.
/OLL 3%C6
1. (C, * 0ote 2
*
is a co&&i t t ed 2( i! acti ve list.
". (<, *, B, ", $ Co&&i tted tra!sactio!, 'ai t )or R144 >1R<ARD.
*. (CP62, (48S2 I 2
"
, 2
$
0ote acti ve tra!sactio!s 2
"
a!d 2
$
C
2%3S3 %A53 012 C1778223D (!o (C, " or
(C, $ lo#s have bee! e!cou!tered
$. (S, $ 4ist o) Active tra!sacti o!s !o' shorter: R2
"
, 2
*
S
+. (<, ", A, 1, * 0ot Co&&i t ted. ;0D1: A I 1
,. (S, * 4ist o) Active 2ra!sactio!s shorter.
E. (S, " 4ist o) Active 2ra!sactio!s e&pty. S216 R144BACP.
<ith a Cache Co!siste!t Checkpoi !t, 'he! R144 BACP e!cou!ters the
CP62 lo# e!try the syste& takes !ote o) the tra!sacti o!s that 'ere
acti ve, eve! thou#h 'e have !ever see! a!y operati o!s i! the lo# )ile.
<e !o' take our list o) acti ve tra!sacti o!s, re&ove those that 'e have
see! co&&i t t ed, a!d have a list o) tra!sactio!s 'hose updates 'e !eed
F"$1F
to ;0D1. Si!ce 2ra!sacti o!s ca! live throu#h Checkpoi!ts 'e have to #o
back 6R81R to the Checkpoi!t, 'hile ;0D1 steps &i#ht re&ai !.
<e co!ti !ue i! the R144 BACP phase u!til 'e co&plete all such ;0D1
actio!s. <e ca! be sure 'he! this happe!s because as 'e e! cou!ter (S,
k lo#s, rolli!# back'ard.
<he! all Active ;!co&&i t t ed 2
k
have bee! re&oved, the R144 BACP is
co&plete, eve! thou#h there &ay be &ore e!tries occurri !# earlier i! the
lo# )ile.
/OLL 7O/?%/D
-. (CP62, (48S2 I 2
"
, 2
*
Skip )or'ard i! lo# )ile to this e!try, start
a)ter this.
.. (<, $, C, +, , Roll >or'ard: C I ,.
1-. (C, $ 0o Actio!. 4ast e!try: R144 >1R<ARD is
co&plete.
8! starti !# the Roll >or'ard 6hase, 'e &erel y !eed to R3D1 all updates by
co&&i t ted tra!s actio!s that &i#ht !ot have #o!e out to disk.
<e ca! Hu&p )or'ard to the )irst opera tio! a)ter the Checkpoi!t, si!ce 'e
k!o' that all earlier updates 'ere )lushed )ro& bu))ers.
Roll >or'ard co!ti!ues to the e!d o) the Bu))er >ile. Recall that the val ues
o! disk at the ti &e o) the crash 'ere: A I *, B I " , C I +. At the e!d o)
Recovery, 'e have set A I 1 (Step +, a!d C I , (Step ..
<e still have B I $. A #la!ce at the ti &e scale )i#ure sho's us that 'e 'a!t
updates per)or&ed by 2
$
to be applied, a!d those by 2
"
a!d 2
*
to be backed
out. 2here 'ere !o 'ri tes per)or&ed by 2
$
that #ot out to disk, so 'e have
achieved 'hat is !ecessary )or recovery: A I 1, B I $, C I ,.
7uMMy Ceckpoint .
A proble& ca! still arise that &akes the Cache Co!siste!t Checkpoi !t a
&aHor hiccup i! 2ra!sacti o! 6rocessi!#.
0ote i! the procedure that 'e ca!G t let a!y Active 2ra!sacti o!s co!ti!ue, or
start a!y !e' o!es, u!til all bu))ers are 'ri tte! to disk. <hat i) there are a
412 o) Bu))ersM
F"$"F
So&e !e' &achi !es have several GBytes o) &e&oryQ 2hatG s probabl y
7i!utes o) 8N1, eve! i) 'e have a lot o) disks. D8SP 8N1 is S41<QQQ
1P, 'ith a >uBBy Checkpoi !t, each checkpoi !t, 'he! it co&pletes, &akes
the 6R3581;S checkpoi !t a valid place to stop R144BACP.
Defini tion !I. ,. '. 7uMMy Ceckpoint procedure steps .
(1 6rior to Checkpoi !t start, the re&ai !i !# pa#es that 'ere dirty at the
prior checkpoi!t 'ill be )orced out to disk (but the rate o) 'ri tes should
leave 8N1 capaci ty to support curre!t tra!sactio!s i! pro#ressC there is !o
critical hurry i! doi!# this.
(" 0o !e' tra!sactio!s are per&i t ted to start. 3(isti!# tra!sacti o!s are
!ot per&i t ted to start a!y !e' operati o!s.
(* 2he curre!t lo# bu))er is 'ri tte! out to disk 'i th a! appe!ded lo#
e!try, (CP62
0
, 4ist, as i! the Cache Co!siste!t Checkpoi!t procedure.
($ 2he set o) pa#es i! bu))er that have beco&e dirty si!ce the last
checkpoi !t lo#, CP62
0F1
, is !oted.
2his 'ill probabl y be acco&plished by special )la#s o! the Bu))er
directory. 2here is !o !eed )or this i!)or&ati o! to be &ade disk reside!t,
si!ce it 'ill be used o!ly to per)or& the !e(t checkpoi!t, !ot i! case o)
recovery. At this poi!t the Checkpoi !t is co&plete. h
As e(plai!ed above, the recovery procedure 'i th >uBBy Checkpoi!ts di))ers
)ro& the procedure 'i th Co&&i t Co!siste!t Checkpoi !ts o!l y that R144
>1R<ARD &ust start 'i th the )irst lo# e!try )ollo'i !# the S3C10D to last
checkpoi !t lo#. <e have ho&eo'ork o! this.
F"$*F
Class "*.
Covered last ti &e the various Checkpoi!ts: Co&&i t Co!siste!t, Cache
Co!siste!t, a!d >uBBy. A!y questio!sM
<hat reall y happe!s 'i th co&&ercial databases. ;sed to be all Co&&i t
Co!siste!t, !o' o)te! >uBBy.
Also used to be 53R: physical. B8 a!d A8 &ea!t physical copies o) the e!tire
6AG3. Still !eed to do this so&eti &es, but )or lo!#F ter& lo# ca! be &ore
Olo#ical O.
8!stead o) O%ere is the 'ay the pa#e looked a)ter this updateNbe)ore this
updateO, have: this update 'as ADD 1/ to Colu&! A o) ro' 'ith R8D 1"*$+
'i th versio! !u&ber 11"1.
5ersio! !u&ber is i&porta!t to keep updates ide&pote!t.
0ote that recovery is i!tert 'i !ed 'i th the type o) recovery. 8t does!G t do
a!y #ood to have ro'F level locki!# i) have pa#e level recovery.
21 cha!#es Ro' 1"*$+ colu&! A )ro& 1"* to 1"$, a!d lo# #ives 6AG3 B8
'i th A I 1"*, 21 has!G t co&&i t t ed yet.
2" cha!#es Ro' 1"*$E (sa&e pa#e colu&! B )ro& *"1 to *** a!d
Co&&i ts, lo# #ives Ro' 1"*$+ 'i th 1"$, Ro' 1"*$E colu&! B 'i th *** o!
pa#e A8.
2ra!sactio! 2" co&&i ts, 21 does!G t, the! have crash. <hat do 'e doM
6ut A8 o) 2" i! placeM Gives 'ro!# value to A. 6ut B8 o) 21 i! placeM <ro!#
value o) B.
6a#e level lo##i!# i&plies pa#e level locki!# is all 'e ca! do.
Sybase S@4 Server S2844 does!G t have ro'F level locki!#.
!I. ; Medi a /eco5ery
6roble& is that Disk ca! )ail. (0ot Hust stop ru!!i !#: head ca! score disk
sur)ace. %o' do 'e recoverM
>irst, 'e 'ri te our 4o# to 2<1 disk backups. 2ry to &ake sure t'o disks
have 8!depe!de!t >ailure 7odes (!ot sa&e co!trol ler, sa&e po'er suppl y.
F"$$F
<e say that stora#e that has t'o duplicates is called stable storage , as
co&pared to nonvolati l e storage )or a !or&al disk copy.
Be)ore Syste& Startup ru! BACP;6 (bulk copy o) disksNdatabases.
2he!, 'he! per)or& Recovery )ro& 7edia )ailure, put backup disk i! place
a!d ru! R144 >1R<ARD )ro& that disk, as i) )ro& startup i! the &or!i !#.
As i) !or&al recovery o! this disk e(cept that all pa#es o! this disk 'ere
53R: popular a!d !ever #ot out )ro& disk. Do!G t !eed Roll Back e(cept
co!ceptual l y.
RA8D Disks
Soði !# they have !o'adays is RA8D disks. RA8D sta!ds )or Redu!da!t
Arrays o) 8!e(pe!si ve Disks. 8!ve!ted at Berkeley. 2he 8!e(pe!si ve part
rather #ot lost 'he! this idea 'e!t co&&erci al.
2he si&plest ki!d, RA8D 1, &irrors <rites. Could have t'o disks that 'ri te
everythi !# to i! t'o copies. 3very ti &e 'e <rite, !eed " <rites.
So i) o!e disk lost, Hust use other disk. 6ut a!other bla!k disk i! )or &irror,
a!d 'hile operati !# 'ith !or&al Reads a!d <rites do BACP;6 to !e' disk
u!til have &irror.
2his approach saves the ti &e !eeded to do &edia recovery. A!d o) course
'orks )or 1S )iles, 'here there 8S !o &edia recovery.
:ou ca! buy these !o'. As co&ple( syste&s #et &ore a!d &ore disks, 'ill
eve!tual l y !eed RA8D. 2he &ore u!its there are o! a syste&, the &ore
)reque!t the )ailures.
0ote that &irrored <rites are ha!dled by co!troll er. Does!G t 'aste ti &e o)
Syste& to do " <rites.
But 'he! Read, ca! Read 382%3R C16:. ;se disk ar&s i!depe!de!tl y.
So 'e take t'ice as &uch &edia, a!d i) all 'e do is <rites, !eed t'ice as
&a!y disk ar&s to do the sa&e 'ork.
But i) all 'e do is Reads, #et i!depe!de!t disk ar& &ove&e!ts, so #et t'ice
as &a!y Reads too.
But i! order )or t'ice as &a!y Reads to 'ork, !eed 'ar& data, 'here disk
capacil ty is !ot the bottl e!eck, but disk ar& &ove&e!t is.
F"$+F
De)i!i tel y lose the capaci ty i! RA8D 1. But i) 'e 'ere o!l y #oi!# to use hal)
the capaci ty o) the disk because 'e have too 7a!y Reads, RA8D 1 is )i!e.
2here is a! alter!ati ve )or& o) RA8D, RA8D +, that uses less capaci ty tha!
&irrori !#. 2rick is to have , disks, + real copies o) pa#e, o!e checksu&.
;se 91R )or Checksu&. CP I D1 91R D" 91R D* 91R D$ 91R D+.
8) (say D1 disappears, ca! )i#ure out 'hat it 'as:
D1 I CP 91R D" 91R D* 91R D$ 91R D+
(6rove this: A I B 91R C, the! B I A 91R C a!d C I A 91R B.
1 I 1 91R / IT 1 I 1 91R / a!d / I 1 91R 1
1 I / 91R 1 IT / I 1 91R 1 a!d 1 I 1 91R /
/ I 1 91R 1 IT etc.
/ I / 91R /
So i) o!e disk drops out, keep accessi!# data o! it usi!# 91R o) other +.
Recover all disk pa#es o! disk i! sa&e 'ay. 2his takes a lot o) ti &e to
recover, but it D13S save disk &edia.
!I. !I 2PCK % 3encmark
2he 26CFA Be!ch&ark is !o' out o) date. 0e'er 26CFC Be!ch&ark: &ore
co&ple(.
See >i# 1/.1,, p#. ,-,. SiBe o) tables deter&i !ed by 26S.
See >i# 1/.1E. p#. ,-E. All threads do the sa&e thi!#. Ru! i!to each other
i! co!curre!cy co!trol because o) Bra!ch table a!d %istory table.
Be!ch&ark speci)ies ho' &a!y threads there are, ho' o)te! each thread
ru!s a 2(, costs o) ter&i !als, etc.
1! a #ood syste&, Hust add disks u!til use .+[ o) C6;. 1! a bad syste&,
ru! i!to bottl e!ecks.
;lti &ate &easure is 26S a!d YN26S
F"$,F