0% found this document useful (0 votes)
39 views6 pages

05 - A Unified OLAP or OLTP Big Data Processing Framework in Telecom Industry

This document discusses a unified framework for hybrid OLAP/OLTP big data processing. It describes how both analytical (OLAP) and transactional (OLTP) tasks are now required over large datasets in many practical big data scenarios, such as in cellular network planning systems. It compares the performance of SQL-on-Hadoop systems and MPP columnar databases for OLAP and OLTP workloads, and evaluates optimizations to improve OLTP performance on MPP columnar databases. Finally, it proposes a unified data processing framework to better support hybrid OLAP/OLTP big data applications.

Uploaded by

Sourabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views6 pages

05 - A Unified OLAP or OLTP Big Data Processing Framework in Telecom Industry

This document discusses a unified framework for hybrid OLAP/OLTP big data processing. It describes how both analytical (OLAP) and transactional (OLTP) tasks are now required over large datasets in many practical big data scenarios, such as in cellular network planning systems. It compares the performance of SQL-on-Hadoop systems and MPP columnar databases for OLAP and OLTP workloads, and evaluates optimizations to improve OLTP performance on MPP columnar databases. Finally, it proposes a unified data processing framework to better support hybrid OLAP/OLTP big data applications.

Uploaded by

Sourabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Unified OLAP/OLTP Big Data Processing

Framework in Telecom Industry


;LQ/X)HL6X+DR]KDQJ/LX:HLZHL&KHQ;LQJ]KRX&KHQJ

&KLQD8QLFRP1HWZRUN7HFKQRORJ\5HVHDUFK,QVWLWXWH%HLMLQJ&KLQD

%HVWULGHFRP*DWHKRXVH0HGLD9HQWXUHV%RVWRQ8QLWHG6WDWHV

^OX[VXIHLFKHQZZFKHQJ[]`#FKLQDXQLFRPFQ

GOLX#EHVWULGHFRP

Abstract—This paper studies the system performance of SQL- SUHVHQWDQRYHOXQLILHGGDWDSURFHVVLQJIUDPHZRUNWKDWH[FHOV


on-Hadoop technology and MPP technology in an OLAP and LQWKHK\EULG2/$32/73ELJGDWDDSSOLFDWLRQV
OLTP mixed scenario, which is becoming the most common big
data application scenario in telecom industry. Until recently,
most SQL-on-Hadoop systems only focus on the speed of data
,, 7+(2/$32/730,;('%,*'$7$6&(1$5,2
analysis but overlook the support for transactions. MPP %LJ GDWD WHFKQRORJ\ ZDV ILUVW LQWURGXFHG LQWR WKH WHOHFRP
columnar databases, on the other hand, are able to deal with both LQGXVWU\ IRU PDUNHWLQJ DQG GHFLVLRQVPDNLQJ SXUSRVHV $Q
analytical and transactional tasks, but tend to have poor H[DPSOH LV FXVWRPHU EHKDYLRU DQDO\VLV ZKLFK UHTXLUHV D IDVW
transactional performance due to the column-store mechanism. GDWDDJJUHJDWLRQDFURVVWLPHDQGJHRJUDSK\GLPHQVLRQV)URP
By adopting proper optimization methods, it is possible to D GDWDEDVH SHUVSHFWLYH VXFK ZRUNORDG LV FKDUDFWHUL]HG DV
increase transactional performance of MPP columnar databases. OLAP 2Q/LQH $QDO\WLFDO 3URFHVVLQJ  WDVNV +RZHYHU
Finally, a unified data processing framework is proposed for a WRGD\¶V ELJ GDWD WHFKQRORJ\ KDV JUHDWO\ HYROYHG WR D OHYHO
better support of hybrid OLAP/OLTP big data applications. ZKHUH JHQHUDO VWDII DUH EHJLQQLQJ WR XVH ELJ GDWD WR LQFUHDVH
SURGXFWLYLW\LQGDLO\URXWLQHZRUN7KHVHQHZDSSOLFDWLRQVQRW
Keywords—big data; SQL-on-Hadoop; MPP columnar
database; OLAP; OLTP
RQO\FRQWDLQKLJKDQDO\VLVZRUNORDGEXWDOVRLQYROYHLQKHDY\
OLTP 2Q/LQH7UDQVDFWLRQDO3URFHVVLQJ WDVNV([DPSOHVDUH
EDWFK LQVHUWV RI QHZO\ FRQVWUXFWHG URDG UHFRUGV LQWR D
, ,1752'8&7,21 JHRJUDSKLF LQIRUPDWLRQ WDEOH RU XSGDWHV RI D FHUWDLQ
5HFHQW \HDUV KDYH VHHQ DQ H[SORVLYH JURZWK RI ELJ GDWD FXVWRPHU¶VSHUVRQDOGDWDUHFRUG
DSSOLFDWLRQV LQ WHOHFRP LQGXVWU\ 8QOLNH WUDGLWLRQDO (53OLNH 7KLVSDSHULVLQVSLUHGE\WKHGHYHORSPHQWRIELJGDWDEDVHG
V\VWHPV ZKLFK DUH GHSOR\HG RYHU URZEDVHG WUDQVDFWLRQDO FHOOXODU QHWZRUNSODQQLQJ V\VWHP ZKLFK KDV D W\SLFDO K\EULG
GDWDEDVHV QHZ WHFKQRORJLHV DUH LQWURGXFHG WR DFKLHYH D KLJK 2/$32/73 VFHQDULR ,Q WKLV V\VWHP QHWZRUNSODQQLQJ
GDWD DJJUHJDWLRQ UDWH DQG UHDOWLPH UHVSRQVH RYHU ODUJHUVFDOH HQJLQHHUV VKRXOG EH DEOH WR DGG UHFRUGV LQVHUW DFWLRQV  RI
GDWDVHW 7%WR3%OHYHO +DGRRSEDVHGV\VWHPV +LYH,PSDOD QHZO\SODQQHGEDVHVWDWLRQVLQWRDSODQWDEOH7RHYDOXDWHWKH
>@6SDUN64/>@HWF DQG033FROXPQDUGDWDEDVHV 9HUWLFD YDOLGLW\DQGUDWLRQDOLW\RIDEDVHVWDWLRQLQWHUDFWLYHLQIRUPDWLRQ
>@ 6$3 +$1$ >@ HWF  DUH WKH WZR PRVW SURPLVLQJ RIFRQVWUXFWLQJSULRULWLHVVKRXOGEHJLYHQEDVHGRQDUHDOWLPH
WHFKQRORJLHV IRU HQWHUSULVHV VXFK DV WHOHFRP RSHUDWRUV WKDW KLVWRU\GDWDDQDO\VLVRQQHDUE\VWDWLRQV6XFKDQDO\VLVLQYROYH
REWDLQDODUJHDPRXQWRIGDWD LQ FDOFXODWLQJ WKH DYHUDJH QXPEHU RI QHDUE\ FXVWRPHUV WKH
:KLOHWRGD\¶VELJGDWDV\VWHPVSXWDJUHDWHPSKDVLVRQWKH DPRXQWRIYRLFHDQGGDWDWUDIILFWKH\FRQVXPHWKHVDWLVIDFWLRQ
RSWLPL]DWLRQ RI 2/$3 VW\OH ZRUNORDG WUDGLWLRQDO 2/73 VW\OH LQGH[RIWKHFXVWRPHUVHWF)LQDOO\FDOFXODWHGUHVXOWVQHHGWR
ZRUNORDGV DUH OHVV FRQVLGHUHG LQ HLWKHU +DGRRSEDVHG EHDXWRILOOHG XSGDWHDFWLRQV LQWRWDEOHDWWULEXWHVLQRUGHUWR
GDWDEDVHV RU 033 FROXPQDU GDWDEDVHV +RZHYHU LQ PDQ\ JHQHUDWH D ILQDO UHSRUW VXEPLWWLQJ WR WKH QH[W SURMHFW
SUDFWLFDOVFHQDULRVWUDQVDFWLRQDOWDVNVDUHZLGHO\GHVLUHGHYHQ PDQDJHPHQWV\VWHP
LQELJGDWDDSSOLFDWLRQV 7KLV VFHQDULR OHDGV WR D W\SLFDO GHVLJQ RI WKH ELJ GDWD
,Q WKLV SDSHU ZH ZLOO ILUVW GHVFULEH WKH ODWHVW ELJ GDWD SODWIRUPDKXJHYROXPHRIVWUXFWXUHGGDWDLVORDGHGDQGVWRUHG
DSSOLFDWLRQVFHQDULRLQZKLFKERWK2/73DQG2/$3RSHUDWLRQV LQWR D XQLILHG V\VWHP YDULRXV DSSOLFDWLRQV DUH UXQQLQJ DERYH
DUHUHTXLUHGRYHUODUJHGDWDVHWV'LIIHUHQFHVEHWZHHQWKH64/ DQG DQVZHULQJ GLIIHUHQW HQG XVHUV¶ UHTXHVWV 6RPH RI WKHP
RQ+DGRRS V\VWHPV DQG WKH 033 FROXPQDU GDWDEDVHV ZLOO EH VHQG 2/$3 WDVNV ZKLFK PD\ FRQVXPH HQRUPRXV FRPSXWLQJ
GLVFXVVHGDQGWKHLU2/$3DELOLWLHVDUHFRPSDUHG:HZLOOWKHQ UHVRXUFHV PHPRU\ &38 GLVN ,2 HWF  VRPH RI WKHP VHQG
HYDOXDWH WKH SHUIRUPDQFH RI 2/73 MREV RI 033 FROXPQDU 2/73LQWHQVH WDVNV ZKLFK UHTXLUH D FRQFXUUHQW WUDQVDFWLRQ
GDWDEDVHV 'LIIHUHQW PHFKDQLVPV EHKLQG WKHVH GDWDEDVHV ZLOO VXSSRUW
EHIXUWKHUVWXGLHGDQGRSWLPL]DWLRQVZLOODOVREHLQWURGXFHGWR
GHPRQVWUDWH WKDW LW LV SRVVLEOH WR IXUWKHU HQKDQFH 2/73
SHUIRUPDQFH RQ 033 FROXPQDU GDWDEDVHV )LQDOO\ ZH ZLOO

‹,((( 
,,, 2/$36833257 UXQQLQJ GDHPRQV RQ HYHU\ +')6 'DWD1RGH :KHQ DQ
&RPSDUHG WR WUDGLWLRQDO 5'%06 GDWDEDVHV WKH DELOLW\ RI DUELWUDU\QRGHUXQQLQJDQ,PSDODGDHPRQUHFHLYHD64/TXHU\
UXQQLQJDQDO\VLVWDVNVRYHUODUJHVFDOHGDWDLVWKHFRUHIHDWXUH WKLV QRGHEHFRPHV D FRRUGLQDWRU 7KH FRRUGLQDWRU VHQGVZRUN
RIWRGD\¶VELJGDWDWHFKQRORJ\,QWKLVVHFWLRQZHZLOOFRPSDUH WR RWKHU QRGHV LQ WKH FOXVWHU FROOHFWV WKHLU FRPSXWLQJ UHVXOWV
WKH DQDO\VLV DELOLW\ QDPHO\ 2/$3 DELOLW\ RI WZR PDMRU ELJ DQG DJJUHJDWHV WKHP LQWR WKH ILQDO UHVXOW VHW $OO FRPSXWLQJ
GDWD WHFKQRORJLHV +DGRRS V\VWHP DQG 033 FROXPQDU WDVNVUXQLQPHPRU\RQHDFKQRGHZLWKRXWDQ\GDWDZULWLQJWR
GDWDEDVH GLVNV 3OXV ,PSDOD XVHV D FROXPQVWRUH GDWD IRUPDW FDOOHG
SDUTXHW WR IXUWKHU LPSURYH LWV DQDO\WLFDO SHUIRUPDQFH ,Q DQ
+DGRRS LV DQ RSHQVRXUFH LPSOHPHQWDWLRQ RI 0DS5HGXFH H[SHULPHQWFRQGXFWHGE\WKH,PSDODGHYHORSPHQWWHDP,PSDOD
SURJUDPPLQJPRGHO:KLOHWKLVSURJUDPPLQJPRGHOLVKLJKO\ RXWSHUIRUPV ERWK +LYH DQG 6SDUN64/ >@ ,PSDOD FDQQRW
IOH[LEOHLWVXVDJHLQVWUXFWXUHGGDWDDQDO\VLVLVUHVWULFWHGE\LWV UHFRYHUIURP PLGTXHU\IDLOXUHVOLNH6SDUN64/RU+LYHGRHV
VWHHS OHDUQLQJ FXUYH ,Q WHOHFRP LQGXVWU\ VWUXFWXUHG GDWD DUH DVLWQHHGVWRUHUXQWKHZKROHTXHU\LQFDVHRIIDLOXUH
VWLOOWKHPDMRULW\ELOOLQJLQIRUPDWLRQSKRQHUHFRUGV05GDWD
*Q GDWD HWF 7UDGLWLRQDOO\ VWUXFWXUHG GDWD DUH DQDO\]HG B. MPP Columnar Database
WKURXJK 64/ ZKLFK SURYLGHV D VLPSOH LQWHUIDFH WR GHVFULEH
9HUWLFDV\VWHPLVIHDWXUHGE\LWVKLJKDQDO\WLFSHUIRUPDQFH
WZREDVLFRSHUDWLRQVDJJUHJDWLRQDQGMRLQ$VDUHVXOWWRGHDO
DQG KLJK VFDODELOLW\ DORQJ ZLWK IXOO\ $&,' VXSSRUW >@ 'DWD
ZLWK VWUXFWXUHG GDWD HIILFLHQWO\ D QXPEHU RI KLJKOHYHO
DUHVWRUHGLQRQHRUPRUH³SURMHFWLRQV´ZKLFKDUHFROXPQVRI
FRPSRQHQWVZHUHGHYHORSHGZLWKLQ+DGRRSVWDFNDQGEHFDPH
FRPSUHVVHGGDWDGLVWULEXWHGDFURVVWKHFOXVWHUDQGRUJDQL]HGLQ
WKH VR FDOOHG 64/RQ+DGRRS V\VWHPV +LYH 6SDUN64/ DQG
GLIIHUHQW VRUW RUGHUV +LJK VSHHG RI DQDO\WLFDO ZRUNORDG
,PSDODDOOEHORQJWRWKLVFDWHJRU\
UXQQLQJRQ9HUWLFDLVTXLWHLPSUHVVLYH,QRXUTXLFNWHVWRQD
033 FROXPQDU GDWDEDVHV RQ WKH RWKHU KDQG GHYHORSHG QRGH FOXVWHU D VLPSOH DJJUHJDWLRQ RI  7% GDWD FDQ EH
WKHLU RZQ VWRUDJH OD\RXWV WR VWRUH GDWD LQ D FROXPQDU IRUPDW ILQLVKHGZLWKLQVHFRQGVDQGDMRLQRSHUDWLRQEHWZHHQWKHIDFW
&ROXPQVWRUDJHQDWXUDOO\H[FHOLQ2/$3RSHUDWLRQVVLQFHWKH\ WDEOH 7%  DQG D GLPHQVLRQ WDEOH 0%  FDQ EH ILQLVKHG
DUH DEOH WR DFFHVV GDWD YHUWLFDOO\ WR SHUIRUP IDVW DJJUHJDWLRQV ZLWKLQPLQXWH
DQG MRLQV 6LPLODU WR FODVVLF GDWDEDVHV FROXPQDU GDWDEDVH
6$3+$1$GDWDEDVHDGRSWVDPXOWLHQJLQHDUFKLWHFWXUHWR
V\VWHPV XVH WKH UHODWLRQDO GDWD PRGHO ZKHUH GDWD LV ORJLFDOO\
VXSSRUW YDULRXV EXVLQHVV DSSOLFDWLRQV 6WUXFWXUHG GDWD UHVLGHV
VWRUHG LQ ³WDEOHV´ $ VWDQGDUG 64/ LQWHUIDFH LV SURYLGHG LQ
LQWDEOHVLQHLWKHUFROXPQRUURZOD\RXWDQGDUHSURFHVVHGE\D
VXFKV\VWHPVDQGDQXPEHURI64/OLNHIXQFWLRQVDUHLQFOXGHG
UHODWLRQDOHQJLQH$JUDSKHQJLQHKDQGOHVXQVWUXFWXUHGGDWDRU
WRDGGH[WUDDQDO\]LQJIXQFWLRQDOLW\WRWKHVHGDWDEDVHV
VHPLVWUXFWXUHGGDWDVXFKDV;0/DQG-621ILOHVDQGDWH[W
HQJLQH WR GHDO ZLWK WH[W GDWD DQDO\VLV >@ $OO GDWD DUH VWRUHG
A. SQL-on-Hadoop Systems DQGH[HFXWHGLQPDLQPHPRU\DVORQJDVWKHUHLVHQRXJKVSDFH
+LYHLVWKHILUVW64/OLNHV\VWHPRYHU+DGRRSZKLFKXVHV DYDLODEOH +HDY\ FRPSUHVVLRQV DUH PDGH WR WKH GDWD ZKHQ
0DS5HGXFH WR SURFHVV 64/OLNH TXHULHV +LYH4/  7KH +LYH VWRUHGLQDFROXPQIRUPDWWRVDYHDVPDQ\PHPRU\DVSRVVLEOH
H[HFXWH HQJLQH ILUVW SDUVH +LYH4/ VWDWHPHQWV LQWR D 'LUHFWHG 'DWDDUHDEOHWREHXQORDGHGIURPWKHPHPRU\ZKHQWKHOLPLW
$F\FOLF *UDSK '$*  RI 0DS5HGXFH WDVNV ZKLFK DUH WKHQ RI DYDLODEOH PHPRU\ LV UHDFKHG DQG ODWHU UHORDGHG LQWR WKH
H[HFXWHG WKURXJK WKH 0DS5HGXFH IUDPHZRUN )RU EDWFKHG PHPRU\DJDLQZKHQWKH\DUHUHTXLUHG
WLPHLQVHQVLWLYH WDVNV +LYH ZRUNV ILQH HVSHFLDOO\ IRU ODUJH
GDWDVHW SURFHVVLQJ +RZHYHU TXHU\ SHUIRUPDQFH RQ +LYH LV C. Summary
JUHDWO\ LQIOXHQFHG E\ WKH VORZVWDUW IHDWXUH RI 0DS5HGXFH
'HVSLWH HDFK WHFKQRORJ\ IHDWXUHV LWV RZQ FRPSXWLQJ
MREV DV ZHOO DV WKH QHHG RI ZULWLQJ LQWHUPHGLDWH UHVXOWV LQWR
HQJLQH VLPLODU PHFKDQLVPV FDQ EH IRXQG LQ ERWK 033 DQG
+')6 6XFK VKRUWFRPLQJ PDNHV +LYH WKH VORZHVW V\VWHP
64/RQ+DGRRSV\VWHPVVXFKDVLQPHPRU\FRPSXWLQJGDWD
DPRQJDOOV\VWHPVZHGLVFXVVLQWKLVSDSHU
FRPSUHVVLRQ FROXPQVWRUH DQG ODWH PDWHULDOL]DWLRQ 7KHVH
6SDUN64/ XWLOL]HV D GLVWULEXWHG PHPRU\ GDWD DEVWUDFWLRQ QRYHO V\VWHPV DOO H[FHO LQ GHDOLQJ ZLWK 2/$3 ZRUNORDGV
5''V 5HVLOLHQW 'LVWULEXWHG 'DWDVHWV  WR SURFHVV GDWD LQ FRPSDUHG WR WUDGLWLRQDO 5'%06 GDWDEDVHV 0XOWLSOH
PHPRU\$0DS5HGXFHOLNH'$*H[HFXWHHQJLQHLVDGRSWHGWR H[SHULPHQWV KDYH EHHQ FRQGXFWHG WR FRPSDUH WKH DQDO\WLFDO
WUDQVIRUP 64/ TXHULHV LQWR LQGLYLGXDO 6SDUN WDVNV DQG VHQG SHUIRUPDQFHV UXQQLQJ RQ V\VWHPV OLVWHG DERYH >@>@>@>@
WKHVHVWDVNVWRZRUNHUQRGHVLQWKHFOXVWHU6SDUN64/FODLPVWR 7HFKQLFDOGHWDLOVFDQDOVREHIRXQGLQWKHVHVWXGLHV
KDYHDTXHU\VSHHGîIDVWHUWKDQ+LYH>@&ORVHO\WLHGZLWK
+DGRRS 6SDUN64/ FDQ UHDG GDWD GLUHFWO\ IURP +')6 DQG LV ,9 2/736833257
DEOHWRXVHWKHVDPHFOXVWHUUHVRXUFHQHJRWLDWRU<$516XFK
IHDWXUH PDNHV 6SDUN LQFOXGLQJ 6SDUN64/ 6SDUN 6WUHDPLQJ $V PHQWLRQHG EHIRUH WRGD\¶V ELJ GDWD DSSOLFDWLRQV LQ
DQG 6SDUN0/  TXLWH DWWUDFWLYH WR FRPSDQLHV ZKLFK KDYH WHOHFRPLQGXVWU\FDQEHIDUPRUHFRPSOLFDWHGWKDQVLPSOHGDWD
DOUHDG\GHSOR\HG+DGRRSEDVHGELJGDWDV\VWHPV DJJUHJDWLRQVDQGWDEOHMRLQV,QRXUQHWZRUNSODQQLQJVFHQDULR
IRUH[DPSOHUHFRUGLQVHUWLRQVPRGLILFDWLRQVDQGGHOHWLRQVDUH
,PSDODZKLFKLVDQRSHQVRXUFHYHUVLRQRI*RRJOH'UHPHO ZLGHO\ GHVLUHG 7KH DELOLW\ WR SURFHVV ERWK 2/$3 DQG 2/73
XVHV WKH FRQFHSW RI PXOWLOHYHO VHUYLQJ WUHH LQVWHDG RI '$* ZRUNORDGV LV EHFRPLQJ D NH\ SRLQW ZKHQ FKRRVLQJ D ELJGDWD
WDVNVWRH[HFXWHTXHULHV>@,PSDODLPSOHPHQWVLWVRZQORQJ V\VWHP


Transactions ZKLFK DUH EURXJKW E\ WUDGLWLRQDO 5'%06
GDWDEDVHV FDQ EH FKDUDFWHUL]HG E\ ACID >@  $WRPLFLW\ DQ
RSHUDWLRQHLWKHUVXFFHHGVFRPSOHWHO\RUIDLOV &RQVLVWHQF\ WKH
UHVXOWV RI DQ RSHUDWLRQ DUH YLVLEOH WR LW LQ HYHU\ VXEVHTXHQW
RSHUDWLRQ  ,VRODWLRQ RSHUDWLRQV PDGH E\ RQH XVHU GR QRW
LQIOXHQFHVRWKHU XVHUVXQWLO WKHVH RSHUDWLRQVKDYH FRPPLWWHG 
DQG 'XUDELOLW\ RQFH DQ RSHUDWLRQ LV FRPSOHWH LW ZLOO EH
SUHVHUYHG HYHQ XQGHU V\VWHP IDLOXUH VLWXDWLRQ  +HUH
WUDQVDFWLRQDO RSHUDWLRQV RIWHQ UHIHU WR INSERT UPDATE
DELETE COMMIT DQG ROLLBACK :KLOH ACID LV D
QHFHVVLW\ LQ WUDGLWLRQDO 5'%06 GDWDEDVHV DV SDUW RI WKHLU
WUDQVDFWLRQIXQFWLRQDOLW\LWLVOHVVFRQVLGHUHGLQWRGD\¶V2/$3
FHQWULFELJGDWDV\VWHPV
,Q+DGRRSILOHVDUHZULWWHQWR+')6E\DVLQJOHZULWHULQ
DQ DSSHQGRQO\ SDWWHUQ 0XOWLSOH LQSODFH ZULWHV DUH QRW
VXSSRUWHGE\GHVLJQ7KXVIRU64/RQ+DGRRSV\VWHPVH[WUD
SURFHVVRQGDWDLVQHHGHGWRVXSSRUW$&,'WUDQVDFWLRQDOWDVNV 
8QIRUWXQDWHO\ XQWLO UHFHQWO\ PRVW 64/RQ+DGRRS V\VWHPV )LJ &RPSDULVLRQ RI 2/73 UHVSRQVH WLPH LQ PV  XQGHU D 
LQFOXGLQJ ,PSDOD DQG 6SDUN64/  VWLOO ODFN WKH VXSSRUW IRU FRQFXUUHQF\FRQGLWLRQDQGDFRQFXUUHQF\FRQGLWLRQ
WUDQVDFWLRQV +LYH SDUWLDOO\ VXSSRUW WUDQVDFWLRQV E\ DOORZLQJ
INSERTUPDATEDQGDELETELQDUHVWULFWHGFRQGLWLRQEXW TXHXHDOVRKDVDWLPHRXW DGHIDXOWVHWXSLVPLQXWHV :KHQ
VXIIHUV D VLJQLILFDQW TXHU\ SHUIRUPDQFH GHWHULRUDWLRQ DIWHU WKH PD[LPXP WLPH RI WKH TXHXH LV UHDFKHG UHTXHVW ZLOO EH
UHFRUG PRGLILFDWLRQV 7KHUHIRUH FXUUHQWO\ 64/RQ+DGRRS LV
UHMHFWHGWRDYRLGORQJWLPHZDLWLQJRIWKHIURQWDSSOLFDWLRQ
QRWDQDSSURSULDWHFKRLFHIRUV\VWHPVWKDWUHTXLUHWUDQVDFWLRQDO
RSHUDWLRQV :H ILQG RXW WKDW LQ WKH FRQFXUUHQF\ VLWXDWLRQ HYHU\
UHTXHVW ZDV DVVLJQHG VXIILFLHQW UHVRXUFHV DQG WKXV V\VWHP
033 FROXPQDU GDWDEDVHV RQ WKH RWKHU KDQG XVXDOO\ UHVSRQVH WLPH ZDV UHODWLYHO\ ORZ DQG VPRRWK ,Q WKH 
LPSOHPHQWVWRUDJHV\VWHPVE\WKHPVHOYHVDQGWKXVGRQRWKDYH FRQFXUUHQF\ VLWXDWLRQ KRZHYHU TXHU\ WLPH ZDV PXFK ORQJHU
WKHSUREOHPLQWURGXFHGE\WKH+')67KHVHV\VWHPVFODLPWR DQGKDGDSHULRGSDWWHUQZKLFKZDVFRUUHVSRQGHGWRHDFK'0/
KDYHIXOO\$&,'VXSSRUWIRUWUDQVDFWLRQVDQGWKXVEHFRPHWKH ORRS'XULQJHDFKSHULRGTXHU\UHVSRQVHWLPHJUHZOLQHDUO\DW
ILUVW FKRLFH ZKHQ EXLOGLQJ DQ 2/$32/73 K\EULG ELJ GDWD DQXQDFFHSWDEOHKLJKUDWH IURPVWRV (YHU\QHZFRPLQJ
SODWIRUP 6WLOO YHU\ IHZ KDYH EHHQ GLVFXVVHG RI 2/73 '0/ UHTXHVW KDG WR ZDLW IRU &38 UHVRXUFHV RFFXSLHG E\ WKH
EHKDYLRU LQ 033 FROXPQDU GDWDEDVHV ,Q WKLV VHFWLRQ ZH ZLOO SUHYLRXV TXHU\ XQWLO WKH HQG RI HDFK ORRS ZKHQ DOO UHVRXUFHV
ILUVWWDNHH[SHULPHQWVWRWHVWWKH2/73SHUIRUPDQFHRI9HUWLFD ZHUHUHOHDVHG:HUHIHUVXFK'0/SHUIRUPDQFHGHJUDGDWLRQDV
RQH RI WKH EHVWUDWHG 033 FROXPQDU GDWDEDVH VRPH WKRXJKWV D concurrency-related bottleneck ,W LV QRW KDUG WR LQIHU WKDW
RQ2/73RSWLPL]DWLRQVDUHWKHQGLVFXVVHG XQGHU D KLJKHU FRQFXUUHQF\ VLWXDWLRQ RXU V\VWHP ZRXOG
SHUIRUPHYHQZRUVH
A. Experimental setups
)ROORZLQJ H[SHULPHQWV ZHUH UXQ RQ D QRGH 9HUWLFD C. Exclusive-Lock-related bottleneck
FOXVWHU(DFKQRGHLQWKHFOXVWHUKDV[,QWHO;HRQ(Y )LJ  FRPSDUHV WKH SHUIRUPDQFH RI INSERT, UPDATE
&38VZLWKFRUHV FRUHVLQWRWDO *%5$0DQG7% DQG DELETE RSHUDWLRQV LQ WKH FRQFXUUHQF\ VLWXDWLRQ :H
GLVN VSDFH 6LQFH 9HUWLFD FRQFXUUHQF\ LV UHVWULFWHG E\ &38 KDYHDOUHDG\GLVFXVVHGWKDWWKHFRQFXUUHQF\UHODWHGERWWOHQHFN
FRUHVZHPDQXDOO\VHWPLANNEDCONCURRENCYSDUDPHWHU XQGHU VXFK FRQGLWLRQ FRXOG EH QHJOHFWHG VLQFH WKH QXPEHU RI
WREHOHDYLQJFRUHVWRGRUHJXODUV\VWHPTXHULHV FRQFXUUHQWRSHUDWLRQVLVIDUEHORZRXUPD[LPXPVHWXS
:HGHVLJQHGDWHVWSURJUDPVLQZKLFKPXOWLSOHWKUHDGVFDQ
)LUVW ZH QRWLFHG WKDW INSERT RSHUDWLRQV WRRN DQ DYHUDJH
EHVHWXSWRVLPXODWHDUHDOZRUOGFRQFXUUHQF\VFHQDULR(DFK WLPH RI PV PXFK ORZHU WKDQ WKH UPDATE DQG WKH
WKUHDG UDQ DQ LQILQLW\ ORRS RI VHQGLQJ WKUHH '0/ UHTXHVWV DELETERSHUDWLRQVZKLFKERWKWRRNPRUHWKDQPV7KLV
LQVHUWLQJ  UHFRUGV LQWR RQH WDEOH XSGDWLQJ WKHVH  REVHUYDWLRQFDQEHH[SODLQHGE\WKH9HUWLFD¶V:ULWH2SWLPL]HG
UHFRUGVDQGILQDOO\GHOHWLQJUHFRUGVIURPWKLVWDEOH:H 6WRUH :26  PHFKDQLVP ,Q FROXPQDU GDWDEDVHV GDWD LV
ILUVW UDQ WKH SURJUDP ZLWK DFRQFXUUHQF\VHWXS DQG DQRWKHU FRPSUHVVHG RUGHUHG DQG RUJDQL]HG LQ FROXPQ DQG SK\VLFDOO\
FRQFXUUHQF\VHWXSLQWKHIROORZLQJWHVW VWRUHG LQ VHSDUDWH ILOHV RQ GLVNV &ROXPQVWRUH WHFKQRORJ\
JUHDWO\ LPSURYHV WKH GDWD DJJUHJDWLRQ UDWH EXW LW KDV WKH
B. Concurrency-related bottleneck GHILFLHQF\ RI DZNZDUGQHVV ZKHQ PDNLQJ '0/ RSHUDWLRQV
)LJFRPSDUHVWKHSHUIRUPDQFHRIWKHFRQFXUUHQWDQG %HFDXVHFROXPQILOHVDUHVWRUHGVHSDUDWHO\LQRUGHUWRSHUIRUP
FRQFXUUHQW'0/RSHUDWLRQV7KHFRQFXUUHQF\VLWXDWLRQKDV HYHQRQHVLQJOHXSGDWHDFWLRQRQDVLQJOHURZWKHV\VWHPQHHGV
D PXFK ZRUVH SHUIRUPDQFH WKDQ WKH FRQFXUUHQF\ VLWXDWLRQ PXOWLSOH,2DFWLRQV0RUHRYHUKHDY\FRPSUHVVLRQRQFROXPQ
7KLV LV EHFDXVH ZKHQ UHDFKHG FRQFXUUHQF\ FDS  LQ RXU PDNHVLWHYHQKDUGHUWRPDNHXSGDWHVVLQFHGDWDQHHGWREHGH
VHWXS  DQ\ QHZ MREV UHFHLYHG E\ 9HUWLFD ZLOO EH TXHXHG FRPSUHVVHGXSGDWHGDQGUHFRPSUHVVHGEHIRUHZULWHWRGLVN
ZDLWLQJ IRU WKH V\VWHP UHVRXUFHV WR EH UHOHDVHG 7KLV ZDLWLQJ


9HUWLFDKDQGOHVWKLVSUREOHPE\VSOLWWLQJLWVVWRUDJHV\VWHP
LQWR 526 5HDG 2SWLPL]HG 6WRUH  DQG :26 >@ :KLOH 526
VWRUHV WKH PDMRU GDWDVHWV RQ GLVNV XVLQJ FROXPQVWRUH :26
EXIIHUVVPDOOGDWDLQVHUWVGHOHWHVDQGXSGDWHVHLWKHULQURZRU
LQ FROXPQ ZLWKRXW DQ\ HQFRGLQJ RU FRPSUHVVLRQV $
FRPSRQHQW FDOOHG 7XSOH 0RYHU LV XVHG WR SHULRGLFDOO\ PRYH
GDWD IURP WKH :26 WR WKH 526 LQ RUGHU WR DYRLG PHPRU\
RYHUIORZ:KHQUHDGLQJGDWDIURPDWDEOHWKDWVWRUHGVHSDUDWHO\
LQ ERWK :26 DQG 526 V\VWHP SHUIRUPV D PHUJH DFWLRQ
EHWZHHQWKHVHWZRVSDFHV%RWK:26VSDFHDQGWKHVWUDWHJ\RI
7XSOH0RYHUFDQEHWXQHGWRPDNHEHWWHUSHUIRUPDQFH
8QOLNH WUDGLWLRQDO URZVWRUH GDWDEDVHV LQ ZKLFK GDWD LV
PRGLILHG LQ SODFH ZKHQ GRLQJ GHOHWHG DQG XSGDWH RSHUDWLRQV
FROXPQDU FUHDWHV D GHOHWH YHFWRU LQ ZKLFK VWRUHV D OLVW RI
SRVLWLRQVRIURZVWKDWKDYHEHHQGHOHWHG'HOHWHYHFWRUVFDQEH
WUHDWHGDVUHJXODUGDWDUHFRUGVWKH\DUHILUVWLQVHUWHGLQWR:26
DQGWKHQPRYHGWR526E\WXSOHPRYHUUPDATELVVXSSRUWHG
E\DFRPELQDWLRQRIDELETEDQGINSERT
%DFN WR RXU H[SHULPHQW ZKLOH INSERT RSHUDWLRQV
SHUIRUPHG TXLWH ZHOO LQ JHQHUDO DELETE DQG UPDATE
RSHUDWLRQV KDG PXFK ZRUVH SHUIRUPDQFHV 7KH UHDVRQ EHKLQG
VXFKGLIIHUHQFHVLVWKDW9HUWLFDXVHVGLIIHUHQWORFNVZKHQGRLQJ
LQVHUWV DQG GHOHWHV 7R DFKLHYH $&,' UHTXLUHPHQW GDWDEDVHV
XVH ORFNLQJ PHFKDQLVPV WR HQVXUH WKH FRQVLVWHQF\ RI WKH GDWD
XQGHUFRQFXUUHQF\VLWXDWLRQVEXWHDFKGDWDEDVHKDVLWVXQLTXH
LPSOHPHQWDWLRQRIORFNV,Q9HUWLFDDQ,QVHUW/RFN ,ORFN LV 
DFTXLUHGGXULQJWKHLQVHUWLRQSURFHVV,ORFNLVFRPSDWLEOHZLWK )LJ 5HVSRQVHWLPHFXUYH LQPV RIINSERTUPDATEDQGDELETE
LWVHOI HQDEOLQJ PXOWLSOH FRQFXUUHQW LQVHUWV ZLWKRXW DQ\ XQGHUDFRQFXUUHQF\FRQGLWLRQ
LQWHUIHUHQFH 'HOHWHV DQG XSGDWHV RQ WKH RWKHU KDQG WDNH
([FOXVLYH/RFNV ;ORFN RQWKHWDEOH>@2QO\RQHGHOHWHRU
XSGDWHWUDQVDFWLRQRQDWDEOHFDQEHLQSURJUHVVDWDWLPH6XFK D. OLTP Optimizations of Vertica
UHVWULFWLRQPDNHV;ORFNVEHFRPHDUDUHUHVRXUFHHYHU\WLPHD 'HVSLWH 9HUWLFD LV QRW JRRG DW GHDOLQJ ZLWK FRQFXUUHQW
UHFRUG GHOHWLRQ RU DQ XSGDWH LV UHTXHVWHG V\VWHP ZLOO FKHFN 2/73 ZRUNORDGV VHYHUDO RSWLPL]DWLRQV FDQ EH PDGH WR
ZKHWKHU WKH WDEOH LV ORFNHG ,I ORFNHG QHZ FRPLQJ UHTXHVWV LPSURYHWKH'0/SHUIRUPDQFH
ZLOO KDYH WR ZDLW IRU WKH ; ORFN WR EH UHOHDVHG 6XFK ZDLWLQJ
WLPHZLOODGGXSUHVXOWLQJLQDJURZLQJUHVSRQVHWLPHFXUYH LQ 1) Limiting concurrent requests
PLOOLVHFRQG RYHUWLPHDVVKRZQLQ)LJ:HFDOOVXFK'0/ $V PHQWLRQHG EHIRUH FRQFXUUHQF\UHODWHG ERWWOHQHFN
SHUIRUPDQFH GHJUDGDWLRQ DQ exclusive-lock-related bottleneck VKRXOG EH DYRLGHG GXULQJ WKH V\VWHP GHVLJQ SKDVH
7R FRPSDUH '0/ ORFNV LQ 2UDFOH ZKLFK LV D URZVWRUH 2SWLPL]DWLRQVFDQEHFRQVLGHUHGIURPWKHIROORZLQJDVSHFWV
WUDQVDFWLRQGDWDEDVHFDQEHDSSOLHGRQDVSHFLILFURZRIGDWD a) Building Vertica cluster using servers with more
UDWKHUWKDQORFNLQJHYHU\URZLQWKHWDEOH>@ physical CPU cores.
$QRWKHU LQWHUHVWLQJ WKLQJ LV WKDW DIWHU DERXW  URXQGVRI b) Setting database connection limits and using queue
FRQWLQXRXV H[HFXWLRQ WKH DELETE DQG UPDATE UHVSRQVH technique on the application level instead of limiting queries
WLPH FXUYH EHJDQ WR YLEUDWH VHYHUHO\ XQWLO WKH HQG RI RXU on the database level.
H[SHULPHQW 6XFK GHWHULRUDWLRQ FDQ EH DWWULEXWHG WR WKH :26
PHFKDQLVP PHQWLRQHG EHIRUH :KHQ VPDOO LQVHUWV LQFOXGLQJ c) Assigning an independent, high-priority resource pool
LQVHUWLRQVRIGHOHWHYHFWRUV DUHFRQWLQXRXVO\VWRUHGLQWR:26 to DML-intense applications. 7DNH H[WUD FRQVLGHUDWLRQV RQ
LQ PHPRU\ 7XSOH 0RYHU ZLOO EHJLQ WR H[HFXWH D PRYHRXW WKHVH SDUDPHWHUV ³PRIORITY´
RSHUDWLRQWRPRYHGDWDIURP:26WR5267XSOHPRYHUDOVR ³PLANNEDCONCURRENCY´ DQG ³QUEUETIMEOUT´ $V
UHTXLUHVDVSHFLDOORFNFDOOHG7XSOH0RYHU/RFN 7ORFN 7KLV DUXOHRIWKXPEDVVLJQDUHVRXUFHZLWKKLJKSULRULW\DQGKLJK
ORFN LV FRPSDWLEOH ZLWK HYHU\ RWKHU NLQG RI ORFNV H[FHSW ; FRQFXUUHQF\ WR WKRVH VPDOO EXW QXPHULRXV TXHULHV WR DYRLG
ORFN ZKLFK LV XVHG LQ GHOHWH RSHUDWLRQV ,Q RXU H[SHULPHQW TXHXLQJ )RU ORQJ FRPSXWLQJ WDVNV DVVLJQ DQRWKHU UHVRXUFH
:26 ZDV ILOOHG DIWHU  URXQGV RI H[HFXWLRQ WKH PRYHRXW SRRO ZLWK ORZ SULRULW\ ORZ FRQFXUUHQF\ EXW KLJK TXHXH
RSHUDWLRQ ZDV WULJJHUHG DORQJ ZLWK WKH 7 ORFN 6XFK FRQIOLFW WLPHRXW
EHWZHHQWKH7ORFNDQGWKH;ORFNPDNHVWKHGHOHWHRSHUDWLRQV
2) Sorting the frequently-deleted columns in the projection 
HYHQZRUVH
9HUWLFDXVHRQHRUPRUH³SURMHFWLRQV´WRVWRUHDOOGDWDRID
WDEOH 7R RSWLPL]H WKH GHOHWH DFWLRQV XSGDWH DV ZHOO  PDNH
VXUH WKH IUHTXHQWO\GHOHWHG FROXPQV DUH DSSHDUHG LQ WKH


ORDER BYFODXVHRIDOOSURMHFWLRQV)RUH[DPSOHWRRSWLPL]H F. Summery
IRUGHOHWHVLQFROXPQ³YDOXH´ZHFDQFRQVWUXFWDSURMHFWLRQDV %RWK 9HUWLFD DQG 6$3 +$1$ GLG D UHPDUNDEOH MRE WR
IROORZV LPSOHPHQW $&,' WUDQVDFWLRQ IXQFWLRQDOLW\ LQ D GLVWULEXWHG
 FROXPQDUGDWDEDVH6LQFHGLIIHUHQWPHFKDQLVPVDUHDGRSWHGLQ
CREATE PROJECTION projection_t1 AS SELECT id, name, WKHVH WZR V\VWHPV RSWLPL]DWLRQ PHWKRGV DOVR YDU\ )RU
value FROM table_1 ORDER BY value; H[DPSOHFRQVWUXFWLQJVRUWHGSURMHFWLRQVIRUIUHTXHQWO\XSGDWHG
FROXPQV LV D JRRG PHWKRG IRU WKH 9HUWLFD V\VWHP ZKHUHDV
,Q JHQHUDO FRQVWUXFWLRQ RI SURMHFWLRQV LV HVVHQWLDO WR GLVDEOLQJDXWRGHOWDPHUJHIRUWDEOHVXQGHUKHDY\PRGLILFDWLRQ
9HUWLFD V\VWHP :KHQ SURSHUO\ XVHG WKH V\VWHP SHUIRUPDQFH FDQLQFUHDVH'0/RSHUDWLRQVVLJQLILFDQWO\LQ6$3+$1$2Q
RI ERWK 2/73 DQG 2/$3 ZRUNORDG  FDQ EH VLJQLILFDQWO\ WKH RWKHU KDQG VRPH VLPLODU LGHDV DUH VKDUHG LQ WKH
LQFUHDVHG %XW LW DOVR UHTXLUHV XVHUV WR KDYH D GHHS RSWLPL]DWLRQ RI WUDQVDFWLRQV LQ 033 FROXPQDU GDWDEDVHV )RU
XQGHUVWDQGLQJ RQ DOO WKH EDODQFH WULFNV EHKLQG VXFK LQVWDQFH ZKHQ WUDQVDFWLRQDO WDVNV DUH WKH PDLQ WKHPH IRU D
RSWLPL]DWLRQPHFKDQLVP WDEOH LW LV EHWWHU WR NHHS WKH GDWD LQ PHPRU\ ZLWK D URZ
3) Changing application to use INSERT-only logic
,Q9HUWLFDZKLOHFRQFXUUHQWUPDATEsDQGDELETEsOHDG
WRDH[FOXVLYHORFNERWWOHQHFNINSERTs SHUIRUPVTXLWHVPRRWK 2/$37DVNV 2/737DVNV 0L[HG7DVNV
Applications
ZLWKRXWVHULRXVSHUIRUPDQFHLVVXHV7KXVLWLVDJRRGSUDFWLFH
WRMXVWDSSO\LQFUHPHQWDODFWLRQV7RDFKLHYHWKLVZHFDQDGGD -'%& 2'%& $GKRF4XHU\

FROXPQLQWKHWDEOHWRPDUNWKHYDOLGLW\RIWKHUHFRUG)RUWKRVH
UHFRUGV WKDW QHHGV WR EH GHOHWHG RU PRGLILHG MXVW LQVHUW QHZ 64/,QWHUIDFH
RQHV ZLWK WKH ODWHVW WLPHVWDPS DQG PDUN WKHP YDOLGDWH $OVR
FOHDQ WKH RXWRIGDWH UHFRUGV UHJXODUO\ WR DYRLG WDEOH JHWWLQJ +LJK6SHHG$QDO\VLV
Parallel Data 7UDQVDFWLRQ0DQDJHU
WRR ODUJH 7KLV LQVHUWRQO\ PHWKRG KRZHYHU LV QRW XQLYHUVLW\ Processing Frame
([HFXWLRQ(QJLQH

DSSOLHG,WLVDWUDGHRIIEHWZHHQSHUIRUPDQFHDQGFRPSOH[LW\RI (MPP Columnar


Database, ,Q0HPRU\6WRUDJH
WKHV\VWHP Hadoop, etc.)

E. The SAP HANA solution 'LVN &RXOXP6WRUH 5RZ6WRUH

7KH 6$3 +$1$ ZKLFK LV DOVR D FROXPQVWRUH KLJK


VFDODEOHGDWDEDVHDGRSWVDQRWKHUDSSURDFKFDOOHG5HFRUG/LIH 
&\FOH 0DQDJHPHQW WR SURYLGH HIILFLHQW DFFHVV IRU ERWK
WUDQVDFWLRQDODQGDQDO\WLFDOZRUNORDGV>@7KHLGHDEHKLQGWKLV )LJ $XQLILHG2/$32/73DUFKHWHFWXUHRIELJGDWDV\VWHP
DSSURDFK LV WR XVH GLIIHUHQW VWRUDJH IRUPDWV IRU WKH GLIIHUHQW RULHQWHG IRUPDW :26 LQ 9HUWLFD DQG GHOWD VWRUH LQ 6$3
VWDJHVRIDWDEOH$URZEDVHGGDWDVWUXFWXUH FDOOHG/GHOWD  +$1$ 7RFRQFOXGHZLWKSURSHURSWLPL]DWLRQPHWKRGV033
LV ILUVW XVHG WR KDQGOH GDWD LQVHUWV GHOHWHV DQG XSGDWHV DQ FROXPQDU GDWDEDVHV KDYH WKH SRWHQWLDO WR EHFRPH DQ HIILFLHQW
LQWHUPHGLDWH FROXPQVWRUH EXW XQVRUWHG VWUXFWXUH FDOOHG / SODWIRUPIRUERWK2/$3DQG27/3WDVNV
GHOWD LVWKHQDSSOLHGWRRSWLPDOO\VXSSRUWSRLQWTXHU\DQGEXON
ORDGLQJ DQG ILQDOO\ D 0DLQ VWUXFWXUH LV XVHG WR VWRUH
FRPSUHVVHG RUGHUHG DQG FROXPQDU GDWD IRU 2/$3 ZRUNORDG 9 &21&/86,216
6LQFH +$1$ LV D PHPRU\FHQWULF GDWDEDVH DOO WKUHH GDWD ,QRXUSHUVSHFWLYHGHVSLWHKDYLQJDQH[FHOOHQWSHUIRUPDQFH
VWUXFWXUHV DUH VWRUHG LQ PHPRU\ ZLWK SHUVLVWHQF\ PDSSLQJ RQ LQ IDVW GDWD DQDO\VLV 64/RQ+DGRRS V\VWHPV JHQHUDOO\ ODFN
GLVNV WKH DELOLW\ WR GHDO ZLWK WUDQVDFWLRQDO WDVNV ,W LV QRW DQ
DSSURSULDWHVROXWLRQIRUWHOHFRPFRPSDQLHVWKDWLQWHQGWREXLOG
7KH 6$3 +$1$ GDWDEDVH LV GHVLJQHG WR GHDO ZLWK ERWK
DQ 2/$32/73 PL[HG ELJ GDWD V\VWHP 033 FROXPQDU
2/73 DQG 2/$3 ZRUNORDGV E\ DSSO\LQJ URZVWRUH DQG
GDWDEDVHV RQ WKH RWKHU KDQG ZLWK ERWK IDVW DQDO\]LQJ DELOLW\
FROXPQVWRUH GDWD VWUXFWXUHV DW WKH VDPH WLPH :KHQ UHFRUG
DQGWKHEXLOWLQ$&,'VXSSRUWLVFXUUHQWO\DEHWWHUFKRLFH%\
LQVHUWV DQG XSGDWHV DUH WKH PDLQ WDVN RI D WDEOH LW LV
WHVWLQJ VHYHUDO RSWLPL]DWLRQ PHWKRGV RQ 9HUWLFD DQG 6$3
UHFRPPHQGHGWRWXUQRIIWKH$XWR'HOWD0HUJHWRHQVXUHGDWD
+$1$ZHFDQFRQFOXGHWKDWLWLVSRVVLEOHWRIXUWKHUHQKDQFH
LVVWRUHGLQDURZRULHQWHGIRUPDW8QOLNH9HUWLFD&38FRUHV
WKH 2/73 SHUIRUPDQFH RI 033 FROXPQDU GDWDEDVHV ,W DOVR
DQG H[FOXVLYH ORFNV ZLOO QR ORQJHU EH D SUREOHP EHFDXVH IRU
FRPHV WR RXU DWWHQWLRQ WKDW VRPH VLPLODU PHFKDQLVPV DUH
URZRULHQWHG GDWD VWUXFWXUHV ORFNV DUH URZEDVHG DQG WKH
DGRSWHG LQ GLIIHUHQW 033 FROXPQDU GDWDEDVHV %\ FRPELQLQJ
PD[LPXP FRQFXUUHQF\ HTXDOV WKH WRWDO QXPEHU RI WKUHDGV RQ
WKHVHJHQHUDOIHDWXUHVWRJHWKHUZHDUHDEOHWRGHVLJQDXQLILHG
HYHU\ QRGHV LQVWHDG RI SK\VLFDO FRUHV RQ HDFK QRGH 
GDWD SURFHVVLQJ IUDPHZRUN WR SURFHVV DQDO\WLFDO DQG KLJK
+RZHYHU KLJK PHPRU\ FRQVXPSWLRQ EHFRPHV D FULWLFDO
FRQFXUUHQWWUDQVDFWLRQDOWDVNVHIILFLHQWO\ZLWKLQRQHV\VWHP
SUREOHP ZKHQ URZ VWRUH WDEOHV JHWWLQJ WRR ODUJH 3OXV
WUDQVIRUPDWLRQ IURP GHOWD GDWD WR PDLQ VWRUH FROXPQDU GDWD LV )LJVKRZVWKHDUFKLWHFWXUHRIVXFKK\EULGELJGDWDV\VWHP
DOVRDUHVRXUFHLQWHQVLYHWDVN)RUELJGDWDDSSOLFDWLRQVLWLVD 2Q WKH XQGHUO\LQJ VWRUDJH OD\HU HLWKHU URZRULHQWHG GDWD RU
FKDOOHQJLQJ ZRUN WR DVVLJQ PHPRULHV IRU D EDODQFHG FROXPQRULHQWHG GDWD DUH VWRUHG LQ PHPRU\ RU RQ GLVN LQ D
SHUIRUPDQFHRIERWK2/73DQG2/$3ZRUNORDG GLVWULEXWHGIRUPWRVXSSRUWERWKIDVWUHDGLQJDQGIDVWZULWLQJ
$ KLJKVSHHG H[HFXWLRQ HQJLQH LV LPSOHPHQWHG DERYH WR GHDO
ZLWK 2/$3OLNH ZRUNORDGV $ WUDQVDFWLRQ PDQJHU LV LQFOXGHG


WRVXSSRUW2/73OLNHZRUNORDGV2QWKHKLJKHUOHYHODOONLQGV >@ 0DUFHO.RUQDFNHU$OH[DQGHU%HKP9LFWRU%LWWRUI7DUDV%REURY\WVN\
RI DSSOLFDWLRQV UXQQLQJ E\ GLIIHUHQW JURXSV RI XVHUV VKDUH D &DVH\&KLQJ$ODQ&KRLHWDO³,PSDOD$ 0RGHUQ 2SHQ6RXUFH64/
(QJLQH IRU +DGRRS´ WK %LHQQLDO &RQIHUHQFH RQ ,QQRYDWLYH 'DWD
FRPPRQ64/LQWHUIDFHZKLFKVHQGVDQDO\WLFDORUWUDQVDFWLRQDO 6\VWHPV 5HVHDUFK &,'5¶  -DQXDU\   $VLORPDU &DOLIRUQLD
WDVNV WR WKH GDWDEDVH DQG JHW UHVXOWV IURP LW :LWK WKH UDSLG 86$
GHYHORSPHQW RI RSHQVRXUFH FRPPXQLW\ ZH H[SHFW LQ WKH >@ )UDQ] )lUEHU 1RUPDQ 0D\ :ROIJDQJ /HKQHU 3KLOLSS *UR‰H ,QJR
IXWXUH PDWXUH WUDQVDFWLRQDO IHDWXUHV ZLOO DOVR EH DGGHG WR WKH 0OOHU+DQQHV5DXKHDQG-RQDWKDQ'HHV³7KH6$3+$1$'DWDEDVH±
+DGRRS HFRV\VWHP SRVVLELOLW\ RWKHU ELJ GDWD IUDPHZRUN DV $Q$UFKLWHFWXUH2YHUYLHZ´,((('DWD(QJ%XOO  ±
ZHOO :LWKLQ D IHZ \HDUV DOO ELJ GDWD V\VWHPV ZLOO DGRSW DQ >@ 6KLYQDWK %DEX  +HURGRWRV +HURGRWRX ³0DVVLYHO\ 3DUDOOHO 'DWDEDVHV
2/$32/73XQLILHGGDWDSURFHVVLQJDUFKLWHFWXUH DQG0DS5HGXFH6\VWHPV´)RXQGDWLRQVDQG7UHQGVLQ'DWDEDVHV9RO
1R  ±
 >@ 9LVKDO6LNND)UDQ])lUEHU:ROIJDQJ/HKQHU6DQJ.\XQ&KD7KRPDV
3HK &KULVWRI %RUQK|YG ³(IILFLHQW 7UDQVDFWLRQ 3URFHVVLQJ LQ 6$3
+$1$ 'DWDEDVH ± 7KH (QG RI D &ROXPQ 6WRUH 0\WK´ 6,*02' ¶
5()(5(1&(6 0D\±6FRWWVGDOH$UL]RQD86$
>@ $QGUHZ /DPE 0DWW )XOOHU 5DPDNULVKQD 9DUDGDUDMDQ 1JD 7UDQ %HQ >@ 7KRPDV .\WH DQG 'DUO .XKQ ([SHUW 2UDFOH 'DWDEDVH $UFKLWHFWXUH UG
9DQGLYHU /\ULF 'RVKL DQG &KXFN %HDU  ³7KH 9HUWLFD $QDO\WLF HG&KDSWHU$SUHVV
'DWDEDVH &6WRUH  <HDUV /DWHU´ 3URFHHGLQJV RI WKH 9/'% >@ +HZOHWW 3DFNDUG (QWHUSULVH 9HUWLFD [ 'RFXPHQWDWLRQ $YDLDEOH DW
(QGRZPHQW9RO1R KWWSP\YHUWLFDFRPGRFV
>@ 5H\QROG 6 ;LQ -RVK 5RVHQ 0DWHL =DKDULD 0LFKDHO - )UDQNOLQ 6FRWW
6KHQNHU DQG ,RQ 6WRLFD ³6KDUN 64/ DQG 5LFK $QDO\WLFV DW 6FDOH´
6,*02'¶-XQH±1HZ<RUN1HZ<RUN86$



You might also like