Assignment of Information Technology: Submitted To: Submitted by
Assignment of Information Technology: Submitted To: Submitted by
SUBMITTED TO:
SUBMITTED BY:
Ms. KAMAL
VIPAN
DATA WAREHOUSE
A data warehouse is a repositor o! a" or#a"i$atio"%s e&e'tro"i'a&& stored data. Mea"s to retrie(e a"d a"a& $e data) to e*tra't) tra"s!or+ a"d &oad data) a"d to +a"a#e the data di'tio"ar are a&so 'o"sidered esse"tia& 'o+po"e"ts o! a data warehousi"# s ste+. Data warehousi"# i"'&udes ,usi"ess i"te&&i#e"'e too&s) too&s to e*tra't) tra"s!or+) a"d &oad data i"to the repositor ) a"d too&s to +a"a#e a"d retrie(e +etadata. According to Inmon, famous author for several data warehouse oo!s, "A data warehouse is a su #ect oriented, integrated, time variant, non volatile collection of data in su$$ort of management%s decision ma!ing $rocess&' Data warehousi"# arises i" a" or#a"i$atio"-s "eed !or re&ia,&e) 'o"so&idated) u"i.ue a"d i"te#rated reporti"# a"d a"a& sis o! its data) at di!!ere"t &e(e&s o! a##re#atio". The data warehousi"# 'o"su&ta"t is 'har#ed with +a/i"# the data appear 'o"siste"t) i"te#rated a"d 'o"so&idated despite the pro,&e+s i" the u"der& i"# sour'e s ste+s. The data warehousi"# 'o"su&ta"t a'hie(es this , e+p&o i"# di!!ere"t data warehousi"# te'h"i.ues) 'reati"# o"e or +ore "ew data repositories 0i.e. the data warehouse1 whose data +ode&0s1 support the "eeded reporti"# a"d a"a& sis. (e) develo$ments in earl) )ears of data warehousing were*
2345s 6 7e"era& Mi&&s a"d Dart+outh 8o&&e#e) i" a 9oi"t resear'h pro9e't) de(e&op the ter+s di+e"sio"s a"d !a'ts.:;< 23=5s 6 A8Nie&se" a"d I>I pro(ide di+e"sio"a& data +arts !or retai& sa&es.:;< 23?; 6 Teradata i"trodu'es a data,ase +a"a#e+e"t s ste+ spe'i!i'a&& desi#"ed !or de'isio" support. 23?? 6 Barr De(&i" a"d Pau& Murph pu,&ish the arti'&e A" ar'hite'ture !or a ,usi"ess a"d i"!or+atio" s ste+s i" IBM S ste+s @our"a& where the i"trodu'e the ter+ A,usi"ess data warehouseA. 2335 6 >ed Bri'/ S ste+s i"trodu'es >ed Bri'/ Barehouse) a data,ase +a"a#e+e"t s ste+ spe'i!i'a&& !or data warehousi"#. 2332 6 Pris+ So&utio"s i"trodu'es Pris+ Barehouse Ma"a#er) so!tware !or de(e&opi"# a data warehouse. 2332 6 Bi&& I"+o" pu,&ishes the ,oo/ Bui&di"# the Data Barehouse.
233C 6 The Data Barehousi"# I"stitute) a !orDpro!it or#a"i$atio" that pro+otes data warehousi"#) is !ou"ded. 2334 6 >a&ph Ki+,a&& pu,&ishes the ,oo/ The Data Barehouse Too&/it. 233= 6 Ora'&e ?) with support !or star .ueries) is re&eased. 233? 6 Mi'roso!t re&eases Mi'roso!t A"a& sis Ser(i'es 0the" OLAP Ser(i'es1 hea(i& uti&i$i"# data warehousi"# s'he+as.
E+am$le*
I" order to store data) o(er the ears) +a" app&i'atio" desi#"ers i" ea'h ,ra"'h ha(e +ade their i"di(idua& de'isio"s as to how a" app&i'atio" a"d data,ase shou&d ,e ,ui&t. So sour'e s ste+s wi&& ,e di!!ere"t i" "a+i"# 'o"(e"tio"s) (aria,&e +easure+e"ts) e"'odi"# stru'tures) a"d ph si'a& attri,utes o! data. 8o"sider a ,a"/ that has #ot se(era& ,ra"'hes i" se(era& 'ou"tries) has +i&&io"s o! 'usto+ers a"d the &i"es o! ,usi"ess o! the e"terprise are sa(i"#s) a"d &oa"s. The !o&&owi"# e*a+p&e e*p&ai"s how the data is i"te#rated !ro+ sour'e s ste+s to tar#et s ste+s. E+am$le of Source Data S)stem ,ame Attri ute ,ame -olumn ,ame Data t)$e .alues
8usto+er Sour'e App&i'atio" S ste+ 2 Date 8usto+er Sour'e App&i'atio" S ste+ F Date Sour'e App&i'atio" S ste+ ; Date
8USTEAPPLI8ATIONEDATE APPLI8ATIONEDATE
DATE DATE
2252F55C 52NOVF55C
I" the a!ore+e"tio"ed e*a+p&e) attri,ute "a+e) 'o&u+" "a+e) data t pe a"d (a&ues are e"tire& di!!ere"t !ro+ o"e sour'e s ste+ to a"other. This i"'o"siste"' i" data 'a" ,e a(oided , i"te#rati"# the data i"to a data warehouse with #ood sta"dards. E+am$le of Target Data /Data Warehouse0 Target S)stem >e'ord G2 Attri ute ,ame -olumn ,ame Data t)$e .alues 5222F55C
Date >e'ord GF >e'ord G; 8usto+er App&i'atio" 8USTOME>EAPPLI8ATIONEDATE DATE Date 8usto+er App&i'atio" 8USTOME>EAPPLI8ATIONEDATE DATE Date 5222F55C 5222F55C
I" the a,o(e e*a+p&e o! tar#et data) attri,ute "a+es) 'o&u+" "a+es) a"d data t pes are 'o"siste"t throu#hout the tar#et s ste+. This is how data !ro+ (arious sour'e s ste+s is i"te#rated a"d a''urate& stored i"to the data warehouse.
AR-HITE-TURE
Data warehouse ar'hite'ture is pri+ari& ,ased o" the ,usi"ess pro'esses o! a ,usi"ess e"terprise ta/i"# i"to 'o"sideratio" the data 'o"so&idatio" a'ross the ,usi"ess e"terprise with ade.uate se'urit ) data +ode&i"# a"d or#a"i$atio") e*te"t o! .uer re.uire+e"ts) +eta data +a"a#e+e"t a"d app&i'atio") warehouse sta#i"# area p&a""i"# !or opti+u+ ,a"dwidth uti&i$atio" a"d !u&& te'h"o&o# i+p&e+e"tatio". The Data Barehouse Ar'hite'ture i"'&udes +a" !a'ets. So+e o! these are &isted as !o&&ows: Pro'ess Ar'hite'ture Data Mode& Ar'hite'ture Te'h"o&o# Ar'hite'ture
1rocess Architecture
Des'ri,es the "u+,er o! sta#es a"d how data is pro'essed to 'o"(ert raw H tra"sa'tio"a& data i"to i"!or+atio" !or e"d user usa#e. The data sta#i"# pro'ess i"'&udes three +ai" areas o! 'o"'er"s or su,D pro'esses !or p&a""i"# data warehouse ar'hite'ture "a+e& IE*tra'tJ) ITra"s!or+J a"d ILoadJ. These i"terre&ated su,Dpro'esses are so+eti+es re!erred to as a" IETLJ pro'ess. 21 E+tractD Si"'e data !or the data warehouse 'a" 'o+e !ro+ di!!ere"t sour'es a"d +a ,e o! di!!ere"t t pes) the p&a" to e*tra't the data a&o"# with appropriate 'o+pressio" a"d e"'r ptio" te'h"i.ues is a" i+porta"t re.uire+e"t !or 'o"sideratio". F1 Transform2 Tra"s!or+atio" o! data with appropriate 'o"(ersio") a##re#atio" a"d '&ea"i"# ,esides deD"or+a&i$atio" a"d surro#ate /e +a"a#e+e"t is a&so a" i+porta"t pro'ess to ,e p&a""ed !or ,ui&di"# a data warehouse. ;1 3oad2 Steps to ,e 'o"sidered to &oad data with opti+i$atio" , 'o"sideri"# the +u&tip&e areas where the data is tar#eted to ,e &oaded a"d retrie(ed is a&so a" i+porta"t part o! the data warehouse ar'hite'ture p&a".
5rd ,ormal 6orm DTop Dow" Ar'hite'ture) Top Dow" I+p&e+e"tatio" 6ederated Star Schemas D Botto+ Up Ar'hite'ture) Botto+ Up I+p&e+e"tatio" Data .ault D Top Dow" Ar'hite'ture) Botto+ Up I+p&e+e"tatio"
Technolog) Architecture
Te'h"o&o# or Te'h"i'a& ar'hite'ture pri+ar e(o&(ed !ro+ deri(atio"s !ro+ the pro'ess ar'hite'ture) Meta data +a"a#e+e"t re.uire+e"ts ,ased o" ,usi"ess ru&es a"d se'urit &e(e&s i+p&e+e"tatio"s a"d te'h"o&o# too& spe'i!i' e(a&uatio". Besides these) the Te'h"o&o# ar'hite'ture a&so &oo/s i"to the (arious te'h"o&o# i+p&e+e"tatio" sta"dards i" data,ase +a"a#e+e"t) data,ase 'o""e'ti(it proto'o&s 0ODB8) @DB8) OLE DB et'1) Midd&eware 0,ased
o" O>B) >MI) 8OMHDOM et'.1) Networ/ proto'o&s 0DNS) LDAP et'1 a"d other re&ated te'h"o&o#ies.
Information Architecture
I"!or+atio" Ar'hite'ture is the pro'ess o! tra"s&ati"# the i"!or+atio" !ro+ o"e !or+ to a"other i" a step , step se.ue"'e so as to +a"a#e the stora#e) retrie(a&) +odi!i'atio" a"d de&etio" o! the data i" the data warehouse.
Resource Architecture
>esour'e ar'hite'ture is re&ated to so!tware ar'hite'ture i" that +a" resour'es 'o+e !ro+ so!tware resour'es. >esour'es are i+porta"t ,e'ause the he&p deter+i"e per!or+a"'e. Bor/&oad is the other part o! the e.uatio". I! ou ha(e e"ou#h resour'es to 'o+p&ete the wor/&oad i" the ri#ht a+ou"t o! ti+e) the" per!or+a"'e wi&& ,e hi#h. I! there are "ot e"ou#h resour'es !or the wor/&oad) the" per!or+a"'e wi&& ,e &ow.
DATA7ASE
A data,ase is a" app&i'atio" that +a"a#es data a"d a&&ows !ast stora#e a"d retrie(a& o! that data.
The term database was originally written as data base, and it may have been first used in 1963 at a symposium sponsored by the System Development Corporation of Santa oni!a, California. The use of the term database "single word# be!ame popular in some $uropean !ountries in the early 19%&s, and it subse'uently spread to the (.S.
8A data ase is a collection of information that is organi9ed so that it can easil) e accessed, managed, and u$dated& In one view, data ases can e classified according to t)$es of content* i liogra$hic, full2te+t, numeric, and images&' A data,ase 'a" #e"era&& ,e &oo/ed at as ,ei"# a 'o&&e'tio" o! re'ords) ea'h o! whi'h 'o"tai"s o"e or +ore !ie&ds 0i.e.) pie'es o! data1 a,out so+e e"tit 0i.e.) o,9e't1) su'h as a perso") or#a"i$atio") 'it ) produ't) wor/ o! art) re'ipe) 'he+i'a&) or se.ue"'e o! DNA. Kor e*a+p&e) the !ie&ds !or a data,ase that is a,out peop&e who wor/ !or a spe'i!i' 'o+pa" +i#ht i"'&ude the "a+e) e+p&o ee ide"ti!i'atio" "u+,er) address) te&epho"e "u+,er) date e+p&o +e"t started) positio" a"d sa&ar !or ea'h wor/er.
T:1ES
There are di!!ere"t t pes o! data,ase ,ut the +ost popu&ar is a re&atio"a& data,ase that stores data i" ta,&es where ea'h row i" the ta,&e ho&ds the sa+e sort o! i"!or+atio".
app&i'atio"s ,e'ause o! their e!!i'ie"' ) ease o! use) a"d a,i&it to per!or+ a (ariet o! use!u& tas/s that had "ot ,ee" ori#i"a&& e"(isio"ed.
a''ou"ti"# data,ase
Data warehouse
A data warehouse stores data !ro+ 'urre"t a"d pre(ious ears 6 data e*tra'ted !ro+ the (arious operatio"a& data,ases o! a" or#a"i$atio". It ,e'o+es the 'e"tra& sour'e o! data that has ,ee" s'ree"ed) edited) sta"dardi$ed a"d i"te#rated so that it 'a" ,e used , +a"a#ers a"d other e"dDuser pro!essio"a&s throu#hout a" or#a"i$atio". Data warehouses are 'hara'teri$ed , ,ei"# s&ow to i"sert i"to ,ut !ast to retrie(e !ro+. >e'e"t de(e&op+e"ts i" data warehousi"# ha(e &ed to the use o! a Shared "othi"# ar'hite'ture to !a'i&itate e*tre+e s'a&i"#.
'o++o" operatio"a& a"d 'o++o" user data,ases) as we&& as data #e"erated a"d used o"& at a user-s ow" site.
'o"trasts with data,ase +a"a#e+e"t s ste+s whi'h e+p&o a dis/D,ased stora#e +e'ha"is+. Mai" +e+or data,ases are !aster tha" dis/Dopti+i$ed data,ases si"'e the i"ter"a& opti+i$atio" a&#orith+s are si+p&er a"d e*e'ute !ewer 8PU i"stru'tio"s. A''essi"# data i" +e+or pro(ides !aster a"d +ore predi'ta,&e per!or+a"'e tha" dis/. I" app&i'atio"s where
respo"se ti+e is 'riti'a&) su'h as te&e'o++u"i'atio"s "etwor/ e.uip+e"t that operates e+er#e"' s ste+s) +ai" +e+or data,ases are o!te" used.
AR-HITE-TURE
A "u+,er o! data,ase ar'hite'tures e*ist. Ma" data,ases use a 'o+,i"atio" o! strate#ies. Data,ases 'o"sist o! so!twareD,ased A'o"tai"ersA that are stru'tured to 'o&&e't a"d store i"!or+atio" so users 'a" retrie(e) add) update or re+o(e su'h i"!or+atio" i" a" auto+ati' !ashio". Data,ase pro#ra+s are desi#"ed !or users so that the 'a" add or de&ete a" i"!or+atio" "eeded. The stru'ture o! a data,ase is ta,u&ar) 'o"sisti"# o! rows a"d 'o&u+"s o! i"!or+atio". O"&i"e Tra"sa'tio" Pro'essi"# s ste+s 0OLTP1 o!te" use a Arow orie"tedA or a" Ao,9e't orie"tedA data store ar'hite'ture) whereas dataDwarehouse a"d other retrie(a& !o'used app&i'atio"s &i/e 7oo#&e%s Bi#Ta,&e) or ,i,&io#raphi' data,ase 0&i,rar 'ata&o#1 s ste+s +a use a 8o&u+" orie"ted DBMS ar'hite'ture.
Do'u+e"tDOrie"ted) LML) /"ow&ed#e,ase) as we&& as !ra+e data,ases a"d >DKDstores 0a&so /"ow" as trip&e stores1) +a a&so use a 'o+,i"atio" o! these ar'hite'tures i" their i+p&e+e"tatio" Not a&& data,ases ha(e or "eed a data,ase s'he+a 0As'he+aD&ess data,asesA1. O(er +a" ears #e"era&Dpurpose data,ase s ste+s ha(e do+i"ated the data,ase i"dustr . These o!!er a wide ra"#e o! !u"'tio"s) app&i'a,&e to +a" ) i! "ot +ost 'ir'u+sta"'es i" +oder" data pro'essi"#. These ha(e ,ee" e"ha"'ed with e*te"si,&e data t pes 0pio"eered i" the Post#reSML pro9e't1 to a&&ow de(e&op+e"t o! a (er wide ra"#e o! app&i'atio"s.
. There are a&so other t pes o! data,ases whi'h 'a""ot ,e '&assi!ied as re&atio"a& data,ases. Most "ota,&e is the o,9e't data,ase +a"a#e+e"t s ste+) whi'h stores &a"#ua#e o,9e'ts "ati(e& without usi"# a separate data de!i"itio" &a"#ua#e a"d without tra"s&ati"# i"to a separate stora#e s'he+a. U"&i/e re&atio"a& s ste+s) these o,9e't data,ases store the
re&atio"ship ,etwee" 'o+p&e* data t pes as part o! their stora#e +ode& i" a wa that does "ot re.uire ru"ti+e 'a&'u&atio" o! re&ated data usi"# re&atio"a& a&#e,ra e*e'utio" a&#orith+s.
-om$onents of D74S
A''ordi"# to the wi/i,oo/s ope"D'o"te"t te*t,oo/s) ADesi#" o! Mai" Me+or Data,ase S ste+HO(er(iew o! DBMSA) +ost DBMS as o! F553 i+p&e+e"t a re&atio"a& +ode&. Other &essDused DBMS s ste+s) su'h as the o,9e't DBMS) #e"era&& operate i" areas o! app&i'atio"D spe'i!i' data +a"a#e+e"t where per!or+a"'e a"d s'a&a,i&it ta/e hi#her priorit tha" the !&e*i,i&it o! ad hoc .uer 'apa,i&ities pro(ided (ia the re&atio"a&Da&#e,ra e*e'utio" a&#orith+s o! a re&atio"a& DBMS.
RDBMS components
Interface drivers D A user or app&i'atio" pro#ra+ i"itiates either s'he+a +odi!i'atio" or 'o"te"t +odi!i'atio". These dri(ers are ,ui&t o" top o! SML. The pro(ide +ethods to prepare state+e"ts e*e'ute state+e"ts) !et'h resu&ts) et'. E*a+p&es i"'&ude DDL) D8L) DML) ODB8) a"d @DB8. So+e (e"dors pro(ide &a"#ua#eDspe'i!i' proprietar i"ter!a'es. Kor e*a+p&e M SML a"d KireBird pro(ide dri(ers !or PNP) P tho") et'. S;3 engine D This 'o+po"e"t i"terprets a"d e*e'utes the SML .uer . It 'o+prises three +a9or 'o+po"e"ts 0'o+pi&er) opti+i$er) a"d e*e'utio" e"#i"e1. Transaction engine D Tra"sa'tio"s are se.ue"'es o! operatio"s that read or write data,ase e&e+e"ts) whi'h are #rouped to#ether. Relational engine D >e&atio"a& o,9e'ts su'h as Ta,&e) I"de*) a"d >e!ere"tia& i"te#rit 'o"strai"ts are i+p&e+e"ted i" this 'o+po"e"t. Storage engine D This 'o+po"e"t stores a"d retrie(es data re'ords. It a&so pro(ides a +e'ha"is+ to store +etadata a"d 'o"tro& i"!or+atio" su'h as u"do &o#s) redo &o#s) &o'/ ta,&es) et'.
ODBMS components
3anguage drivers D A user or app&i'atio" pro#ra+ i"itiates either s'he+a +odi!i'atio" or 'o"te"t +odi!i'atio" (ia the 'hose" pro#ra++i"# &a"#ua#e. The dri(ers the" pro(ide the +e'ha"is+ to +a"a#e o,9e't &i!e' '&e 'oup&i"# o! the app&i'atio" +e+or spa'e with the u"der& i"# persiste"t stora#e. E*a+p&es i"'&ude 8OO) @a(a) .NET) a"d >u, . ;uer) engine D This 'o+po"e"t i"terprets a"d e*e'utes &a"#ua#eDspe'i!i' .uer 'o++a"ds i" the !or+ o! OML) LINM) @DOML) @PAML) others. The .uer e"#i"e retur"s &a"#ua#e spe'i!i' 'o&&e'tio"s o! o,9e'ts whi'h satis! a .uer predi'ate e*pressed as &o#i'a& operators e.#. P) Q) PR) QR) AND) O>) NOT) 7roupBY) et'. Transaction engine D Tra"sa'tio"s are se.ue"'es o! operatio"s that read or write data,ase e&e+e"ts) whi'h are #rouped to#ether. The tra"sa'tio" e"#i"e is 'o"'er"ed with su'h thi"#s as data iso&atio" a"d 'o"siste"' i" the dri(er 'a'he a"d data (o&u+es , 'oordi"ati"# with the stora#e e"#i"e. Storage engine D This 'o+po"e"t stores a"d retrie(es o,9e'ts i" a" ar,itrari& 'o+p&e* +ode&. It a&so pro(ides a +e'ha"is+ to +a"a#e a"d store +etadata a"d 'o"tro& i"!or+atio" su'h as u"do &o#s) redo &o#s) &o'/ #raphs)