Unit 2 (Data Warehousing)
Unit 2 (Data Warehousing)
[rove us coy far laos | Ar 874 fred Sr ies gr pao) 4/81, 19 G12 fO2S | 20 |e | oa fauolras|sa pa | OF BY fara}iezs|ee g 3 agar jes | 892) 94 |4a4per|sa jase Dy jrre |ae4 59 Pes 024 : whe © scanned with OKEN Scannerpo Wild sal dala elle, @ S Srephore ere Loa Olimansin, Lito as Luppier: Qed aclheenal foeette PS Viewlug ut 4-0 bscornet Delchey Necees aliae. Loainte 4 a Y-D cube Be beteg 9 cescies gf 3-0 Cesta. teoum Ln ebous fir’ Oy wera corttinise CEE OSG 2 AS me oat abtny y-dimonsforal dala 29 shetiee % Gy) iene zoel Beas the dela Cube. ica melaphor por mutilfmensforel Aala Teraye » Ths ailral plyetcal Umrage Y tuck dala ee Lifor fetomm Lay Lagtea! seufetan Talia an portant Clng to tetmember fg That Late Cubes asre n-olimentioal aud cbonat emafine alata te 3-D- — Qu data wwe lug a data coke, ase stoped to as cecboid: yo ffi 8 dk inantion , use cou penotate a Cuboid fr Gack of Tit pouiile dubits y Tia giuow Olomonsi ova: Ula sueute urcutd form a tattfee af ie, Puen serfered ty aya olala extbe- ° © scanned with OKEN ScannerO-D (apen) all fas a trDeerBard. YD Eb cae) Corterd Bene , Reve, detatin supebir Latice of Cxbwides ratleieg Uh a od cate. cetbe. fer fine, dian, alin auel a paring Cw, deen CEack cuboid serbreraut aol Afferent ign ¢ beets) Summuzalint }* oe cuebatd teat trolds cre Loruet tenet deamescicathos is Cxlted Lhe bare, otbeld: (Eg Prom fig? Lie , fem Arcatfe, perphler )« the tight hero af — Yu o-P eubotd , ss cated the apex eursard. Atemmorization Clg Totalsabes , or leltir Lot seemmaseiseaa © scanned with OKEN ScannerSCHEMAS POR MULTIDIMENSIONAL = DaTA MonELe i. 2 Whee tadihey ~ Mimubif, botla 1, © ue ¢ f trelarte ‘b Mroedlol is Commonly, tod tu the Aeatgn Y Melita’ Atabares g etere & Adta baae sehoem% Condicts a cet ; Chile, ied tee Pacey hts, dens sdecch dake model fs APbeopetate fr ontdve > Dus sehema is a legttal hourpen, ey Te “erties AeLabas,, B trelucles 210 name aud i Arecseipivonr alt recorde Tybes Mecclucclilng att anuocldtid date ilims aud 24 regalia. —A data warchoue | howeute eeqiiog atmere , Atthj cet -etdiiuitd Seboma that Grcititatis online dota aecbysie: KS Une most papelar date mode pu a tala parcehouse £ 2 onuttidimensional rodlel, Util Can burgh Lu the jew OY 4 thar cohema, & sireroftate « cehema, ora pect Conihlatlalter Lbbema: i 1 t I © scanned with OKEN Scanner@ SCHEMAS FOR MULTIDIMENSIONAL DATABASES, 1, StarSchema ‘* The mast common modeling paradigm is the star schema, in which the data warehouse contains (1) a Large rca bbe (fact able) cossning the blk ofthe data With n redundancy and (2) ast allstar ‘ables (dimension tables) one foreach dimension, The schema graph resembles a starburst, wih he dimension bles, cromnl fst ble. + Esch dimension ina star schema is represented with only one-dimension ble, + This dimension tale cootins these of atuibutes +The falowing diagram shows the sales data ofa company’ With respect to the four dimensions, namely time, iter, brinch and location. ‘+ Theresa fac table atthe center. contains the keys to each of four dimensions. The facttsble also contains the anvibutes namely dollars sold and units sold. played in a radial pattern around the time sales item Dimension table Feet table Dimension table item fey ene Bea Inee | svppler hry Location Dimension table location_key \sueer ity province or sat country ‘Note ~ Each dimension has only one dimension table and each table holds set of attributes. For example, the location dimension table contains the auribute st (location_key, tect, city, province_or_statecountry). This constraint may cause data redundency. For example, "Vancouver" and "Victoria" both te cities are in the Canadian province of British Columbia. The entries for such cities may cause data redundancy along the attributes province_or_stste and country. 2 Snowflake Schema + Some dimension tables in the Snowflake schema are normalized. ‘+ Thenormalization spis up the data into additional tables. | + Unlike Star schema, the dimensions table in a snowflake schema is normalized For example the item dimension table ina star Schema is normalized and spit into two dimension tables namely item and supplier table. The resulting schema graph forms a shape similar to a snowflake. ‘+ Now the item dimension able contains the abuts item_ey,item_name, Pe brand and supplir-key. ‘+ The supplier key is linked tothe supplier dimension table, The supplier dimension table contains the attributes supplier key and supplier ype : ‘Note ~ Due to normalization in the Snowflake schema, the redundancy i reduced and therefore, it becomes ca5) fo rainiain and save storage space. © scanned with OKEN Scanneray | tmenon ale stem ised dimension able — dimension'abie Tone ty ay Z Uyak ain aut lie by 1 Fact Constellation Sehenn Sophisticated apleations may This kind of schema can be constellation a -——tzt=CstN The soles fic oble isthe sme ath inthe ‘Ths shiping fact ble as th ve dimensions, marely em hey, to_location, ~ The shipping fact abe also contains two measures, namely dlls sold and uit sold. Tisato Possible to share dimension tbls between ft tables. For example, ime ie, ad loeation dimension tables are shared beeen the sles and sing heels ‘eaulte multiple fat tables to share dimension tables. "ved 05a collection of stars, and hee sealed a gay schema or a fact s_key,shipper_key,from_loeation, time sales Item shipping dimensiontable facttable——dimensiontable fact table Tine key time key henley co Ten fey 7 a ime_Fey Branch Fey shipper key ut aT (8221 | [from locaton quarter Solar sold} |) [ouppier fey] | | reer year anisole | [dollars cost | unts_shippeal location dimensiontable branch dimension tabl shipper Tocation_key dimension table street city Province_or_state [ cournry | shipper_type © scanned with OKEN Scanner© NOTE ‘+ In data warehousing, there is a distnetion between data warehouse and a data mart, + Adata warehouse colleets information about sve that span the entire organization, suc as customers, items, sales, assets, and personnel, and thus its Seope is enterprise-wide, + Fordsta warehouses, the fat constellation schema is Commonly used, snce it can model multiple, interelted subjects + Adata mart, on the other hand, i a department subset ofthe data warehouse that focuses on selected subjects, and thus is scope is deparent-vise + For data mars the star and snowflake schema is commonly used, since both are geared towards modeling single subjects, although the star schema is more popula and ecient, © scanned with OKEN ScannerPoe CF CONCEPT HMmARCHIEL MMEN CONS (Ie) —Ghe Concept Mfercanctyy staff 0 kepeinen of map hieg fom tenet demeeple AY Cuct | mnesce grouseat a act ¥ bnew Concepts — Crnusider a coneapt Atewcarele fe Lhe Almundan Jotatler. Elly waterss for hocation scrolude Vauauver, Terrvite, Mau Yor, ¢ f Aud Cut cage Each eft, heweus, co b+ Wemmel vated Dp te prcorinte OF ate te uhiol Lh eters Por example, Vancouner tau 62 mapped ts Becilist Columera, aud Cree GO Tlinok. he, prouires oud Mole wou Lu turn be mapped couatig leg tauada or the Uolled etatis) & eartitcl. 7 cote —Yasse mappings form & ommexpt Ricaley fer Mee hocatian , rapplg a ut of towo- leet eoncepte Cie cibes) @ Aggnr - Leet, core geruscal comerp Cie corrcefots) comctbcies ) A concipt deeemnhiy for bertine £4) fat] Chocitinn ~ HD ZO ty tos Ceceenbug) [osar Cprecouiee- < A ou _slati) mK (eee © scanned with OKEN Scannerwt Ha ie “ted —_ Oy ———_——— ae a Y Conteh di emareting arcs fonptict uilitn te CTR ol ata brag Cocos en ther Example | detfore, stat ine ctinuntion Lavalfen Ue teseelicd CRC TI b ee te, i Abe om t roor , dlocect, cily g prcomlnce ove hate 4 ls greemiber, Heed yr? porte. aed Cweeectry HUhese, AEE eee Heelalid by 4 fatal orden, booing Aemept hi : . 7 Meee, Qutalicett € cles peccntinss or ie < conuebey * Coxsntiy One or. tee O quasdtic | - vocoke. ahioovhag” Pe a lettteg fine — Atwuatively que abebuls qa Lt mcastn be onganteed Lis a partial orter, forming & laMice - fn enamble 4 we. te Tine Aruntly booed on the alirdbett Pett ele cuted Year 18 Cha ye frmuonti.c quater; wen ]e 12°F ; - Witeky me Lesa © scanned with OKEN ScannerWA doeeke Atisancty Ltt fa a Petal or partial Onder Among atleribulty wh Adatabae Letuma % cated a @ detema _hievearcoley —rncept hiecacotaps Chah eC Commun te mane Aplications (eg. for Ten) May bo fecedtepbed ule the alate madning egulne > Cercept Aararchior omagy ako be Aifnced aig eating a routing Malus for a Geen Lrnenshou, or albeute , Verouttiog wis a aut geoublng Neranetey « Gy. A Woe Or partial oxda can be cefined areng greeted watues y for Ae mension prdite, “there auc Lalirusl Yd y) olenotas fhe serege feeom $x( encleafive) BA 7 (inetacia 7 Yow bs mare that Ont Concept hicscarceliyp fer 0 pes aTbakuti or dlirention bared on iffercsnt- oo Uieeporics » for LncTauce , Oa tei 2 orgavize chef é weauge La enpinsbie yrnorlescale speed oid enpemtnc: r T byt wars domain Oxpeds , or kuomtedge engeneers oA may be aetomaticabiy Grete tase m slatictical Qhalyds 4 fa data Aidl®forlter’ © scanned with OKEN ScannerOLAP OPERATIONS IN THE MULTIDIMENSIONAL DATA MODEL, tn the muidimensional model, the record are organized into various dimensions, and each dimension inclodes ‘ultple levels of abstraction described by concept hierarchies ‘This organization supports users with the Mexibility o view data from various perspectives. A number of OLAP data cute operations eit to demenstate these diferent views slowing intenctive quence aed soeach doh ‘hand. Hence, OLAP supports a user-friendly environment for interactive data analysis. Consider the OLAP operations which ar tobe performed on multidimensional at. The figure shows ata cubes far sales ofa shop. The cube contains the dimensions, loation, and time and item, where the location is angregated With regard to city values, time is aggregated with respect 0 quarters, and an item i aggregated with respect o itera ‘pes. 1. Roll-Up The rall-up operation (also known as dillup or aggregation operation) performs aggregation on a data cube, by climbing down concept hierarchi dimension reduction. Roll-up is like zooming-out on the data cubes. Fi shows the result of roll-up operations performed on the dimension locaton, The hierarchy for the location is defined ‘8s the Order Street, city, province, or state, country. The roll-up operation aggregates the data by ascending the location hierarchy from the level of the city to the level of the country. When a roll-up is performed by dimensions reduction, one or more dimensions are removed from the eube. For ‘example, consider a sales data cube having two dimensions, location and time, Roll-up may be performed by ‘removing, the time dimensions, appearing in an aggregation ofthe total sales by location, relatively than by location and by time. Example, Consider the following cubes illustrating the temperature of certain days recorded weekly: Temperature = | 64658] 7] TLC SBCs Week to Week? 0 10 jo Consider that we want to setup levels (hot (80-85), mild (70-75), cool (64-69) in temperature from the above cubes. To do this, we have to group columns and add up the values according to the concept hierarchies. This ‘operation is known a a roll-up. By doing this, we contain the following cube: © scanned with OKEN Scanner‘mp the Infurmation by tevele 0 ‘he Hows digo totae8 No slam wy Roll UP aa ‘Mobile Modem Phone Security Htemitypes) 2 Follup on focation (from cities to ‘countrles) ° o $ ae so LS hea ae nae . Toronte A! Vancouver 335, 0s [e2s | 14 | ao 2 2 Time (Quarter) 2 a Mobile Modem Phone Security Memitypes) 2 Drill-Down imate dawn epeaion (ato called roll-down) isthe revere operation of roll-up. Diledown vious deta cube. navigates from les detailed records o more deuiled data. Delldone wo ‘ther stepping down a concep hierarchy fora dimension or adding additonal diners | "igure shows a drill-down operation performed on the dimension ti {s defined as day, month, quarter, and year. Dril-down a the quarter to a more detailed level of the month, by stepping down a concept hierarchy which Peas by descending the time hierarchy ffom the level of cd © scanned with OKEN ScannerBxample Drillaown adds more details othe given date ‘Temperature Day7 Day 10 Day IT Day 12 Day 13, Day 14 ‘ © scanned with OKEN ScannerCe» he folowing ingrany ih tos ow Della wks 4 sigs Drill Down M ! saeasinetinn wit aoa roan ies maremt aseige aoa ‘emanate ™ o cng newton tom vaneury Seeereseeey 3. Slice Alice isa subset ofthe cubes Example slice operation is executed when the customer wants «cy ‘cube resulting in a two-dimensional ste. So, ‘cube, thus resulting in a subcube, Day 3 ° Day ° Days © scanned with OKEN ScannerDay 6 De? best Day 9 Day 1 Day 12 Dey 13 Day 14 The following diagram illustrates how Slice works. Slice ee FP merit Var = wee = ig 3 ou “ slice’ for time ="Q1" tue li New York toomta voncuve ae a ats eda Shoe Sty ‘se Her {twill form a new sub-cubes by selecting one or more dimensions, lice is functioning for the dimensions “time” using the criterion time * "Ql" > Scanned with OKEN Scanner4. Dice ‘The dice operation describes a subcube by operating a selection on two or more dimensions, For example, implement the selection (time = day 3 OR time = day 4) AND (temperature = a) te orga ees We tthe loving ube orden) a OR emreratre= NewYork AGI OO ngs Vancouver ‘a (5 [235 [16] a] ‘Mobile Modem Phone Secuity ‘The dice operation on the cubes based on the following selection criteria involves three dimensions. © (location = "Toronto" or "Vancouver") 7 © (lime = "QI" oF Q2") © tem =" Mobile” or "Modem") | | | a © scanned with OKEN ScannerS Prot "e pitt operation ae called arto. Pivot isalizaon operation hat tates he een preven eatin rset ef he aH) SH SN tows ane re rowsdimensions into the column dimensions. a Time ‘Temperature Consider the following diagram, which shows the pivot operation. SS chicago New York {cities) Locations Toronte| {= taal Vancouver] 605 | #25 | 14 | 400] ‘Mobile. Modem Phone Security Pivot | movite[ | 5 Modem ms (0) Phene 4 sean |_| 1) fronto. Vancouver York Location (cities) al © scanned with OKEN ScannerOther OLAP Operations eae ng mr fatal eg etm to Other OLAP operations may contain rank in its, a well as calculating moving = 18 the tap-N or bottom-N elements in lists, ‘8 8 averages, growth res and interests, tral ate of tum, depreciation, mene conversions, and Satsticl OLAP offers analytical modeling capabilites, conning acaleultion engine for determining ratios, variance ete, and for computing measures across various dimensions It ean generate summarization, aggregation, and hierarchies at each granularity level and at every dimensions interscetion, OLAP also provide functional models for forecasting, twend analysis, and statistical analysis. In this context, the OLAP engine is powerful data analysis tool a © scanned with OKEN ScannerDATA WAREHOUSE ARCHITECTURE: 3 THER ARCL tune, ‘Dun Warsow i vettne tothe dats epost thi is anne separately fy som Mal er Data Warhoune rcietie conn he atoning ga EA pert Boxtorn Tier AMiaate Tier 1 Wp ter Query/report Analysis Top tier front-end tools \dministrat\yn Data warehouse Datamarts C= ‘8S 2 tottonti ~ S| dita warenou —” Lap J server Data Operational database Extemal Sources Three-Tier Data Warehouse Architecture Bottom Tier(Data sources and data storage) : 1. The bottom Tier usually consists of Data Sources and Data Storage. 2. Iisa warehouse database server. For Example RDBMS. : 3. In Bottom Tier, using the application program interface(called gateways), data is extracted from ‘operational and! external sources 4. Application Program Interface likes ODBC(Open Database Connection), OLL-DB(Oper-Linking and mbeding for Database), JOHCUava Database Connection) is supported. TA. stands for Extract, Transtonn, and Load, ral popular ETL tools include: IDM Infosphere {nformati MIL. Conflveet © scanned with OKEN ScannerAW. Microsoft SSIS, NV. Snaplogiec VIL Alooma Middle Tiers ‘The middle tie isan OLAP server that is typically implemented using either = Arcana OLAP (KOLAR) mode, aed mons DUMS at ap petation fom sanded des standard daa); oF 4 mutdinensinat OLA (MOLAP) model e, 8 pei porose server that ditety implements tnultigimensional data and operations), OLAP server models come i L thre different categories, neu i actively broken down nt several dimensions as part of elaional online analytical proessing(ROL-AP). Thi ‘sed when everything that is contained inthe epoitory is relational database system, 2. MOLAP: A different type of onlin analytical processing called multidimensional online analytical processing(MOLAP) includes directories and catalogs that are immediately integrated into its multidimensional dbase system. This fs used when all thats contained in the repository isthe multidimensional database system. HOLAP: A combination of relational and multidimensional online analytical processing paradigms is hybrid ‘online analytical processing(HOLAP). HOLAP isthe ideal option fora seamless functional flow across the database systems when the repository houses both the relational database management system and the ‘multidimensional database management system, ‘Top Tier: ‘The top tir isa front-end client layer, which includes query and reporting tools, analysis tools, and/or data mining, tools (eg, trend analysis, prediction, etc). Here area few Top Tier tools that ae often used: = SAPBW + SAS Business Intelligence = IBM Cognos # Crystal Reports ‘Microsoft BI Platform Advantages of Mult-Tier Archit Scalability: Various components can be added, deleted, or updated in accordance with the data warehouse’s shifting needs and specifications. 2. Better Performance: The several layer enable parallel and efficient processing, which enhances performance and reaction times. 3. Modularity: The architecture supports modular design, which facilitate the ereation, testing, and deployment of separate componen'. 4. Security The data warchouse's overall security canbe improved by applying various security measures to various layers. 5. Improved Resource Management: Diferent ters can be tuned to use the proper hardware resources, cating expenses overall and increasing effectvenes 6 Easier Maintenance: Maintenance is simpler because individual components can be updated ot maintained without affecting the data warehouse asa whole, Improved Reliability: Using many tiers can offer redundaney and failover capabilities, enhancing the data ‘warchouse’s overall reliability. J © scanned with OKEN ScannerDATA WARENOUS MODELS rom the perspective of data warchouse architecture, we have the following data warchouse models = + Virtual Warehouse + Dataman + Enterprise Warehouse Enterprise Warehouse: — ‘© Anenterprise warehouse collets al information topics spread throughout the organization, + Itprovides corporate-wide data integration, typically from one or several operational systems or extemal information providers, and is cross-functional in seope. + itusually contains detailed data as well ag summarized data and ean range in size from a few gigabytes to hundreds of gigabytes terabytes, or beyond. Can be an enterprise data warehouse, ‘© The traditional mainfame, computer super server, or parallel architecture has been implemented on platforms. This requires extensive commercial modeling and may take years o design and manufacture, Data Mart: +A data mart contains a subset of corporate-wide data that is important toa specific group of users. +The scope is limited to specific selected subjects. + For example, a marketing data mart may limit its topics to customers, goods, and sales. + The data contained in the data mars are summarized. Data mart are typically applied to low-cost departmental servers that are Unix/Linux or Windows-based, “The implementation eycle of a data masts more likely to be measured in weeks rather than months oF _years. However, it ean bein the long run, complex integration isiavolved in its design and planning were not enterprise-wide. Virtual Warehouse:~ + Avira warehouse is a group of views on an operational database. + For efficient query processing, nly afew possible summary views can be physical. + Creating a virtual warehouse is easy, but requires aditonal capacity on operational database servers. © scanned with OKEN ScannerF WHAT ARE THE PRos AND Cons OF THE ToP-pown AND BoTTonN- UP Aimonakd TO _DATA waathoue DEVELOPMENT: @ he tap - leu cece lopment 7 Culipuitn Wareehortae, Auvcuus as a tycbmate doleet for od mibntnitves Seutighation Probie —Horuenua , Lt b enpensine , takes @ long time te clectalapr, avs laches plecititily chew ty tee chiffecttay ta achientag Tornutstency aad tonsensed fr A commen date model fer fae Cues ercgernteattons + Uke Dolem—up apprench te Abe ducton, development aud Aeplezpment of Cclefendot ala mas preoctcla foceitity | bow cost, ancl veapic weskinen of Krutstment Ot, heweur, au Lead 10 feces thes ibigratt. restos diparate dala mots Lutp a temetlent- tntiripice clata wasrcchora- PA seoommended omitted for the development Y date rascetiouts cyeliins be te Lerplenent he prasechoess eg 10 an LCuseenrontal ad colette aree mann. > Pbuk, 0 big Cue conporeti hata model Le hoftined wrtttatir | @ puasenob ly short puselod (detec ot One or +00 menthiz) fiat prroutdu AFran Mohd. Pbrakenn. POMEAT Temata of arte into @ Handased erde F/No-L e-2 PLU Noa PUT No.2 PL ND 2 Plat a2 Pla the missing data, ‘The following points must be rectified in this phase: © aoee texts may hide valuable information. For example, XYZ PVT Lid doesnot ‘explicitly show that this isa Limited Partnership company. ° Inger mas ca bowed for ind datFor example, aa canbe avd as sung or asthe integers © Matching that associates equivalent fields in different sources, © Selection that reduces the numberof source fields and records. Cleansing and Transformation processes are often closely linkedin ETL tools, 3. Leading Tee Otis the process of writing the data into the target database. Dating the load step, itis necessary to ensure {hat the load is performed correctly and with as few resources as possible, ‘Loading can be carried out in two ways: 1 Refresh: Data Warchouse data is completely rewriten, This means tha older fle is replaced. Refresh is usually sed in combination with static extraction to populate a data warehouse ina % Update: Only thse changes applied o source information ar added tothe Daa Warehouse. Anup cS ‘ypically carried out without deleting or modifying preexisting ‘data. This method is used in combination with incremental extraction to update data warehouses regularly. © scanned with OKEN Scanner@) El _ _ \- . Mi eamedoees - = | = 7 ne exasliing det Ula puectss oY Deanifosening peers Gout, aeumnmareloeg, a 1“ 7d eee wb pum ik Digie Omioticlalts , compiles eeice = fem “ffeo late Eats @ coulslnt Hats £0 Spy as beets Luckices oud |S | east Can be plaud tnt | pacts 3DATA QUALITY What is Data Quality? Data quality i defined as: the degree to which data meets a company’s expectations ofauray, validly, completeness, and conssteny By tracking data quality, a busines can pinpoint potential issues harming quality, and ensure that shared data sf to be used for a given purpose, ‘When collected dats fis to meet the company’s expetations of acurcy, validly, completeness, nd consistency, it can have massive negative impacs on eustomer service, employee productivity, and key stones ‘Why Is Data Quality Important? Quality data is key to making accurate, informed decisions. White al data has some level of “quality” a variety of characteristics and factors determines the degree of data quality (high-quality versus low-quality). Furthermore, different data quality characteristics will likely be more important to various stakeholders across the organization. Alist of popular data quality characteristics and dimensions include + Accoraey + Completeness + Consistency + tegrty + Reasonabilty + Timeliness + Uniqueness/Dedupication + Validity + Accesibily Because data accuracy isa key attribute of high-quality data, a single inaccurate data point can wreak havoe across the entire system. Without accuracy and reibilty in data quali, executives cannot tt the dats or make informed decisions. This an, in tum, increase operational costs and wreak havoc for downszeam users. Analysts wind up relying on imperfect reports and making misguided conclusion based on those findings. And the productivity of end-users wil diminish duc to flawed guidlines and practices being in place. Poorly maintained data can lead toa variety of other problems, too. For example, out-of-date customer information ‘may result in missed opportunities for up-or cross-selling products and services. Low-quality data might also cause a company to ship their products to the wrong addresses, resulting in lowered customer satisfaction ratings, decreases in repeat sales, and higher costs due to reshipment, ‘And in more highly regulated industries, bad data can result inthe company receiving fines for improper financial or regulatory compliance reporting. © scanned with OKEN Scannerr harcoctineisties of Delt Sunt lp @ © Acceernaay-» lec varttece fala MUSE Compooum te acral, seal teortl seenaretoc auch weaftect 4ecal—wuntd ebjects auch erente clratyeG Lheetd He Lrcscipfabte doners Te Comp Sctne ethic menace, ye : 1 keloumereol Aco Clore the walters fim MUA hee wescitied Corvect agora tion, CoLeeed « ® ComplTemes » lomplefeners Wreretwuc the datals abit ty ALi cer at the manclalory Halt Cat aseo uaflable Auecechesttig © cna > Dake isteney Avcekee tre dala le UnYormtiy as Lt mous aecrons Appiication aid nefioorla Ad chen Tomes from recttipie Soererens . Comet. Aso means tuat tle Untguenus means thal no olutptications or vuduidaut ace artenfa| awosalt tie Lalacek- No peeesrolr tu Bre datacle fmalyits tote olla otrast Aeolp adele 2 loo Undyrenes score: enicts rueeltible Lines - aud cleduplicalitor CS © scanned with OKEN Scanneree ee a . aueding 2 Fhe. Ouaticti oe a Must be rotteeted ance 5 ~ J paseamelira ahaa ee : zatinsts defiives olnen oer : pe . feed dual e im tis toweat, ine Leyormation Stoutd tae oompre” / Tus accepted format, aud ay clatasete “@ roe 9 he taetilaty, ebbee preoper steauge- tbe Terud Hu ® Kelewoloiy + Whe sane date mot rol que Tea ome place Cn a uyefine @ Aeeciaetetty ify > Wie extut t ushic data ve actatl oy eactly aud pete es @ eojesthuthy + Wee cetane te verter the data is Unbiareel- © Charity » otasdtiy ue ekteBued ¥4 : Concer tiers ~ Ot hetb t make the data Ly Uuderslord by strc. Clemente ne | © scanned with OKEN ScannerDATA QUALITY C |ALLENGES Managlng the data structure and optimization Tre ate nny toprocess datas to stutureit 8 vy hat wil id your fue operations. As you sd tron and more daa to your warhouse, suturing BECOMES inreasingl dieu and can slow down the ETL process Als. it becomes neeasingly diel or sytem managers to qualify the data fr advanced analytics, In terms of system optimization, it’s important to carefully design and configure data analysis tools that are better suited to business needs. “Managing user expectations ‘Asmmore information gets loaded int a data warehouse, management systems strugle more to find and analyze it This means thet business users expect refined and relevant results from any analysis they run. However, data warehouse performance cen decrease a the data volume increases, which inevitably leas to reduced speed and efficiency It's your job to manage the expectations of your team so that they aren't frustrated when the buffering occurs ‘The costs of data warehousing ‘Acommon problem with traditional data warehouses isthe high failure rat. According toa Gartner report, ‘more than 50% of data warehouses fail at one point — not only because of the technical challenges and complex architectre but also because the proecs fil to met user requirements. ‘Organizations then face the same challenges when tying to update a data warchouse to accommodate new reporting requirements or data mode's. Even if'such projects don't fil, they have high costs and timelines. Ul these factors make waditonal data ‘warehouses inadequate for real-time data requirements and scalability. ‘Onthe other hand, ityou go with acloud-based data warchouse ll the maintenance rests on the cloud provider, while the cost is formed by the used GBs per month Snowflake, for example, even has la ate of 23/TDB/month, Google BigQuery's active storage costs $0.02 per GB per mont, withthe fist 10 GB free each month. Data quality “Maintaining quality data is difficult na traditional data warehouse where manual erors and missed updates lead to corrupt or obsolete daa. This inevitably impacts business decisions and causes innccurate data processing. Asbusinesses increasingly adopt digital wansformtion i intended data silos. This ours when departments heavily rely on cloud tools accompanied by the democratization of technology — where each department is moc likely to be responsible for purchasing and developing technologies for its use. Each ofthese silos represents enother source system from which uses need to pul, integrate, and analyze data to use itcorectl in decision-making. To make mates worse silos often don't follow the same set of businesswide standards, making data integration even more diffi ‘And due to the democratization of cloud technologies, your organization might even have valuable data silos tha IT doesn't know about. “Modem warehousing solutions can automate the data quality process, preventing data silos, outlier, manual erors, redundancy, and other data inconsistencies from occurring. ‘With an automated data warehousing solution, you are able to provide high-quality data that brings the most vale to your organization, Data Accuraey ‘f you want your data insights and business intelligence to be reliable, the data that is analyzed warehouse needs tobe acurte. Traditional data warehouse often suffer from inconsistencies hat lead to inaccurate data as a result of manual processing and other errs. ‘There are several ways to go around this challenge, but the fist and most important sto ensure that a data colton od string process gare nda hen dts oso sorsty before it enters te warehouse Data accuracy can also be improved through regular esting iatves, they often run into the problem of © scanned with OKEN ScannerHowever, with the right data warehousing solution | that Supports automated | twansfers, the chance for heman ‘error is minimal, If you use an {11.100 not only can YOU prevent inaccurate data from entering yout data ‘warchoue, but aso flag errors so hat you can OPimize your data accuracy athe care, Adjusting to non-technical users , ‘Traditional data warehouses are often complex for nor-technical teams to use, ‘Sure, everyone can master data analysis enough to be able to query data from any Source and know how to use the dat provided Ws vy. But the reality is diferent. Non-technical users often need to interact with company data, very efficient if you use 2 ‘traditional data warehouse — submitting a request to the data team, waiting forthe data team to fulfill the request, and using the data once delivered to them. “The process might work in small teams, but fr larger teams, it's time-consuming and inefficient, a datz teams can quickly become saturated with requests, leading to frustration and bottlenecks. However, with modem, self-managed data warchouses and automated ETL tools, this challenge is easy to ‘overcame Data transfer tools like What graph allow any user to move data from disparate sources to Google BigQuery without enlisting any help from the data or developer team, With point-and-click solutions, even non-technical users can operate a data warehouse without slowing dowm the workflow. Data pollution ‘Sometimes the data gets comupted in the source systems. Some ofthe common sources of data poliion is: + System Conversions Data Aging Heterogeneous System Integration oor Database Design Incomplete information at data entry Input rors Intemationalizaton and Localization of ta Fraud Lack of policy © scanned with OKEN Scanner