0% found this document useful (0 votes)
39 views54 pages

Unit 2

NA

Uploaded by

senthil7111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
39 views54 pages

Unit 2

NA

Uploaded by

senthil7111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 54
UNIT Il ETL and OLAP Technology Syllabus What is ETL - ETL Vs ELT - Types of Data warehouses - Data warehouse Design and Modeling - Delivery Process - Online Analytical Processing (OLAP) - Characteristics of OLAP - Online Transaction Processing (OLTP) Vs OLAP - OLAP operations- Types of OLAP- ROLAP Vs MOLAP Vs HOLAP. Contents 21 ETL (Extract, Transform, Load) 22 ETLVsELT 2.3. Types of Data Warehouses 2.4 Data Warehouse Design 2.5 Data Warehouse Modeling 2.6 Delivery, Process 27 — Online Analytical Processing (OLAP) 2.8 Online Transaction Processing (OLTP) Vs OLAP 29 OLAP Operations 2.10 Types of OLAP 2.11 Comparison of ROLAP and MOLAP 2.12 Comparison of ROLAP VS MOLAP VS HOLAP 2.13 Two Marks Questions with Answers (2-7) ae a 2-2 ETL and OLAP Techn, Yy Data Warehousing ERI ETL (Extract, Transform, Load) sform, Load and it is @ crucial process in a data Watehoy, environment. ETL refers to the set of activ involved in extracting data from ating sa i " and usable format and Fonding it into. the qa, ‘g an overview of each step in the py ETL stands for E forming it into a consistent sources, tr warehouse for analysis and reporting purposes. Here proces a is extracted from diverse sources such .. ces can inchide operations | media platforms, or an, y 1. Extract : In the extraction phase, dat APIs, or external systems. These sou spreadsheets, logs, social a can involve identifying the requiry the data in a structured forma, databases, files, systems, transactional databases, other relevant data sources. The extractio data, selecting specific tables or files and capturing Once the data is extracted, the transformation phase involve; g the data to meet the data warehouse, des activities such as data cleaning, day nrichment and resolving 2. Transform": cleansing, filtering and restructurin; requirements and standards. This step inclu validation, data normalization, data aggregation, data €1 inconsistencies or data quality issues. Transformation rules and business logic are applied to ensure that the data is consistent, accurate and ready for analysis. 3. Load : The load phase involves Joading the transformed data into the day wirehouse. This. step typically includes mapping the transformed data to the in the data warehouse schema. The loading process lata, updating existing records, or replacing existing may involve appending new di data based: on the data warehouse’s loading strategy. The load’ phase ensures tha med data is stored in the data warehouse in a structured and optimize appropriate tables and column: the transfor manner for efficient querying and reporting. The ETL process is typically automated using ETL tools o nalities to facilitate data extraction, transformation and loadi workflow management capabilities, data ‘mapping ani 1 platforms, which provide functior ing tasks. These tools often include graphical interfaces, transformiation features and connectivity options to integrate with various data sources. ‘The ETL process is essential for maintaining the integrity and consistency of data in a dit warehouse. It enables the consolidation of data from multiple sources, ensures data quality, and prepares the data for analytical processing, reporting and decision-making. ETL plays? critical role in keeping the data warehouse up-to-date and providing accurate and actionable insights to users. TEGHNIGAL PUBLIGATIONS®- an uptwst for knowledge Data Warehousing ag ERRIETL Process RDBMS T.and OLAP Tachnology. see! ae Loading Flat files ‘Transformation Extraction Fig, 2.1.1 ETL process © Fig. 2.1.1 shows ETL process. © The ETL (Extract, Transform, Load) process in a data warchouse involves a series of steps to extract data from various sources, transform it into a consistent and usable format and load it into the data warehouse. * Here's a.more detailed overview of each step in the ETL process : 1. Extract: © Identify data sources : Determine the sources -from which data needs to be extracted. These sources can include databases, files, APIs, web services, spreadsheets and other systems. © Define extraction methods : Decide on-the extraction methods to be used, such as full extraction (all data is extracted every time) or incremental extraction (only new or modified data is extracted since the last extraction). © Extract data : Extract thé identified data from the sources using appropriate techniques, such as querying databases, reading files, or utilizing APIs, Extracted data is typically stored temporarily in a staging area. : 2. Transform : © Data cleaning : Cleanse the extracted data to remove any inconsistencies, errors, or duplicates. This involves validating data, correcting inaccuracies, standardizing formats and handling missing or null values. © Data integ unified format. This includes mapping and aligning data attributes across sources, resolving conflicts and ensuring data consistency. mn : Combine data from multiple sources and integrate it into a TECHNICAL PUBLICATIONS® - an up-thrust for knowiedgo ____Eth and LAP Tochnetey ata transformation opel tions to convey and reporting. This can inctyg, ating new derived vate, © Data transformation + Apply various da the data into a format suitable for analy’ ing, filtering, sorting and calcul logic. fine the structure of the data warchous house schema : Def | relationships and indexes, based on th s, columns, schema, including tabl organization's data modeling requirements. Data mapping : Map the transformed data attribu and columns in the data warehouse schema. Thi warehouse structure with the transformed data structure. © Load data : Load the transformed and mapped data into the appropriate tables jn the data warehouse. Depending on the loading ‘strategy, this can involve apperiding new data, updating existing records, or replacing existing data. © Perform data quality checks : Validate the loaded data to ensure its accuracy, completeness and conformity to predefined data quality standards. Tdentify ang handle any data quality issues or discrepancies. it is essential to maintain data lineage and audit trails, of the data, This helps ensure data tes to the corresponding table, is involves aligning the dat ¢ Throughout the ETL process, documenting the origin and transformation history traceability, compliance and transparency. © ETL processes are often automated using ETL tools or platforms, which provide features for designing workflows, defining data transformations, scheduling data loads, and monitoring the ETL activities. These tools help streamline the ETL process and improve its efficiency. The ETL process is iterative and cyclical, as it may need to be repeated periodically to incorporate new data or changes from the source systems and keep the data warehouse up-to-date. ETL process can also make use of the pipelining model i.e. as soon as some data is extracted, it can transformed and at some point in that period some new data can be extracted. And while the transformed data is being loaded into the data warehouse, the already extracted data can be transformed. The block diagram of the pipelining of ETL process is shown below in Fig 2.1.2. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Data Warehousing ETLand OLA | Extract ‘]efrenstrm =| Load Extract }+|Transform}=} Load Extract f+|Transform}=} Load © Fig. 2.1.2 Pipelining of ETL process EGE Aavantages of ETL Process * The ETL (Extract, Transform, Load) process in a data warehouse offers several advantages that contribute to the effectiveness and efficiency of data management and analysis. «Here are some key advantages of the ETL process : 1. Data integration : The ETL process enables the integration of data from multiple disparate sources into a centralized data warehouse. It allows organizations to consolidate data from various systems, databases, files and APIs, providing a unified view of the data and facilitating cross-system analysis. 2. Data consistency and quality : Through data cleansing and transformation, the ETL process improves data consistency and quality. It helps identify and rectify errors, inconsistencies, duplicates, and missing data, ensuring that the data in the data warehouse is accurate, reliable and of high quality. 3. Standardization and conformity : The ETL process allows organizations to standardize data formats, units of measure, naming conventions and other data attributes across different source systems. This ensures data conformity and consistency, enabling effective analysis and reporting, 4. Data transformation and enrichment ; ETL enables the. transformation and enrichment of data during the process. It allows organizations to apply business rules, calculations, aggregations and derivations to the data, creating new meaningful metrics and insights that are valuable for decision-making and analysi TECHNICAL PUBLICATIONS on ups or knoodgo —— Data Warehousing 2-6 ETL and OLAP Technolog ETL prov s often include the capture and Tetention, . organizations to track and analyze data changes Ove and support historical reporting and comparison, can enhance the performance of day 5, Historical data tracking : historical data, This cnabl time, perform trend analysis 6. Improved performance : The ETL process : retrieval and analysis in the data warehouse. By transforming and aggregating day the data warehouse can provide pre-processed day times, during the ETL proc optimized for analytical queries, resulting in faster query respons 7. Scalability and flexibility : The ETL process allows organizations to handle larg, volumes of data and scale their data warehousing capabilities. It supports the integration of new data sources, accommodates changing business requirements ang facilitates future data expansion without disrupting existing data warehous. operations. : ETL processes can incorporate data security 8. Data security and governance : P measures and governance controls. This includes data encryption, access controls, data masking and compliance with regulatory requirements. The ETL proces; ensures that sensitive data is protected and handled according to established governance policies. 9. Automation and efficiency : ETL tools automate the extraction, transformation, and loading activities, reducing manual effort and improving efficiency. Automated workflows, scheduling, and monitoring features streamline the ETL process and “enable organizations to handle data updates and refreshes in a timely and consistent manner. 10. Decision-making and insights : By providing a consolidated and consistent view of data, the ETL process:enables organizations to make informed decisions and gain actionable insights. It supports ad-hoc queries, reporting, business intelligence’ and data analytics, empowering users with timely and accurate data for decision-making. © Overall, the ETL process plays a vital role in maintaining data integrity, supporting data-driven decision-making and maximizing the value of the data warehouse within an organization. Disadvantages of ETL Process © While the ETL (Extract, Transform, Load) process offers several advantages, it also has some potential disadvantages that organizations should consider. © Here are some of the common disadvantages of the ETL process : 1. Data latency : The ETL process often involves extracting, transforming and loading data in batch mode, which means that there can be a delay between the data's original source and its availability in the data warehouse. This latency can limit real-time or near real-time analysis and decision-making, especially for time- sensitive data. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Data Warehousing 2-7 2. Ld Te ETL and OLAP Technology Complex and time - consuming : consuming ‘to des ETL processes can be complex and time- : ign, develop and maintain, ‘They require expertise in data integration, data transformation, and data modeling. Developing and implementing the ETL process can involve significant effort, especially for organizations with large and diverse data sources, Potential data loss or inconsistency : During the ETL process, there is a risk of data loss or inconsistency if not handled properly. Errors in data extraction, transformation, or loading can result in missing or incorrect data in the data warehouse. Ensuring data integrity and accuracy requires careful validation and error handling mechanisms. Scalability challenges : As data volumes and complexity increase, scalability can become a challenge in the ETL process. Extracting and transforming large amounts of data within limited time windows can strain system resources, impact performance, and require additional hardware or infrastructure investments. Maintenance and upkeep.: The ETL process requires ongoing maintenance and upkeep. As data sources change, new data elements are added, or business rules evolve, the ETL process needs to be updated and adapted. This requires regular monitoring, troubleshooting and modifying the ETL workflows, transformations and mappings. Impact on source systems : Extracting data from operational systems during the ETL process can put a strain on those systems, affecting their performance and availability. Organizations need to carefully manage the impact on source systems and ensure that the extraction process does not disrupt operational processes. Limited real - time analysis : The batch-oriented nature of the ETL process means that real-time analysis or processing of data is limited. For use cases that require immediate or near real-time insights, alternative approaches such as real- time data integration or streaming data processing may be more suitable. Complexity of data transformations : Data transformations in the ETL process can be complex, especially when dealing with diverse data sources, varying data formats and complex business rules. Designing and implementing complex transformations can introduce potential points of failure or introduce performance bottlenecks. Dependency on ETL tools : Organizations often rely on ETL tools or platforms to automate and manage the ETL process. However, these tools can come with licensing costs, require specialized skills to operate and may limit flexibility or customization options. TECHNICAL PUBLICATIONS® - an upthnstforkrowfodpe 7 ETL and OLAP Techno, Data Warehousing 10. Limited flexibility for ad-hoc analysis + The structured nature of the q, warchouse schema, resulting from the ETL process, can limit the flexibitty », and transformations are typically designed to SUppoq requirements, potentially restricting the explora, and hierarchies. ad-hoc analysis. Data models specific reporting and analysis beyond predefined dimensions sluate these disadvantages against the specific needs ayy Alternative approaches, such as real-time diy, of da © It is important to eva requirements of the organization. tion, or ELT (Extract, Load, Transform) processes, may by integration, data virtual more suitable in certain scenarios to address these limitations. : EBJETL Vs ELT Sr.No Parameters ETL (Extract, Transform, ELT (Extract, Load, Load) Transform) 1 Data volume ETL processes are traditionally _ ELT leverages the processing and processing _used for large-scale data power and scalability of power integration, where the extraction modern data warehouse ‘and transformation processes _ technologies, allowing for are performed outside the data _: high-volume data loading and warehouse. , * jn-database transformations. 2 Flexibility and ETL provides more flexibility ELT allows data analysts and F agility and agility in data processing as data scientists to perform ad- the raw data is loaded into the hoc queries and explore the data warehouse without raw data directly, enabling extensive upfront faster insights and analysis. transformations 3 Performance - ° ETL processes may involve ELT can provide faster loading f significant data transformations times as data is loaded directly before loading data into the and transformations are warehouse, which can impact performed within the data performance warehouse using parallel processing capabilities 4 Data storage ETL processes often require a ELT reduces the need for and cost separate staging area or storage additional storage systems system to temporarily hold the since data is loaded directly extracted data during into the data warehouse. transformation. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ETL and OLAP Technology and load 3 Complexity and” TEFL processes require ELT leverages the data skill Specialized skills and tools for processing capabilities of requirements, data extraction, transformation, — modern data warehouses, which are often familiar to SOL-savvy data professionals, reducing the need for ag asl aa Gel specialized ETL tools and skills. 6 Data lineage proce: In ELT, the data lineage and and auditability wintain data lineage and auditability are primarily t trails during the extraction focused on the transformation and transformation stages. activities performed within the data warchouse. 7 Process Data is transferred to the ETL Data remains in the Database Server and moved back to DB. _ except for cross Database Very High bandividth is loads required. 8 Analysis processing requirements, flexibility needs, with their needs. EE] Types of Data Warehouses The choice between ETL and ELT depends on various factors such as data volume, performance goals infrastructure. Organizations should evaluate their specific use cases, requirements and data warehouse capabilities to determine which aj and available data integration proach aligns best Data warehouses can be classified into different types based on various factors. Here are some common types of data warchouses : 1. Enterprise Data Warehouse (EDW) : An enterprise data warehouse integrates data from various sources across an entire organization. It serves as a central Fepository for all business-related data, providing a comprehensive and unified view for analysis and reporting, operational reporting and real-time deci real-time data for operational reporting purposes. Operational Data Store (ODS) : An operational data store is designed to support ion-making, It acts as an intermediate storage layer between transactional systems and data warehouses, providing near TECHNICAL PUBLICATIONS® - an up-thrust for knowledge 1 focuses on a speci ie Data Warehousing 2-10 ETL and OLAP Techno, 3. Data mart : A data mart is a subset of a data warehouse that business function, department, or user group within an organ typically designed to provide a more targeted and spec zed view Of data, making | ars to access and analyze information specific to their needs, ion. Data mars a, , it easier for u: 4. Virtual data warehouse : A virtual data warehouse is a concept where data is ng | physically stored in a central repository. Instead, it leverages. virtualizaty | data from various sources, enabling users. | ditional data warehouse. Virtuy | techniques to provide a logical view of query and analyze data as if it were stored in a tra le and can integrate data from botl data warehouses are flexibl jh internal and externa] sources. ‘A federated data warehouse is a distributed 5. Federated data warehouse : dent data sources withow architecture that combines data from multiple indepen physically moving or consolidating the data. It allows users to access and query data from different sources transparently, providing a unified view for analysis and ) reporting purposes. | 6. Analytical data store : An-analytical data store focuses on supporting advanced analytics and data exploration. It is optimized for complex queries, data mining, . predictive modeling, and other advanced analytical techniques. Analytical daw stores often employ specialized data models and structures to enable efficient processing of analytical workloads. 7. Real-time data warehouse : A real-time data warehouse is designed to capture, process and analyze data in near real-time or with minimal latency. It enables organizations to make timely decisions based on up-to-date information by continuously loading and processing data as it becomes available. These are some of the common types of data warehouses, and organizations may we one or a combination of these types based on their specific requirements and use cases. © Let's delve into the different types of data warehouses in more detail : 4. Enterprise Data Warehouse (EDW) : Description : An enterprise data warehouse serves as a central repository that integrals | data from various sources across an entire organization. © Characteristics : It provides a unified view of data, consolidating information fro" different systems and departments, It supports complex data transformations and é* cleansing processes to ensure data quality. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge J Data Warehousing 2-11 ETI and OLAP Technology © Benefit DWs enable comprehensive reporting, advanced analytics and decision- making across the organization by providing a single source of truth. They facilitate data ion and consistency, integ allowing for e junction ailysis and insights. 2. Operational Data Store (ODS) ; © Description : An operational data store acts as an intermediate layer between tional systems and data warchouses reporting and real-time decis * Characteristics : ODS ¢; systems, transa ;. It focuses on supporting operational n-making, ‘aplures and stores near real-time data from transactional Providing quick access to operational information. It may store raw or lightly transformed data to mairitain data integrity and accuracy. + Benefits : ODS allows organizations to monitor operational activities and make immediate decisions based on up-to-date data. It supports real-time reporting, exception handling and operational analytics, 3. Data mart: * Description : A data mart is a subset of a data warehouse that caters to the needs of specific departments, user groups, or business functions within an organization. * Characteristics : Data marts are designed to provide a focused view of data, tailored to the requirements of a particular department or user “group. They often employ a dimensional modeling approach arid store pre-aggregated data for faster querying and analysis. © Benefits : Data marts offer simplicity, agility and quicker implementation compared to a comprehensive enterprise data warehouse. They provide specific business insights to individual departments, enabling faster decision-making and analysis. 4. Virtual data warehouse : © Description : A virtual data warehouse is a concept where data is not physically stored in a central repository. Instead, it leverages virtualization techniques to provide a logical view of data from various sources. © Characteristics ; Virtual’ data warehouses integrate data from multiple sources on-the- fly, without physically moving or consolidating the data. They use data virtualization technologies to create a unified view of data, which can be accessed and queried by users, * Benefits : Virtual data warchouses offer flexibility, as they can quickly incorporate new data sources without the need for extensive data integration efforts. They provide, real- time or near real-time access to data from diverse sources, enabling agile data analysis and reporting. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge oe hb 6. Analytical data store : requirements and analytical needs. Organizations may choose one o types based on their data integration, reporting and analytics goals. Data Warehousing. 2-12 ETL and OLAP Techy 5, Federated data warehoust | A federated data warchouse is a distributed architecture that comin, | ically consolidating the dag | © Description data from multiple independent data sources without physi Federated data warchouses use a logical integration layer t0 agg. | ‘They maintain’ data sovereignty and engy) | © Character and query data from various sources. different systems. © Benefit 5 allow organizations to leverage data from | disparate sources without the need for data replication. They provide a unified ang | ipporting cross-system analysis and reporting while Fespectin | transparent data access : Federated dat tent view of data, consis data governance and security requirements. © Description : An analytical data store focuses on supporting advanced analytics, da, mining, and complex analytical queries. : : © Characteristics : Analytical data stores are optimized for analytical processing, using specialized data structures and indexing techniques. They store historical data, suppor complex aggregations and calculations and enable sophisticated data analysis ang modeling. Benefits : Analytical data stores provide fast query performance for complex analytical data exploration and, reporting. They facilitate advanced analytics, such as queries, analysis, enabling organizations ty predictive modeling, machine learning and statistical uncover insights and patterns from large volumes of data. 7. Real-time data warehouse : © Description : A real-time data warehouse is designed to capture, process and analyze data in near real-time or with minimal latency. Characteristics": Real-time data warehouses employ technologies like Change Datt Capture (CDC) and ‘streaming data ingestion to capture and process data as it becomes available. They provide up-to-date information for real-time reporting and decision making. © Benefits : Real-time data warehouses enable organizations to respond quickly © changing business conditions and make data-driven decisions in real-time. They suppot real-time analytics, monitoring and operational reporting, particularly in use cases likt fraud detection, stock trading and JoT data analysis. These are the main types of data warehouses, each catering to specific busines ra combination of the TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ” pata Warchousing 2-13 ETL and OLAP Technology pa Data Warehouse Design Data warehouse design is the process of structuring and organizing data in a way that enables efficient querying, analysis and reporting, It involves several key steps and considerations to ensure that the da {a warehouse meets the needs of the organization and its users. Here are some important aspects of data warchouse design : 1. Identify busi ity requirements : Understand the goals and requirements of the organization, such as the types of analysis and reporting needed, the data sources involved and the Key Performance Indicators (KPIs) that need to be tracked. 2. Data modeling : Di sign a logical data model that represents the business entities, relationships and attributes relevant to the data warehouse. Comimon data modeling techniques include star schema and snowflake schema, which involve organizing data into fact tablés and dimension tables, 3. Extract, Transform, Load (ETL) : Determine the processes and workflows for extracting data from various sources, transforming it into a consistent format and loading it into the data warehouse. ETL processes may involve data cleansing, aggregation, integration and enrichment. 4. , Dimensional modeling : Apply dimensional modeling techniques to represent data in a way that supports efficient querying and analysis. This involves identifying key dimensions (e.g., time, geography, product) and organizing data hierarchies within each dimension, Physical design : Determine the physical storage structures for the data warehouse, including decisions about hardware, Database Management Systems (DBMS), indexing, partitioning and data distribution strategies. Consider performance optimization techniques, such as data compression and indexing, to ensure fast a query response times. 6. Data security : Implement appropriate security measures to’ protect sensitive data in the data warehouse. This may involve access controls, data encryption and data masking techniques to ensure compliance with privacy regulations and prevent unauthorized access. 7. Data governance : Establish data governance policies and procedures to ensure data quality, consistency and integrity within the data warehouse. This includes data validation, data profiling and data stewardship practices to maintain the ‘accuracy and reliability of the data. TECHNICAL PUBLICATIONS® - an up-thrust for knowtodgo Data Warehousing 2-14 ETL and OLAP Tec, 0 % 8. Scalability and performance : Consider the scalability requirements of the warchouse to accommodate future growth’ in data volume and user demang, Fe a Toads and ensure that gu, | Design the system to handle increasing di performance remains optimal as the data warehouse grows. 9. Metadata management : Develop a metadata management strategy t0 doctime, and track the data sources, data transformations, business .rules and definition, within the data warchouse. Metadata provides valuable information about the da, _ and helps users understand its context and meaning. 10. User interface and repor + Design intuitive user interfaces and reporting tool, that enable end-users to easily access and analyze the di needs of different user roles and provide self-service capabilities wheneye, possible. lata warehouse. Consider thy : Data warehouse design is an iterative process thy 11. Iterative development involves continuous refinement and improvement based on user feedback ang changing business requirements. Plan for regular updates and maintenance to keep the data warehouse aligned with evolving needs. © Overall, successfill data warehouse design requires a deep understanding of the organization's business requirements, data sources and analytical goals, along with technical expertise in data modeling, ETL processes. performance optimization and data governance: Collaboration between business stakeholders and technical experts is crucial to ensure that the data warehouse design meets the needs of the organization and > provides valuable insights for decision-making. There are two approaches 1."top-down" approach 2."bottom-up" approach [22M Data Warehouse Design : Top Down Approach ¢ The top-down approach is a common methodology used in data warehouse design. It involves designing the data warehouse from a high-level perspective and gradually drilling down into more detailed components. Here are the main steps involved in the top-down approach to data warehouse design : 1. Define the business requirements : Start by understanding the business £0 objectives and information needs of the organization. Identify the key metrics performance indicators and dimensions that are important for decision-making. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ETL and OLAP Technology. Source system n Source system 4 Extract transform and load Enterprise data warehouse (Normalized) Extract transform and load Se, S44 Data mart 1 Data mart n Star schema Star schema Top down design approach Fig. 2.4.1 Top down design approach Identify subject areas : Determine the. subject areas or domains that need to be covered in the data warehouse. Subject areas are typically related to specific business functions or areas of interest, such as sales, finance, inventory, or customer data. Create a conceptual model : Develop a high-level conceptual model that Tepresents the subject areas and their’relationships. This model captures the major entities, attributes and relationships between different subject areas, providing a holistic view of the data warehouse. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge oy Data Warehousit 2-16 ETL and OLAP Techy is of the d 4. Define the data marts Data marts are subs Warehouse tha, focused on specific subject areas. Mentify the key data marts based on the sup - areas identified carier, Each data mit should contain the necessary dimen facts and measures specific to its subject area, 5. Design the dimensional model + Apply dimensional modeling, techniques , This involves ereating dimension tables, design the schemas for each data ma and hicrarchi capture the descriptive attribu as well as fact tables that storey, quantitative measures or met 6. Determine the integration + Decide how data will be integrated fon, various source ystems into the data’ warehouse. ‘This may involve develop, Ext cleanse and transform it, and load it into the appropriate data marts. ‘Transform, Load (ETL) processes to extract data from source systems 7. Develop the physical design : Specify the physical implementation details for th data warehouse, including the database platform, hardware _ infrastructure, partitioning strategies, indexing and data storage considerations. This step focuses on optimizing performance, scalability and manageability of the data warehouse, 8. Implement the data warehouse : Develop and deploy the data warehouse based on the design specifications. This includes creating the necessary database structures, implementing the ETL processes and loading the data into the diz marts. 9. Enable user access and reporting, : Provide tools and interfaces that allow uses to access and analyze the data in the data warehouse. This may involve developing reporting and visualization capabilities, implementing query and analysis tools ani ensuring appropriate security measures are in place. 10. Iterate and refine : The top-down approach is an iterative process. Continuously gather feedback from users, refine the design based on changing requirements and incorporate new subject areas or data marts as needed. ¢ The top-down approach to data warehouse désign ensures that the overall architecturt aligns with the organization's strategic goals and provides a holistic view of the date u allows for better integration and consistency across subject areas and facilitates east data analysis and reporting, « TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ‘pata Warehousing 2-17 ETL and OLAP Technology pa Data Warehouse Design : Bottom Up Approach Source. system 2 Source system n, Source system 1 Extract transform and load Data mart 2 DW Bus- dimensions and conformed facts Bottom up approach (DW Bus) Fig 2.4.2 Bottom up design approach © The bottom-up approach is an alternative methodology used in data warehouse design. It involves building the data warehouse incrementally, starting from individual data sources and gradually integrating them into a comprehensive solution.-Here are the main steps involved in the bottom-up approach to data warehouse design:: Identify data sources : Begin by identifying the various data sources within the organization, such as transactional databases, spreadsheets, log files, external data feeds, etc. These sources typically contain valuable data that can be used for analysis and reporting. Select a pilot project : Choose a specific business area or subject domain to start with, typically one that has a pressing need for data analysis or reporting. This will serve as a pilot project for the data warehouse implementation. Extract data : Develop extraction processes to extract relevant data from the selected data sources. This may involve using ETL tools or custom sctipts to retrieve data from databases, files, or APIs. TECHNICAL PUBLICATIONS® - an up-thnist for knowiadge Data Warehousing 4. 10. = 2-18 ETL and OLAP Teg, ‘Transform and cleanse data : Apply nece techniques to ensure the quality and consisteney of the data. This step may jn, : 7 oT I ion, normalization and data quality checks, ary transformations and Cleans data validation, standardi: ging area : Create a staging area where the extracted staging area acts as a buffer 5, nt Load data into a st transformed data will be temporarily stored. Th before the data is loaded into the data warehouse. Design and create ‘a dimensional model : Apply dimensional moggj, techniques to design the schema for the pilot project. This involves identifying y key dimensions, facts and measures specific to the subject area. Create dimen, and fact tables based on the designed schema. Load data into the data warehouse : Once the staging area is populated yi, transformed data, load the relevant data into the dimension and fact tables of y data warehouse. This step involves mapping the transformed data to the appropriz columns in the data warehouse schema. Enable user access and reporting : Provide users with the necessary tools ay interfaces to access and analyze the data in the data warehouse. This may invol, developing reporting and visualization capabilities, implementing query a analysis tools and ensuring appropriate security measures are in place. Incremental expansion : After the initial pilot project is implemene successfully, expand the data warehouse by repeating the same process wit additional data sources and subject areas. Gradually integrate more data soure and extend the dimensional model to cover new domains. Iterate and refine : Similar to the top-down approach, the bottom-up approachi an iterative process. Gather feedback from users, refine the design based 0 changing requirements and incorporate new data sources or subject areas needed. The bottom-up approach to data warehouse design allows for a more incremental a flexible implementation, focusing on specific business needs and delivering value in a phas. manner, It allows organizations to start small and gradually scale up the data warehouse bas on the sucet ss of individual projects. EEE] Top Down and Bottom Up Approach Comparision ‘Top-down design approach Bottom-up design approach Breaks the huge problem into smaller sub Solves the important low-level problem and problems. integrates them into a higher one. Tnherently architected- nota union of several _Inherently incremental; can schedule essential data marts. data marts first, TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge poo vetwsng 2-19 ETL and OLAP Sinple, central storage OF information about the. fy pattmental information stored content snnalized niles and control, aan Ffecnmal Departmental rules and control | ———— | | includes redundant information, Redundancy can be removed. Wanay see quick reaults i plemented with > > Less rak af fallue, lnvorable return on | repetitions | investinent and proof of technique F ; Ee a Warehouse Modeling ¢ Data _warchouse modeling involves designing the structure and organization of data within a data warchouse to support efficient querying, analysis and reporting. The two approaches for data warchouse modeling are dimensional modeling and Entity- Relationship (ER) modeling. 1 common Dimensional modeling : Dimensional modeling is a technique that organizes data into a star schema or snowflake schema, which consists of a central fact table surrounded by dimension tables. This approach is optimized for query performance and analytical processing. Here are the key elements of dimensional modeling. 2. Fact tables : Fact tables contain the quantitative measures or metrics that represent the business events or activities being analyzed. Each row in the fact table corresponds to a specific instance of the event and contains foreign keys referencing the associated dimension tables, along with the measured values. _ Dimension tables : Dimension tables contain descriptive attributes that provide context and additional details about the data in the fact tables. They represent the different perspectives by which data can be analyzed, such as time, geography, product, or customer. Dimension tables typically have a primary key and are connected to the fact table through foreign keys. 4. Hierarchies : Dimension tables often have hierarchies that represent the different levels of granularity within each dimension. For example, a time dimension can have hierarchies like year, quarter, month, and day, allowing data to be aggregated and analyzed at different levels of time granularity, i 5. Entity-Relationship (ER) modeling : ER modeling is a technique commonly used in traditional database design and can also be applied to data warehouse modeling. It focuses on capturing: the relationships between entities and the attributes associated with those entities. Here are the key clements of ER modeling, TECHNICAL PUBLICATIONS® - an up-thstforknowiodge 1 6 Entities : Entities represent the real-world objects or concepts that are relevant 4, 1 warehouse, entities. gy Data Warehousing ETL and OLAP Tochng, the data being modeled. For example, in a sales dat include customers, products, orders and suppliers. jations between entiti jonships. define the asso They 7. Relationships + Rel ‘© connected to cach other and describe the cardinality ang indicate how entitie such as one-to-one, one-to-many, or many-to-many, nature of the relationships, (tributes represent the characteristics or properties of entities, They onal details and context about the entities. For example, attributes of address and contact information. 8. Attributes provide addi a customer entity can include name, 9. Primary keys and foreign keys : Primary keys uniquely identify each instance of an entity, while foreign keys establish relationships between entities. Foreign keys link tables together, enabling data integration across multiple tables. Both dimensional modeling and ER modeling have their advantages and are suitable for different scenarios: Dimensional modeling is commonly used in data warehousing as it simplifies query performance and facilitates business analysis. ER modeling is more widely used in transactional systems but can also be used in data warehouse modeling, especially when there is a need to capture complex relationships and normalize the dats. Ultimately, the choice of data warehouse modeling approach depends on the specifi requirements of thé organizatios, the nature of the data being modeled and the intended analytical use cases. It is important to carefully analyze the business requirements and consult with stakeholders to determine the most appropriate modeling approach for @ given data warehouse project. Data Modeling Life Cycle Business Logical data Physical Fulfilled business i modeling data requirements dais modeling for data storage | Fig. 2.5.1 Data modeling life cycle ‘The data modeling life cycle encompasses the processes and activities involved in creat and managing data’ models throughout their lifecycle: It typically consists of the follow stages : 1. Requirements gathering’: In this initial phase, data modelers work closely with stakeholders, subject matter experts and business users to gather requirements. Thi TECHNICAL PUBLICATIONS® - an up:thrust for knowledgo 5. pate Warehousing or ETL and OLAP Technology involves understanding the business goals attributes and the intended use of the data ir Conceptual data modelin; data sources, data entities, relationships, nodel. i = In thi created. It represents the key entit without delving into the technical stage, a high-level conceptual data model is ics, relationships and attributes of the data implementation details. The focus is on understanding the business concepts and their relationships. Logical data modeling : The logical data model builds upon the conceptual model and adds more detail. It involves translating the business requirements into a model that is independent of any specific technology or database platform. The logical data model defines the entities, their attributes, relationships and the business rules that govern the data. Physical data modeling : The physical data modeling phase focuses on implementing the logical data model into a specific Database Management System (DBMS) or technology platform. It involves selecting appropriate data types, defining primary and foreign keys, specifying indexes and optimizing the model for performance and storage efficiency. Database implementation : Once the physical data. model is defined, it is implemented by creating the database schema and tables according to the model's specifications. This includes sctting up relationships, constraints, triggers and any other necessary database objects. Data integration and ETL : In this stage, the data model is integrated with various data sources and transformed through Extract, Transform, Load (ETL) process: transforming and mapping it to fit the data model and loading ‘it into the target database. Data quality assurance : Throughout the data modeling life cycle, data quality is a The data integration involves extracting data from source systems, ctitical consideration. Data modelers and data quality experts perform data profiling, validation, and cleansing activities to ensure the accuracy, completeness and consistency of the data. Maintenance,and evolution : Data models are not static and require ongoing maintenance and evolution as business needs change and new data sources are added. This stage involves managing changes to the data model, incorporating feedback from users and continuously improving the model to reflect the evolving business requirements. TECHMGAL PUBLICATIONS® an ups for knowledge Data Warehousing 4 2-22 aha % 10. The data modeling life cycle is iterati refining the models based, on feedb that requires collaboration, commun! data model remains aligi [EX Delivery Process e The delivery process in a di insights to users or d Extracting, Transforming and Lo: data transformations and aggregations and deliverit 3 ETL and OLAP Technolg, Metadata management : Metadata, which provides information about the daty sential for understanding and managing the dat, and documenting metadata, such a day, model and its components, Metadata management involves capturing data lineage, data transformations and business rules associated wiy, definition the data model, Collaboration and documentation + Effective collaboration and documentation ele. Data modelers work closely takeholders, DBAs, developers and other team members to ensure g aning of the data model. Comprehensive documentation is create Jear record of the data model's desig are crucial throughout the data modeling life ©: te future reference and maintain a ¢} decisions and changes. : ‘ive, meaning that it often involves revisiting ang ack and evolving requirements. It is an ongoing process ication and continuous improvement to ensure that the ned with the organization's needs. ne lata warehouse refers to the process of providing data and jownstream systems in a timely and efficient manner. It involves ading (ETL) data from various sources, performing ing the processed data to end users or downstream applications. e Here is 1. 2 3. ina data warehouse : The process starts with extracting data from multiple source external APIs, flat files, or other date a regular basis or in real-time to ensure a general overview of the delivery process Data extraction : systems such as transactional databases, sources. This data is usually collected on the warehouse has up-to-date information. : Once the data is extracted, it ‘undergoes various Data transformation sis and reporting, Transformations may transformations to make it suitable for analy: include cleaning and validating the data, applying business rules, aggregating ot and performing calculations or derivations. These lefined in an ETL process or data integration tool. ns are applied, the data is loaded into the data in a structured format summarizing data, transformations are often d Data loading : After the transformatio! data warehouse, This involves storing the processed such as tables or cubes that are optimized for reporting and analysis. Loading ¢2 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge J = gata Warehousing ETL and OLAP Technology be pertormed in diffrent Ways, including full loads (reloading, all data) or incremental loads (updating, only the changed or new data) 4. Data inte eration : Tn addition to loading data into the data warehouse, there Mm be a need to integrate data thom multiple sources to create a unified view, This involves mapping and aligning data ftom diffrent systems, resolving any inconsistencies or contliets, and ensuring data quality and consistency across the warehouse, ” Metadata management : Alongside the actual data, metadata plays a erucial role in the delivery process, Metadata includes information about the data’s source, structure, relations ips and meaning, Managing metadata helps users understand and navigate the data warehouse, its conte enabling them to find relevant data, understand and make informed decisions, 6. Data access and reporting : Once the data s loaded and integrated, users can access it through various reporting and analytics tools, These tools provide a user- Fiendly interface for querying the data, generating reports, «1 and performing necessary permission ualizations, ating. vis fa analysis. The delivery process ensures that users have the ind access rights to retrieve the data they need. 7. Data distribution : In some eases; the proces I t0 be distributed to ed data may ne downstream systems or applications for further prov ing or consumption, This CSV, JSON) or integrating can involve exporting data in various formats (c. with other systems through APIs or data feeds. 8. Monitoring and maintenance : Ongoing monitoring and maintenance are essential to ensure the data warehouse's performance, reliability, and a This includes ‘monitoring data loads and proc ura cy. ing times, identifying and resolving data quality issues, optimizing queries and transformations and applying regular updates or patches to the data warehouse infrastructure. * Overall, the delivery process in a data warehouse encompas cries of eS aS from step data extraction to data access and reporting, ensuring that users have access to reliable, consistent and relevant data for analysis and decision-making. Online Analytical Processing (OLAP) * OLAP (Online Analytica analysis of large volumes of data from different perspectives. It provides a way to Processing) is a technology that enables multidimensional organize and analyze data to support complex and interactive data exploration, reporting and decision-making. TEGHIIGAL PUBLICATIONS an ptt or knowedgo Data Warehousing Here are some ki 1, "| ETL and OLAP Techn, characteristics and components ofOLAP > OLAP uses a multidimensional data model, e. This model organizes data ip, data model : Muttidimensio in a structure called a cubs represent data in a structure called ° ; i ese rent perspec dimensions and measures. Dimensions represent the differ Perspectives 9, attributes of the data (e.g., time, geography, product), while measures represent y, numerical values or metrics that are being analyzed (c.g., sales revenue, quanti, sold). Data aggregation : OLAP allows for data aggregation across multiple Gimension, It can summarize and aggregate data at different levels of granularity, such a rolling up data from a detailed level (e.g., daily sales) to higher levels (eg, monthly or yearly sales). Aggregations facilitate faster query performance and the ability to view data from different hierarchical levels. Interactive analysis : OLAP provides an interactive and user-friendly interface for analyzing data. Users can drill down to more detailed levels, drill up to higher levels of aggregation, pivot dimensions, slice and dice data and apply various calculations and aggregations on the fly. This flexibility allows users to explore data from different angles and gain insights quickly. Fast query performance : OLAP technologies are designed to deliver fast query performance, even when dealing with large volumes of data. Pre-aggregations and indexing techniques are often used to optimize query. response times, OLAP databases are specifically designed for analytical workloads and are optimized for complex calculations and aggregations. Complex calculations : OLAP supports advanced calculations and calculations across multiple dimensions. Users can define custom calculations, perform mathematical operations, create derived measures and apply business rules within the OLAP environment. This capability enables the creation of sophisticated analysis and supports complex decision-making Processes, Hierarchical navigation : OLAP enables users to navigate through hierarchical structures within dimensions. For example, a time dimension can have levels such as year, quarter, month, and day. Users can drill up or down through these levels to analyze data at different levels of detail. This hicrarchical navigation allows for casy exploration and comparison of data across different dimensions, Data integration : OLAP systems can integrate data from various sources including, data warehouses, operational databases and external systems, Data TECHNICAL PUBLICATIONS® «an up-thrust for knowledge ata Warehousing 2-25 ETL and OLAP Technology integration ensures that all relevant data is available for analysis in a consolidated and consistent manner, * OLAP technology has found applications in various domains, such as business intelligence, financi isting, and performance management. It empowers users to gain insights from data quickly, make informed decisions, and discover patterns and trends that may not be apparent in traditional tabular representations of data, ERM OLAP Guidelines (Dr.£.F.codd Rule) ¢ Dr. E-F. Codd, a pioneer in the field of relational databases, proposed a set of 12 rules known as "Codd's 12 Rules" for OLAP systems. These rules provide guidelines for designing and implementing OLAP systems to ensure their effectiveness and adherence to the principles of OLAP. Here are some of the key guidelines from Codd's rules : 1. Multidimensional ‘conceptual view : The OLAP system should provide a multidimensional conceptual view of data, allowing users to analyze data from different dimensions simultaneously. The system should support dimensions, hierarchies and measures to represent data in a multidimensional structure. 2. Transparency : The OLAP system should be transparent to the user, abstracting the underlying data complexities. Users should be able to access and analyze data without needing to understand the technical details of data storage or retrieval. 3. Accessibility : The OLAP system should provide easy and efficient access to data for end users. It should support fast query response times, interactive analysis and flexible data exploration capabilities. 4. Consistent reporting : The OLAP system should ensure consistent reporting and analysis results across different dimensions and levels of detail. The system should handle data aggregation and drill-down operations accurately to maintain data consistency. 5. Dynamic database reorganization : The OLAP system should support dynamic reorganization, of the database structure without requiring the system to be shut down. This allows for modifications to the dimensions, hierarchies and measures without disrupting user access to data. 6. Client-server architecture : The OLAP system should employ a client-server architecture to facilitate efficient data retrieval and analysis. The server component handles data storage and processing, while the client component provides a user- friendly interface for interacting with the data, TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge Data Warehousing 2-26 ETLand OLAP al sate 7. - Generic dimensionality : The OLAP system should be able to handle dimen, of arbitrary compléxity, It should support hierarchies with multiple levels, | aggregations and flexible calculations. “yy tive data manipulation : The OLAP system should provide intuitive al operations for manipulating data, Users should be able to drill down, up, slice, dice and perform other analytical operations easily and in a user-friony, manner, 9. Flexibility : The OLAP system shouild be flexible in accommodating changes . | data structures, dimensions and measures. It should support the addition or remoyay | without requiring significant syste, | of dimensions, hierarchies and measures | modifications. . | 10. Multidimensional performance : The OLAP system should be optimized fo, | multidimensional analysis. It should provide fast response times for comple, culations and navigation across multiple queries involving aggregations, cal dimensions. «These guidelines’serve as principles for designing OLAP systems that support efficien and effective data analysis. While the original rules were proposed in the context of relational databases, they have influenced the development of OLAP technologies and continue to shape best practices in the field. Characteristics of OLAP Fast a Analysis’ 02 Shared (o3] Multidimensional o———_04l Information (os) Fig. 2.7,1 Characteristics of OLAP Fast With the elementary analysis lasting little more than one second and very few taking more than 20 seconds, it identifies which system was intended to provide the majority of feedback to the client in approximately five seconds. Analysis «It specifies how any business logit ndled by the method while yet keeping the method simple enough .c and statistical analysis pertinent to the function a" the user can be ha fot Se a ae ee tee a “| pata Warehousing Dao ETL and OLAP Technology the intended client, We exelude products (like Oracle Discoverer) that do not allow the user 10 define new Adhoe calculations as part of the analysis and to document on the data in any desired product that do not allow adequate ¢ we do not think it is acceptable Il application definitions have to be allow the user to define new Adhoc calculations as part of the analy: is and to document on the data in any desired method, without having to programme. ‘Share © Notall functions require the user to write data back, but an increasing number do, so the system should be able to manage multiple updates imely, secure manner. It defines in which the system tools all the security requirements for understanding and if multiple write connections are needed, concurrent update location at an appropriated level. Multidimensional © The prerequisite is this. Due to the fact that this is unquestionably the most logical way to analyse businesses and organizations, OLAP systems must offer a multidimensional conceptual representation of the data, including full support for hierarchies. Information * All the data.required by the apps should be able to be stored on the system. It is important to handle data sparsity effectively. © Here are the key characteristics of OLAP (Online Analytical Processing): 1. Multidimensional analysis : OLAP systems organize and analyze data across multiple dimensions, allowing users to explore and analyze data from different perspectives simultaneously. Dimensions can include attfibutes such as time, geography, product, customer and more. 2. Aggregation and summarization : OLAP enables the aggregation and summarization of data, allowing users to view data at various levels of granularity. It supports roll-up and drill-down operations, which involve aggregating data to higher levels or drilling down to more detailed levels, respectively. 3. Fast query performance : OLAP systems are optimized for fast query response times. They employ techniques like indexing, precomputations and caching to ensure quick access to data, even when dealing with large volumes of information. 4. Advanced calculations : OLAP systems support complex: calculations and calculations across multiple dimensions. Users can perform calculations, apply formulas, create derived measures and define custom calculations to analyze data based on specific requirements. TECHNICAL PUBLIGATIONS®- an ups for nowedge q Data Warehousing 2-28 ETL and OLAP Techn, | | 5. Hierarchical navigation : OLAP systems allow users to navigate hierarchic, | ns, For example, a time dimension can have hierarey; within dime year, quarter, month and day. Users can drill down or roll up through the, structure such 1 hierarchies to analyze data at different levels of detail. | 6. Inter e and ad hoc analysis’: OLAP systems offer interactive. and aq hee | analysis capabilities, enabling users to explore data dynamically. Users can sli dice, pivot and filter data, perform on-the-fly calculations and change dimens; to answer specific analytical questions. 7. Data consolidation and integration : OLAP systems consolidate data from various sources. into a single view, ensuring consistency and coherence in the analytical process. They integrate data from data warehouses, operational databases and other systems to provide a unified and comprehensive perspective. | 8: Support for data visualization : OLAP systems often include data visualization features, such as charts, graphs and dashboards. Visual representations help users understand and communicate complex data patterns and trends effectively. 9. Time intelligence : OLAP systems provide support for time-based analysis and calculations. Users can perform time-based comparisons, track trends over time, calculate year-over-year growth rates and analyze data within specific time periods. 10. Scalability and flexibility : OLAP systems are designed to handle large volumes of data and scale as the data and user demands grow. They offer flexibility to adapt to changing business requirements, allowing “modifications to dimensions, measures and hierarchies without significant disruption. © These characteristics make OLAP systems powerful tools for analytical processing enabling .users to gain insights, discover trends; perform in-depth analysis and makt informed decisions based on multidimensional views of data, Advantages of OLAP © OLAP (Online Analytical Processing) offers several advantages that make it a valuable technology for data analysis and decision-making. Here are some key advantages &! OLAP: 1, Multidimensional analysis : OLAP allows users to analyze data from multip dimensions similtaneously.- This multidimensional view enables a dee?” understanding of data relationships, patterns and trends that may not be apparett® traditional two-dimensional views. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge J yr pata Warehousing 2. 2-20 ETL and OLAP Tochnology Faster: que performance. Sponse times : OLAP systems are optimized for fast query " They employ specialized indexing, caching and pre-aggregation techniques that enable rapid retrieval of information. This speed enables users to explore and analyze data interactively. Flexible data exploration: OLAP capabilities, and analysis of data, even with large volumes. provides interactive and ad hoc analysis Allowing users to slice, dice, pivot and drill down into data casily. Users can dynamically explore data, apply filters, perform dimensions to answer specific ulations and change analytical questions. This flexibility empowers users to gain insights quickly and make informed decisions. Complex calculations : LAP supports advanced calculations and calculations across multiple dimensions. Users’ can define custom calculations, apply mathematical operations; create derived measures and apply business rules within the OLAP environment. This capability enables sophisticated analysis, forecasting and modeling, Hierarchical navigation : OLAP systems allow users to, navigate hierarchical structures within dimensions. Users can drill down to more detailed levels or roll up to higher levels of data aggregation. This hierarchical navigation provides the ability to analyze data at different levels of granularity and view data from various perspectives. Consistent and consolidated data : OLAP consolidates data from multiple sources, ensuring data consistency and coherence. By integrating data from data ‘warehouses, operational databases and other systems, OLAP provides a unified and comprehensive view of data for analysis. This consistency helps in accurate reporting and decision-making. Data visualization : OLAP systems often include built-in data visualization capabilities, such as charts, graphs and dashboards. Visual representations of data make it easier to understand complex patterns, trends and relationships. Data visualization enhances data exploration, analysis and communication of insights. Scalability and adaptability : OLAP systems are designed to handle large volumes of data and scale as data and user demands grow. They can accommodate increasing data sizes, user concurrency and evolving business requirements. OLAP systems offer flexibility in modifying dimensions, measures and hierarchies to adapt to changing analytical needs. Improved decision-making : By providing a multidimensional view of data, fast query response times and interactive analysis capabilities, OLAP empowers users TECHNICAL PUBLICATIONS® - an up-thrust for knowlodge Data Warehousing 2-30 ETL and OLAP Tec, kly. bility to explore data from differ, and visualize insights enhances the dei, to make data-driven decisions qu orm complex calculations angles, per making pr I, OLAP technology provides powerful tools fo ability to handle complex analysis, supp, © Over r data analysis, Feporting decision-making. Its advantages lie in its multidimensional views, enable fast queries and provide flexibility in data exploration, Disadvantages of OLAP <® While OLAP (Online Analytical Processing) offers numerous advantages, it also hy, some disadvantages that organizations should consider. Here are some poten drawbacks of OLAP : Data latency : OLAP systems typically rely.on periodic data updates from ity 1. underlying data sources, such as data warehouses. As a result, there can be 2 latency period between when the data is updated in the source system and when i, becomes available for analysis in the OLAP system. This latency may impact thy timeliness of insights and decision-making. 2. Data complexity : OLAP requires careful design and modeling of the underlying data structures, dimensions, hierarchies and measures. Setting up and maintaining an OLAP system can be complex, especially when dealing-with large and divese datasets. Designing and configuring the OLAP cubes and dimensions correctly requires expertise and effort. 3. Implementation and infrastructure costs : Implementing an OLAP system oft involves significant upfront’ costs, including hardware, software licenses ani specialized expertise for design and development. Additionally, OLAP systems require robust infrastructure to handle the storage and processing requirements of multidimensional data. These costs can pose challenges for organizations with limited resources. ; 4, Data volume and scalability : While OLAP systems are designed to handle larg? volumes of data, extremely large datasets may pose scalability issues. As da!® volumes increase, query response times may slow down and additional hardwatt resources may be required to maintain performance. Scaling an OLAP systet! ® accommodate growing data sizes can be a complex and costly task. 5. Complexity of data updates : Updating data in an OLAP system can be mo" complicated than in transactional or operational databases. When new data is ad or existing data is modified, the OLAP cubes may require rebuilding or refrestil to reflect the changes. This process can be time-consuming and resource-intensi"* especially for large datasets. TECHNICAL PUBLIGATIONS® - an ups fr knowledge ye pata Warehousing 2-31 ETL and OLAP Technology ited real-time analy 6. Li OLAP systems are typically designed for analytical Processing and are not well-suited for real-time or near-real-time analysis. As OLAP relies on periodic data updates, it may not be suitable for scenarios that require up-to-the-minute insights or data streaming analysis. 7. Steep I urve : Using and leveraging the full capabilities of an OLAP system may require users to undergo training and become familiar with the system’s functionalities and query languages. ‘The learning curve for effectively utilizing OLAP tools and features can be steep, especially for users who are new to the technology. Data governance and consisteney : OLAP systems rely on consolidated and integrated data from various sources. Ensuring data quality, consistency and governance across multiple source systems can be a challenge. Inaccurate or SrNo Parameters Online Transaction Processing ‘Online Analytical Processing (OLTP) (OLAP) 1. Purpose OLTP systems are designed for OLAP systems are designed for transactional processing and handle analytical processing and day-to-day operational tasks, such as support complex analysis, recording, updating, and retrieving reporting, and decision-making. individual transactions. They focus on ‘They provide a real-time data processing and ensuring multidimensional view of data, : data integrity in transactional enabling uSers to analyze large environments. volumes of data from multiple . dimensions simultaneous 2. DataFocus _ OLTP systems focus on detailed, OLAP systems focus on inconsistent data can impact the vali ity and reliability of analysis results. While the disadvantages of OLAP exist, organizations can mitigate some of these challenges through careful planning, robust infrastructure, ongoing maintenance and user training. It’s important to evaluate the specific needs and requirements of the organization before deciding to iinplement an OLAP system. EJ Online Transaction Processing (OLTP) Vs OLAP granular data at the transaction level, They capture and process individual transactions, typically involving small subsets of data, The emphasis is on capturing real-time operational data. aggregated and summarized data, They consolidate data from multiple sources, such as data warehouses or operational databases, and organize it into a multidimensional structure. OLAP allows users to analyze data from different dimensions and levels of granularity. TECHNICAL PUBLIGATIONS® - an up-thrust for knowledge Date Wansrousinyy 6. Database Desiyn Quevy Patterns Response Time Concurreney OL TP databases ae typleally nonnnalived to minimize redundaney al erste data consisteney it transactional operations, They are losiyned tan e ail platings a indlividial records OLUP systems primarily handle hort-duration queties thet simple ic individual record: retvieve or up wetion The focus is on tran operations, maintaining data integrity, and supporting concurrent user acees times for individual (ransactional operations. They a eal for ire optin Jow-lateney access to transactional data and aim to main responsiveness for concurrent user interactions, OLTP systems handle high levels of concurrent transactions from multiple tion ser or appli ‘They employ concurreney control mechanism locks and isolation lev data consi: steney and tran sactional integrity, cient store, retrieval fast response ain high system is, such, Is, to ensuye FN. and OLAp y, olny OLAV databases ate of.) designed using, detorny ie Or star-seemn struct optimize bn ‘msl yi including fy performance OLAP a: proce te efficient analysis, OLAP systems handle comp, analytical queries that ing aggregations, calculations ang multidimensional typically perform ad hoe qua slice and dice data, drill dog into details and perform comparative anal inultiple dimensions OLAP systems may have long query response times compared (0 OLTP systems. Analytical queries involving large datases, complex calculations and aggregations can take more time (o process. However, OLAP $ optimize for efficient and provide acceplable || response times for complex || queries OLAP systems typically have lower concurrency requirements compared to OLTP systems. Analytical queries are often executed by a smaller number 0 ted analytical However, OLAP syst | i | | |} users or dedi Ir teams, TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge. - pata Warehousing 2-33 ETL and OLAP Technology 7 Data OLTP systems focus on real-time data OLAP systems perform periodic Maintenance maintenance, ensuring the accuracy 8 from the underlying data and consistency of transactional data Js, such as data ‘They prioritize immediate updates and — warchouses, Data maintenance enforce data integrity constr nts involves consolidating, uring transaction processing. aggregating and summarizing data to reflect changes in the source systems. OLAP updates are often less frequent and occur ‘ in batch processes. 2B) OLAP Operations + OLAP (Online Analytical Processing) operations are a set of analytical operations performed on multidimensional data to facilitate data analysis, exploration and decision-making. Here are some common OLAP operations : 1. Slice one or more dimensions to create a subset of data. It allows users to focus on a : The slice operation involves selecting a specific value or range of values for specific segment of data for analysis. For example, slicing by a specific time period ora particular product category. Dice : The dice operation involves selecting specific values or ranges of values from multiple dimensions to create a sub-cube of data. It allows users to. further narrow down their analysis by considering a combination of dimension values. For example, dicing by a specific time period, product category and geographical region. : Drill-down : The drill-down operation involves navigating from a higher-level ‘summary to a more detailed-level of data. It allows users to explore data hierarchies by moving from aggregated data to finer-grained data. For example, drilling down from yearly sales-to quarterly, monthly and daily sales. Roll-up : The roll-up operation is the reverse of drill-down. It involves aggregating data from a detailed level to a higher-level’ summary. It allows users to view data at different levels of granularity. For example, rolling up monthly sales to quarterly or yearly sales. Pivot : The pivot operation involves rotating the axes of a multidimensional cube to provide alternative views of data. It enables users to change the dimensions used for analysis and explore different perspectives of the data. For exainple, pivoting sales data to analyze it by different product categories or regions, TECHNICAL PUBLICATIONS® - an up-thrust for knowledge oN 2-34 ETL and OLAP Teo 6. Grouping The grouping operation involves aggregating data based on sp. criteria or dimensions. It allows users to group data based on shared attribyt 4 form calculations on those groups. For example, grouping sales data by org for cach category. : egory to calculate total sales 7. Caleulations : OLAP systems support various calculations and Computations ; the data, Users can define custom calculations, apply formulas and Perfoy, mathematical operations on measures and dimensions. These calculations G include aggregations, 18, averages, percentages and other calculations baseg : specific analytical requirements. 8. Ranking and sor : OLAP operations enable ranking and sorting of data bas. on specific measures or dimensions. Users can determine the top or botto, performers, identify trends and analyze the relative positions of data points. 9. Forecasting : OLAP systems often provide capabilities for forecasting fui trends based on historical data. Users can apply forecasting algorithms and mode to predict future outcomes and make informed decisions. 10. Drill-across : The drill-across operation involves accessing and analyzing relate data across multiple OLAP cubes or data sources. It allows users to navigate an explore data from different perspectives, such as combining sales data wit customer data from a separate cube. © These OLAP operations provide users with flexible and powerful tools to explor analyze, and gain insights from multidimensional data. Users can manipulate data, foot on specific subsets, navigate hierarchies, perform calculations, and change dimensio to uncover valuable patterns and make informed business decisions. Roll-Up «It involves aggregating data from a detailed level to a higher-level summary. It allov users to view data at different levels of granularity. For example, rolling up month sales to quarterly or yearly sales. : Example. © Consider the following cubes illustrating temperature of certain days recorded weekly Temperature | 64 | 65 | 68 | 69 | 70] 11 | 72 | 75] 80 [ 81 | 93 | 88 Week! 1 oj/i}ojf]1]o}o]o}olo 1] o Week2 Orel Os | Oc (ed A\ 20.) 02 | oe) 50° | a soni g0 TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ETL and OLAP Technology ¢ Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69) in temperature from the above cube » Todo this, we have to group column and add up the value according to the concept hierarchies. This operation is known as a roll-up. © By doing this We contain the following cube : Vemperature Cool Mild Hot Weekt 2 1 I Week2 2 see] 1 ¢ The roll-up operation groups the information by levels of temperature. The following diagram illustrates how roll-up works 2000) Mobile Modem Phone Security Item (types) Roll-up on location (from cities to countries) e SO a OY aa POT mae a New York, 4500/7 7 Toronto 455-7 TATE Vancouvar, ai] 60s [825] 14 | 400 a2 a3 a4 ‘Mobile Modem Phone Security Item (types) Time (quarter) Fig, 2.9.1 Roll up TECHNICAL PUBLICATIONS® - an up-thrust for knowiedga a x ETL and OLAP Toc, | Pata Warehousing GX priti-Down | ; A | © The drill-down operation involves navigating from a higher-level summary t0 a y4, led level of data. It allows users to explore data hierarchies by moving §, agaregated data to finer-grained data, For example, drilling down from yearly safe, erly, monthly and daily sales. © The drill-down operation (also called roll-down) is the reverse operation of roll-up, ‘Temperature | Cool Mild Hot Day I 0 0 0 Day 2 0 0 0 Day3 0 0 1 Day 4 0 1 0 Day 5 1 0 0 Day 6 0 0 0 Day7 i 0 0 | “Day 8 0 0 0 | Day 9 1 0 a0 Day 10 0 rb 0 Day 11 0 1 0 Day 12 20 1 0 * Day 13 0 0 Hl Day 14 0 0 0 i pate Warehousing 2-37 ETL and OLAP Technology ‘The following diagram illustrates how Drill-down works. Chicago Sp —_New York, AD os 1560, ve Toronto Age ; Vancouvar ai} 605 |825] 14 | aoo ef Fz a3 aa Diilldown on Mobile Modem Phone Security item typece time(from quarters to month) ° ep sé Chicago vs : 440 S ef New York /;69 Toronto /z95, Varicouvar - Jan Feb . Mar Apr of May 2 hun ES uu — Aug Sep Oct Nov Dec Mobile Modem Phone Security Item (types) Fig. 2.9.2 Drill down ERE slice The slice operation involves selecting a specific value or range of values for one or more dimensions to create a subset of data. It allows users to focus on a specific segment of data for analysis, For example, slicing by a specific time period or a particular product category. For example, if we make the selection, Temperature = Cool we will obtain the following cube : For Temperature Cool . Day 1 0 Day 2 0 Day3 0 TEGHNIGAL PUBLICATIONS®- an upitnst for knowledge Data Warehousing 2-38 ETL and OLAP Teer, ly, Day 4 0 Day 5 1 {a ae ilar ed Day 6 1 Day 7 | ees? owe Day 8 | Day 9 1 [Bical el Day II 0 Day 12 0 lammyisee 0 cc © The following diagram illustrates how Slice works. & oe ve Chicago yaz5 New York, A560, rents Ze ‘Vancouvar, rl 60s [825] 14 | 400 a2 a3 aa Time (quarter) Mobile Modem Phone Security Item (types) Chicago | New York Toronto Vancouvar| 605 | 825} 14 | 400 ‘Mobile Modem Phone Security tem (types) Fig. 2.9.3 Slice TECHNICAL PUBLICATIONS® - an up-thrust for knowledge pa Dice @ The dice operation de scribes a sub - . mensi Sa sub cube eratlneyavneles more dimension. by operating a selection on two or ¢ For example, Implement the select (temperature = ion (lime = day 3 OR time = day 4) AND ¢ end N Oo! OR temperature = hot) to the original cubes we get the following subeube (still (wo-dimensional) Temperature Cool ¢ os Toronto, Vancoiay—25 | 22 Q1] 605 FS a2 Mobile Modern Ntem(types) Dice for (location="Toronto" or’Vancouver") and(time="Q1"0r"Q2")and {item="Mobile"or"Modern”) e s FP ong SE chicago, 0 New York en Toronto Ag Vancouvar, ai] 60s | 825] 14 | 400 a2 a3 aa ‘Mobile Modem Phone Security Time (quarter) Fig. 2.9.4 Dice o TECHNICAL PUBLICATIONS® - an up-thrust for knowlodge ETI. and OLAP Technology co Data Warehousing 2:40 ETL and OLAP Tap, Moy BE Pivot eS © The pivot operation is also called a rotation, Pivot is a visualization operations ve te i escl i m the data axes in view to provide an alternative presentation of the data, 4, rota contain swapping the rows and columns or moving one of the row-dimensions in, column dimensions. Time ‘Temperature 2.9.5 Pivot © Consider the following diagram, which shows the pivot operation. Chicago New York Toronto Vancouvar| 605°} 825 | 14_|- 400 Mobile Modem Phone Security Item (types) = Mobile 605 Modem 825 tt (types) Phone 14 Security Le 400 Chicago New Toronto Vancouvar York Locations (cities) Fig, 2.9.6 Pivot operation TECHNICAL PUBLICATIONS® - an up-thrust for knowiodgo ota Warehousing 2-41 Types of OLAP ETI. and OLAP Technology ‘there are three main types of OLAP ig. 2.10.1 depicts types of OLAP OLAP server! Fig. 2.10.1 Types of OLAP ROLAP stands for Relational OLAP, an application based on relational DBMS. MOLAP stands. for Multidimensional OLAP, an application based on multidimensional DBMS. 3. HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional techniques. [J] ROLAP ( Relational OLAP ) * Relational OLAP (ROLAP) is an approach to OLAP that leverages Relational Database Management Systems (RDBMS) for storing and managing multidimensional data. Unlike other OLAP approaches such as Multidimensional OLAP (MOLAP) or Hybrid OLAP (HOLAP), ROLAP does not require a separate multidimensional storage engine. Instead, it relies on the relational capabilities of the underlying database system. © In ROLAP, the multidimensional data is stored in a relational database, typically using tables with rows and columns. The data is organized in a star or snowflake schema, where the fact table contains the measures or metrics and dimension tables store the hierarchies and descriptive attributes. © Here are some key features and considerations of ROLAP : 1. Architecture : ROLAP systems utilize-a three-tier architecture. The first tier consists of the Relational Database Management System (RDBMS) that stores the multidimensional data, The second tier comprisés the OLAP. engine, which generates SQL queries to retrieve and process data from the RDBMS. The third tier is the user interface or client application that interacts with the OLAP engine. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Data Warehousing 2-42 ETL and OLAP Techn, 11 : ROLAP systems translate OLAP queries into SQL qutig the SQL queries retrieve the required data from. calculations and filters as specified in 1, The RDBMS engine handles q 2. Query proces access the relational database. relational tables, apply aggregations, OLAP query and return the results to the use optimization and execution. y + ROLAP offers flexibility vase schema can in terms of data modeling and cheng be easily modified and extendas, ts. New dimensions, hierarchies, o 3. Flexi design. The relational databi accommodate changing business requirement measures can be added by altering the database schema and updating the relevay, tables. 4, Data integration : ROLAP allows integration with existing relational database, and data sources within an organization. It can leverage data from operation systems, data warehouses, data marts, or other data repositories. ROLAP facilitates real-time or near-real-time access to the most up-to-date data from various sources, Scalability : ROLAP can handle large volumes of data as it relies on the scalability and performance optimizations of the underlying RDBMS. By ut base optimization techniques, ROLAP can efficiently ing indexing partitioning and other datal process queries even with large datasets. 6. Performance considerations : While ROLAP leverages the power of the relational database engine, complex OLAP queries in ROLAP systems ca sometimes result in slower query response times compared to specialized OLAP engines. Optimization techniques such as caching, aggregations and indexing cx be applied to improve performance. Security and access control : ROLAP systems inherit the security mechanisms provided by the underlying RDBMS. Access control, user authentication and data encryption can be enforced at the database level to ensure data security and privacy. 3 8, SQL expertise : Developing and maintaining ROLAP systems often requires SQL expertise, Query design, optimization and performance tuning in SQL ae ant skills for working with ROLAP systems. . bines the benefits of relational databases, including flexibility, ¢" with the analytical capabilities of OLAP. It is suitable fi tional database infrastructure and import ¢ ROLAP com! integration, and scalability, organizations that already have a well-established relat want to leverage their existing data repositories for multidimensional analysis reporting. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge

You might also like