0% found this document useful (0 votes)
48 views39 pages

DW Unit 4

Uploaded by

monster170293
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
48 views39 pages

DW Unit 4

Uploaded by

monster170293
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 39
Dimensional Modeling and Schema Sy pinensional Modeling - Multi- Dimensional Data Modeling - Data Cube - Star Schema - Snowflake “hema - Star Vs Snowflake schema - Fact constellation Schema - Schema Definition - Process cipitecture - Types of Data Base Parallelism - Datawarehouse Tools. A Contents 14 Dimensional Modeling 42. Multi-Dimensional Data Modeling 43 Data Cube 44 Star Schema 45 Snowflake Schema 46 Comparison of Star Schema and Snowflake Schema 47 Fact Constellation Schema 48 Schema Definition 49° Process Architecture 410 Types of Data Base Parallelism 411 Datawarehouse Tools 412 Two Marks Questions with Answers (4-1) — Data Warehousing 4-2 Dimensional Modeling and Schema Dimensional Modeling © Dimensional modeling is a desi structure data for effi 1 technique used in data warehousing to organize and ent analysis and reporting. It provides a way to represent the data ina dimensional format that is optimized for querying and aggregating information. © In dimensional modeling, the transaction record is separated into "facts," which are often numerical transaction data and "dimensions," which are the reference data that places the facts in context. For instance, a sale transaction can be broken down into specifics like the quantity of products ordered and the cost of those things, as well as into dimensions like the date of the purchase, the user's name, the product number, the ship - to and bill - to addresses and the salesperson who was in charge of receiving the order. In dimensional modeling, data is organized into two main types of tables = 1. Fact tables and 2. Dimension tables. e Fact tables : Fact tables contain the numeric and additive data that represents the business transactions or events being analyzed. Each row in the fact table corresponds to a specific event or transaction and contains the associated measures or metrics, such as sales amount, quantity, or revenue. Fact tables also include foreign keys that link to the dimension tables. © Dimension tables : Dimension tables provide descriptive attributes or context for the data in the fact tables. They contain the textual or qualitative data that provides additional details about the business entities involved in the transactions. Examples of dimension tables include customer, product, time, geography and other relevant dimensions. Dimension tables typically have a primary key column that uniquely identifies each row. The Key Principles of Dimensional Modeling The key principles of dimensional modeling include : 1, Star schema Dimensional models typically follow a star schema design, where the fact table is at the center and dimension tables radiate from it like the points of a star. This schema simplifies queries and enables fast aggregations. 2. Denormalization : Dimensional models denormalize data by combining multiple tables into a single table. This improves query performance by reducing the number of table joins required for analysis. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge using. me Dimensional Modeling and Schema. 5 epresent different Aan res nelude hierarchies that represent di levels o a 'y OF aggregation. For example, a time dimension may include hierarchies like year, quarter, month and dy rrogate keys : , se Bice meee oe Keys are artificially generated primary keys used in jim a Provide a stable identi each dimensi w, even if the source system's primary keys change lentifier for each dimension row, 5, Conformed dimensi Hierarchies ension tables often ns: ; S * Conformed dimensions are dimensions that are used multiple fe i across MUNUP'S Fact tables. They ensure consistency and enable data integration and comparison across different business Processes. the benefits of dimensional modelin ysis and flexibi g include improved query performance, simplified analys! fos bun ty in data exploration and reporting. It provides a user - friendly structure for business users to navigate and understand the data within a data warehouse or data mart environment. pa The Objectives of Dimensional Modeling + The objectives of dimensional modeling are as follows : 1, Simplify data analysis : Dimensional modeling aims to simplify the process of data analysis by providing a structure that is intuitive and easy to understand for business users. The design focuses on organizing data into dimensions and facts, allowing users to navigate and explore the data efficiently. 2. Optimize query performance : Dimensional models are designed to optimize query performance by reducing the complexity of joins and aggregations. The star schema design, denormalization and pre-aggregation techniques employed in dimensional modeling enable faster query execution, making it easier to retrieve insights from large volumes of data. 3. Support business reporting and analytics : Dimensional models are specifically tailored to support business reporting and analytics requirements. By structuring data into dimensions and facts, dimensional models facilitate the creation of meaningful reports, dashboards and visualizations that provide actionable insights to business users. 4. Enable Ad Hoe analysis : Dimensional models are flexible and enable ad hoe analysis, allowing users to slice and dice data along various dimensions, drill down into details and perform multidimensional analysis. The structure of dimensional models facilitates self - service exploration and empowers users to answer their own business questions. 5. Promote data consistency and integrati S odeling consistency by utilizing conformed dimensions. Conformed dimensions ensure that Dimensional modeling promotes data TECHNICAL PUBLICATIONS® - en upthrst for knowledge Date Warehousing 4-4 Dimensional Modeling and Schoma the same dimension attributes are used consistently across different facts, enabling integration and comparison of data from multiple business processes. ibility : Dimensional models are designed to 6. Facilitate scalability and exten: accommodate future growth and changes in business requirements. The modular nature of dimensional models allows for easy addition of new dimensions or facts ag the data warchouse evolves, making it a scalable and extensible solution. 7. Enhance user adoption and user experience : Dimensional models are user . centric and provide a user - friendly experience. By organizing data in a way that aligns with the users' mental model of the business, dimensional models enhance user adoption and acceptance of the data warehouse, fostering a data - driven culture within the organization. Overall, the primary objectives of dimensional modeling in a data warehouse are to simplify data analysis, optimize query performance, support business reporting and analytics, enable ad hoc analysis, promote data consistency and integration, facilitate scalability and extensibility and enhance user adoption and user experience. Elements of Dimensional Data Model 1. Fact © Facts are the measurements or metrics from the business process. For example, in sales business process, a measurement can be quarterly sales number 2. Dimension ) Dimension provides the background close to a business process event. In simple terms, they give who, what, where of a fact. In the sales business process, for the fact quarterly sales number, dimensions would be © Who - Customer names © Where - Location © What - Product name In other words, a dimension is a window to view information in the facts. 3. Attributes The attributes are the various characteristics of the dimension in dimensional data modeling. In the location dimension, the attributes can be © State © Country ° Zipcode etc. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge i worehousing e s Dimensional Modeling and Schema + Axtributes are used to search, filter or clas . Dimension tables contain Attributes ‘ Fact table «A fact table is a primary table in dimension modelin « A fact table contains : 1. Measurements / facts 2. Foreign key to dimension table 5 pimension table « Adimension table contains dimensions of a fact, « They are joined to fact table via a foreign key, + Dimension tables are de-normalized tables, « The dimension attributes are the various columns in a dimension table. « Dimensions offers descriptive characteristics of the facts with the help of their attributes. ¢ No set limit set for given for number of dimensions, ¢ The dimension can also contain one or more hierarchical relationships. [EE] steps of Dimensional Modeling Fig. 4.1.1 shows steps involved in dimensional modeling. Select the business | eWhy process, Declare the grain | @How much Identify the dimension ae Identify Build the schema Fig, 4.1.4 Stops of dimensional modeling up-thrust for knowledge pal CATION Data Warehousing 4-6 Dimensional Modeling and Schema Steps in dimensional modeling are : 4. Identify business process © Determining the precise business processes that a data warehouse should cover. Depending on the organization's requirements for data analysis, this might be marketing, sales, HR, etc, The quality of the data available for that process affects the choice of the business process as well. The data modeling process is most crucial stage, failing to which would result in cascading and disastrous flaws. To describe business process one can use plain text or simple Business Process Modeling Notation (BPMN), or Unified Modeling Language (UML) to describe the business process. 2. Identify grain (Level of detail) © The level of detail for the business problem / solution is described by the grain. Finding the lowest level of information for any table in the data warehouse is what this method entails. A table should have daily granularity if it contains daily sales data. A table has monthly granularity if it contains total sales information for each month. Example of grain : The CEO at an MNC wants to find the sales for specific products in different locations ona daily basis. So, the grain is “product sale information by location by the day.” 3. Identify dimensions e Dimensions are nouns like date, store, inventory, etc. These dimensions are where all the data should be stored. For example, the date dimension may contain data like a year, month and weekday. 4. Identify facts © This step is associated with the business users of the system because this is where they get access to data stored in the data warehouse. Most of the fact table rows are numerical values like price or cost per unit, etc. 5. Build schema ® In this step, the implementation of the dimension model is involved. A schema is the database structure (Arrangement of tables). Advantages of Dimensional Modeling Dimensional modeling offers several advantages in the context of data warehousing and analytics : * Simplicity and ease of understanding : Dimensional models are designed to be intuitive and easily comprehensible. The use of dimensions, facts and hierarchies simplifies the TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ie orenusing 4-7 Dimensional Modeling and Schema 8. La making it easier for b + Flex *inizing and analyzing data in a data warehouse. It enables efficient querying, fle i business-focused analysis, empowering organizations to derive valuable insights trom their data ” to un tract . tand and navigate the data. This omotes user adoption and facilitates self - service analytics. query performance : Dimensional models are optimized for query performance. The structure allows for efficient querying and Aggregating of data, enabling faster response times for analytical queries. The use of pre - Aggregated measures and summarized data in the fact tables further enhances query performance, ility and scalability : Dimensional models. provide flexibility in adding or modifying dimensions and measures as business requirements evolve, ‘They can accommodate new attributes, hierarchies, or even entirely new dimensions without disrupting the existing structure. This scalability makes it easier to adapt to changing business needs. + Enhanced data analysis : Dimensional models facilitate easy data analysis and exploration. The clear separation of dimensions and measures allows users to slice and dice data along different dimensions, drill down into details, and perform multidimensional analysis. This enables users to gain insights and make informed decisions based on the data. + Business - focused design : Dimensional modeling aligns with business processes and user requirements. It emphasizes the representation of business entities, attributes and relationships, making it more business - centric. This ensures that the data warehouse caters to the specific needs of the business, enabling better decision-making, * Integration with reporting and visualization tools : Dimensional models are well - suited for integration with reporting and data visualization tools. The structure aligns with the way users typically analyze data, making it easier to create meaningful reports and visualizations. This integration enhances the usability and effectiveness of these tools. * Simplified ETL processes : Extract, Transform, Load (ETL) processes are typically simplified in dimensional models. The structured nature of dimensions and facts makes it easier to extract and transform data from source systems and load it into the data Warehouse. This can lead to streamlined and efficient ETL pipelines. Overall, dimensional modeling provides a robust and user - friendly framework for bility, TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Data Warehousing 4-8 Dimensional Modeling and Schema Multi-Dimensional Data Modeling © The way that data is seen in a multidimensional model is as a data - cube. Data may be modeled and seen in various dimensions using a data cube. Facts and dimensions serve to define it. The views or things that an organization keeps data of are known as the dimensions. For instance, a retailer may establish a sales data warehouse to document sales for the dimensions of time, item and place. With the help of these parameters, the save may keep track of things like monthly sales of particular commodities and the places where those sales took place. A table called a dimensional table is associated with each dimension and contains additional information about the dimension. For instance, an item's dimensional table may include the attributes item_name, brand and type. Tabular representation 228 a@resao at] 1] 4 [25 uf2[ife vif 3 | a [15] 12] 1 | 1 [30 12] 2] 1 [20 12[ 3 | 1 | 50 13) 1] 118 13] 8 | 10| 10 13] 2] 1] 10 z HRS 13] 3] 1] 10 af 25[ 8 | 15 a[1] 2 [35 1.2 +3 Timeid Multidimensional representation Slice loci is shown Locid Fig. 4.2.1 Example of multi dimensional data modeling e A multidimensional data model is structured around a central theme, for example, sales. This theme is represented by a fact table. Facts are numerical measures. The fact table contains the names of the facts or measures of the related dimensional tables. In following table the 3D data of the table are represented as a series of 2D tables. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge : 8 3 3 & 3 = = $ = a OS. SEE Bh “Bes OR Sy Bh BEE SE ER CSOE SEE BEE HDS wD Or 0S HE EH GE EHS BIS BHCGE(OGHCC8Ds «hCG 06 «Oz «STE St 99 gow se Sy SBE OBE OSLER HD 09 SI 80S 0st 0c = S8EO6E ST 0c 09h Sth 0c ONE OPED. unosigy Peg AWA 839 unosig prog AN 38g unosig peg YIN 88q unosig pearg AWA 33g OWL way way way wart «!UIAC,, = UoHEI07] «fQuiny,,, = UOHBI07 LBIPYOY.,, = wOHHeIO} efBUUDYD,, = WORD] yg woronousing y | Data Warehousing 4-10 Dimensional Modeling and Schema > i & Chenna\/"z49 / 360 20 10 & Kolkata /435 7” 460 20 15 Mumbai/“99 / 385 / 20 39 Delhi 50 35 ai] 260 | 508 18 60 a8 || 48 | e 90 Ay, B aa| a0 | 256 | 20 a & | 80, 2 cs| 436 | sv | 50 | 4 [ — as] 520 | 403 | 25 | 50 Egg Milk ~=—sBread Biscuit Item (types) Fig. 4.2.2 Example of multi dimensional data modeling EER Key Components of Multi - Dimensional Data Modeling © Multi - dimensional data modeling is a technique used in data warehousing to represent and organize complex business data in a way that supports efficient querying and analysis across multiple dimensions. It extends the concept of dimensional modeling by incorporating additional dimensions, hierarchies and measures to capture the various perspectives of the data. Here are the key components and considerations in multi- dimensional data modelling : 1. Dimensions : Dimensions represent the descriptive attributes by which data is analyzed. In multi - dimensional modeling, multiple dimensions are included to capture different perspectives of the data. For example, in a sales data warehouse, dimensions could include time, product, customer, geography and organization. 2. Hierarchies : Hierarchies define the relationships between the attributes within a dimension. They represent the drill - down or roll - up paths to navigate through the data at different levels of granularity. For instance, a time hierarchy could have levels such as year, quarter, month and day, allowing users to analyze data at different time granularities. 3. Measures : Measures represent the numeric facts or metrics that are analyzed in the data warehouse. They provide quantitative data for analysis and reporting. Measures TECHNICAL PUBLICATIONS® - an up-thrust for knowledge alt. werehousing cot Dimensional Modeling and Schema can be aggregated or calculated b, ased on the dimensions, Examples of measures include sales revenue, quantity sold , profit and av 4, Cubes : In multi - dimensional data modeling, a cube is a fundamental structure that represents the combination of dimensions, hicrarchie a multi - dimensional view of the ¢; along different dimensions, age order value, s and measures. A cube provides lata, enabling users to analyze and aggregate data Each cell within the cube represents a specific intersection of dimension values and contait ns the corresponding measures). 5, OLAP (Online Analytical Processing) : , ‘ OLAP is a technology that supports multi - dimensional data modeling and enab! les fast and interactive analysis of the data warehouse. OLAP tools provide capabilities to slice, dice, drill down, roll up and pivot the data along different dimensions. They allow users to perform complex queries and calculations on the multi - dimensional data. 6. Aggregation : Aggregation is an important concept in multi - dimensional data modeling. It involves pre - calculating and storing aggregated values for measures at various levels of granularity in the cube. Aggregations speed up query performance by avoiding the need to calculate measures from the detailed data during analysis. Aggregations can be defined at different levels of the hierarchies within dimensions. 7, Data granularity : Data granularity refers to the level of detail at which data is stored in the data warehouse. It is an important consideration in multi - dimensional data modeling. Choosing the appropriate granularity ensures that the data warehouse can support the desired level of analysis and reporting. It is important to strike a balance between providing sufficient detail for analysis and maintaining acceptable query performance. 8. Complex relationships : Multi - dimensional data modeling can handle complex relationships between dimensions. For example, a many-to-many relationship between dimensions can be represented using bridge tables or factless fact tables. These relationships enable the analysis of data across multiple dimensions simultaneously. By employing multi - dimensional data modeling techniques, organizations can create a Comprehensive and flexible representation of their business data in the data warehouse. This enables in - depth analysis and reporting, supports complex queries and provides Valuable insights to support decision - making process TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge “ | Modeli ¢ Data Warehousing 4-12 Dimensional Hing and Schem, EX) Data Cube © A data cube is a fundamental concept in data warehousing that represents a multi - dimensional view of data. It provides a structured and efficient way to organi. and analyze data along multiple dimensions, A data cube combines dimensions hierarchies, and measures to enable users to slice, dice, drill down and roll up data fon , analysis and reporting purposes. © Here are the key components and characteristics of'a data cube : 1. Dimensions : Dimensions represent the descriptive attributes by which data js analyzed. Examples of dimensions include time, product, customer, geography ang organization. Each dimension can have multiple levels or hicrarchies representing different levels of granularity. 2. Measures : Measures represent the numeric facts or metrics that are analyzed in the data warehouse. They provide quantitative data for analysis and reporting. Examples of measures include sales revenue, quantity sold, profit and average order value, Measures can be aggregated or calculated based on the dimensions. 3. Hierarchies : Hierarchies define the relationships between the attributes within a dimension. They represent the drill - down or roll - up paths to navigate through the data at different levels of granularity. For example, a time hierarchy could have levels such as year, quarter, month and day, allowing users to analyze data at different time granularities. iz 4. Cells : Cells represent the intersections of dimension values within the data cube. — Each cell contains the corresponding measure(s) for that specific combination of dimension values. For example, a cell in a sales data cube may represent the revenue generated by a specific product in a particular month. 5. Aggregation : Aggregation involves pre - calculating and stori for measures at various levels of granularity in the data cube. Aggregations speed Up query performance by avoiding the need to calculate measures from the detailed data during analysis. Aggregations can be defined at different levels of the hierarchies within dimensions. eregated values 6. OLAP Operations : Online Analytical Processing (OLAP) operations are performed on the data cube to analyze and report on the data, OLAP operations include slic (selecting a specific subset of data), dicing (selecting a subcube based on specific criteria), drilling down (viewing data at a more detailed level), rolling up (viewits data at a higher level of aggregation), and pivoting (rotating the dimensions analyze data from different perspectives), TECHNICAL PUBLICATIONS® - en up-thrust for knowledge i wa using ets I ehd Dimensional Modeling and Schema @ pata cubes ron’ ‘ multi - dimensional representation of data, enabling users to explore dat i TI ' \d exp ta from different angles. They facilitate efficient and interactive analyze aM gata analysis 1m data warehousing environments, supporting decision - making processes and providing insights into business performance. OLAP tools and technologies are often used to interact with data cubes and perform these multi - dimensional analysis operations. Example = In the 2-D representation, All Electronics sales data for items sold per quarter in the city of Vancouver. The measured display in dollars sold (in thousands). Location = “Vancouver” Item (type) Time (quarter) Home entertainment __ Computer Phone Security Qi 605 825 14 400 Q 680 952 31 512 @ 812 1023 30 301 Q@ 927 1038 38 380 | Fig. 4.3.1 2-D view of sales data Location = “Chicago” Location = “New York” Location = “Toronto” Item item item home qa 854 882 89 623-1087 968 38 872. 818 746 4B 591 870 1142 1091 54 984 978 864 359 784 Qs 1129. 992 63 Fig. 4.3.2 3-D view of sales data Conceptually, data in the form of 3-D data cubes, as shown in Fig. 4.3.3. = Data Warehousing 4-14 Dimensional Modeling and Schema & s Chicage/“gs4 / 882 89 623 ge NewNOO\os7 7” 968 38 / 872 Y Toronto/ aia / 740 / 43/591 Vancouver, x Q1) Gos 825 14 400 sg, Ay, g siz |PA 2 aa] sv | oz | 31 | 512 SX) & 2 o WN SY 2 as] siz | 1023 | 30 | sot e is: as} 927 | 1038 | 38 | 500 Computer Security Home Phone entertainment Item (types) Fig. 4.3.3 3-D Data cube EB star Schema Star schema is a widely used data modeling technique in the design of data warehouses. It is named so because the diagrammatic representation of this schema resembles a star shape, with a fact table at the center and multiple dimension tables radiating outwards. In the above example, SALES is a fact table having attributes i.e. (Product ID. Order ID. Customer ID, Employer ID, Total, Quantity, Discount) which references to the dimension tables. Employee dimension table contains the attributes : Employee ID. Employee Name, Title, Department and Region. Product dimension table contains the attributes : Product ID, Product Name, Product Category, Unit Price. Time dimension table contains the attributes : Order ID, Order Date, Year, Quarter, Month. Customer dimension table contains the attributes : Customer ID, Customer Name, Address, City. Zip. In a star schema, the fact table represents the central focus of the schema and contains the measurements or metrics of the busin of numerical values or facts that are additive or semi - additive, such as sales revenue, quantity sold, or units produced. The fact table is associated with one or more dimension tables through foreign key relationships. process being analyzed. It typically con: TECHNICAL PUBLICATIONS® - an up-thrust for knowledge yoronousiad a6 Dimonsional Modoling and Schema i ; Time Dimension Product Dimension Order ID Product 1D ae Product Name a Product Category pee Unit Price Month — Sales Product ID Order ID Customer ID eS a | Employer ID Total Quantity Customer Dimension Discount Emp Dimensions Customer ID Emp ID Customer Name Emp Name Address: Title City Department Zip Region Fig. 4.4.1 Star schema « Dimension tables provide descriptive attributes or context to the data in the fact table. They contain textual or descriptive information about the dimensions or characteristics that describe the business process being analyzed. For example, in a sales data warehouse, dimension tables could include information about products, customers, time, geography and so on. Each dimension table is linked to the fact table through a primary key - foreign key relationship. The star schema is characterized by its simplicity and ease of use. It provides a denormalized structure where each dimension is represented by a single table, making Queries and aggregations efficient and straightforward, The dimension tables are typically small in size, as they contain the descriptive attributes, while the fact table may be relatively larger, containing the numerical facts. This structure enables efficient Guerying and reporting capabilities for analytical purposes. ty Features of Star Schema . The star schema is a popular data modeling technique in data warehousing. It offers a features that make it suitable for analytical reporting and query performance. le ; Te ate some key features of the star schema : TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ee Data Warehousing 4-16 Dimensional Modeling and Schema 1. Simplicity : The star schema has a simple and intuitive structure. It consists of a central fact table surrounded by multiple dimension tables. This simplicity makes it easy to understand, implement and query. 2. Denormalization : The star schema is denormalized, meaning that it reduces the number of joins required to retrieve data, Dimension tables contain all the descriptive attributes related to a dimension, climinating the need for complex join operations during querying. 3. Fact table measurement revenue, quantity sold, or units produced. The fact table is typically large and contains numeric facts that are additive or semi - additive. The star schema has a fact table at its core. This table holds the or metrics of the business process being analyzed, such as sales 4. Dimension tables : Dimension tables provide the descriptive context for the measurements in the fact table. Each dimension table represents a specific dimension or characteristic related to the business process, such as product, customer, time, or geography. These tables contain attributes that describe the dimension, such as product name, customer ID, or date. 5. One-to-Many relationships : The relationship between the fact table and dimension tables in the star schema is typically one-to-many. Each row in the fact table is associated with one or more rows in the dimension tables through foreign key relationships. This structure simplifies data retrieval and enables efficient query performance. 6. Query performance : The denormalized structure of the star schema facilitates fast and efficient query performance. By reducing the number of joins and providing pre-aggregated data in the fact table, analytical queries can be executed quickly and easily. 7. Analytical reporting : The star schema is designed to support analytical reporting and data analysis. It provides a structure that is optimized for querying and aggregating data for reporting purposes. Users can easily slice and dice the data along various dimensions to gain insights and make informed business decisions. 8. Scalability : The star schema is highly scalable and can handle large volumes of data without sacrificing performance. It can accommodate the addition of new dimension tables or attributes as the busines: requirements evolve. 9, Data quality : The star schema simplifies data validation and improves data quality: De: consistency of attribute values. iptive attributes are stored in dimension tables, allowing for better control and TECHNICAL PUBLICATIONS® - an up-thrust for knowiedge voonusing fenie Dimensional Modeling and Schema 7 overall, the star schema's simplicity, denormalized structure and efficient querying + aabiltes make ita widely adopted data modeling technique for data warehousing and analytical reporting. However, it's important to consider the specific requirements and de - of fs of the project when deciding whether to use a star schema. ‘Advantages of Star Schema penefits of using a star schema in a data warehouse include : mplified querying : The star schema's simple structure allows for easy and intuitive querying, as it eliminates the need for complex joins and enables faster query performance. 2. Performance optimization : The denormalized structure of the star schema enhances performance for aggregations and summaries, as it minimizes the number of joins required to retrieve data, : Star schemas are highly scalable and can accommodate large volumes 3, Scalabil of data without sacrificing performance. 4, Flexibility : The star schema provides flexibility in adding or modifying dimensions and measures as the business requirements evolve. New dimensions can be easily incorporated without affecting the existing schema. 5. Improved data quality : The star schema simplifies data validation and improves data quality by storing descriptive attributes in dimension tables separately. * Despite its advantages, the star schema may not be suitable for all scenarios. It is best suited for analytical reporting and data warehousing purposes, where read performance and ease of use are important. In more transactional systems, where data integrity and updates are critical, a more normalized schema might be preferred. [ZX Disadvantages of Star Schema * While star schemas offer numerous benefits, they also have some potential disadvantages to consider. Here are a few drawbacks associated with star schemas : 1. Data redundancy : Star schemas denormalize data by duplicating dimension attributes across multiple dimension tables. This redundancy can result in increased storage requirements, especially when dealing with large amounts of data, Redundaney also introduces the possibility of inconsistent data if updates or changes | instances of a dimension attribute. are not properly managed across all ity in adding or modifying 2. Limited flexibility : While star schemas provide flex‘ dimensions, they can be less flexible when it comes to adding new measures or TECHNICAL PUBLICATIONS® an up-thrust for knowledge Data Warehousing 4-18 Dimensional Modeling and Schema metrics. If new measures need to be added to the schema, it may require altering the structure of the fact table, which can be complex and time - consuming. This lack of flexibility can make star schemas less suitable for environments with rapidly changing and diverse analytical requirements. 3. Performance challenges with complex queries : Although star schemas are optimized for simple and straightforward queries, they may struggle to handle complex queries involving multiple dimensions and extensive aggregations. In such cases, the performance of star schemas can degrade, as they may require numerous joins and calculations across large data sets. 4. Limited support for non - additive facts : Star schemas are designed primarily for additive or semi - additive facts, which are values that can be summed up across dimensions. However, they may not be well - suited for non - additive facts, such as ratios or percentages, which require more sophisticated calculations and different aggregation methods. 5. Data risk of data integrity issues if updates or changes are not properly managed. tegrity concerns : As star schemas denormalize data, there is an increased Inconsistent or inaccurate data can occur if dimension attributes are not synchronized or if changes are not propagated correctly across the schema. 6. Difficulty in modeling complex relationships : Star schemas work best for simple one-to-many relationships between the fact table and dimension tables. However, they can be challenging to model more complex relationships, such as many-to-many or self - referencing relationships. In such cases, additional techniques like bridge tables or snowflake schemas may be required, adding complexity to the schema design. © It's important to note that while these disadvantages exist, they may not be significant concerns for all use cases. The suitability of a star schema depends on the specific requirements, data volume, query patterns and the trade - offs that need to be made in the data warehousing environment. EG snowflake Schema The snowflake schema is a data modeling technique used in data warehousing, which is an extension of the star schema. It is designed to address some of the limitations of the star schema, particularly related to data redundancy and the ability to handle complex relationships. TECHNICAL PUBLICATIONS® - an up-thnust for knowledge 4-19 Dimensional Modeling and Schema Ina snowflake schema, dimension tab) : al into multiple smaller table: ’ ; Ss each repr i within the dimension, cb ee eee le Mor relationships, formin; These smaller tables are conn: isugitand Se dy g 8 structure that resembles a sni oe, itieaatd snowflake, hence the name. les are i Normalized, meaning that they are divided Time Dimension ae Product bimensi lon ProduetCatego ore Prowse |__| —, Dimension” dit oduct ven cist » esha ProduetCategor/D mpl roductCat Name Month Toe ran : Quantity ee Discount Customer Employee Dimension [pimension, Customerid Employee! CustomerName Empleyeetame ‘Address DepartmentiD 7 as CityiD Region Dimension Dimension serrton, as Deparment cityName Leeaon Zipcode Sate Country Fig. 4.5.1 Snowflake schema «In Fig 4.5.1 the employee dimension table now contains the attributes : Employee ID. Employee Name, Department ID, Region, and Territory. The Department ID attribute links with the Employee table with the Department dimension table. The Department dimension is used to provide detail about each department, such as the Name and Location of the department. The Customer dimension table now contains the attributes : Customer ID, Customer Name, ‘Address, and City ID. The City ID attributes link the Customer dimension table with the City dimension table. The City dimension table has has ci .¢ from the star schema in that it allows for more granular By normalizing dimension tables, details about each city such ity name, Zipcode, State and Country. flake schema differs ion within dimensions. ted, resulting in reduced storage requirements compared so help improve data integrity and © The snow! levels of data representa redundant attributes are climinalees 1 to the star schema. This normalization can al consistency. TacnNIGALEUBL! sn up-thrust for knowledge Data Warehousing 4-20 Dimensional Modeling and Schema EES] Features of Snowflake Schema Here are some key features of the snowflake schema : L Normalized dimension tables : In a snowflake schema, dimension tables are normalized, meaning they are divided into multiple smaller tables. Each smaller table represents a specific level of detail or hierarchy within the dimension. This normalization reduces data redundancy by avoiding duplication of attributes. Hierarchical structure : The snowflake schema allows for a hierarchical structure within dimensions. The dimension tables are connected through one - to - many relationships, forming a snowflake - like structure. This structure enables drill - down and roll - up operations, allowing users to navigate through various levels of detail within a dimension. The snowflake schema excels at handli complex relationships, such as many - to - many or self - referencing relationships. Complex relationship handling By breaking down dimension tables into smaller tables, it becomes easier to represent and manage these relationships accurately. Improved data integrity : With normalized dimension tables, updates or changes to dimension attributes can be applied consistently across all levels of the hierarchy. This helps maintain data integrity and ensures that changes are accurately reflected throughout the schema. Reduced data redundancy : Normalizing the dimension tables in a snowflake schema eliminates redundant attributes. This reduction in redundancy leads to lower storage requirements and makes data maintenance easier, especially for large data warehouses. Flexibility in querying : The snowflake schema provides flexibility in querying by allowing drill - down and roll - up operations across different levels of detail within a dimension. Users can navigate the hierarchy and retrieve data at various levels of granularity as needed. Scalability : The snowflake schema offers scalability as the data warehouse grows. The normalized structure allows for efficient management of large volumes of data without sacrificing performance or data integrity. Enhanced data consistency : With the normalized structure of the snowflake schema, data consistency can be maintained across different levels of the dimension hierarchy. Changes made to a_ speci attribute in one level will propagate consistently to all related levels, ensuring consistent and accurate data representation. TECHNICAL PUBLIGATIONS®- an up-trust for knowiedge oronusid 4-21 Dimensional Modeling and Schema g |. important to note that whil : ane improved a = snowflake schema provides benefits such as reduced ear yces increased complexi egrity and complex relationship handling, it also fee to use a ae to the schema design and querying process. The gecisi schema should consi : on i trade - offs of the data warehousing project. sider the specific requirements and ge qhead I ae 2 redundancy : Normalizing the dimension tables reduces data redundancy by avoiding duplication of attributes. This can lead to lower storage requirements and easier data maintenance, especially for large data warehouses. 2. Improv dimension attributes can be applied consistently across all levels of the hierarchy. reflected ‘this helps maintain data integrity and ensures that changes are accurately # throughout the schema. 3, Flexibility in querying + querying by allowing drill - dvantages of Snowflake Schema vantages of snowflake schema are : ed data integrity : With normalized dimension tables, updates or changes to ¢ schema provides more flexibility in rious levels of The snowflal down and roll - up operations across va detail within a dimension. Users can navigate the hierarchy and retrieve data at different levels of granularity as needed The snowflake schema is particularly useful such as many - to - many oF self - ‘ons into multiple tables, it 4, Handling complex relationships 1 complex. relationships. By breaking down dimensi and represent these relationships accurately. when dealing with referencing relationships becomes easier to manag [BQ Disadvantages of Snowflake Schema Despite its advantages, the snowflake schema als ity : The normalization of dimen: and query formu! itiple levels of the hierarchy. which ca ‘9 has some drawbacks to consider : sion tables introduces additio Jation. More joins are typically n impact nal 1. Increased complexi complexity to the schema desig! om mu required to retrieve data fr {joins in a snowflake schema d to the star schema. The Jower query query performance. 2. Performance implications = ively impact query \d more co! == The increased number 01 performance compare’ can neg mplex query paths can lead to sl additional join operations an €xccution times. Data Warehousing 4-22 Dimensional Modeling and Schema 3. Query complexity : Querying data from a snowflake schema can be more challenging for users due (o the increased complexity of the schema. Formulating queries across multiple tables and understanding the relationships between them can be more time - consuming and error - prone. ESS comparison of Star Schema and Snowflake Schema © The star schema and snowflake schema are two common data modeling techniques used in data warehousing, While they share similarities, they also have distinct differences, Here's a comparison between the star schema and snowflake schema : to the snowflake schema. The denormalized structure requires fewer joins, leading to simpler and faster query execution, Sr.No Parameters Star schema Snowflake schema 1. Structure The star schema has a simple The snowflake schema extends structure with a single fact table atthe star schema by normalizing the center and multiple dimension ___ dimension tables. Dimension tables radiating outwards. tables are divided into multiple Dimension tables are denormalized, smaller tables, creating a containing all the attributes related snowflake - like structure with to.a dimension. additional levels of detail. 2 Data ‘The star schema tends to have more The snowflake schema reduces redundancy data redundancy as dimension tables _ data redundancy by normalizing are denormalized, resulting in dimension tables, resulting in attribute duplication across multiple smaller storage requirements. rows. This redundancy can lead to larger storage requirements 3+ Query The star schema generally offers ‘Snowflake schema may have performance better query performance compared _ slightly slower query performance due to additional joins required to navigate between the normalized dimension tables. However, with proper indexing and optimization, the performance difference can be minimized. 4 Complexity The star schema is simpler to understand, implement and query due to its denormalized structure, It is easier to maintain and modify as it requires fewer tables and relationships. Snowflake schema introduces additional complexity due to the normalized structure. It involves more tables and relationships, requiring more complex joins and query formulation. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge y- werent! Relationship handling using 4-29 pst Th rhe star schema is suitable for simpler one - to - many relationships between the fact table and dimension tables. It may not handle complex relationshif as many - to - many or self- referencing relationships, as effectively, 5 , such The star schema can be easier to maintain data integrity since it has a denormalized structure, reducing the chance of inconsistencies when updating data, Data integrity ~ The choice between the star schema and snowflake requirements, com| considerations of tl weaknesses and the decision s| the data and business requirements. i] Fact Constellation Schema « Fact constellation schema is ar ‘and business intelligence to organize an different facts an handle more complex and interrelated business scen bles are a differ multidimensional data d_represent Ina fact constellation schema, multiple fact tal of dimension tables. Each fact table represents inventory, or customer ive information re! business, such as sales, provide the context and descript geography, product and customer. * Fig. 4.7.1 Shows fact constellation Dimension Table Schema. Dimension Table| Dimension Table| act Table Dimension Table Fig. 4 re of fact constell plexity of relationships, query perfor he data warehousing project. Both schemas have the hould be based on careful analysis and w .d dimensions. It is an extension of the sta jarios. Dimonsional Modoling and Scheme Snowflake schema excels at handling complex relationships: By normalizing dimension tables- it can handle many - to - many OF self - referencing relationships more efficiently. The snowflake schema requires more attention to maintain data integrity due to the presence of normalized tables. Changes oF updates to attributes must be consistently propagated across multiple tables. schema depends on the specific mance needs and data integrity ir strengths and nderstanding of model used in data warehousing complex relationships between schema and is designed to connected to a central collection ent aspect or measurement of the interactions. The dimension tables Jated to the facts, such as time, Dimension Table] Dimension Table| yn schema 7.1 General struct! ON ea | Data Warehousing 4-24 Dimensional Modeling and Schems Example : Dimension Table Student Stud_roll Eact Table Name Fact Table = Placement CGPA Workshop, Dimension Table ineneie a ple aan stud_ro | [Training institute compan pany: Company_id Institute_id Institute_ia Company_id TPO_id TPO_id acs eewrs No, of students Dimension Table No. of stadents Full_course_fee Offer_Package [Cs a selected No. of students No. of students attented placed TPO_id Name Age Fig. 4.7.2 Example of fact constellation schema 4. Placement is a fact table having attributes like Stud_roll | Company_id, TPO_id with the facts Number of students eligible, Number of students placed. 2. Workshop is a fact table having attributes like Stud_roll Institute_id TPO_id with the facts ‘Number of students selected ‘Number of students attended the workshop. 3. Company is a dimension table having attributes Company_id Name Offer_package TECHNICAL PUBLICATIONS® - an up-thrust for knowledge eonousing 225) Dimensional Modeling and Schema es L student uudent_roll a dimension table having attributes Name caPA is a dimension table having attributes Vos 7P0 i 7Po_id Name Ase g. training institute is a dimension table having attributes Institute_id Name Full_course_fee. So, there are two fact tables namely, placement and workshop which are part of two different star schemas having dimension tables - Company, Student and TPO in Star schema with fact table Placement and dimension tables - Training Institute, Student and TPO in Star schema with fact table Workshop. Both the star schema have two dimension tables common and hence, forming a fact constellation or galaxy schema. Features of Fact Constellation Schema The key features of a fact constellation schema are : 1, Multiple fact tables : Unlike the star schema, which typically has one central fact table, the fact constellation schema includes multiple fact tables. Each fact table represents a different business process or measurement. 2. Shared dimension tables : The dimension tables are shared among the different fact tables. This means that multiple fact tables can have relationships with the same dimension tables, allowing for complex analysis and drill - down capabilities. 3. Complex relationships : The fact constellation schema allows for complex relationships between the fact tables. For example, one fact table may have a direct telationship with another fact table, enabling analysts to analyze the impact of one business process on another. 4 Increased flexibility : The fact constellation schema provides greater flexibility in analyzing data across multiple dimensions. Analysts can perform complex queries and aggregations involving multiple fact tables and dimensions, enabling them to Sain deeper insights into business performance and relationships. L tan ip tiiet for kaotwiedgé Data Warehousing 4-26 Dimensional Modeling and Schema © Overall, the fact constellation schema is interrelated business processe: seful when dealing with complex ang where multiple measurements need to be analyzed ang understood together, It provides a flexible and powerful data model for decision suppor and business intelligence applic: lions. Advantages of Fact Constellation Schema © The fact constellation schema offers several advantages over other data modeling approaches in data warehousing and business intelligence. Here are some of the key advantages = Flexibility and scalability : The fact constellation schema provides a high degree of flexibility and scalability. It can handle complex and evolving business requirements by accommodating multiple fact tables and their relationships. As new business processes or measurements emerge, additional fact tables can be added without impacting existing dimensions or facts. Enhanced analysis capabilities : The fact constellation schema enables advanced analysis capabilities by allowing analysts to explore relationships between different fact tables. Analysts can perform multidimensional analysis, drill across different dimensions and investigate the impact of one business process on another. This flexibility enhances the depth and breadth of data analysis. Comprehensive data integration : With multiple fact tables and shared dimension tables, the fact constellation schema promotes comprehensive data integration. It allows for the integration of data from diverse sources and systems, enabling a holistic view of the business. This integration supports cross - functional analysis and helps identify correlations and patterns that might not be evident in isolated data sets. Improved performance : By distributing the data across multiple fact tables, the fact constellation schema can improve query performance. Queries involving specific fact tables can be optimized, resulting in faster response times. Additionally, the schema's structure facilitates indexing and partitioning strategies, further enhancing performance. Granular level of detail : The fact constellation schema allows for a granular level of detail in data analysis. Different fact tables can capture specific measurements of events, enabling analysts to analyze data at a highly detailed level. This granularity supports accurate and precise reporting, forecasting and decision - making processes. Separation of concerns : The fact constellation schema separates the concerns of different business processes into separate fact tables while sharing common TECHNICAL PUBLICATIONS® - an up-thrust for knowledge # using o wore ; dimensions. This separati Dimensional Modeling and Schema on helps main ‘ain data integrity, as changes or updates to It also cables better data governance and model, one fact table do not impact other: i d s. simplifies the maintenance of the dat an werall, the fact constellation schema em, mplex : r ; Oo Cmpowers organizations to handle complex and . rerconnected business ‘ ied sale en Clfectively. It offers flexibility, scalability and adv F abilities, enabling de a : r decision - making. per insights and —_ informe Disadvantages of Fact Constellation Schema while the fact constellation sch A . ee ema provides several advantages, it is important to consider its Pp isadvantages as well. Here are some of the disadvantages associated with the fact constellation schema : - 1 Increased complexity The fact constellation schema introduces increased complexity compared to simpler data models such as the star schema. Managi multiple fact tables, their relationships and shared dimensions can be more challenging and require a higher level of expertise. The complexity can make the schema harder to understand, maintain and modify. 2. Higher storage requirements : Due to the presence of multiple fact tables and shared dimensions, the fact constellation schema may require more storage space compared to simpler schemas. Each additional fact table adds to the overall stora: requirements and shared dimensions need to be duplicated across different fact nd impact system performance. tables. This can increase storage costs a Integrating data from various sources and sy be more complex in a fact constellation schema. As multiple fact tables are involved, each with its own set of dimensions, aligning data across different sources can be challenging. Ensuring consistent and accurate integration of data from diverse systems requires careful planning and robust ETL (Extract. ‘Transform, Load) Processes. 4. Query performance conside improved query performance, ita i Analyzing data across multiple fact «bles 2°? im i impact quer Operations and aggregations, which can me ia p : indexing, partitioning and qu performance issues. 1 The fact constell 5. Maii Kee} intenance and upkecP keep compare’ in terms of maintenance and UP a ee tems can 3. Data integration challenges : vations + While the fact constellation schema ean oftter Iso introduces potentia ensions may involve complex join performance challenges. ry response times. Careful jmizatit required to mitigate ery optimization ! ig ‘on schema can be more demanding lati 4 to simpler schemas. As the schema Date Warehousing 4-28 Dimensional Modeling and Schema evolves with the addition of new fact tables or dimensions, ongoing management jg required to ensure data integrity and cons istency. Changes to dimensions or relationships may necessitate updates to existing reports, queries and analytical tools, which can be time - consuming. 6. Learning curve for users The complexity of the fact constellation schema may increase the learning curve for end users. Understanding the relationships between different fact tables and dimensions, as well as the appropriate usage of the schema for analysis. requires specialized knowledge and training. This can potentially limit the accessibility and usability of the data model for non - technical users. © It is crucial to weigh these disadvantages against the specific requirements and complexities of the business scenario when considering the adoption of a fact constellation schema. Depending on the nature of the data and analytical needs, the fact constellation schema may provide substantial benefits despite its drawbacks. EY schema Definition e Ina data warehouse, schema definition refers to the process of designing the structure and organization of the data within the warehouse. It involves determining how the data will be organized, the relationships between different data elements and the overall schema or structure that will be used for data storage and retrieval. © There are two common types of schema designs used in data warehousing : 1. Star schema : The star schema is a widely adopted schema design in data warehousing. It consists of a central fact table surrounded by dimension tables. The fact table contains the numerical or measurable data (e.g., sales, revenue), while the dimension tables provide descriptive attributes related to the facts (e.g., product, customer, time). The fact table and dimension tables are connected through primary - key and foreign - key relationships. The star schema simplifies queries and allows for fast aggregation of data, making it suitable for reporting and analysis. 2. Snowflake schema : The snowflake schema is an extension of the star schema, where dimension tables are further normalized into multiple levels of tables. The normalization reduces data redundancy and can improve data integrity. In a snowflake schema, the dimension tables are connected through primary - key and foreign - key relationships, similar to the star schema, However, the snowflake schema's additional normalization can lead to more complex queries and slightly slower performance compared to the star schema. . TECHNICAL PUBLICATIONS® - an up-thrust for knowledge a definition involves the following steps : ae sui : sa gentity busines requirements : Understand the busi pee cs analytical goals of the data warehouse. Identify the key entities or ¢0' represente mensions : Identify the dimensions (e.g., product, customers = rmi eit the data. Determine the attributes or characteristics attributes like pro .d in the schema, time) that provide that define duct pefine di descriptive conte) dimension. For example, the product dimension may include me, category and price. I be stored in the fact ets : Determine the numerical or measurable data that will .) or metrics that are of cach 1D. nal pefine fai tables. These are typically the Key Performance Indicators (KPIs; interest for analysis, such as sales, revenue, or inventory levels. pstablish tables. Thi correspondin, Determine granul: will be stored in the fact tal depending on the business requirements. ‘ew the schema design and ensure it ali cents or refinements as necessary 0 OP relationships : Define the relationships between the fact table and dimension 5 is done by identifying the primary keys in dimension tables and g foreign keys in the fact table. larity : Determine the level of detail or granularity at which the data ble. This could be at a daily, weekly, monthly, or any other ens with the business appropriate level, timize the schema for Validate and refine : Revi requirements. Make adjustm performance and data analysis. re and usability of Je in shaping the overall structu query d analysis, allowing Schema definition plays a crucial rol icient data storage, retrieval an the data warehouse. It enables effici formed decisions based on the de users to derive insights and make ini jata contained within the warehouse. overall design and structure of the Process Architecture cture refers t0 the a data’ warehouse. It encompasses * Data warehouse process archite’ processes involved in building and maintaining various stages and components that work together t data for analytical purposes- found ina data warel . Here are the key components typically The first step js to extract data from flat files, APIs, oF other systems. ‘This involves identifying 1g it into the data warehouse env! et for knowledge o extract, transform, load and present house process architecture : multiple sources such as 1, Data extraction : transactional databases, the relevant data and pullin ironment. Data Warehousing 4-30 Dimensional Modeling and Schema : Once the data is extracted, it often needs to be transformed to 2. Data transformation and compatibility. Data transformation may involve cleaning, ensure consistency or restructuring the data to meet the requirements filtering, aggregating, integrating, of the data warehouse schema. 3. Data loading : After transformation, the data is loaded int There are different loading strategies, including full loads (where all data is loaded) he changes since the last load are loaded). Loading depending on the 10 the data warchouse. or incremental loads (where only th can be done through batch processing or real - time streaming, requirements. : The loaded data is stored in the data warehouse in a structured 4, Data storage a Relational Database manner to support efficient querying and analysis. Typically, Management System (RDBMS) is used for storage, such as Oracle, SQL Server, or PostereSQL. Alternatively, some data warehouses employ NoSQL databases or distributed file systems for storage. 5, Data modeling : Data modeling involves designing the structure of the data warehouse, including the definition of tables, relationships and hierarchies. The most common modeling techniques are dimensional modeling (¢.g., star schema, snowflake schema) and entity - relationship modeling. 6. Metadata management : Metadata, which provides information about the data, is crucial in a data warehouse. It includes data definitions, data lineage, business rules, data quality metrics and other details. Effective metadata management ensures data accuracy, improves data understanding and supports data governance. 7. Query and analysis : Once the data is loaded and organized, users can query and analyze the data warehouse. Business Intelligence (BI) tools and analytics platforms are commonly used to create reports, dashboards, visualizations and perform advanced analytics on the data. 8. Data governance and security : Data governance ensures the quality, availability ndards and integrity of the data in the warehouse. It involves establishing policies, sta protect and procedures for data management. Security measures are implemented to the data from unauthorized access, ensuring compliance with privacy regulations and maintaining data confidentiality. : Data warehouses are not static; they require 9, Data refresh and maintenance he data regular maintenance and periodic data refreshes. This involves updating warehouse with new or modified data, performing data quality checks and optimizing performance by indexing, partitioning, or aggregating data. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge vy pata Warehousing 4-31 Dimensional Modeling and Schema 10. Monitoring and performance tuning»: Continuous monitoring of the data warchouse's performance is crucial to identify bottlenecks, optimize query execution and ensure responsiveness. Performance tuning techniques, such as indexing, caching, or parallel processing, may be applied to enhance the system's efficiency. ¢ The exact architecture of a data warehouse process may vary depending on the specific requirements, technologies used and organizational preferences. However, these components provide a general framework for understanding the key stages and activities involved in building and maintaining a data warehouse. EES Types of Process Architecture Following are the two types process architectures : 1, Centralized process architecture 2. Distributed process architecture Centralized process architecture Distributed process architecture Fig. 4.9.1 Types of process architecture BI centralized Process Architecture © Centralized process architecture refers to a | Client design approach where all processes and Client ] Client Operations are consolidated and controlled from a central location or system. In this architecture, a single central entity or system handles the core functionalities and Operations, while other entities or components interact with it to perform their tasks. Central data warehouse Here are some key characteristics and benefits of 4 centralized process architecture : * Central control : The central entity or system has control and oversight over all processes Fig. 4.9.2 Centralized process architecture TECHNICAL PUBLICATIONS® - an up-thrust for knowedge Data Warehousing 4-32 Dimensional Modeling and Schema and operations. It acts as a central hub for coordinating acti resources. © Standardization : A centralized architecture promotes standardization of processes, procedures and data formats, This ensures consistency and uniformity across the entire system, making it easier to manage and maintain. ies and managing © Data integration : Centralization facilitates data integration by consolidating data from various sources into a single repository or system. This enables better data management, analysis and reporting capabilities. © Resource optimization : By consolidating processes and resources, a centralized architecture allows for better resource allocation and optimization. It helps eliminate duplication and redundancy, leading to improved efficiency and cost savings. © Decision-making and governance : Centralization provides a centralized decision- making structure, allowing for better control, governance and enforcement of policies and guidelines across the system. Benefits of Centralized Process Architecture Following are the benefits of centralized Process Architecture : 1. Improved efficiency : Centralizing processes eliminates the need for multiple systems and redundant operations. It streamlines workflows, reduces manual intervention and enables more efficient resource utilization. 2. Enhanced data quality : Centralized architectures enable better data governance and quality control measures. With standardized processes and data integration, organizations can ensure data consistency, accuracy and integrity. 3. Increased collaboration : Centralized architectures foster collaboration and information sharing across different departments or entities. It facilitates seamless communication, data access and knowledge transfer. Scalability and flexibility : Centralization simplifies scalability as resources can be easily managed and allocated from a central point. It also offers flexibility in adapting to changing business needs and accommodating growth. 5. Simplified maintenance : Having a centralized architecture makes maintenance and updates more manageable. System upgrades, bug fixes and enhancements can be applied centrally, reducing complexity and downtime. 6. Enhanced security and compliance : Centralized architectures provide better control and security measures. Data access, user permissions, and security protocols can be enforced centrally, reducing the risk of data breaches and ensuring compliance with regulatory requirements. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge yr pata Warehousing 4-33 Dimensional Modeling and Schema ¢ However, it's important to note that centralized architectures may have some limitations. They can introduce single points of failure, and if the central entity or system experiences issues, it may impact the entire system. Additionally, implementing and managing a centralized architecture requires careful planning, coordination and investment in infrastructure and resources. Overall, a centralized process architecture offers several benefits in terms of efficiency, data management, collaboration and scalability. It can be particularly advantageous for organizations that require standardized Processes, centralized control, and streamlined operations. [22 Distributed Process Architecture Distributed process architecture, also known as decentralized process architecture, is an approach where processes and operations are spread across multiple entities or systems. In this architecture, different components or entities perform specific tasks or functions independently, yet coordinate and communicate with each other to achieve a common goal. There are some architectures of the distributed process : 4. Client-server In this architecture, the user will accomplish all the information collecting and arrangement, while the server does the processing and management of data. 2, Three-tier architecture In this architecture, the client machines is connected to a server machine, thus mandating finite states and introducing latencies and overhead in terms of record to be carried between clients and servers. 3. N-tier architecture The n-tier or multi - tier architecture is where clients, middleware, applications and servers are isolated into tiers. 4. Cluster architecture In this architecture, machines that are connected in network architecture (software or hardware) to approximately work together to process information or compute requirements in Parallel. Each device in a cluster is associated with a function that is processed locally and the "sult sets are collected to a master server that returns it to the user. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Modeling and Sche Data Warehousing 4-34 Dimensional 9 ema 5. Peer-to-Peer Architecture This is a type of architecture where there are no dedicated servers and clients. Instead, alj the processing responsibilities are allocated among all machines, called peers. Each machine can perform the function of a client or server or just process data. Here are some key characteristics and benefits of a distributed process architecture : 1. Decentralized control : In a distributed architecture, control and decision - making are distributed among multiple entities or systems. Each component has autonomy in managing its processes and operations. 2. Task segmentation : Processes and tasks are segmented and allocated to different entities based on their capabilities, resources, or expertise. Each entity focuses on its assigned tasks, leading to specialization and efficiency. 3. Communication and coordination : Entities in a distributed architecture communicate and coordinate with each other through standardized protocols, interfaces, or messaging systems. This ensures synchronization and collaboration between different components. Fault tolerance and resilience ; Distributed architectures are designed to be resilient in the face of failures or disruptions. If one component fails, other components can continue to operate independently, reducing the impact on the overall system. 7 Scalability : Distributed architectures offer scalability by allowing additional entities or components to be added as the system grows. It enables the system to handle increased workloads and accommodate expanding requirements. EH) Benefits of Distributed Process Architecture 1. Improved performance : By distributing tasks across multiple entities, a distributed architecture can enhance performance and throughput. Each entity can focus on its specific tasks, leading to parallel processing and reduced bottlenecks, 2. Increased flexibility : Distributed architectures offer flexibility in terms of adding or removing components as needed. It allows for easy integration of new technologies, systems, or services, enabling adaptability to changing business needs. 3. Enhanced fault tolerance : The decentralized nature of a distributed architecture improves fault tolerance. If one component fails or experiences issues, other components can continue functioning, minimizing the impact on the overall system. 4. Geographic distribution : Distributed architectures can be advantageous when dealing with geographically dispersed entities or users. It enables efficient processing and data access, even across different locations or regions. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge worehousing a e Dimensional Modeling and Schema 5, Load balancing : Distribute architectures facilitate load balancing by distributing workloads across multiple entities, This helps optimize resource utilization and ensures smooth operation during peak demand ae 6, Reduced dependencies : Components in a distributed architecture can operate independently, reducing dependencies on a single system or entity. This enhances resilience and reduces the risk of a single point of failure. 7, Data locality : In some distributed architectures, data can be distributed closer to the entities or systems that need it, reducing data transfer latency and improving data access performance, Distributed process architectures offer several advantages in terms of performance, flexibility, fault tolerance and scalability. However, they also introduce complexities in terms of communication, coordination and management of distributed entities. Proper design, synchronization mechanisms and robust communication protocols are crucial to ensure the smooth functioning and effective collaboration of the distributed components. [Sd Types of Data Base Parallelism « Database parallelism refers to the ability to perform multiple tasks simultaneously in a database system to improve performance and efficiency. There are different types of parallelism techniques employed in database systems to achieve parallel processing. Here are some commonly used types of database parallelism : 1. Task parallelism : Task parallelism involves dividing a database task-or query into multiple subtasks and executing them concurrently on different processing units. Each subtask operates on a separate portion of the data, allowing for parallel execution. This parallelism technique is commonly used in database systems to improve query performance. Data parallelism : Data parallelism involves splitting a large dataset into smaller partitions and processing them concurrently across multiple processing units. Each partition is processed independently, allowing for parallel execution. Data parallelism is well-suited for operations that can be applied separately to different portions of the data, such as filtering, sorting and aggregating. 3. Pipeline parallelism : Pipeline parallelism divides @ database task into multiple stages and each stage is executed concurrently. Each stage processes a subset of the data and passes the intermediate results to the next stage. This parallelism technique is useful for tasks that involve a sequence of operations, such as data transformation or data integration. nee “ATIONS®. TECHNICAL PUBLIC Data Warehousing 4-36 Dimensional Modeling and Schema 4. Parallel query execution : Parallel query execution involves breaking down a database query into smaller units and executing them simultaneously across multiple processing units. The query optimizer determines the most efficient way to divide the query into subtasks and distribute them for parallel processing. This technique is commonly used in parallel database systems to improve query response time. 5. Parallel data loading : Parallel data loading is used when loading large amounts of data into a database. It involves dividing the data into smaller chunks and loading them simultaneously into different database partitions or tables. Parallel data loading can significantly speed up the data loading process and reduce the overall load time. 6. Parallel indexing : Parallel indexing is a technique used to create or rebuild database indexes in parallel. The index creation process is divided into multiple tasks and each task is executed concurrently on different processing units. This approach accelerates the index creation process and reduces the time required for indexing large volumes of data. 7. Parallel backup and recovery : Parallel backup and recovery techniques involve performing backup and recovery operations simultaneously across multiple database components or partitions. This approach reduces the backup or recovery time, especially for large databases, by distributing the workload across multiple resources. © These are some of the commonly used types of database parallelism. The specific parallelism techniques employed depend on the database system, hardware infrastructure and the nature of the tasks or queries being executed. Effective utilization of parallelism can significantly improve the performance and scalability of database systems. Datawarehouse Tools There are several data warehouse tools available in the market that facilitate the development, management and analysis of data warehouses. These tools offer various functionali ies, such as data Extraction, Transformation, Loading (ETL), data modeling, reporting and analytics. Here are some popular data warehouse tools : 1. Oracle Data Integrator (ODI) : ODI is an ETL tool provided by Oracle that enables data extraction, transformation and loading into the data warehouse. It supports both batch and real - time data integration and provides a graphical interface for designing and managing data integration workflows. 2. Informatica PowerCenter : Informatica PowerCenter is a widely used ETL tool that offers comprehensive data integration capabilities. It supports data extraction TECHNICAL PUBLICATIONS® - an up-thrust for knowledge bh TECHNICAL PUBLICATIONS® - 1g and Schema 7 Dimensional Model to data warehouse: ty from various sources, data transformation and data loading powerCenter provides a visual development environment and extensive connec! options: Microsoft SQL Server Integration Services (SSIS) : SSIS is an ETL tool included in Microsoft SQL Server. It allows users to create data integration workflows, extract data from various sourees, transform data and load it into the SQL Server database or data warehouse. SSIS provides a visual development environment within SQL Server Management Studio. IBM InfoSphere DataStage : InfoSphere DataStage is an ETL tool provided by IBM that facilitates data integration and transformation. It offers a graphical interface for designing and managing ETL processes. DataStage supports parallel processing and provides a scalable platform for data integration in data warehouse environments. Talend Data Integration : Talend Data Integration is an open-source ETL tool that enables data integration and transformation across various sources and targets. including data warehouses. It offers a comprehensive set of data integration and quality features and supports both batch and real - time data integration. . MicroStrategy : MicroStrategy is a business intelligence platform that provides robust reporting and analytics capabilities for data warehouses. It offers a suite of tools for data visualization, ad-hoc querying, dashboards and advanced analytics. MicroStrategy supports integration with various data warehouse platforms. . Tableau : Tableau is a popular data visualization and analytics tool that connects to data warehouses and allows users to create interactive visualizations and reports. It supports data exploration, ad-hoc analysis and collaborative analytics. Tableau integrates with multiple data warehouse systems and provides easy-to-use drag-and- drop functionality. QlikView / Qlik Sense : QlikView and Qlik Sense are data discovery and Visualization tools that enable users to create interactive dashboards and reports from data warehouses, They offer associative data modeling, powerful visualizations and self-service analytics capabilities, Qlik tools support integration with various data Warehouse platforms. . Amazon Redshift : Amazon Redshift is a fully managed data warehousing service Provided by Amazon Web Services (AWS). It offers scalable and high - performance ties in the cloud. Redshift provides data loading, querying ration options with other AWS services, data warehousing capab nd analytics functionalities, along with integr ‘an upethnust for knowiedge Data Warehousing 4-38 Dimensional Modeling and Schema 10. Google BigQuery : Google BigQuery is a serverless data warehousing and analytics platform offered by Google Cloud. It allows users to analyze large datasets using SQL queries and supports fast and scalable data processing, BigQuery integrates with various data sources and offers built-in machine learning capabilities. © These are just a few examples of data warehouse tools available in the market. The choice of tool depends on factors such as specific requirements, budget, scalability needs, integration capabilities and existing technology stack within an organization. Review Questions 1. Explain the key principles of dimensional modeling. (Refer section 4.1.1) FE 2. Discuss the objectives of dimensional modeling. (Refer section 4.1.2) F=za 3. Explain elements of dimensional data model. (Refer section 4.1.3) Eezg 4 = Fag |. Explain in detail steps of dimensional modeling. (Refer section 4.1.4) w Explain multi-dimensional data modeling. (Refer section 4.2) 6. Explain key components of multi - dimensional data modeling. (Refer section 4.2.1) hee 7. Explain data cube. (Refer section 4.3) 8. Explain star schema. (Refer section 4.4) fea 9. Explain snowflake schema. (Refer section 4.5) ReoG YZ] Two Marks Questions with Answers Q.1 What is dimensional modeling ? Ans. : Dimensional modeling is a design technique used in data warehousing to organize and structure data for efficient analysis and reporting. It provides a way to represent the data in a dimensional format that is optimized for querying and aggregating information. Q.2 Whatare the elements of dimensional data model ? Ans. : 1. Fact 2. Dimension 3. Attributes 4. Fact table 5. . Dimension table TECHNICAL PUBLICATIONS® - an up-thrust for knowledge y we wereho! what is data cube ? using ae 39 Dimensional Modeling and Schema pe! data cube is a fundamental Concept in data warehousing that represents a multi- jimensional oy of Cae ht Provides a structured and efficient Way to organize and analyze gata 1008 multiple dimensions. A data cube combines dimensions, hierarchies, and measures rn enable users to slice, dice, drill down, and roll up data for analysis and reporting purposes. 4 What is star schema ? ans. : Star schema is a widely used data modeling technique in the design of data warehouses. wis named so because the diagrammatic representation of this schema resembles a star shape, with a fact table at the center and multiple dimension tables radiating outwards. 5 Whatis snowflake schema ? Ans. ¢ anextension of the star schema. It is designed to address some of the limitations of the star he snowflake schema is a data modeling technique used in data warehousing, which is schema, particularly related to data redundancy and the ability to handle complex relationships. Q6 What is fact constellation schema ? Ans. : Fact constellation schema is a multidimensional data model used in data warehousing and business intelligence to organize and represent complex relationships between different facts and dimensions. It is an extension of the star schema and is designed to handle more complex and interrelated business scenarios. gaa

You might also like