Data Warehousing and Online Analytical Processing
Data Warehousing and Online Analytical Processing
Data Warehousing is a technology that aggregates structured data from multiple sources
into a centralized repository for analysis and reporting. It is designed to support business
decision-making by providing a consolidated view of the organization’s data.
1. Subject-Oriented:
○ Explanation: The data warehouse is organized around key business
AK
subjects such as customers, sales, products, etc., rather than individual
transactions or processes. This orientation makes it easier for businesses to
AY
analyze data from a particular perspective.
○ Example: If a company wants to analyze customer purchasing behavior, a
N
data warehouse might contain data specifically related to customers, such as
demographics, purchase history, and preferences.
LP
2. Integrated:
○ Explanation: Data from different sources (e.g., CRM, ERP, financial
KA
systems) is combined and standardized in the data warehouse. This
integration ensures that data from various sources can be used together for
N
analysis.
SA
comprehensive analysis.
3. Non-Volatile:
○ Explanation: Once data is entered into the data warehouse, it is not altered.
S
This ensures that historical data remains intact for long-term analysis.
TE
○ Example: Sales data for the year 2020 will remain unchanged in the
warehouse, even if there are changes in the operational systems in 2021.
O
4. Time-Variant:
○ Explanation: Data warehouses store historical data, enabling analysis of
W
○ Example: A retail company can analyze monthly sales data over the past
M
five years to identify seasonal trends and make future sales forecasts.
D
1. Data Sources
● Operational Databases:
1
Data Warehousing And Online Analytical Processing
AK
database to complement its sales data.
AY
2. Data Staging Area
● ETL Process:
N
○ Extract:
LP
■ Explanation: Data is collected from multiple sources, which might
have different formats and structures.
KA
■ Example: Extracting customer data from a CRM system and sales
data from an ERP system.
○ Transform:
N
■ Explanation: The extracted data is cleaned and transformed into a
SA
○ Load:
TE
are loaded into the customer and sales tables in the data warehouse.
N
3. Data Storage
W
2
Data Warehousing And Online Analytical Processing
4. Data Presentation
● OLAP (Online Analytical Processing) Tools:
AK
perspectives on the same data.
○ Example: A sales manager could use an OLAP tool to view sales data by
AY
region, product, and time period, identifying trends and outliers.
● Reporting Tools:
○ Explanation: These tools generate reports that summarize the data stored
N
in the warehouse. They can produce both standard reports (e.g., monthly
LP
sales reports) and ad-hoc reports tailored to specific queries.
○ Example: A financial analyst might generate a report showing quarterly
KA
revenue growth across different product lines.
N
SA
BY
S
TE
O
N
W
D
M
D
○
3
Data Warehousing And Online Analytical Processing
AK
sensitive data.
5. Administerability: The data warehouse should be easy to manage and maintain,
AY
with tools for monitoring performance, managing storage, and ensuring data
quality.
N
Types of Data Warehouse Architectures
LP
1. Single-Tier Architecture
KA
● Explanation: This architecture minimizes the amount of data stored by creating a
virtual data warehouse, where data is processed on the fly rather than being stored
N
in a central repository.
SA
2. Two-Tier Architecture
S
● Explanation: This architecture separates the data warehouse from the source
TE
systems, with an ETL process used to extract, cleanse, and integrate data before
loading it into the warehouse.
O
as the central repository, and data marts can be created for specific departments.
W
3. Three-Tier Architecture
D
systems and the data warehouse. The reconciled layer standardizes data across the
D
enterprise, providing a consistent data model that feeds the data warehouse and
data marts.
● Example: A large enterprise might use a three-tier architecture to ensure that data
from different departments is standardized before being loaded into the data
warehouse. The reconciled layer helps manage the complexity of integrating data
from multiple sources.
4
Data Warehousing And Online Analytical Processing
Query-Driven Approach:
AK
● Explanation: In a heterogeneous DBMS, the integration is typically done
on-demand. When a user submits a query at a client site, the system uses a
AY
meta-dictionary to translate this query into formats understandable by the various
heterogeneous databases involved. The results from these different systems are
N
then integrated to form a global answer set.
● Challenge: This approach can be resource-intensive, as it requires real-time
LP
querying and integration across multiple systems, which can lead to complex
information filtering and competition for computational resources.
Example:
KA
N
● Suppose a company has sales data stored in a SQL database and customer
SA
Update-Driven Approach:
TE
preprocessed and loaded into the warehouse, making it available for direct
querying and analysis without the need for real-time data integration.
W
Example:
D
● A retail company may consolidate its sales data from multiple regional databases
into a central data warehouse. This centralized data can be queried for
comprehensive analysis, such as monthly sales trends across different regions.
5
Data Warehousing And Online Analytical Processing
● Explanation: OLAP systems are designed for complex queries that involve data
AK
analysis and decision-making processes. These systems are typically used in data
warehouses where historical and consolidated data is analyzed to support strategic
decisions.
AY
● Example: A financial analyst uses OLAP tools to analyze historical sales data to
forecast future sales trends.
N
4. Distinct Features: OLTP vs. OLAP
LP
● User and System Orientation:
KA
○ OLTP: Customer-oriented; focuses on managing current transactions and
ensuring data integrity.
N
○ OLAP: Market-oriented; focuses on analyzing data to derive insights and
SA
support decision-making.
● Data Contents:
○ OLTP: Contains current, detailed data used for routine operations.
BY
● View:
N
● Access Patterns:
○ OLTP: Designed for frequent updates and short, simple queries.
D
extensive computation.
D
6
Data Warehousing And Online Analytical Processing
AK
AY
N
LP
KA
N
SA
Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are
two fundamental paradigms within data management systems, each serving distinct
purposes and operational requirements. Below is a detailed explanation of their
S
Purpose:
N
● OLTP systems are designed to manage and facilitate the day-to-day operations of
W
Characteristics:
D
7
Data Warehousing And Online Analytical Processing
● Response Time: OLTP systems prioritize fast response times, often requiring
sub-second response times to ensure smooth operation, particularly in
customer-facing applications.
● Examples:
○ Banking Systems: Transactions like deposits, withdrawals, and transfers
are processed instantly, with the database reflecting changes in account
balances immediately.
○ Retail Systems: Point-of-sale systems that process sales transactions,
update inventory levels, and generate receipts in real-time.
AK
2. OLAP (Online Analytical Processing)
Purpose:
AY
● OLAP systems are designed for complex data analysis and decision-making
N
processes. These systems enable users to query and analyze large volumes of
historical and aggregated data, helping them identify trends, patterns, and insights
LP
that inform strategic business decisions.
KA
Characteristics:
N
● Data Handling: OLAP systems handle large volumes of data that have been
aggregated and summarized from various sources. The focus is on read-heavy
SA
operations, where users execute complex queries that may span large portions of
the database.
BY
● Data Structure: OLAP databases often use multidimensional models like star
schemas or snowflake schemas, where data is organized around central facts and
related dimensions. This structure is optimized for fast retrieval and analysis.
S
● Response Time: While OLAP systems may not require the sub-second response
TE
times of OLTP systems, they are designed to handle complex queries efficiently,
even those that involve large datasets.
O
● Examples:
○ Business Intelligence Tools: A company might use an OLAP system to
N
8
Data Warehousing And Online Analytical Processing
○ OLTP: Contains current, detailed data essential for running the operational
aspects of a business.
○ OLAP: Contains historical, consolidated data that supports analytical tasks
and strategic planning.
● Database Design:
○ OLTP: Employs an ER model with highly normalized tables, which helps
in maintaining data integrity and optimizing transactional operations.
○ OLAP: Utilizes a star schema or other multidimensional models, which
simplifies complex queries and enhances analytical capabilities.
● View:
AK
○ OLTP: Provides a real-time view of current data, reflecting the latest
transactional activities.
AY
○ OLAP: Offers an evolutionary, integrated view of data over time, making it
ideal for trend analysis and historical comparisons.
N
● Access Patterns:
○ OLTP: Designed for frequent updates and simple queries, ensuring the
LP
quick processing of transactions.
○ OLAP: Supports complex, read-only queries that may involve large
KA
datasets and require significant computation.
N
SA
BY
S
TE
O
N
W
D
M
D
Separating a data warehouse from operational databases is crucial for maintaining high
performance and ensuring the efficient execution of both Online Transaction Processing
9
Data Warehousing And Online Analytical Processing
(OLTP) and Online Analytical Processing (OLAP) tasks. Here’s why this separation is
necessary:
● OLTP Systems:
○ Optimization for Transactions: OLTP systems are specifically tuned for
fast transaction processing. This includes optimized access methods,
indexing, concurrency control, and recovery mechanisms that ensure quick
and reliable processing of high volumes of simple transactions, such as
AK
inserting, updating, or deleting records.
○ Real-Time Operations: These systems are designed to support real-time
operations, where the focus is on handling a large number of short, atomic
AY
transactions efficiently. Any delay or performance issue in these systems
can directly impact day-to-day business operations.
N
● Data Warehouse (OLAP Systems):
LP
○ Optimization for Analytical Queries: Data warehouses, on the other
hand, are tuned for complex OLAP queries that involve large-scale data
KA
analysis. These systems support multidimensional views of data and are
designed to handle complex aggregations, summarizations, and
consolidations of data from multiple sources.
N
○ Batch Processing and Historical Data: Unlike OLTP systems, data
SA
warehouses are optimized for batch processing and handling large datasets.
They are built to store and process historical data, which is crucial for trend
analysis, forecasting, and decision support.
BY
● Data Consolidation:
M
10
Data Warehousing And Online Analytical Processing
AK
The multidimensional data model is central to Online Analytical Processing (OLAP) and
AY
is designed to enable efficient querying and analysis of data. This model structures data in
a way that allows users to view and interact with it from multiple perspectives. The
primary components of the multidimensional model are data cubes, star schemas,
N
snowflake schemas, and fact constellation schemas.
LP
1. Data Cubes
KA
Definition: N
● A data cube is a multidimensional array of values, typically used to represent data
SA
along multiple dimensions. It allows users to view and analyze data from various
perspectives, such as by time, location, product, etc.
BY
Components:
● Dimensions: These are the perspectives or angles from which the data is analyzed.
S
● Measures: These are quantitative data points stored in the cube, such as sales
revenue, quantities sold, or profit margins.
● Cells: Each cell in the cube represents a unique combination of dimension values
O
Example:
W
● Consider a retail company that wants to analyze sales data. A data cube could have
D
dimensions such as Time (years, quarters, months), Location (regions, stores), and
M
Product (categories, individual items). Each cell in the cube might contain the total
D
sales revenue for a specific product in a specific region during a specific time
period.
Benefits:
● Speed: Data cubes allow for fast querying and retrieval of aggregated data.
● Multidimensional Analysis: Users can perform complex queries and analyze data
across multiple dimensions.
11
Data Warehousing And Online Analytical Processing
A data cube is a fundamental concept in the multidimensional data model used for Online
Analytical Processing (OLAP). It represents data in a multidimensional format, allowing
users to analyze and explore data from various perspectives efficiently. Here's a detailed
breakdown of the data cube concept:
1. Definition
AK
perform sophisticated queries and analyses. The cube structure facilitates the exploration
of data across multiple dimensions and hierarchies.
AY
2. Key Components
N
● Dimensions:
LP
○ Definition: Dimensions are perspectives or categories by which data is
analyzed. They represent different angles or attributes from which data can
KA
be viewed and queried.
○ Examples: Common dimensions include Time (year, quarter, month),
N
Location (country, city, store), and Product (category, brand, item).
● Measures:
SA
○ Definition: Measures are quantitative data points stored in the cube. They
are the numerical values that users analyze and aggregate.
BY
○ Example: In a sales data cube, a cell might contain the total sales revenue
for a specific product in a particular region during a specific month.
N
● Multidimensional Array:
○ A data cube is essentially a multidimensional array where each dimension
M
12
Data Warehousing And Online Analytical Processing
AK
4.
AY
N
LP
KA
N
SA
BY
S
TE
1. Dimensions:
N
13
Data Warehousing And Online Analytical Processing
○ Along the edges of the cube, you can see sum labels. These represent
aggregated values. For example:
■ Total sales of all products across all quarters in the USA.
■ Total sales of TVs across all quarters and countries.
○ These aggregate values help provide summarized information for quicker
analysis.
4. Highlighted Total:
○ The diagram highlights Total annual sales of TV in U.S.A., which is the
sum of all sales of TVs in all quarters in the USA. This is shown as a
specific part of the cube that has been summed over a particular axis
AK
(quarters).
5. Concepts Involved:
AY
○ Slicing: Looking at one specific slice of the cube, for example, just the
sales for TV products or just the data for the USA.
N
○ Dicing: Examining a more specific sub-cube by choosing a subset of
dimensions, such as sales of TVs in the USA for the 1st and 2nd
LP
quarters.
○ Roll-up: Summing data across a particular dimension, such as getting total
KA
sales across all products or all countries.
○ Drill-down: Breaking the aggregated data into finer levels, for example,
N
breaking down total sales by quarters.
SA
BY
S
TE
O
N
W
D
M
D
14
Data Warehousing And Online Analytical Processing
AK
AY
N
LP
KA
N
SA
BY
This table and diagram represent a 3D view of sales data for AllElectronics, according
to the three dimensions:
S
TE
The values represent dollars sold (in thousands) for each combination of these three
dimensions.
W
D
Explanation:
M
● The table shows the sales values broken down by item, time (quarters), and
D
AK
AY
N
LP
KA
N
SA
●
BY
This diagram builds on the previous data cube by introducing an additional dimension:
Supplier. It now represents a 4D data cube with the following four dimensions:
S
The measure displayed is dollars sold (in thousands), but for simplicity, only some
W
Explanation:
M
○ The diagram shows three separate cubes, one for each supplier (SUP1,
SUP2, SUP3). Each cube is similar to the 3D data cube seen before but
now represents the sales data for a particular supplier.
○ For example:
■ In the SUP1 cube, during Q1, in Chicago, sales of computers were
$825,000, and sales of security items were $400,000.
○ The other cubes (SUP2, SUP3) would contain similar data but for the
corresponding supplier.
16
Data Warehousing And Online Analytical Processing
● Purpose:
○ By adding the supplier dimension, we can analyze how different suppliers
perform across the other dimensions (time, location, and item). This allows
for a deeper understanding of supplier-specific performance.
AK
AY
N
LP
KA
N
SA
BY
S
TE
O
N
W
D
M
\
D
1. Roll-up:
17
Data Warehousing And Online Analytical Processing
● Example in the image: The cube at the top right shows a roll-up on the location
dimension from cities (Chicago, New York, Toronto) to the country level (USA,
Canada). This reduces the granularity of the data.
2. Drill-down:
AK
etc.).
AY
3. Slice:
● Definition: Cutting out a single layer of data from the cube by fixing one
N
dimension at a particular value.
● Example in the image: A slice is applied by selecting data for a specific item type
LP
(e.g., "home entertainment"), which results in a 2D matrix for location and time.
The original cube is "sliced" along the item dimension.
4. Dice:
KA
N
SA
● Definition: Selecting a subcube by specifying a range or specific values for
multiple dimensions.
● Example in the image: A dice operation is applied by selecting data for certain
BY
locations (Chicago, New York) and certain item types (home entertainment,
security), resulting in a subcube.
S
5. Pivot (Rotate):
TE
swapping dimensions.
● Example in the image: The pivot operation rotates the data cube so that different
N
dimensions (e.g., item types and location) are placed on different axes, providing a
W
Summary:
M
These operations allow for flexible and dynamic data analysis in OLAP systems.
18
Data Warehousing And Online Analytical Processing
AK
A data warehouse model defines how data is structured, stored, and accessed in a data
AY
warehouse. The modeling process involves designing an efficient and scalable
architecture to support querying, analysis, and reporting on large amounts of data. Data
warehouse modeling provides a structured way to manage complex data and is crucial for
N
data integration, storage, and retrieval.
LP
Model : In the context of data warehouses, a model is an abstract representation that
KA
defines the organization and relationships of data. A model provides a blueprint for how
data is to be arranged in the data warehouse, ensuring it is structured for easy retrieval,
querying, and analysis. The model allows for the seamless integration of data from
N
various sources and helps in decision-making.
SA
There are three primary types of models used in data warehouse design:
1. Conceptual Model
S
focuses on defining the entities and relationships that exist within the data
warehouse.
O
○ At this stage, no technical details are included, such as how the data will be
N
○ The logical model builds upon the conceptual model and specifies the
M
logical structure of the data. This includes fact tables, dimension tables,
D
19
Data Warehousing And Online Analytical Processing
AK
1. Star Schema
AY
Definition:
N
● The star schema is a type of multidimensional database schema that organizes data
into facts and dimensions. It is characterized by a central fact table surrounded by
LP
dimension tables.
KA
Components:
● Fact Table: Contains quantitative data (measures) and foreign keys to dimension
N
tables. Examples include sales figures, order quantities, or financial metrics.
SA
Structure:
S
● The fact table is at the center of the schema, and dimension tables radiate outwards
TE
Example:
N
● For a sales analysis, the fact table might include columns for sales amount,
quantity sold, and foreign keys linking to dimension tables like Time, Product, and
W
Store. Each dimension table provides descriptive attributes for each dimension.
D
Benefits:
M
D
1. Fact Table:
20
Data Warehousing And Online Analytical Processing
AK
dimension tables using foreign keys (e.g., time_key, item_key, branch_key,
location_key).
AY
○ Users can run queries like "What are the total sales for a specific product in
a particular location during a specific time period?" by joining the sales fact
N
table with relevant dimension tables.
4. Advantages:
LP
○ The structure simplifies data queries and analysis, as users can easily
retrieve relevant information by connecting the measures in the fact table
KA
with detailed attributes in the dimension tables.
○ This schema is ideal for data warehousing and reporting, enabling
N
businesses to extract insights from large datasets.
SA
BY
S
TE
O
N
W
D
M
D
21
Data Warehousing And Online Analytical Processing
AK
AY
N
LP
KA
N
SA
BY
the star schema in the diagram represents a way to organize data for efficient analysis.
The fact table in the center stores measurable data like sales (e.g., units sold, dollars
sold). Around it, there are dimension tables (e.g., time, item, branch, location) that
S
describe the context of the sales, like when and where the sales happened or which
TE
2. Snowflake Schema
N
Definition:
W
● The snowflake schema is a variation of the star schema where dimension tables are
D
Components:
● Fact Table: Similar to the star schema, it contains measures and foreign keys to
dimension tables.
● Normalized Dimension Tables: Dimension tables are decomposed into multiple
related tables to reduce redundancy and improve data integrity.
Structure:
22
Data Warehousing And Online Analytical Processing
● The snowflake schema features a more complex structure with normalized tables
connected by relationships, resembling a snowflake.
Example:
● In a snowflake schema for sales analysis, the Product dimension table might be
split into sub-tables for Product Category and Product Subcategory. The Store
dimension might be split into City and State tables.
Benefits:
AK
● Normalization: Reduces redundancy and improves data integrity by organizing
data into normalized tables.
AY
● Space Efficiency: More efficient in terms of storage compared to the star schema.
N
Drawbacks:
LP
● Complexity: The schema is more complex, which can make querying and design
more challenging.
KA
N
SA
BY
S
TE
O
N
W
D
M
D
Key Components:
AK
identified by branch_key.
○ Location: Stores street and links to City via city_key.
AY
○ City: Contains city information like province_or_street, country, etc.
3. Normalization:
N
○ In the snowflake schema, dimension tables like item and location are
normalized into sub-dimensions (supplier and city, respectively). This
LP
reduces redundancy but increases the number of joins required for
querying.
KA
This structure is efficient in terms of storage, but may result in slower query performance
due to the multiple table joins required.
N
SA
Definition:
BY
● The fact constellation schema, also known as a galaxy schema, is a more complex
schema that includes multiple fact tables sharing dimension tables. It supports
S
Components:
O
● Fact Tables: Multiple fact tables that store measures related to different processes
N
Structure:
M
Example:
● For a retail company, a fact constellation schema might include fact tables for
Sales, Inventory, and Purchasing, all linked to common dimension tables like
Time, Product, and Store.
Benefits:
24
Data Warehousing And Online Analytical Processing
Drawbacks:
AK
AY
N
LP
KA
N
SA
BY
S
TE
O
N
The image represents a Fact Constellation Schema (also known as a Galaxy Schema).
W
This schema is a collection of multiple fact tables that share common dimension tables. It
D
Key Components:
D
3. Shared Dimensions:
○ Time: Tracks time-related data (e.g., day_of_the_week, month, year),
shared by both fact tables.
○ Item: Contains item-specific details like item_name, brand, supplier_type,
shared by both fact tables.
○ Location: Represents geographical details (city, province_or_street,
country), linked to both sales and shipping processes.
4. Unique Dimensions:
○ Branch: Linked to the Sales Fact Table, contains branch-specific details
(branch_name, branch_type).
AK
○ Shipper: Linked to the Shipping Fact Table, contains details about shippers
(shipper_name, shipper_type).
AY
Key Points:
N
● Fact Constellation allows analysis of multiple business processes (e.g., sales and
LP
shipping) in the same schema.
● Shared dimensions reduce redundancy and ensure consistency across the
KA
processes.
● Multiple fact tables increase complexity but allow more comprehensive reporting
and querying.
N
SA
BY
S
TE
O
N
W
D
M
D
26
Data Warehousing And Online Analytical Processing
1. Concept of Cuboids
In the context of data warehousing and OLAP (Online Analytical Processing), a cuboid
is a sub-cube that represents data aggregated along specific dimensions. Each cuboid can
be seen as a multidimensional slice of the larger data cube, capturing data at various
levels of granularity.
2. Lattice of Cuboids
AK
The lattice of cuboids refers to the hierarchical structure formed by all possible cuboids
within a data cube. This structure is organized based on the levels of aggregation for each
AY
dimension.
N
● Granularity Levels:
○ Base Cuboid: Contains data at the finest level of granularity, with no
LP
aggregation. For example, sales data for each individual transaction.
○ Aggregated Cuboids: Represent data aggregated along different
KA
dimensions or hierarchies. For instance, total sales by month or by city.
● Lattice Structure: N
○ The lattice is hierarchical, where each level represents different degrees of
data aggregation. The base cuboid is at the bottom, while higher-level
SA
Consider a data cube with dimensions for Time (Year, Month), Location (Country, City),
S
and Product (Category, Item). The lattice of cuboids for this cube would include:
TE
Base Cuboid: Sales data for each individual combination of Time, Location, and Product
(e.g., sales for each item in each city for each month).
O
N
Example:
| Time | Location | Product | Sales |
W
|--------|-------------|---------|-------|
| Jan | New York | Widget | $1,000|
D
■ Total sales for each product item (ignoring Time and Location).
Example:
| Time | Location | Product | Sales |
|--------|-------------|---------|-------|
| Jan | All Cities | Widget | $1,500|
| Jan | All Cities | Gadget | $500 |
● Combinations of Aggregations:
○ Total sales for each combination of Year and City (e.g., total sales for each
AK
city in 2024).
○ Total sales for each combination of Product and Month (e.g., total sales for
each product in January).
AY
4. Importance and Use Cases
N
● Efficient Querying: The lattice structure allows for efficient querying and data
LP
retrieval at various levels of aggregation. Users can drill down into more detailed
data or roll up to higher-level summaries.
KA
● Data Analysis: The lattice of cuboids supports complex data analysis by
providing different views of data. Users can analyze trends, compare performance
N
across dimensions, and identify patterns.
SA
● Performance Optimization: Pre-aggregating data into cuboids can improve
query performance by reducing the amount of computation required at query time.
BY
5. Challenges
manage.
N
W
DMQL is a high-level query language designed for data mining, particularly for defining
M
To define a data cube (the fact table that stores measures), we use the following syntax:
28
Data Warehousing And Online Analytical Processing
Example:
define cube sales_cube [time, location, product]:dollars_sold, units_sold
AK
In this example:
AY
● The cube sales_cube is defined with three dimensions: time, location, and product.
● The measures tracked are dollars_sold and units_sold.
N
2. Dimension Definition (Dimension Table)
LP
To define a dimension (which is the dimensional table associated with a cube), the
following syntax is used:
KA
N
define dimension <dimension_name> as (<attribute_or_subdimension_list>)
SA
to the dimension.
Example:
define dimension time as (day, month, year)
S
TE
In this example:
● The time dimension is defined with attributes day, month, and year.
O
N
If a dimension is shared across multiple cubes, it is only defined fully the first time it is
used. When referenced by subsequent cubes, we use the shared dimension syntax:
D
M
<cube_name_first_time>
● <dimension_name_first_time>: The name of the shared dimension.
● <cube_name_first_time>: The name of the cube where the dimension was first
defined.
Example:
define dimension time as time in cube sales_cube
In this example:
29
Data Warehousing And Online Analytical Processing
● The time dimension is shared between the sales_cube and another cube, so it is
referred to from its first definition.
In a star schema, we define a central fact table that is connected to various dimension
tables. The fact table contains the quantitative data (measures), and the dimension tables
hold descriptive data that helps in slicing and dicing the measures.
Below is the breakdown of the DMQL (Data Mining Query Language) syntax to
AK
define a Star Schema:
AY
1. Cube Definition (Fact Table)
N
The central fact table is defined using the define cube statement. In this case, the cube
sales_star has four dimensions: time, item, branch, and location. The cube also defines
LP
measures that aggregate data.
KA
define cube sales_star [time, item, branch, location]:
dollars_sold = sum(sales_in_dollars),
N
avg_sales = avg(sales_in_dollars),
SA
units_sold = count(*)
Next, we define each dimension table that connects to the fact table. These dimensions
O
store the descriptive attributes related to each aspect of the sales data.
N
Dimension: Item
define dimension item as (item_key, item_name, brand, type, supplier_type)
Dimension: Branch:
30
Data Warehousing And Online Analytical Processing
Dimension: Location
define dimension location as (location_key, street, city, province_or_state, country)
AK
● Attributes: location_key, street, city, province_or_state, country.
● This dimension captures the geographical location of the branch or customer.
AY
Defining a Fact Constellation schema using DMQL (Data Mining Query Language) with
N
two cubes: sales and shipping. The Fact Constellation schema allows for multiple fact
tables (cubes) that share dimensions, enabling complex queries and analysis. Here’s a
LP
structured representation of your schema:
Fact Constellation Schema
KA
define cube sales [time, item, branch, location]: dollars_sold = sum(sales_in_dollars)
avg_sales = avg(sales_in_dollars) units_sold = count(*)
N
SA
BY
S
TE
O
N
W
D
M
D
31