Schemas For Multidimensional Databases
Schemas For Multidimensional Databases
Schemas For Multidimensional Databases
STAR SCHEMA:
A star schema classifies the attributes of an event into facts(measured numeric/time data), and descriptive dimension attributes (product id, customer name, sale date) that give the facts a context. A fact record is the nexus between the specific dimension values and the recorded facts. The Facts are grouped together by grain (level of detail) and stored in the fact table. Dimension attributes are organized into affinity groups and stored a minimal number of dimension tables. A weather star schema that records weather data may have facts of temp, barometric pressure, wind speed, precipitation, cloud cover,etc and dimensions of location, date/time, reporter, etc. Star schemas are designed to optimize user ease-of-use and retrieval performance by minimizing the number of tables to join to materialize a transaction. A star schema is called such as it resembles a constellation of stars, generally several bright stars (facts) surrounded by dimmer ones (dimensions).
The fact table holds the metric values recorded for a specific event. Because of the desire to hold atomic level data, there generally are a very large number of records(billions). Special care is taken to minimize the number and size of attributes in order to constrain the overall table size and maintain performance. Fact tables generally come in 3 flavors transaction (facts about a specific event eg Sale), snapshot (facts recorded at a point in time (eg Account details at month end ), and accumulating snapshot tables (eg month-to-date sales for a product). Dimension tables, usually have few records compared to fact tables, but may have a very large number of attributes that describe the fact data.
Star Schema DMQL define cube sales_star [dim_date, dim_product, dim_store]: define dimension dim_store as (id, store_number, state_province, country) define dimension dim_product as (id, EAN_code, Product_Name, Brand, Product_category) define dimension Dim_Date as (id, Date, Day, Day_of_Week, Month, Month_Name, Quarter, Quarter_Name, Year)
Benefits
The primary benefit of star schema is its simplicity for users to write, and databases to process: queries are written with simple inner joins between the facts and a small number of dimensions. Star joins are simpler than possible in snowflake schema. Where conditions need only to filter on the attributes desired, and aggregations are fast.
The star schema is a way to implement multidimensional database (MDDB) functionality using a mainstream relational database: given most organizations' commitment to relational databases, a specialized multidimensional DBMS is likely to be both expensive and inconvenient.
SNOWFLAKES
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema's dimensions are normalized with each dimension represented by a single table. A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables ("forks in the road"). The "snowflaking" effect only affects the dimension tables and NOT the fact tables. Star and snowflake schemas are most commonly found in dimensional data warehouses and data marts where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schemas are not normalized much, and are frequently designed at a level of normalization short of third normal form. Deciding whether to employ a star schema or a snowflake schema should involve considering the relative strengths of the database platform in question and the query tool to be employed. Star schemas should be favored with query tools that largely expose users to the underlying table structures, and in environments where most queries are simpler in nature. Snowflake schemas are often better with more sophisticated query tools that create a layer of abstraction between the users and raw table structures for environments having numerous queries with complex criteria. Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into new tables. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes. From a space storage point of view, the dimensional tables are typically small compared to the fact tables. This often removes the storage space benefit of snowflaking the dimension tables, as compared with a star schema.
Snowflake DMQL define cube sales_snowflake [dim_date, dim_product, dim_store]: define dimension dim_store as (id, store_number, Geography_Id(id, state_province,country)) define dimension dim_product as (id, EAN_code, Product_Name, Brand_id(id, Brand), Product_category(id, product_category))
FACT Constellation
FACT Constellation Schema is describes a logical database structure of Data Warehouse or Data Mart. FACT Constellation Schema can design with collection of de-normalized FACT, Shared and Conformed Dimension tables. FACT Constellation Schema is an extended and decomposed STAR schema .FACT Constellation Schema is complicated database design that is difficult to summarize data. FACT Constellation Schema can implement between Aggregate FACT tables or elsewhere to decompose a complex FACT table into independent simplex FACT tables. Sophisticated applications may require multiple fact tables to share dimension tables. This kind of
schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation.
Fact constellation DMQL de_ne cube sales [Dim_date, Dim_product, Dim_Store]: dollars sold = sum(sales in dollars), units sold = count(*) define dimension Dim_Date as (id, Date, Day, Day_of_Week, Month, Month_Name, Quarter, Quarter_Name, Year) define dimension dim_store as (id, store_number, Geography_Id(id, state_province,country)) define dimension dim_product as (id, EAN_code, Product_Name, Brand_id(id, Brand), Product_category(id, product_category)) define cube Transport [time_key, item_key, transport_key, from location, to location]: dollars cost = sum(cost in dollars), units shipped = count(*) define dimension Dim_Tport as (transport_key, location_key, Transport_name, Store_number)