Cube Implementations
Cube Implementations
pm jat @ daiict
21-09-2023
Figure Source: Data Mining Textbook [9] Data Cube Implementation 3
Data Cube as “lattice of Cuboids”
• Data Cube is represented as a “lattice of Cuboids”, where
• Node: Cuboid
– that contains “aggregated values” (called as measures) for “a
dimension(attribute) combination”
• Edge: Parent-Child Relationship, where
– child node has one (exactly one) extra attribute (dimension) than its parent
– child node cuboid can be computed from the parent cuboid aggregations
• Data cube is a set of “cuboids” for “each possible subset of the given dimensions”.
Correct?
21-09-2023
Figure Source: Data Mining Textbook [9] Data Cube Implementation 5
Querying Lattice
• Each node is
a view.
• “dimension
hierarchies”
• Rollup, and
Drill down?
21-09-2023
Figure Source: Data Mining Textbook[9] Data Cube Implementation 6
“Cube” “location” “Data Hierarchy”
Lattices can even be maintained
for different specific attributes –
making the cube further lower
granularity
Questions Remain: How efficient it would be? Can the resulting cube be materialized?
If yes, where do we store it? Can we
21-09-2023 Datahave an index on dimensions?
Cube Implementation 9
A Simple Spark solution for Lattice Computation
Identify:
• Base Cuboid? what is the dimension?
• Identify 2-D, 1-D, and 0-D cuboids?
https://kylin.apache.org/docs/index.html
21-09-2023 Data Cube Implementation 21
Apache Kylin Rationale
• Kylin’s core idea is the precomputation of result sets
• It calculates all possible query results in advance according to the specified
dimensions and indicators and speed up OLAP queries with fixed query patterns. .
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
21-09-2023 Data Cube Implementation 22
Apache Kylin Rationale
• Kylin’s core idea is the precomputation of result sets
• It calculates all possible query results in advance according to the specified
dimensions and indicators and speed up OLAP queries with fixed query patterns. .
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
21-09-2023 Data Cube Implementation 23
Apache Kylin - Architecture