Data Analysis in Data Warehouses Group 6
Data Analysis in Data Warehouses Group 6
Warehouses
GROUP 6
members
ABOUT US
HOÀNG MẠNH QUÂN
BUASY SYDAVONG
MDX & DAX
MDX DAX
MDX (MultiDimensional eXpressions) Microsoft introduced in 2012 the tabular
was defined by Microsoft in 1997 and was model and its associated language DAX,
soon adopted by many OLAP tool which have become widely popular since
providers, becoming a de facto standard. then. from users’ perspective, the
Despite of the success of Analysis Services underlying concepts
and MDX, many users claimed that of the tabular model are simpler than those
multidimensional cubes were hard to of the multidimensional model both when
understand and manipulate. designing models and when using them for
analysis and reporting
MDX
(MultiDimensional
eXpressions)
MDX
Tuples and Sets
(Customer.City.Paris)
(Product.Category.Seafood, Customer.City.Paris)
MDX
Tuples and Sets
}
MDX
Basic Queries
The syntax of a typical MDX query is as follows:
Example:
FROM Sales
((Customer.Country.Belgium,
WHERE (Customer.Country.Belgium)
{Customer.Country.Belgium,
Product.Categories.Beverages)
Customer.Country.France},
Product.Categories.Beverages)
MDX
navigation
Only returns the immediate child members of a specific Returns all child members of a specific parent member in a
parent member in a dimension, excluding the dimension, including the children of children and so on down
grandchildren and deeper levels of the hierarchy. the hierarchy.
MDX
CROSS JOIN
Purpose: combine multiple dimensions into a single axis, often used to create matrix
representations.
Cross join
SELECT
SELECTProduct.Category.MEMBERS
Product.Category.MEMBERSON
ONCOLUMNS,
COLUMNS, operator
CROSSJOIN(Customer.Country.MEMBERS,
Customer.Country.MEMBERS *
[Order Date].Calendar.Quarter.MEMBERS
[Order Date].Calendar.Quarter.MEMBERS)
ON ROWS ON ROWS
FROM
FROMSales
Sales
WHERE
WHEREMeasures.[Sales
Measures.[SalesAmount]
Amount]
MDX
CROSS JOIN
More than two cross joins can be applied:
FROM Sales
The WHERE clause does not allow the use of filtered Subquery allows the use of filtered dimensions on the
dimensions on the axes. axes of the main query.
Calculated members and
MDX
named sets
Calculated members are new members in a dimension or new measures that are
defined using the WITH clause in front of the SELECT statement:
MDX
named sets
For example, a measure that calculates the percentage profit of sales:
FORMAT_STRING = '#0.00%'
Customer.Country ON ROWS
FROM Sales
Time-related calculations
MDX
PARALLELPERIOD function: Compares the value of a member with the corresponding period of
the previous period (e.g., same quarter of the previous year).
FROM Sales
specific order.
SELECT Measures.MEMBERS ON COLUMNS,
MDX provides a function for sorting with the following syntax:
ORDER(Customer.Geography.Country.MEMBERS,
ORDER (Set, Expression [, ASC | DESC | BASC | BDESC])
Measures.[Sales Amount], BDESC) ON ROWS
FROM Sales
Top and down
MDX
analysis
Top and Bottom Analysis is a special case of sorting where users are interested in only a
small number of elements with the highest or lowest values in a set.
TOPCOUNT(Set, Count, Expression)
BOTTOMCOUNT(Set, Count, Expression)
Example:
MDX
Functions
Some aggregation functions in MDX:
SUM: Calculates the sum of the values.
AVG: Calculates the average of the values.
MEDIAN: Calculates the median of the values.
MAX: Finds the maximum value.
MIN: Finds the minimum value.
VAR: Calculates the variance of the values.
STDDEV: Calculates the standard deviation of the values.
ITEM: Retrieves the first member from the specified tuple
NAME: Returns the name of the specified member.
DAX
(Data Analysis ,,,
,,, Expressions)
DAX
Expressions
DAX supports the following data types: integer numbers, floating-point numbers,
currency, datetimes, Boolean, string, and binary large object
Functions are used to perform calculations on a data model.
DAX provides several types of operators, namely, arithmetic (+, -, *, /) comparison (=,
<>, >, etc.), text concatenation (&), and logical operators (&& and ||, corresponding to
logical and and logical or)
Expressions are constructed with elements of a data model (such as tables, columns, or
measures), functions, operators, or constants.
Measures are used to aggregate values from multiple rows in a table
DAX
Expressions
Calculated columns are derived by an expression and can be used like any other
column.
Example: Employee[Age] = INT(YEARFRAC(Employee[BirthDate], TODAY()))
Variables can be used to avoid repeating the same subexpression
Example:
Customer[Class] =
RETURN
SWITCH( TRUE, TotalSales > 1000, "A", TotalSales > 100, "B", "C" )
Evaluation
DAX
Context
A DAX expression is evaluated inside a context, which is the environment under
The filter context is the set of filters that identifies the active rows in a table, while
the row context is the single row that is active in a table for evaluating column
references.
Evaluation
DAX
Context
Filter context:
Row context:
DAX
Handling
DAX does not currently support parent-child hierarchies.
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
MEASURE Sales[Sales Amount %] = SUM( [SalesAmount] ) /
CALCULATE( SUM( Sales[SalesAmount] ), ALL( Customer[City] ) )
EVALUATE
SUMMARIZECOLUMNS( Customer[City], Customer[Country],
"Sales Amount", [Sales Amount], "Sales Amount %",
FORMAT( [Sales Amount %], "Percent" ) )
ORDER BY [Country], [City]
Hierarchy
DAX
Handling
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
MEASURE Sales[Sales Amount %] = SUM( [SalesAmount] ) /
CALCULATE( SUM( Sales[SalesAmount] ),
ALLSELECTED(Customer[City]))
EVALUATE
SUMMARIZECOLUMNS( Customer[City], Customer[Country],
FILTER( Customer, Customer[Continent] = "Europe" ),
"Sales Amount", [Sales Amount], "Sales Amount %",
FORMAT( [Sales Amount %], "Percent" ) )
ORDER BY [Country], [City]
Time-Related
DAX
Calculations
DAX provides a set of functions, referred to as time intelligence functions, that enable
time-related calculations such as year to date, same period last year, period growth, etc
PARALLELPERIOD: useful when you want to compare data within the same time
period but at different points in time.
STARTOFQUARTER: return the begining date of the quarter.
ENDOFMONTH: return the end date of the month
DATESBETWEEN: returns the dates between two given dates
TOTALYTD/TOTALQTD/TOTALMTD: calculate the total of a expression from the
beginning of the year, quarter, or month up to the current point in time.
Top and Bottom
DAX
Analysis
TOPN function, which returns a given number of top rows according to a specified
expression
Example:
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
EVALUATE
TOPN( 3, SUMMARIZECOLUMNS( Customer[City], "Sales
Amount", [Sales Amount] ), [Sales Amount], DESC )
ORDER BY [Sales Amount] DESC
The RANKX function is used for ranking purposes.
Table
DAX
Operations
The function TREATAS modifies the data lineage and makes customer countries
behave as supplier countries when calculating the measure.
Example:
DEFINE
MEASURE Sales[Nb Customers] = COUNT(Customer[CompanyName])
MEASURE Sales[Nb Suppliers] = CALCULATE(COUNT(Supplier[CompanyName]),
TREATAS (VALUES( Customer[Country] ), Supplier[Country] ))
EVALUATE
SUMMARIZECOLUMNS(Customer[Country], "Nb Customers",
[Nb Customers], "Nb Suppliers", [Nb Suppliers] )
ORDER BY [Country]
Table
DAX
Operations
DAX provides the following functions for performing various kinds of joins:
NATURALINNERJOIN, NATURALLEFTOUTERJOIN, and CROSSJOIN
Example:
DEFINE
MEASURE Sales[Sales Amount] = SUM (Sales[SalesAmount])
EVALUATE
ADDCOLUMNS(
CROSSJOIN( VALUES(Customer[Country]), VALUES(Supplier[Country])),
"Sales Amount", [Sales Amount] )
ORDER BY [Country], [Country]
KEY PERFORMANCE
Concept
INDICATORS
Purpose Application
KPIs are complex KPIs are delivering a global KPIs are often used in
measurements used to overview of the company dashboards and reports to
estimate the effectiveness of status track detailed information of
an organization in carrying specific fields.
out their activities, and to
monitor the performance
of their processes and
business strategies.
KPIs
Defining Key Performance Indicators
In order to define an appropriate set of indicators for an organization, we need to identify the sources from which
we can obtain relevant information.
DATA
6. Set targets 1. Assemble a team
This is a crucial step since it is one of the biggest
Formulate a team, to ensure effective communication
challenges in KPI definition. For this, historical
and collaboration. It's essential to include individuals
information can be used as a guide against
with diverse expertise and perspectives to ensure
which the core team can look at industry
comprehensive coverage of the problem domain.
benchmarks, and economic conditions.