0% found this document useful (0 votes)
16 views39 pages

Data Analysis in Data Warehouses Group 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views39 pages

Data Analysis in Data Warehouses Group 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Data Analysis in Data

Warehouses
GROUP 6
members

ABOUT US
HOÀNG MẠNH QUÂN

BÙI THẾ THUẬT

NGUYỄN THỊNH THUẬN

ĐINH THẾ VƯƠNG

LẠI ĐỨC THẮNG

BUASY SYDAVONG
MDX & DAX

MDX DAX
MDX (MultiDimensional eXpressions) Microsoft introduced in 2012 the tabular
was defined by Microsoft in 1997 and was model and its associated language DAX,
soon adopted by many OLAP tool which have become widely popular since
providers, becoming a de facto standard. then. from users’ perspective, the
Despite of the success of Analysis Services underlying concepts
and MDX, many users claimed that of the tabular model are simpler than those
multidimensional cubes were hard to of the multidimensional model both when
understand and manipulate. designing models and when using them for
analysis and reporting
MDX
(MultiDimensional
eXpressions)
MDX
Tuples and Sets

 A tuple is defined by stating one member from one or


several dimensions of the cube.
 Example:
(Product.Category.Seafood, Date.Quarter.Q1, Customer.City.Paris)

(Customer.City.Paris)

(Product.Category.Seafood, Customer.City.Paris)
MDX
Tuples and Sets

 A set is a collection of tuples defined using the same


dimensions
 Example:

(Product.Category.Seafood, Date.Quarter.Q1, Customer.City.Paris),

(Product.Category.Seafood, Date.Quarter.Q2, Customer.City.Paris)

}
MDX
Basic Queries
 The syntax of a typical MDX query is as follows:

SELECT <axis specification>


FROM <cube>
[WHERE <slicer specification>]

 The first axes have predefined names, namely,


COLUMNS, ROWS, PAGES, CHAPTERS, and SECTIONS.

 Example:

SELECT [Measures].MEMBERS ON COLUMNS,


[Customer].[City].MEMBERS ON ROWS
FROM Sales
MDX
SLICING

 If a dimension appears in a slicer, it cannot be used


To restrict
The the result
query shows allinto Belgium,
measures
any inwe
axis by can
year:
the write:
SELECT clause

SELECT Measures.MEMBERS ON COLUMNS,

[Order Date].Year.MEMBERS ON ROWS

FROM Sales
((Customer.Country.Belgium,
WHERE (Customer.Country.Belgium)
{Customer.Country.Belgium,
Product.Categories.Beverages)
Customer.Country.France},
Product.Categories.Beverages)
MDX
navigation

Used to navigate and access elements within a multidimensional data cube.


It allows users to filter, sort, and select data flexibly from the dimensions and members of the cube.

Drilling Down with Children

SELECT [Order Date].Year.MEMBERS ON COLUMNS, SELECT [Order Date].Year.MEMBERS ON COLUMNS,


Customer.[Company Name].MEMBERS ON ROWS Customer.[Company Name].CHILDREN ON ROWS
FROM Sales FROM Sales
WHERE Measures.[Sales Amount] WHERE Measures.[Sales Amount]

 Show the All member  Not show the All member


MDX
navigation

Drilling Down with Descendants

SELECT [Order Date].Year.MEMBERS ON COLUMNS, SELECT [Order Date].Year.MEMBERS ON COLUMNS,

Customer.[Company Name].CHILDREN ON ROWS Customer.[Company Name].DESCENDANTS ON ROWS

FROM Sales FROM Sales

WHERE Measures.[Sales Amount] WHERE Measures.[Sales Amount]

 Only returns the immediate child members of a specific  Returns all child members of a specific parent member in a
parent member in a dimension, excluding the dimension, including the children of children and so on down
grandchildren and deeper levels of the hierarchy. the hierarchy.
MDX
CROSS JOIN
Purpose: combine multiple dimensions into a single axis, often used to create matrix
representations.
Cross join
SELECT
SELECTProduct.Category.MEMBERS
Product.Category.MEMBERSON
ONCOLUMNS,
COLUMNS, operator
CROSSJOIN(Customer.Country.MEMBERS,
Customer.Country.MEMBERS *

[Order Date].Calendar.Quarter.MEMBERS
[Order Date].Calendar.Quarter.MEMBERS)
ON ROWS ON ROWS

FROM
FROMSales
Sales

WHERE
WHEREMeasures.[Sales
Measures.[SalesAmount]
Amount]
MDX
CROSS JOIN
More than two cross joins can be applied:

SELECT Product.Category.MEMBERS ON COLUMNS,

Customer.Country.MEMBERS * [Order Date].Calendar.Quarter.MEMBERS *

Shipper.[Company Name].MEMBERS ON ROWS

FROM Sales

WHERE Measures.[Sales Amount]


MDX
SUBQUERIES
Purpose: Allow data filtering before using it in the main query

WHERE clause SUBquery

SELECT Measures.[Sales Amount] ON COLUMNS, SELECT Measures.[Sales Amount] ON COLUMNS,

[Order Date].Calendar.Quarter.MEMBERS ON ROWS [Order Date].Calendar.Quarter.MEMBERS ON ROWS

FROM Sales FROM ( SELECT { Product.Category.Beverages,

WHERE { Product.Category.Beverages, Product.Category.Condiments } ON COLUMNS

Product.Category.Condiments } FROM Sales)

 The WHERE clause does not allow the use of filtered  Subquery allows the use of filtered dimensions on the
dimensions on the axes. axes of the main query.
Calculated members and

MDX
named sets
 Calculated members are new members in a dimension or new measures that are
defined using the WITH clause in front of the SELECT statement:

WITH MEMBER Parent.MemberName AS <expression>

 Similarly, named sets are used to define new sets as follows:

WITH SET SetName AS <expression>


Calculated members and

MDX
named sets
 For example, a measure that calculates the percentage profit of sales:

WITH MEMBER Measures.[Profit%] AS (Measures.[Sales Amount] -

Measures.[Freight]) / (Measures.[Sales Amount]),

FORMAT_STRING = '#0.00%'

SELECT {[Sales Amount], Freight, [Profit%]} ON COLUMNS,

Customer.Country ON ROWS

FROM Sales
Time-related calculations

MDX
 PARALLELPERIOD function: Compares the value of a member with the corresponding period of
the previous period (e.g., same quarter of the previous year).

WITH MEMBER Measures.[Previous Year] AS (Measures.[Net Amount],


PARALLELPERIOD([Order Date].Calendar.Quarter, 4)),
FORMAT_STRING = '$###,##0.00’
MEMBER Measures.[Net Amount Growth] AS
Measures.[Net Amount] - Measures.[Previous Year],
FORMAT_STRING = '$###,##0.00; $-###,##0.00'
SELECT {[Net Amount], [Previous Year], [Net Amount Growth]} ON COLUMNS,
[Order Date].Calendar.Quarter ON ROWS
FROM Sales
MDX
Filter
 Example:
 A common form of filtering that we often encounter is when the results of a column
with no SELECT
values are eliminated using the NON EMPTY clause.
Product.Category.MEMBERS ON COLUMNS,
 In MDX, the Filter function has the following structure:
FILTER(Customer.City.MEMBERS, (Measures.[Sales Amount],
FILTER (Set_Expression, Logical_Expression)
[Order Date].Calendar.[2017]) > 25000) ON ROWS

FROM Sales

WHERE (Measures.[Sales Amount], [Order Date].Calendar.[2017])


MDX
Sorting

 Sorting
Example,istoan function
sort the querythat users
above often
based perform
on the on setswe
sales amount, tocan
arrange elements
proceed in a
as follows:

specific order.
SELECT Measures.MEMBERS ON COLUMNS,
 MDX provides a function for sorting with the following syntax:
ORDER(Customer.Geography.Country.MEMBERS,
ORDER (Set, Expression [, ASC | DESC | BASC | BDESC])
Measures.[Sales Amount], BDESC) ON ROWS

FROM Sales
Top and down

MDX
analysis
 Top and Bottom Analysis is a special case of sorting where users are interested in only a
small number of elements with the highest or lowest values in a set.
TOPCOUNT(Set, Count, Expression)
BOTTOMCOUNT(Set, Count, Expression)
 Example:

SELECT Measures.MEMBERS ON COLUMNS,


TOPCOUNT(Customer.Geography.City.MEMBERS, 3,
Measures.[Sales Amount]) ON ROWS
FROM Sales
Aggregation

MDX
Functions
 Some aggregation functions in MDX:
 SUM: Calculates the sum of the values.
 AVG: Calculates the average of the values.
 MEDIAN: Calculates the median of the values.
 MAX: Finds the maximum value.
 MIN: Finds the minimum value.
 VAR: Calculates the variance of the values.
 STDDEV: Calculates the standard deviation of the values.
 ITEM: Retrieves the first member from the specified tuple
 NAME: Returns the name of the specified member.
DAX
(Data Analysis ,,,

,,, Expressions)
DAX
Expressions

 DAX supports the following data types: integer numbers, floating-point numbers,
currency, datetimes, Boolean, string, and binary large object
 Functions are used to perform calculations on a data model.
 DAX provides several types of operators, namely, arithmetic (+, -, *, /) comparison (=,
<>, >, etc.), text concatenation (&), and logical operators (&& and ||, corresponding to
logical and and logical or)
 Expressions are constructed with elements of a data model (such as tables, columns, or
measures), functions, operators, or constants.
 Measures are used to aggregate values from multiple rows in a table
DAX
Expressions

 Calculated columns are derived by an expression and can be used like any other
column.
Example: Employee[Age] = INT(YEARFRAC(Employee[BirthDate], TODAY()))
 Variables can be used to avoid repeating the same subexpression
Example:
Customer[Class] =

VAR TotalSales = SUM( Sales[SalesAmount])

RETURN

SWITCH( TRUE, TotalSales > 1000, "A", TotalSales > 100, "B", "C" )
Evaluation

DAX
Context
 A DAX expression is evaluated inside a context, which is the environment under

which the formula is evaluated.

 The filter context is the set of filters that identifies the active rows in a table, while

the row context is the single row that is active in a table for evaluating column

references.
Evaluation

DAX
Context
 Filter context:

 Row context:

Sales[TotalSalesValue] = Sales[Quantity] * Sales[UnitPrice]


DAX
Queries
 A query is an expression that returns a table.
 Example:
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
MEASURE Sales[Min Amount] = MIN( [SalesAmount] )
MEASURE Sales[Max Amount] = MAX( [SalesAmount] )
EVALUATE
SUMMARIZECOLUMNS(Customer[Country],
Product[CategoryName], "Sales Amount", [Sales Amount], "Min
Amount", [Min Amount], "Max Amount", [Max Amount] )
ORDER BY [Country]
DAX
Filtering
 Filtering is an essential operation in data analysis, and this is especially the case in data
warehouses, where we need to obtain insight from huge volumes of data
 Example:
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
EVALUATE
SUMMARIZECOLUMNS(
Product[CategoryName], 'Date'[Year], 'Date'[Quarter],
"Sales Amount", [Sales Amount] )
ORDER BY [CategoryName], [Year], [Quarter]
DAX
Filtering
 Filtering is an essential operation in data analysis, and this is especially the case in data
warehouses, where we need to obtain insight from huge volumes of data
 Example:
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
EVALUATE
SUMMARIZECOLUMNS(
Product[CategoryName], 'Date'[Year], 'Date'[Quarter],
FILTER( Customer, Customer[Continent] = "Europe" ),
"Sales Amount", [Sales Amount] )
ORDER BY [CategoryName], [Year], [Quarter]
Hierarchy

DAX
Handling
 DAX does not currently support parent-child hierarchies.

DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
MEASURE Sales[Sales Amount %] = SUM( [SalesAmount] ) /
CALCULATE( SUM( Sales[SalesAmount] ), ALL( Customer[City] ) )
EVALUATE
SUMMARIZECOLUMNS( Customer[City], Customer[Country],
"Sales Amount", [Sales Amount], "Sales Amount %",
FORMAT( [Sales Amount %], "Percent" ) )
ORDER BY [Country], [City]
Hierarchy

DAX
Handling
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
MEASURE Sales[Sales Amount %] = SUM( [SalesAmount] ) /
CALCULATE( SUM( Sales[SalesAmount] ),
ALLSELECTED(Customer[City]))
EVALUATE
SUMMARIZECOLUMNS( Customer[City], Customer[Country],
FILTER( Customer, Customer[Continent] = "Europe" ),
"Sales Amount", [Sales Amount], "Sales Amount %",
FORMAT( [Sales Amount %], "Percent" ) )
ORDER BY [Country], [City]
Time-Related

DAX
Calculations
 DAX provides a set of functions, referred to as time intelligence functions, that enable
time-related calculations such as year to date, same period last year, period growth, etc

 PARALLELPERIOD: useful when you want to compare data within the same time
period but at different points in time.
 STARTOFQUARTER: return the begining date of the quarter.
 ENDOFMONTH: return the end date of the month
 DATESBETWEEN: returns the dates between two given dates
 TOTALYTD/TOTALQTD/TOTALMTD: calculate the total of a expression from the
beginning of the year, quarter, or month up to the current point in time.
Top and Bottom

DAX
Analysis
 TOPN function, which returns a given number of top rows according to a specified
expression
 Example:
DEFINE
MEASURE Sales[Sales Amount] = SUM( [SalesAmount] )
EVALUATE
TOPN( 3, SUMMARIZECOLUMNS( Customer[City], "Sales
Amount", [Sales Amount] ), [Sales Amount], DESC )
ORDER BY [Sales Amount] DESC
 The RANKX function is used for ranking purposes.
Table

DAX
Operations
 The function TREATAS modifies the data lineage and makes customer countries
behave as supplier countries when calculating the measure.
 Example:

DEFINE
MEASURE Sales[Nb Customers] = COUNT(Customer[CompanyName])
MEASURE Sales[Nb Suppliers] = CALCULATE(COUNT(Supplier[CompanyName]),
TREATAS (VALUES( Customer[Country] ), Supplier[Country] ))
EVALUATE
SUMMARIZECOLUMNS(Customer[Country], "Nb Customers",
[Nb Customers], "Nb Suppliers", [Nb Suppliers] )
ORDER BY [Country]
Table

DAX
Operations
 DAX provides the following functions for performing various kinds of joins:
NATURALINNERJOIN, NATURALLEFTOUTERJOIN, and CROSSJOIN
 Example:

DEFINE
MEASURE Sales[Sales Amount] = SUM (Sales[SalesAmount])
EVALUATE
ADDCOLUMNS(
CROSSJOIN( VALUES(Customer[Country]), VALUES(Supplier[Country])),
"Sales Amount", [Sales Amount] )
ORDER BY [Country], [Country]
KEY PERFORMANCE
Concept
INDICATORS
Purpose Application

KPIs are complex KPIs are delivering a global KPIs are often used in
measurements used to overview of the company dashboards and reports to
estimate the effectiveness of status track detailed information of
an organization in carrying specific fields.
out their activities, and to
monitor the performance
of their processes and
business strategies.
KPIs
Defining Key Performance Indicators
In order to define an appropriate set of indicators for an organization, we need to identify the sources from which
we can obtain relevant information.

Primary sources Secondary sources External sources


 Front-line employees  Strategic development plan  Social media and discussion
 Managers  Annual business/strategic plan forums
 Board  Annual reports  Expert advice
 Suppliers and customers  Internal operational reports  Information about related
 Competitor review reports organizations and competitors
define the indicators
When the sources have been identified, we can follow the steps below in order to define the indicators for
the problem at hand:

DATA
6. Set targets 1. Assemble a team
This is a crucial step since it is one of the biggest
Formulate a team, to ensure effective communication
challenges in KPI definition. For this, historical
and collaboration. It's essential to include individuals
information can be used as a guide against
with diverse expertise and perspectives to ensure
which the core team can look at industry
comprehensive coverage of the problem domain.
benchmarks, and economic conditions.

5. Perform a final filter 2. Categorize potential metrics


This consists in checking if the metric definition To assess the business from many different
is unambiguous and clear to people not on the
core team, if we have credible data to compute
the metric, and making sure that achieving the
KPIs perspectives. For example, we may want to
define metrics that capture how the
organization is performing from a financial
metrics will lead to achieving our goals. perspective, from a customer’s perspective, and
with respect to employee’s expectations

4. Prioritize the initially defined metrics 3. Brainstorm


- Give its precise definition Brainstorm possible metrics to discuss many
- Define if the indicator is leading or lagging possible measures before deciding the final set.
- Assess the impact.
- Verify alignment with business processes.
- Ensure sufficient quantity of indicators
THANK
YOU

You might also like