0% found this document useful (0 votes)
66 views59 pages

Unit-Iv XML and Datawarehouse

XML and Datawarehouse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views59 pages

Unit-Iv XML and Datawarehouse

XML and Datawarehouse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

UNIT IV XML AND DATAWAREHOUSE

XML Database: XML – XML Schema – XML DOM and SAX Parsers – XSL – XSLT – XPath
and XQuery – Data Warehouse: Introduction – Multidimensional Data Modeling – Star and
SnowflakeSchema – Architecture – OLAP Operations and quries

XML DATABASE:XML-XML SCHEMA

 XML stands for extensible markup language.


 A markup language is a set of codes, or tags, that describes the text in a digital
document. ...
 XML, a more flexible cousin of HTML, makes it possible to conduct complex business
over the Internet.

XML – DATABASES

 XML Database is used to store huge amount of information in the XML format.

 As the use of XML is increasing in every field, it is required to have a secured place to
store the XML documents.

 The data stored in the database can be queried using XQuery, serialized, and exported
into a desired format.

XML Database Types

There are two major types of XML databases −

 XML- enabled
 Native XML (NXD)

XML - ENABLED DATABASE

 XML enabled database is nothing but the extension provided for the conversion of XML
document.

 This is a relational database, where data is stored in tables consisting of rows and
columns.

 The tables contain set of records, which in turn consist of fields.

NATIVE XML DATABASE

 Native XML database is based on the container rather than table format.
 It can store large amount of XML document and data.
 Native XML database is queried by the XPath-expressions.
 Native XML database has an advantage over the XML-enabled database.
 It is highly capable to store, query and maintain the XML document than XML-enabled
database.

<?xml version = "1.0"?>


<contact-info>
<contact1>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact1>

<contact2>
<name>Manisha Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 789-4567</phone>
</contact2>
</contact-info>

XML SCHEMA

 XML Schema is commonly known as XML Schema Definition (XSD).


 It is used to describe and validate the structure and the content of XML data.
 XML schema defines the elements, attributes and data types. Schema element supports
Namespaces.
 It is similar to a database schema that describes the data in a database.

SYNTAX

You need to declare a schema in your XML document as follows −

EXAMPLE

The following example shows how to use schema −


<?xml version = "1.0" encoding = "UTF-8"?>
<xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema">
<xs:element name = "contact">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML Schemas is that they describe the legitimate format that an XML
document can take.

ELEMENTS

 An element can be defined within an XSD as follows −


<xs:element name = "x" type = "y"/>

DEFINITION TYPES

You can define XML schema elements in the following ways −

SIMPLE TYPE

 Simple type element is used only in the context of the text.


 Some of the predefined simple types are: xs:integer, xs:boolean, xs:string, xs:date. For
example −
<xs:element name = "phone_number" type = "xs:int" />

COMPLEX TYPE

 A complex type is a container for other element definitions.


 This allows you to specify which child elements an element can contain and to provide
some structure within your XML documents.
 For example −
<xs:element name = "Address">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
<xs:element name = "phone" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
 In the above example, Address element consists of child elements.
 This is a container for other <xs:element> definitions, that allows to build a simple
hierarchy of elements in the XML document.

GLOBAL TYPES

 With the global type, you can define a single type in your document, which can be used
by all other references.
 For example, suppose you want to generalize the person and company for different
addresses of the company. In such case, you can define a general type as follows −
<xs:element name = "AddressType">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string" />
<xs:element name = "company" type = "xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Now let us use this type in our example as follows −
<xs:element name = "Address1">
<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone1" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name = "Address2">


<xs:complexType>
<xs:sequence>
<xs:element name = "address" type = "AddressType" />
<xs:element name = "phone2" type = "xs:int" />
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice (once for Address1 and once
for Address2), we now have a single definition. This makes maintenance simpler, i.e., if you
decide to add "Postcode" elements to the address, you need to add them at just one place.

ATTRIBUTES

 Attributes in XSD provide extra information within an element. Attributes


have name and type property as shown below −
<xs:attribute name = "x" type = "y"/>

XML DOM AND SAX PARSERS

 The Document Object Model (DOM) is a W3C standard.


 It defines a standard for accessing documents like HTML and XML.
 The Document Object Model (DOM) is an application programming interface (API) for
HTML and XML documents.
 It defines the logical structure of documents and the way a document is accessed and
manipulated.
DOM defines the objects and properties and methods (interface) to access all XML elements. It is
separated into 3 different parts / levels −
 Core DOM − standard model for any structured document
 XML DOM − standard model for XML documents
 HTML DOM − standard model for HTML documents
 XML DOM is a standard object model for XML. XML documents have a
hierarchy of informational units called nodes;
 DOM is a standard programming interface of describing those nodes and the
relationships between them.
 As XML DOM also provides an API that allows a developer to add, edit, move or
remove nodes at any point on the tree in order to create an application.
 Following is the diagram for the DOM structure. The diagram depicts that parser
evaluates an XML document as a DOM structure by traversing through each node.

ADVANTAGES OF XML DOM


 XML DOM is language and platform independent.
 XML DOM is traversable - Information in XML DOM is organized in a hierarchy which
allows developer to navigate around the hierarchy looking for specific information.
 XML DOM is modifiable - It is dynamic in nature providing the developer a scope to add,
edit, move or remove nodes at any point on the tree.

Disadvantages of XML DOM

 It consumes more memory (if the XML structure is large) as program written once
remains in memory all the time until and unless removed explicitly.
 Due to the extensive usage of memory, its operational speed, compared to SAX is slower.
SAX (Simple API for XML)

A SAX Parser implements SAX API. This API is an event based API and less intuitive.

Features of SAX Parser

It does not create any internal structure.

Clients does not know what methods to call, they just overrides the methods of the API and place
his own code inside method.

It is an event based parser, it works like an event handler in Java.

Advantages

1) It is simple and memory efficient.

2) It is very fast and works for huge documents.

Disadvantages

1) It is event-based so its API is less intuitive.

2) Clients never know the full information because the data is broken into pieces.

XSL
 XSL (Extensible Stylesheet Language), formerly called Extensible
Style Language, is a language for creating a style sheet that
describes how data sent over the Web using the Extensible
Markup Language (XML) is to be presented to the user. ...
 XSL is developed under the auspices of the World Wide Web
Consortium (W3C).
XSLT
 EXtensible Stylesheet Language Transformation commonly known as XSLT is a
way to transform the XML document into other formats such as XHTML.

XSL
Before learning XSLT, we should first understand XSL which stands for
EXtensible Stylesheet Language. It is similar to XML as CSS is to HTML.

Need for XSL


In case of HTML document, tags are predefined such as table, div, and span; and the
browser knows how to add style to them and display those using CSS styles. But in
case of XML documents, tags are not predefined. In order to understand and style an
XML document, World Wide Web Consortium (W3C) developed XSL which can act as
XML based Stylesheet Language. An XSL document specifies how a browser should
render an XML document.
Following are the main parts of XSL −
 XSLT − used to transform XML document into various other types of document.
 XPath − used to navigate XML document.
 XSL-FO − used to format XML document.

What is XSLT
XSLT, Extensible Stylesheet Language Transformations, provides the ability to
transform XML data from one format to another automatically.

How XSLT Works


An XSLT stylesheet is used to define the transformation rules to be applied on the target
XML document. XSLT stylesheet is written in XML format. XSLT Processor takes the
XSLT stylesheet and applies the transformation rules on the target XML document and
then it generates a formatted document in the form of XML, HTML, or text format. This
formatted document is then utilized by XSLT formatter to generate the actual output
which is to be displayed to the end-user.
Advantages
Here are the advantages of using XSLT −
 Independent of programming. Transformations are written in a separate xsl file
which is again an XML document.
 Output can be altered by simply modifying the transformations in xsl file. No need
to change any code. So Web designers can edit the stylesheet and can see the
change in the output quickly.

XSLT SYNTAX
we have the following sample XML file, students.xml, which is required to be
transformed into a well-formatted HTML document.
students.xml
<?xml version = "1.0"?>
<class>
<student rollno = "393">
<firstname>Dinkar</firstname>
<lastname>Kad</lastname>
<nickname>Dinkar</nickname>
<marks>85</marks>
</student>
<student rollno = "493">
<firstname>Vaneet</firstname>
<lastname>Gupta</lastname>
<nickname>Vinni</nickname>
<marks>95</marks>
</student>
<student rollno = "593">
<firstname>Jasvir</firstname>
<lastname>Singh</lastname>
<nickname>Jazz</nickname>
<marks>90</marks>
</student>
</class>
We need to define an XSLT style sheet document for the above XML document to meet
the following criteria −
 Page should have a title Students.
 Page should have a table of student details.
 Columns should have following headers: Roll No, First Name, Last Name, Nick
Name, Marks
 Table must contain details of the students accordingly.

Step 1: Create XSLT document


Create an XSLT document to meet the above requirements, name it as students.xsl and
save it in the same location where students.xml lies.
students.xsl
<?xml version = "1.0" encoding = "UTF-8"?>
<!-- xsl stylesheet declaration with xsl namespace:
Namespace tells the xlst processor about which
element is to be processed and which is used for output purpose only
-->
<xsl:stylesheet version = "1.0"
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<!-- xsl template declaration:
template tells the xlst processor about the section of xml
document which is to be formatted. It takes an XPath expression.
In our case, it is matching document root element and will
tell processor to process the entire document with this template.
-->
<xsl:template match = "/">
<!-- HTML tags
Used for formatting purpose. Processor will skip them and
browser
will simply render them.
-->

<html>
<body>
<h2>Students</h2>

<table border = "1">


<tr bgcolor = "#9acd32">
<th>Roll No</th>
<th>First Name</th>
<th>Last Name</th>
<th>Nick Name</th>
<th>Marks</th>
</tr>

<!-- for-each processing instruction


Looks for each element matching the XPath expression
-->

<xsl:for-each select="class/student">
<tr>
<td>
<!-- value-of processing instruction
process the value of the element matching
the XPath expression
-->
<xsl:value-of select = "@rollno"/>
</td>

<td><xsl:value-of select = "firstname"/></td>


<td><xsl:value-of select = "lastname"/></td>
<td><xsl:value-of select = "nickname"/></td>
<td><xsl:value-of select = "marks"/></td>

</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Step 2: Link the XSLT Document to the XML Document


Update student.xml document with the following xml-stylesheet tag. Set href value to
students.xsl
<?xml version = "1.0"?>
<?xml-stylesheet type = "text/xsl" href = "students.xsl"?>
<class>
...
</class>

Step 3: View the XML Document in Internet Explorer


students.xml
<?xml version = "1.0"?>
<?xml-stylesheet type = "text/xsl" href = "students.xsl"?>
<class>
<student rollno = "393">
<firstname>Dinkar</firstname>
<lastname>Kad</lastname>
<nickname>Dinkar</nickname>
<marks>85</marks>
</student>
<student rollno = "493">
<firstname>Vaneet</firstname>
<lastname>Gupta</lastname>
<nickname>Vinni</nickname>
<marks>95</marks>
</student>
<student rollno = "593">
<firstname>Jasvir</firstname>
<lastname>Singh</lastname>
<nickname>Jazz</nickname>
<marks>90</marks>
</student>
</class>

Output
XPATH AND XQUERY

XPath (XML Path Language) is a query language for selecting nodes from an XML
document. In addition, XPath may be used to compute values (e.g., strings, numbers, or
Boolean values) from the content of an XML document.

XPath was defined by the World Wide Web Consortium (W3C).

XPath can be used to navigate through elements and attributes in an XML


document.
1. XPath stands for XML Path Language.
2. XPath uses "path like" syntax to identify and navigate nodes in an XML document.
3. XPath contains over 200 built-in functions.
4. XPath is a major element in the XSLT standard.
5. XPath is a W3C recommendation.

XPath uses path expressions to select nodes or node-sets in an XML document.

These path expressions look very much like the path expressions you use with
traditional computer file systems:
XPath Standard Functions
XPath includes over 200 built-in functions.

There are functions for string values, numeric values, booleans, date and time
comparison, node manipulation, sequence manipulation, and much more.

Today XPath expressions can also be used in JavaScript, Java, XML Schema, PHP,
Python, C and C++, and lots of other languages.

XPath is Used in XSLT


XPath is a major element in the XSLT standard.

With XPath knowledge you will be able to take great advantage of your XSLT
knowledge.

XPath Terminology
Nodes
In XPath, there are seven kinds of nodes: element, attribute, text, namespace,
processing-instruction, comment, and document nodes.

XML documents are treated as trees of nodes. The topmost element of the tree is
called the root element.

Look at the following XML document:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>
<book>
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>

Relationship of Nodes
Parent
Each element and attribute has one parent.

In the following example; the book element is the parent of the title, author, year,
and price:

<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

Children
Element nodes may have zero, one or more children.

In the following example; the title, author, year, and price elements are all
children of the book element:

<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

Siblings
Nodes that have the same parent.

In the following example; the title, author, year, and price elements are all
siblings:

<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

XPath Syntax
XPath uses path expressions to select nodes or node-sets in an XML document. The
node is selected by following a path or steps.

The XML Example Document


We will use the following XML document in the examples below.

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

<book>
<title lang="en">Harry Potter</title>
<price>29.99</price>
</book>

<book>
<title lang="en">Learning XML</title>
<price>39.95</price>
</book>

</bookstore>

selecting Nodes
XPath uses path expressions to select nodes in an XML document. The node is
selected by following a path or steps. The most useful path expressions are listed
below:

Expression Description

nodename Selects all nodes with the name "nodename"

/ Selects from the root node


// Selects nodes in the document from the current node that

match the selection no matter where they are

. Selects the current node

.. Selects the parent of the current node

@ Selects attributes

In the table below we have listed some path expressions and the result of the
expressions:

Path Expression Result

bookstore Selects all nodes with the name "bookstore"

/bookstore Selects the root element bookstore


bookstore/book Selects all book elements that are children of bookstore

//book Selects all book elements no matter where they

are in the document

bookstore//book Selects all book elements that are descendant of the bookstore

Element element, no matter where they are under

the bookstore element

//@lang Selects all attributes that are named lang

Predicates
Predicates are used to find a specific node or a node that contains a specific
value.

Predicates are always embedded in square brackets.

In the table below we have listed some path expressions with predicates and the
result of the expressions:

Path Expression Result

/bookstore/book[1] Selects the first book element

that is the child of the bookstore element.

Note: In IE 5,6,7,8,9 first node is[0],


but according to W3C, it is [1].

To solve this problem in IE, set the

SelectionLanguage to XPath:

In JavaScript:

xml.setProperty("SelectionLanguage","XPath"

/bookstore/book[last()] Selects the last book element that is

the child of the bookstore element

/bookstore/book[last()-1] Selects the last but one book element

that is the child of the bookstore element

/bookstore/book[position()<3] Selects the first two book elements

that are children of the bookstore element

//title[@lang] Selects all the title elements that have

an attribute named lang

//title[@lang='en'] Selects all the title elements that have

a "lang" attribute with a value of "en"

/bookstore/book[price>35.00] Selects all the book elements of the

bookstore element that have a price

element with a value greater than 35.00

/bookstore/book[price>35.00]/title Selects all the title elements of

the book elements of the bookstore

element that have a price element with


a value greater than 35.00

Selecting Unknown Nodes


XPath wildcards can be used to select unknown XML nodes.

Wildcard Description

* Matches any element node

@* Matches any attribute node

node() Matches any node of any kind

In the table below we have listed some path expressions and the result of the
expressions:

Path Expression Result

/bookstore/* Selects all the child element nodes of the bookstore element

//* Selects all elements in the document

//title[@*] Selects all title elements which have at least one


attribute of any kind

Selecting Several Paths


By using the | operator in an XPath expression you can select several paths.

In the table below we have listed some path expressions and the result of the
expressions:

Path Expression Result

//book/title | //book/price Selects all the title AND price elements of all

book elements

//title | //price Selects all the title AND price elements in the

document

/bookstore/book/title | //price Selects all the title elements of the book

element of the bookstore element AND

all the price elements in the document

What is XQuery
XQuery is a functional language that is used to retrieve information stored in XML
format. XQuery can be used on XML documents, relational databases containing data in
XML formats, or XML Databases. XQuery 3.0 is a W3C recommendation from April 8,
2014.
XQuery is a standardized language for combining documents, databases, Web pages and almost
anything else. It is very widely implemented. It is powerful and easy to learn. XQuery is replacing
proprietary middleware languages and Web Application development languages. XQuery is
replacing complex Java or C++ programs with a few lines of code. XQuery is simpler to work with
and easier to maintain than many other alternatives.

Characteristics
 Functional Language − XQuery is a language to retrieve/querying XML based
data.
 Analogous to SQL − XQuery is to XML what SQL is to databases.
 XPath based − XQuery uses XPath expressions to navigate through XML
documents.
 Universally accepted − XQuery is supported by all major databases.
 W3C Standard − XQuery is a W3C standard.

Benefits of XQuery
 Using XQuery, both hierarchical and tabular data can be retrieved.
 XQuery can be used to query tree and graphical structures.
 XQuery can be directly used to query webpages.
 XQuery can be directly used to build webpages.
 XQuery can be used to transform xml documents.
 XQuery is ideal for XML-based databases and object-based databases. Object
databases are much more flexible and powerful than purely tabular databases.

DATA WAREHOUSE

 A Data Warehouse consists of data from multiple heterogeneous data


sources and is used for analytical reporting and decision making.
 Data Warehouse is a central place where data is stored from different data
sources and applications.
 The term Data Warehouse was first invented by Bill Inmom in 1990.
 A Data Warehouse is always kept separate from an Operational Database.
The data in a DW system is loaded from operational transaction systems like −

 Sales
 Marketing
 HR
 SCM, etc.
 It may pass through operational data store or other transformations before it
is loaded to the DW system for information processing.
 A Data Warehouse is used for reporting and analyzing of information and
stores both historical and current data. The data in DW system is used for
Analytical reporting, which is later used by Business Analysts, Sales
Managers or Knowledge workers for decision-making.

In the above image, you can see that the data is coming from multiple
heterogeneous data sources to a Data Warehouse. Common data sources for a data
warehouse includes −

 Operational databases
 SAP and non-SAP Applications
 Flat Files (xls, csv, txt files)
Data in data warehouse is accessed by BI (Business Intelligence) users for
Analytical Reporting, Data Mining and Analysis. This is used for decision making
by Business Users, Sales Manager, Analysts to define future strategy.

FEATURES OF A DATA WAREHOUSE


 It is a central data repository where data is stored from one or more
heterogeneous data sources.
 A DW system stores both current and historical data. Normally a DW
system stores 5-10 years of historical data.
 A DW system is always kept separate from an operational transaction
system.
The data in a DW system is used for different types of analytical reporting range
from Quarterly to Annual comparison.

DATA WAREHOUSE VS OPERATIONAL DATABASE

The differences between a Data Warehouse and Operational Database are as


follows −
 An Operational System is designed for known workloads and transactions
like updating a user record, searching a record, etc. However, Data
Warehouse transactions are more complex and present a general form of
data.
 An Operational System contains the current data of an organization and
Data warehouse normally contains the historical data.
 An Operational Database supports parallel processing of multiple
transactions. Concurrency control and recovery mechanisms are required to
maintain consistency of the database.
 An Operational Database query allows to read and modify operations
(insert, delete and Update) while an OLAP query needs only read-only
access of stored data (Select statement).

ARCHITECTURE OF DATA WAREHOUSE

Data Warehousing involves data cleaning, data integration, and data consolidations.
A Data Warehouse has a 3-layer architecture −

Data Source Layer

It defines how the data comes to a Data Warehouse. It involves various data
sources and operational transaction systems, flat files, applications, etc.
Integration Layer

It consists of Operational Data Store and Staging area. Staging area is used to
perform data cleansing, data transformation and loading data from different sources
to a data warehouse. As multiple data sources are available for extraction at
different time zones, staging area is used to store the data and later to apply
transformations on data.

Presentation Layer

This is used to perform BI reporting by end users. The data in a DW system is


accessed by BI users and used for reporting and analysis.
The following illustration shows the common architecture of a Data Warehouse
System.

Characteristics of a Data Warehouse

The following are the key characteristics of a Data Warehouse −


 Subject Oriented − In a DW system, the data is categorized and stored by a
business subject rather than by application like equity plans, shares, loans,
etc.
 Integrated − Data from multiple data sources are integrated in a Data
Warehouse.
 Non Volatile − Data in data warehouse is non-volatile. It means when data is
loaded in DW system, it is not altered.
 Time Variant − A DW system contains historical data as compared to
Transactional system which contains only current data. In a Data warehouse
you can see data for 3 months, 6 months, 1 year, 5 years, etc.

OLTP vs OLAP

Firstly, OLTP stands for Online Transaction Processing, while OLAP stands
for Online Analytical Processing
In an OLTP system, there are a large number of short online transactions such as
INSERT, UPDATE, and DELETE.
Whereas, in an OLTP system, an effective measure is the processing time of short
transactions and is very less. It controls data integrity in multi-access environments.
For an OLTP system, the number of transactions per second measures the
effectiveness. An OLTP Data Warehouse System contains current and detailed data
and is maintained in the schemas in the entity model (3NF).
For Example −
A Day-to-Day transaction system in a retail store, where the customer records are
inserted, updated and deleted on a daily basis. It provides faster query processing.
OLTP databases contain detailed and current data. The schema used to store OLTP
database is the Entity model.
In an OLAP system, there are lesser number of transactions as compared to a
transactional system. The queries executed are complex in nature and involves data
aggregations.

What is an Aggregation?

We save tables with aggregated data like yearly (1 row), quarterly (4 rows),
monthly (12 rows) or so, if someone has to do a year to year comparison, only one
row will be processed. However, in an un-aggregated table it will compare all the
rows. This is called Aggregation.
There are various Aggregation functions that can be used in an OLAP system like
Sum, Avg, Max, Min, etc.
For Example −
SELECT Avg(salary)
FROM employee
WHERE title = 'Programmer';

Key Differences

These are the major differences between an OLAP and an OLTP system.
 Indexes − An OLTP system has only few indexes while in an OLAP system
there are many indexes for performance optimization.
 Joins − In an OLTP system, large number of joins and data are normalized.
However, in an OLAP system there are less joins and are de-normalized.
 Aggregation − In an OLTP system, data is not aggregated while in an OLAP
database more aggregations are used.
 Normalization − An OLTP system contains normalized data however data is
not normalized in an OLAP system.

Data Mart Vs Data Warehouse

Data mart focuses on a single functional area and represents the simplest form of a
Data Warehouse. Consider a Data Warehouse that contains data for Sales,
Marketing, HR, and Finance. A Data mart focuses on a single functional area like
Sales or Marketing.
In
the above image, you can see the difference between a Data Warehouse and a data
mart.

Fact vs Dimension Table

A fact table represents the measures on which analysis is performed. It also


contains foreign keys for the dimension keys.
For example − Every sale is a fact.
Cust Id Prod Id Time Id Qty Sold

1110 25 2 125

1210 28 4 252

The Dimension table represents the characteristics of a dimension. A Customer


dimension can have Customer_Name, Phone_No, Sex, etc.
Cust Id Cust_Name Phone Sex

1110 Sally 1113334444 F


1210 Adam 2225556666 M

MULTIDIMENSIONAL DATA MODELING

 The multi-Dimensional Data Model is a method which is used for ordering


data in the database along with good arrangement and assembling of the
contents in the database. ...
 OLAP (online analytical processing) and data warehousing uses multi
dimensional databases. It is used to show multiple dimensions of the data to
users.
 Dimensional Modeling (DM) is a data structure technique optimized for data
storage in a Data warehouse.

 The purpose of dimensional modeling is to optimize the database for faster


retrieval of data.
 The concept of Dimensional Modelling was developed by Ralph Kimball and
consists of “fact” and “dimension” tables.
 A dimensional model in data warehouse is designed to read, summarize,
analyze numeric information like values, balances, counts, weights, etc. in a
data warehouse. In contrast, relation models are optimized for addition,
updating and deletion of data in a real-time Online Transaction System.

 These dimensional and relational models have their unique way of data
storage that has specific advantages.

Elements of Dimensional Data Model


Fact
Facts are the measurements/metrics or facts from your business process. For
a Sales business process, a measurement would be quarterly sales number

Dimension
Dimension provides the context surrounding a business process event. In
simple terms, they give who, what, where of a fact. In the Sales business
process, for the fact quarterly sales number, dimensions would be

 Who – Customer Names


 Where – Location
 What – Product Name
In other words, a dimension is a window to view information in the facts.

Attributes
The Attributes are the various characteristics of the dimension in dimensional
data modeling.

In the Location dimension, the attributes can be

 State
 Country
 Zipcode etc.

Attributes are used to search, filter, or classify facts. Dimension Tables contain
Attributes

Fact Table
A fact table is a primary table in dimension modelling.

A Fact Table contains

1. Measurements/facts
2. Foreign key to dimension table

Dimension Table
 A dimension table contains dimensions of a fact.
 They are joined to fact table via a foreign key.
 Dimension tables are de-normalized tables.
 The Dimension Attributes are the various columns in a dimension table
 Dimensions offers descriptive characteristics of the facts with the help of
their attributes
 No set limit set for given for number of dimensions
 The dimension can also contain one or more hierarchical relationships

Types of Dimensions in Data Warehouse


Following are the Types of Dimensions in Data Warehouse:

 Conformed Dimension
 Outrigger Dimension
 Shrunken Dimension
 Role-playing Dimension
 Dimension to Dimension Table
 Junk Dimension
 Degenerate Dimension
 Swappable Dimension
 Step Dimension

Steps of Dimensional Modelling


The accuracy in creating your Dimensional modeling determines the success of
your data warehouse implementation. Here are the steps to create Dimension
Model

1. Identify Business Process


2. Identify Grain (level of detail)
3. Identify Dimensions
4. Identify Facts
5. Build Star

The model should describe the Why, How much, When/Where/Who and What
of your business process

Step 1) Identify the Business Process


Identifying the actual business process a datarehouse should cover. This could
be Marketing, Sales, HR, etc. as per the data analysis needs of the
organization. The selection of the Business process also depends on the
quality of data available for that process. It is the most important step of the
Data Modelling process, and a failure here would have cascading and
irreparable defects.

To describe the business process, you can use plain text or use basic Business
Process Modelling Notation (BPMN) or Unified Modelling Language (UML).

Step 2) Identify the Grain


The Grain describes the level of detail for the business problem/solution. It is
the process of identifying the lowest level of information for any table in your
data warehouse. If a table contains sales data for every day, then it should be
daily granularity. If a table contains total sales data for each month, then it has
monthly granularity.

During this stage, you answer questions like

1. Do we need to store all the available products or just a few types of


products? This decision is based on the business processes selected for
Datawarehouse
2. Do we store the product sale information on a monthly, weekly, daily or
hourly basis? This decision depends on the nature of reports requested
by executives
3. How do the above two choices affect the database size?

Example of Grain:

The CEO at an MNC wants to find the sales for specific products in different
locations on a daily basis.

So, the grain is “product sale information by location by the day.”

Step 3) Identify the Dimensions


Dimensions are nouns like date, store, inventory, etc. These dimensions are
where all the data should be stored. For example, the date dimension may
contain data like a year, month and weekday.

Example of Dimensions:

The CEO at an MNC wants to find the sales for specific products in different
locations on a daily basis.
Dimensions: Product, Location and Time

Attributes: For Product: Product key (Foreign Key), Name, Type, Specifications

Hierarchies: For Location: Country, State, City, Street Address, Name

Step 4) Identify the Fact


This step is co-associated with the business users of the system because this
is where they get access to data stored in the data warehouse. Most of the fact
table rows are numerical values like price or cost per unit, etc.

Example of Facts:

The CEO at an MNC wants to find the sales for specific products in different
locations on a daily basis.

The fact here is Sum of Sales by product by location by time.

Step 5) Build Schema


In this step, you implement the Dimension Model. A schema is nothing but the
database structure (arrangement of tables). There are two popular schemas

1. Star Schema

The star schema architecture is easy to design. It is called a star schema


because diagram resembles a star, with points radiating from a center. The
center of the star consists of the fact table, and the points of the star is
dimension tables.

The fact tables in a star schema which is third normal form whereas
dimensional tables are de-normalized.

2. Snowflake Schema

The snowflake schema is an extension of the star schema. In a snowflake


schema, each dimension are normalized and connected to more dimension
tables.

Rules for Dimensional Modelling


Following are the rules and principles of Dimensional Modeling:

 Load atomic data into dimensional structures.


 Build dimensional models around business processes.
 Need to ensure that every fact table has an associated date dimension
table.
 Ensure that all facts in a single fact table are at the same grain or level of
detail.
 It’s essential to store report labels and filter domain values in dimension
tables
 Need to ensure that dimension tables use a surrogate key
 Continuously balance requirements and realities to deliver business
solution to support their decision-making

Benefits of Dimensional Modeling


 Standardization of dimensions allows easy reporting across areas of the
business.
 Dimension tables store the history of the dimensional information.
 It allows to introduce entirely new dimension without major disruptions to
the fact table.
 Dimensional also to store data in such a fashion that it is easier to
retrieve the information from the data once the data is stored in the
database.
 Compared to the normalized model dimensional table are easier to
understand.
 Information is grouped into clear and simple business categories.
 The dimensional model is very understandable by the business. This
model is based on business terms, so that the business knows what
each fact, dimension, or attribute means.
 Dimensional models are deformalized and optimized for fast data
querying. Many relational database platforms recognize this model and
optimize query execution plans to aid in performance.
 Dimensional modelling in data warehouse creates a schema which is
optimized for high performance. It means fewer joins and helps with
minimized data redundancy.
 The dimensional model also helps to boost query performance. It is more
denormalized therefore it is optimized for querying.
 Dimensional models can comfortably accommodate change. Dimension
tables can have more columns added to them without affecting existing
business intelligence applications using these tables.
STAR AND SNOW FLAKE SCHEMA

What is Multidimensional schema?


Multidimensional Schema is especially designed to model data warehouse
systems. The schemas are designed to address the unique needs of very large
databases designed for the analytical purpose (OLAP).

Types of Data Warehouse Schema:

Following are 3 chief types of multidimensional schemas each having its


unique advantages.

 Star Schema
 Snowflake Schema
 Galaxy Schema

What is a Star Schema?


Star Schema in data warehouse, in which the center of the star can have one
fact table and a number of associated dimension tables. It is known as star
schema as its structure resembles a star. The Star Schema data model is the
simplest type of Data Warehouse schema. It is also known as Star Join
Schema and is optimized for querying large data sets.

In the following Star Schema example, the fact table is at the center which
contains keys to every dimension table like Dealer_ID, Model ID, Date_ID,
Product_ID, Branch_ID & other attributes like Units sold and revenue.
Example of
Star Schema Diagram
Characteristics of Star Schema:
 Every dimension in a star schema is represented with the only one-
dimension table.
 The dimension table should contain the set of attributes.
 The dimension table is joined to the fact table using a foreign key
 The dimension table are not joined to each other
 Fact table would contain key and measure
 The Star schema is easy to understand and provides optimal disk usage.
 The dimension tables are not normalized. For instance, in the above
figure, Country_ID does not have Country lookup table as an OLTP
design would have.
 The schema is widely supported by BI Tools

What is a Snowflake Schema?


Snowflake Schema in data warehouse is a logical arrangement of tables in a
multidimensional database such that the ER diagram resembles a snowflake
shape. A Snowflake Schema is an extension of a Star Schema, and it adds
additional dimensions. The dimension tables are normalized which splits data
into additional tables.
In the following Snowflake Schema example, Country is further normalized into
an individual table.

Ex
ample of Snowflake Schema
Characteristics of Snowflake Schema:
 The main benefit of the snowflake schema it uses smaller disk space.
 Easier to implement a dimension is added to the Schema
 Due to multiple tables query performance is reduced
 The primary challenge that you will face while using the snowflake
Schema is that you need to perform more maintenance efforts because
of the more lookup tables.

Star Schema Vs Snowflake Schema: Key


Differences
Following is a key difference between Snowflake schema vs Star schema:

Star Schema Snowflake Schema


Hierarchies for the dimensions are stored in
Hierarchies are divided into separate tables.
the dimensional table.
One fact table surrounded by dimension table
It contains a fact table surrounded by
which are in turn surrounded by dimension
dimension tables.
table
In a star schema, only single join creates
A snowflake schema requires many joins to
the relationship between the fact table and
fetch the data.
any dimension tables.
Simple DB Design.A Very Complex DB Design.
Denormalized Data structure and query
Normalized Data Structure.
also run faster.
High level of Data redundancy Very low-level data redundancy
Single Dimension table contains
Data Split into different Dimension Tables.
aggregated data.
Cube processing might be slow because of
Cube processing is faster.
the complex join.
Offers higher performing queries using Star
The Snowflake schema is represented by
Join Query Optimization.
centralized fact table which unlikely
Tables may be connected with multiple
connected with multiple dimensions.
dimensions.

What is a Galaxy Schema?


A Galaxy Schema contains two fact table that share dimension tables between
them. It is also called Fact Constellation Schema. The schema is viewed as a
collection of stars hence the name Galaxy Schema.

Example of Galaxy Schema


As you can see in above example, there are two facts table

1. Revenue
2. Product.

In Galaxy schema shares dimensions are called Conformed Dimensions.

Characteristics of Galaxy Schema:


 The dimensions in this schema are separated into separate dimensions
based on the various levels of hierarchy.
 For example, if geography has four levels of hierarchy like region,
country, state, and city then Galaxy schema should have four
dimensions.
 Moreover, it is possible to build this type of schema by splitting the one-
star schema into more Star schemes.
 The dimensions are large in this schema which is needed to build based
on the levels of hierarchy.
 This schema is helpful for aggregating fact tables for better
understanding.

What is Star Cluster Schema?


Snowflake schema contains fully expanded hierarchies. However, this can add
complexity to the Schema and requires extra joins. On the other hand, star
schema contains fully collapsed hierarchies, which may lead to redundancy.
So, the best solution may be a balance between these two schemas which is
Star Cluster Schema design.

Example of Star Cluster Schema


Overlapping dimensions can be found as forks in hierarchies. A fork happens
when an entity acts as a parent in two different dimensional hierarchies. Fork
entities then identified as classification with one-to-many relationships.

OLAP OPERATIONS AND QUERIES


Online Analytical Processing Server (OLAP) is based on the multidimensional data model.
It allows managers, and analysts to get an insight of the information through fast,
consistent, and interactive access to information. This chapter cover the types of OLAP,
operations on OLAP, difference between OLAP, and statistical databases and OLTP.

OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP
operations in multidimensional data.
Here is the list of OLAP operations −

 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −

 By climbing up a concept hierarchy for a dimension


 By dimension reduction
The following diagram illustrates how roll-up works.

 Roll-up is performed by climbing up a concept hierarchy for the dimension


location.
 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy from the
level of city to the level of country.
 The data is grouped into cities rather than countries.
 When roll-up is performed, one or more dimensions from the data cube are
removed.

Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following
ways −

 By stepping down a concept hierarchy for a dimension


 By introducing a new dimension.
The following diagram illustrates how drill-down works −

 Drill-down is performed by stepping down a concept hierarchy for the dimension


time.
 Initially the concept hierarchy was "day < month < quarter < year."
 On drilling down, the time dimension is descended from the level of quarter to the
level of month.
 When drill-down is performed, one or more dimensions from the data cube are
added.
 It navigates the data from less detailed data to highly detailed data.

Slice
The slice operation selects one particular dimension from a given cube and provides a
new sub-cube. Consider the following diagram that shows how slice works.
 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.

Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube.
Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three
dimensions.

 (location = "Toronto" or "Vancouver")


 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")

Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to
provide an alternative presentation of data. Consider the following diagram that shows
the pivot operation.
OLAP QUERIES
Online Analytical Processing (OLAP) databases facilitate business-intelligence queries.
OLAP is a database technology that has been optimized for querying and reporting,
instead of processing transactions. ... OLAP data is also organized hierarchically and
stored in cubes instead of tables.

When you have to run an OLAP query that perform sum of Sales in table with where
clause in Country=’US’.
Select Sum(Sales) from FCT_SALES where Country=’US’;
It storage type is a column based storage in memory cells all the values for Sales will
come together in database and when an aggregation ‘Sum’ is performed it will be much
faster as compared to an OLTP query.
If table is row based storage with values are stored with different data types coming
together and a ‘Sum’ aggregation is performed, it will too tough to find values for ‘Sales’
column.
MULTIDIMENSIONAL DATA MODELING

 The multi-Dimensional Data Model is a method which is used for ordering


data in the database along with good arrangement and assembling of the
contents in the database. ...
 OLAP (online analytical processing) and data warehousing uses multi
dimensional databases. It is used to show multiple dimensions of the data to
users.
 Dimensional Modeling (DM) is a data structure technique optimized for data
storage in a Data warehouse.

 The purpose of dimensional modeling is to optimize the database for faster


retrieval of data.
 The concept of Dimensional Modelling was developed by Ralph Kimball and
consists of “fact” and “dimension” tables.
 A dimensional model in data warehouse is designed to read, summarize,
analyze numeric information like values, balances, counts, weights, etc. in a
data warehouse. In contrast, relation models are optimized for addition,
updating and deletion of data in a real-time Online Transaction System.

 These dimensional and relational models have their unique way of data
storage that has specific advantages.

Elements of Dimensional Data Model


Fact
Facts are the measurements/metrics or facts from your business process. For
a Sales business process, a measurement would be quarterly sales number
Dimension
Dimension provides the context surrounding a business process event. In
simple terms, they give who, what, where of a fact. In the Sales business
process, for the fact quarterly sales number, dimensions would be

 Who – Customer Names


 Where – Location
 What – Product Name

In other words, a dimension is a window to view information in the facts.

Attributes
The Attributes are the various characteristics of the dimension in dimensional
data modeling.

In the Location dimension, the attributes can be

 State
 Country
 Zipcode etc.

Attributes are used to search, filter, or classify facts. Dimension Tables contain
Attributes

Fact Table
A fact table is a primary table in dimension modelling.

A Fact Table contains

3. Measurements/facts
4. Foreign key to dimension table

Dimension Table
 A dimension table contains dimensions of a fact.
 They are joined to fact table via a foreign key.
 Dimension tables are de-normalized tables.
 The Dimension Attributes are the various columns in a dimension table
 Dimensions offers descriptive characteristics of the facts with the help of
their attributes
 No set limit set for given for number of dimensions
 The dimension can also contain one or more hierarchical relationships
Types of Dimensions in Data Warehouse
Following are the Types of Dimensions in Data Warehouse:

 Conformed Dimension
 Outrigger Dimension
 Shrunken Dimension
 Role-playing Dimension
 Dimension to Dimension Table
 Junk Dimension
 Degenerate Dimension
 Swappable Dimension
 Step Dimension

Steps of Dimensional Modelling


The accuracy in creating your Dimensional modeling determines the success of
your data warehouse implementation. Here are the steps to create Dimension
Model

6. Identify Business Process


7. Identify Grain (level of detail)
8. Identify Dimensions
9. Identify Facts
10. Build Star

The model should describe the Why, How much, When/Where/Who and What
of your business process
Step 1) Identify the Business Process
Identifying the actual business process a datarehouse should cover. This could
be Marketing, Sales, HR, etc. as per the data analysis needs of the
organization. The selection of the Business process also depends on the
quality of data available for that process. It is the most important step of the
Data Modelling process, and a failure here would have cascading and
irreparable defects.

To describe the business process, you can use plain text or use basic Business
Process Modelling Notation (BPMN) or Unified Modelling Language (UML).

Step 2) Identify the Grain


The Grain describes the level of detail for the business problem/solution. It is
the process of identifying the lowest level of information for any table in your
data warehouse. If a table contains sales data for every day, then it should be
daily granularity. If a table contains total sales data for each month, then it has
monthly granularity.

During this stage, you answer questions like


4. Do we need to store all the available products or just a few types of
products? This decision is based on the business processes selected for
Datawarehouse
5. Do we store the product sale information on a monthly, weekly, daily or
hourly basis? This decision depends on the nature of reports requested
by executives
6. How do the above two choices affect the database size?

Example of Grain:

The CEO at an MNC wants to find the sales for specific products in different
locations on a daily basis.

So, the grain is “product sale information by location by the day.”

Step 3) Identify the Dimensions


Dimensions are nouns like date, store, inventory, etc. These dimensions are
where all the data should be stored. For example, the date dimension may
contain data like a year, month and weekday.

Example of Dimensions:

The CEO at an MNC wants to find the sales for specific products in different
locations on a daily basis.

Dimensions: Product, Location and Time

Attributes: For Product: Product key (Foreign Key), Name, Type, Specifications

Hierarchies: For Location: Country, State, City, Street Address, Name

Step 4) Identify the Fact


This step is co-associated with the business users of the system because this
is where they get access to data stored in the data warehouse. Most of the fact
table rows are numerical values like price or cost per unit, etc.

Example of Facts:

The CEO at an MNC wants to find the sales for specific products in different
locations on a daily basis.

The fact here is Sum of Sales by product by location by time.

Step 5) Build Schema


In this step, you implement the Dimension Model. A schema is nothing but the
database structure (arrangement of tables). There are two popular schemas

2. Star Schema

The star schema architecture is easy to design. It is called a star schema


because diagram resembles a star, with points radiating from a center. The
center of the star consists of the fact table, and the points of the star is
dimension tables.

The fact tables in a star schema which is third normal form whereas
dimensional tables are de-normalized.

3. Snowflake Schema

The snowflake schema is an extension of the star schema. In a snowflake


schema, each dimension are normalized and connected to more dimension
tables.

Rules for Dimensional Modelling


Following are the rules and principles of Dimensional Modeling:

 Load atomic data into dimensional structures.


 Build dimensional models around business processes.
 Need to ensure that every fact table has an associated date dimension
table.
 Ensure that all facts in a single fact table are at the same grain or level of
detail.
 It’s essential to store report labels and filter domain values in dimension
tables
 Need to ensure that dimension tables use a surrogate key
 Continuously balance requirements and realities to deliver business
solution to support their decision-making

Benefits of Dimensional Modeling


 Standardization of dimensions allows easy reporting across areas of the
business.
 Dimension tables store the history of the dimensional information.
 It allows to introduce entirely new dimension without major disruptions to
the fact table.
 Dimensional also to store data in such a fashion that it is easier to
retrieve the information from the data once the data is stored in the
database.
 Compared to the normalized model dimensional table are easier to
understand.
 Information is grouped into clear and simple business categories.
 The dimensional model is very understandable by the business. This
model is based on business terms, so that the business knows what
each fact, dimension, or attribute means.
 Dimensional models are deformalized and optimized for fast data
querying. Many relational database platforms recognize this model and
optimize query execution plans to aid in performance.
 Dimensional modelling in data warehouse creates a schema which is
optimized for high performance. It means fewer joins and helps with
minimized data redundancy.
 The dimensional model also helps to boost query performance. It is more
denormalized therefore it is optimized for querying.
 Dimensional models can comfortably accommodate change. Dimension
tables can have more columns added to them without affecting existing
business intelligence applications using these tables.

STAR AND SNOW FLAKE SCHEMA

What is Multidimensional schema?


Multidimensional Schema is especially designed to model data warehouse
systems. The schemas are designed to address the unique needs of very large
databases designed for the analytical purpose (OLAP).

Types of Data Warehouse Schema:

Following are 3 chief types of multidimensional schemas each having its


unique advantages.

 Star Schema
 Snowflake Schema
 Galaxy Schema
What is a Star Schema?
Star Schema in data warehouse, in which the center of the star can have one
fact table and a number of associated dimension tables. It is known as star
schema as its structure resembles a star. The Star Schema data model is the
simplest type of Data Warehouse schema. It is also known as Star Join
Schema and is optimized for querying large data sets.

In the following Star Schema example, the fact table is at the center which
contains keys to every dimension table like Dealer_ID, Model ID, Date_ID,
Product_ID, Branch_ID & other attributes like Units sold and revenue.

Example of
Star Schema Diagram
Characteristics of Star Schema:
 Every dimension in a star schema is represented with the only one-
dimension table.
 The dimension table should contain the set of attributes.
 The dimension table is joined to the fact table using a foreign key
 The dimension table are not joined to each other
 Fact table would contain key and measure
 The Star schema is easy to understand and provides optimal disk usage.
 The dimension tables are not normalized. For instance, in the above
figure, Country_ID does not have Country lookup table as an OLTP
design would have.
 The schema is widely supported by BI Tools

What is a Snowflake Schema?


Snowflake Schema in data warehouse is a logical arrangement of tables in a
multidimensional database such that the ER diagram resembles a snowflake
shape. A Snowflake Schema is an extension of a Star Schema, and it adds
additional dimensions. The dimension tables are normalized which splits data
into additional tables.

In the following Snowflake Schema example, Country is further normalized into


an individual table.

Ex
ample of Snowflake Schema
Characteristics of Snowflake Schema:
 The main benefit of the snowflake schema it uses smaller disk space.
 Easier to implement a dimension is added to the Schema
 Due to multiple tables query performance is reduced
 The primary challenge that you will face while using the snowflake
Schema is that you need to perform more maintenance efforts because
of the more lookup tables.

Star Schema Vs Snowflake Schema: Key


Differences
Following is a key difference between Snowflake schema vs Star schema:
Star Schema Snowflake Schema
Hierarchies for the dimensions are stored in
Hierarchies are divided into separate tables.
the dimensional table.
One fact table surrounded by dimension table
It contains a fact table surrounded by
which are in turn surrounded by dimension
dimension tables.
table
In a star schema, only single join creates
A snowflake schema requires many joins to
the relationship between the fact table and
fetch the data.
any dimension tables.
Simple DB Design.A Very Complex DB Design.
Denormalized Data structure and query
Normalized Data Structure.
also run faster.
High level of Data redundancy Very low-level data redundancy
Single Dimension table contains
Data Split into different Dimension Tables.
aggregated data.
Cube processing might be slow because of
Cube processing is faster.
the complex join.
Offers higher performing queries using Star
The Snowflake schema is represented by
Join Query Optimization.
centralized fact table which unlikely
Tables may be connected with multiple
connected with multiple dimensions.
dimensions.

What is a Galaxy Schema?


A Galaxy Schema contains two fact table that share dimension tables between
them. It is also called Fact Constellation Schema. The schema is viewed as a
collection of stars hence the name Galaxy Schema.

Example of Galaxy Schema


As you can see in above example, there are two facts table

3. Revenue
4. Product.

In Galaxy schema shares dimensions are called Conformed Dimensions.

Characteristics of Galaxy Schema:


 The dimensions in this schema are separated into separate dimensions
based on the various levels of hierarchy.
 For example, if geography has four levels of hierarchy like region,
country, state, and city then Galaxy schema should have four
dimensions.
 Moreover, it is possible to build this type of schema by splitting the one-
star schema into more Star schemes.
 The dimensions are large in this schema which is needed to build based
on the levels of hierarchy.
 This schema is helpful for aggregating fact tables for better
understanding.

What is Star Cluster Schema?


Snowflake schema contains fully expanded hierarchies. However, this can add
complexity to the Schema and requires extra joins. On the other hand, star
schema contains fully collapsed hierarchies, which may lead to redundancy.
So, the best solution may be a balance between these two schemas which is
Star Cluster Schema design.

Example of Star Cluster Schema


Overlapping dimensions can be found as forks in hierarchies. A fork happens
when an entity acts as a parent in two different dimensional hierarchies. Fork
entities then identified as classification with one-to-many relationships.

OLAP OPERATIONS AND QUERIES


Online Analytical Processing Server (OLAP) is based on the multidimensional data model.
It allows managers, and analysts to get an insight of the information through fast,
consistent, and interactive access to information. This chapter cover the types of OLAP,
operations on OLAP, difference between OLAP, and statistical databases and OLTP.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP
operations in multidimensional data.
Here is the list of OLAP operations −

 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)

Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −

 By climbing up a concept hierarchy for a dimension


 By dimension reduction
The following diagram illustrates how roll-up works.
 Roll-up is performed by climbing up a concept hierarchy for the dimension
location.
 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy from the
level of city to the level of country.
 The data is grouped into cities rather than countries.
 When roll-up is performed, one or more dimensions from the data cube are
removed.

Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following
ways −

 By stepping down a concept hierarchy for a dimension


 By introducing a new dimension.
The following diagram illustrates how drill-down works −

 Drill-down is performed by stepping down a concept hierarchy for the dimension


time.
 Initially the concept hierarchy was "day < month < quarter < year."
 On drilling down, the time dimension is descended from the level of quarter to the
level of month.
 When drill-down is performed, one or more dimensions from the data cube are
added.
 It navigates the data from less detailed data to highly detailed data.

Slice
The slice operation selects one particular dimension from a given cube and provides a
new sub-cube. Consider the following diagram that shows how slice works.

 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.

Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube.
Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three
dimensions.

 (location = "Toronto" or "Vancouver")


 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")

Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to
provide an alternative presentation of data. Consider the following diagram that shows
the pivot operation.
OLAP QUERIES
Online Analytical Processing (OLAP) databases facilitate business-intelligence queries.
OLAP is a database technology that has been optimized for querying and reporting,
instead of processing transactions. ... OLAP data is also organized hierarchically and
stored in cubes instead of tables.

When you have to run an OLAP query that perform sum of Sales in table with where
clause in Country=’US’.
Select Sum(Sales) from FCT_SALES where Country=’US’;
It storage type is a column based storage in memory cells all the values for Sales will
come together in database and when an aggregation ‘Sum’ is performed it will be much
faster as compared to an OLTP query.
If table is row based storage with values are stored with different data types coming
together and a ‘Sum’ aggregation is performed, it will too tough to find values for ‘Sales’
column.

You might also like