0% found this document useful (0 votes)
144 views10 pages

A Conceptual Model For Multidimensional Data: Anand S. Kamble

This paper introduces a Conceptual Data Model for Data Warehouse including multidimensional aggregation. The Conceptual Data Model gracefully extends standard Entity-Relationship data model with multidimensional aggregated entities. The aim of this work is not to propose yet another conceptual data model, but to find the most general and precise formalism.

Uploaded by

Jasvinder Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views10 pages

A Conceptual Model For Multidimensional Data: Anand S. Kamble

This paper introduces a Conceptual Data Model for Data Warehouse including multidimensional aggregation. The Conceptual Data Model gracefully extends standard Entity-Relationship data model with multidimensional aggregated entities. The aim of this work is not to propose yet another conceptual data model, but to find the most general and precise formalism.

Uploaded by

Jasvinder Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

A Conceptual Model for Multidimensional Data

Anand S. Kamble
Department of Information Technology
Government of India
New Delhi, India
Email: [email protected]
Abstract
This paper introduces a Conceptual Data Model for
Data Warehouse including multidimensional aggrega-
tion. It is based on Entity-Relationships data model.
The conceptual data model gracefully extends stan-
dard Entity-Relationship data model with multidi-
mensional aggregated entities. The model has a clear
mathematical theoretic semantics grounded on stan-
dard ER semantics and the (/T logic-based multi-
dimensional data model. The aim of this work is not
to propose yet another conceptual data model, but to
nd the most general and precise formalism consider-
ing all the proposals for a conceptual data model in
the data warehouse eld, making therefore a possible
formal comparison of the dierences of the models in
the literature, and to study the formal properties or
extensions of such data models.
1 Introduction
The goal of this work is to extend the standard
Entity-Relationship (ER) data model, as dened in
the database textbooks, with constructs which allow
the modeling of multidimensional aggregated entities
together with their interrelationships with the other
parts of the conceptual schema. An important as-
pect is that a formal model-theoretic semantics is to
be given to the conceptual data model by combin-
ing the well known rst order semantics of standard
ER, as described for example, in (Borgida et al. 2003,
Calvanese et al. 1998)with the model theoretic se-
mantics of the (/T logical multidimensional data
model (Franconi & Kamble 2003, 2004b). This work
is also based on a similar preliminary work done on
the use of Description Logics as a mean to give precise
semantics to a data warehouse conceptual data model
and to study its computational properties (Franconi
& Sattler 1999). This paper presents the formal as-
pects along with well dened model-theoretic syntax
and semantics of the conceptual data model intro-
duced in (Franconi & Kamble 2004a).
The proposed framework is a novel data warehouse
conceptual data model, ((/Tgeneralising concep-
tual multidimensional data models in the data ware-
house eld. The aim of this work is not to propose
yet another data model, but to nd the most gen-
eral, an elegant and precise formalism encompassing
all the proposals, for example, listed in (Phipps &
Davis 2002), for a conceptual data model in the data
warehouse eld, making therefore a possible formal
Copyright c 2008, Australian Computer Society, Inc. This pa-
per appeared at the Fifth Asia-Pacic Conference on Concep-
tual Modelling (APCCM 2008), Wollongong, NSW, Australia,
January 2008. Conferences in Research and Practice in Infor-
mation Technology, Vol. 79. Annika Hinze and Markus Kirch-
berg, Eds. Reproduction for academic, not-for prot purposes
permitted provided this text is included.
comparison of the dierent expressivities of the mod-
els in the literature.
The paper is organised as follows. Section 2 de-
scribes the ((/T model along with the required
extensions to the standard entity-relationships data
model. In Section 3 and Section 4, we present re-
spectively syntax and semantics of the ((/T model,
which are purely based on mathematical theory. Sec-
tion 5 reviews the related literature on multidimen-
sional conceptual models. Section 6 evaluates the
((/T model against the criteria for a good concep-
tual multidimensional model. In Section 6, we present
comparison of the ((/T model with other multidi-
mensional models. Finally, in Section 8, we briey
conclude the paper and outline the future work.
2 The ((/T Data Model
The ((/T model extends ideas of a data warehouse
conceptual data model rst proposed in (Franconi
& Sattler 1999) where aggregations and dimensions
are rst class citizens. It abstracts principles of data
warehouse and describes the multidimensional struc-
ture of the data of a business domain of an enterprise.
A ((/T model is based on an ER model. It
captures database schemata expressed in an entity-
relationship diagram and describes multidimensional
structure including dimensions with their hierarchi-
cally organised levels and the structure of aggrega-
tions. It extends standard ER schema with constructs
of aggregated entities together with their interrela-
tionships with the other parts of the schema. As
stated in (Agrawal et al. 1997), a good data ware-
house system should support user-denable multiple
hierarchies along the arbitrary dimensions. A ((/T
model is able to support user-denable multiple hier-
archies, and is able to express aggregations along the
arbitrary dimensions and levels.
2.1 The ((/T: an Extended Entity-
Relationship Model
This section describes the ((/T data model with
an ER model and presents the ER extensions. It also
presents methodology for data warehouse design from
the standard (operational) ER schema and the struc-
ture of aggregations.
We describe the model with an example of telephone
calls presented in Figure 1 (taken from (Franconi &
Sattler 1999)). Entities Calls, Day and Point, and
relationships such as Date, Dest and Source present
the base data. The cardinality constraints such as
(1,1) on the Date relationship between Calls &
Day entities, and the Source/Dest (Destina-
tion) relationship between Calls & Point entities
express that the calls are issued on some dates from
some source points and receiving at some destination
Calls
duration no of calls
1,1
1,1
1,1
Day
day/month/year
holiday?
weekday
Point
address
code
type
Consumer Business
Cell LandLine DirectLine PABX
Dest
Source
Date
d
d d
Figure 1: The Conceptual Schema for the base data of telephone calls
Calls
duration no of calls
1,1
1,1
1,1
Point Day
Dest
Source
Date
Figure 2: The Conceptual Schema for the Basic Multidimensional information for the base data considered in
Figure 1
points. The conceptual multidimensional data model
for this information (base data) we have obtained,
is exemplied in Figure 2. A basic multidimensional
entity such as Calls described in the diagram of the
gure 2 is using a standard star schemai.e., it is
represented by means of a weak entity with respect
to its dimensions. In this example, this basic mul-
tidimensional entity may be useful for analysing the
nature of telephone calls by considering, among oth-
ers, the dimension related to the origin and the desti-
nation of the calls with respect to the type of phone
point (associated to consumer or business customers).
So, the entity Calls represents a basic cube whose
dimensions are Date, Dest and Source (identifying
relationships) which are restricted to the basic levels
Day, Point, and gain Point (associated entities) re-
spectively. This part of the diagram makes still use
of standard constructs.
Level building: For building the aggregation level
hierarchies for each dimension, we consider the fol-
lowing:
discriminator of an entity (Elmasri & Navathe
2000)
generalisation/specialisation hierarchy: creating
a single entity of all subclasses (possibly disjoint)
of a superclass.
one-to-many relationship
partial relationship
many-to-many relationship: converting into one-
to-many relationships which are then converted
into levels as suggested in (Moody & Kortink
2000).
Taking these constraints into account, Figure 3
presents the multidimensional conceptual schema in-
cluding level hierarchies for each dimension. Outer
boxes indicate levels; and inner boxes are their ele-
ments. The bold arrows (from lower level to higher
level) denote hierarchy. The levels Pointtype and
Customertype are created from the partitions of
Point entity. An entity Point is partitioned (ac-
cording to an attribute typea discriminator (El-
masri & Navathe 2000)) into four basic points and
two higher level points. Pointtype aggregates four
basic point types (partitions) namely, Cell, Land
Line, Direct Line & PABX, and Customertype ag-
gregates higher level two (partitions) points types
(partitions) viz Consumer and Business. Thus,
rst three constraints namely, discriminator, general-
isation/secialisation, one-to-many relationships hold.
Similarly, Date dimension multiple hierarchies (for in-
stance, hierarchy including the levels DAY, Month,
Qtr, Year) are created. The Holiday-Nonholiday level
aggregates all holidays and non-holidays. However,
in this case, Holiday or Nonholiday is an optional
level as the relationship between Day and Holiday or
Nonholiday is partial, hence, partial relationships. In
running example, we do not have many-to-many rela-
tionships, however, handling them is straightforward
as suggested in (Moody & Kortink 2000).
Aggregation:
We now perform the analysis of telephone calls along
the arbitrary dimensions. For example, a query
Analyse telephone calls by day and point type? is a
bi-dimensional cube along the Date and the Source di-
mensions involving the level Day and the level Point-
type respectively. A conceptual schema for this query
includes the denition of the basic cube (Figure 2)
and the denition of the aggregation along the de-
nitions of associated levels, i.e., a new aggregated en-
tity Calls-by-Day-and-Pointtype denoting aggre-
gations according to the basic level Day and the level
Pointtype along the dimensions Date and Source re-
spectively. Figure 4 presents the conceptual schema
Calls
duration no of calls
1,1
1,1
1,1
Day
day/month/year holiday? weekday
d
Holiday-NonHoliday
Holiday NonHoliday
d
Weekday
Mon Tue Wed Thu Fri Sat Sun
d
Year
Yr
1
. . . Yrn
d . . . d
Qtr
Qtr1/Yr
1
. . . Qtr4/Yr
1
. . . Qtr1/Yrn . . . Qtr4/Yrn
d . . . d . . . d . . . d
Month
Jan Feb Mar . . . Oct Nov Dec . . . Jan Feb Mar . . . Oct Nov Dec
d . . . d . . . d . . . d
DAY
01/01 . . . 31/01 . . . 01/12 . . . 31/01 . . . 01/01 . . . 31/01 . . . 01/12 . . . 31/12
Point
address
code
type
d
Customertype
Consumer Business
d d
Pointtype
Cell LandLine DirectLine PABX
Dest
Source
Date
Figure 3: The Multidimensional Conceptual Schema for the data considered in Figure 1.
for this aggregated cube (query) in the variant of an
Entity-Relationship model. This particular way of
presenting aggregation (entity) is adapted from UML
(Unied Modeling Language) syntax.
Now consider a multidimensional aggregated view,
for example, analysis of telephone calls by week
day and customer type, composing telephone calls
along the Date and the Source dimensions involv-
ing levels Weekday and Customertype respectively.
The conceptual schema for this aggregated view in-
cludes the denition of the basic cube and the de-
nition of aggregation, i.e., a new aggregated entity,
say Calls-by-Weekday-and-Customertype (along
with the denitions of the level Weekday and the
level Customertype) denoting aggregations accord-
ing to the level Weekday and the level Customertype
along the Date and the Source dimensions respec-
tively. Figure 5 presents the conceptual schema in the
variant of an ER model. This bi-dimensional aggre-
gated view is actually computed from an aggregated
cube of Figure 4. This indicates that the aggregations
can be computed from pre-computed aggregations.
2.2 Extensions to the ER Model
As described above, a rst extension to the standard
ER Model can be seen with simple aggregated
entitiesi.e., non-dimensional aggregations
Weekday and Customertype, which represent
dimensional levels built from the basic dimensional
entities Day and Point respectively. A simple aggre-
gation aggregates the collection of objects that are in
the extension of the aggregated entities. So, in our
example, since entities Mon,...,Sun form a partition
of the entity Day, the Weekday entity denotes exactly
seven objects, one for all the Mondays, one for all the
Tuesdays, etc. On the other hand, the aggregated
entity Customertype denotes exactly two objects,
one aggregating all customer phone points and the
other aggregating all business phone points. In this,
by interleaving partitioning and simple aggregations,
we are able to construct level hierarchies starting
from some basic dimensional level. Obviously, the
functional dependencies exist among the levels of a
hierarchy, as analysed by (Golfarelli et al. 1998).
A second extension to the standard ER model is
the multidimensional aggregated entity exem-
plied in Figure 5 by the entity Calls-by-Weekday-
and-Customertype and in Figure 4 by the en-
tity Calls-by-Day-and-Pointtype. The entity
Calls-by-Weekday-and-Customertype denotes all
the cells of a cube whose coordinates are the week-
days of the date of the calls, and the customer types
of the originators of the calls. Such an entity (i.e., ex-
tension) holds the necessary constraints enforced for
a cube by the (/T-based semantics (Franconi &
Kamble 2003, 2004b).
A multidimensional aggregated entity is an en-
tity itself in the ER diagram, and it can have
attributes (for instance, total no of calls and aver-
age duration in Figure 5 or Figure 4, and can be
computed with associated aggregation functions, i.e.,
sum(no of calls) and average(duration) respec-
tively) and can be part of further relationships or
constraints.
3 Syntax of the ((/T Data Model
The basic constructs of the ER schema are entities,
relationships. and attributes. Entity is drawn as a
rectangle around the entity symbol (entity name),
whereas relationship between the entities is drawn as
a diamond around the symbol (relationship name).
An attribute is drawn as a circle or oval outside or
around the attribute symbol (attribute name). ER-
roles are the edges (links) between entities and re-
lationships and are labeled with number restrictions
called cardinality constraints. An is-a link constraint
is drawn as an arrow from more specic entity (sub-
class) to more general entity called superclass (respec-
tively from more specic to more general relation-
ship). The disjoint-total constraint is drawn with a
Calls
no of calls duration
1,1
1,1
1,1
Point
address
code
type
d
Consumer Business
d d
Cell LandLine DirectLine PABX
Pointtype
Day
date/month/year holiday? weekday
Calls-by-Day-and-Pointtype
total(no of calls) average(duration)
Dest
Source
Date
Figure 4: A cube composing calls by level Day and level Pointtype along Date and Source dimensions respec-
tively
Calls
no of calls duration
1,1
1,1
1,1
Point
address
code
type
d
Consumer Business
Customertype
Day
date/month/year holiday? weekday
d
Mon Tue Wed Thu Fri Sat Sun
Weekday
Calls-by-Weekday-and-Customertype
average duration total no of calls
Dest
Source
Date
Figure 5: The Conceptual Data Warehouse Schema for multidimensional aggregated view presenting Calls by
Weekday and Customertypean aggregated cube (view)
circle having d inside, connecting subclasses with
the edges and a superclass with a double-lined ar-
row from the circle to a superclass. Weak entities
are represented as double-lined rectangles, whereas
identifying relationships are denoted by double-lined
diamonds. Aggregated entity (simple aggregation) is
drawn as rectangle attached with diamond, whereas
multidimensional aggregated entity is drawn as shad-
owed rectangle attached with diamond.
Formally, the syntax of an Extended Entity-
Relationship (EER) model is as follows.
Denition 1 (EER schema) An EER schema is
constructed over the signature o = < c, , /, |, 1, _
, card, [ [ > where
c is a nite set of entity names,
is a nite set of relationship names, each as-
sociated with an arity k,
/ is a nite set of attribute names,
| is a nite set of ER-role names,
1 is a nite set of domain names,
_ c c is a binary relation over c
and
card is a function such that card(E,R,U) =
< cmin(E, R, U), cmax(U) > N N where
cmin(E,R,U) cmax(E,R,U) for each E c,
R , U |.
[ [ models aggregation over c.
An EER schema E over the signature o is
a nite set of entities E c,
a nite set of relationship constraints R of an
arity k such that R
.
= [U
1
: E
1
, . . . , U
k
: E
k
]
where R , E
i
c and U
i
| for each i,
1 i k,
a nite set of attribute constraints A
i
/ such
that E
.
= A
1
: V
1
, . . . , A
n
: V
n
where A
i
/,
V
i
1, E c is in E for each i, 1 i n,
a nite set of is-a link constraints between two
entities E
1
and E
2
such that E
1
_ E
2
, (respec-
tively between two relationships R, S such that
R _ S),
a nite set of disjoint-total constraints between
more specic entities E
1
. . . E
n
and a more gen-
eral entity E such that E
1
_ E, . . . , E
n
_ E
where E
i
,= E
j
for i ,= j, i n; j n, E, E
k
c
for each k, k = 1, . . . , n
a nite set of aggregation links between entities,
each aggregation link in E is drawn as an edge
with attached diamond at one end
a nite set of simple aggregations G c involving
n entities F
1
, . . . , F
n
(each being connected with
an aggregation link to G)
a nite set of aggregations G involving (con-
necting) n relations D
1
,. . . ,D
n
and n entities
L
1
,. . . ,L
n
and a weak entity F.
Before giving the formal semantics of the EER model,
we describe intuitively the components of the EER
Schema. An entity denotes a set of objects called
instances that have common properties. The elemen-
tary properties are modelled with attributes whose
values belong to one of several predened domains
such as Integer, Real, String, or Boolean. The prop-
erties that are due to relations to other entities are
modelled through the participation of the entities in
the relationships. A relationship denotes a set of tu-
ples called its instances, each of which represents an
association among dierent combination of instances
of the entities that participate in the relationship.
Each entity can participate in a relationship more
than once. Such participation is represented by an
ER-role. Each ER-role is assigned a unique name.
Number of ER-roles associated to a relationship is
called the arity of that relationship. The cardinal-
ity constraints (number restrictions) are associated
to ER-role in order to restrict the number of times
participation of each instance of an entity via that
ER-role in instances of the relationship. The min-
imum cardinality is either 0 (zero) or 1 (one) and
maximum cardinality is either 1 or . An is-a link
is modelled by _ and is used to denote the inclusion
between two entities (respectively between two rela-
tionships) and therefore more specic entity (respec-
tively relationship) inherits properties of more general
entity (respectively relationship).
Weak entity is a dependent entity which is identi-
ed by considering the primary keys of participation
of other entities via relationships (called weak rela-
tionships) to which it is connected via ER-roles, each
having minimum and maximum cardinalities equal to
1. Each instance of weak entity is a composition of
instances of participating entities (one instance per
entity).
A fact is represented as weak entity (aggregated
fact as aggregated weak entity). The dimensions are
represented as relationships (weak relationships) and
levels are represented as entities (also including aggre-
gated entities). An entity (level) directly connected
to a (weak) relationship (dimension) is called a ba-
sic level for that dimension. A roll-up link between
two levels (entities) is modeled by a roll-up function
which maps lower level elements (instances) to higher
level elements (instances). Simple aggregation is rep-
resented by an aggregated entity which is a composi-
tion of entities (to which it is connected with lines).
i.e., Simple aggregation involves a nite set of enti-
ties on which it is based on. An n-dimensional ag-
gregation is represented by an aggregated weak en-
tity connecting n relationships (dimensions), n enti-
ties (levels) by connecting them via n circles (one per
relationship and per entity), and a weak entity (fact
on which aggregation is based on) connecting to it
(with line), i.e., an n-dimensional aggregation is rep-
resented by an aggregation (aggregated weak entity)
involving n dimensions (each being a relationship), n
levels (each being an entity or aggregated entity) and
a fact (weak entity) on which aggregation is based
on. That is an n-dimensional aggregation involves n
dimensions, n levels (one per dimension) and a fact
it is based on. Each instance of n-dimensional ag-
gregation is called a cell which is a composition of n
elements (instances) of n levels (one element/instance
per level) of n dimensions involved in the aggregation.
In rest of the paper, we will many times use only ag-
gregation to refer multidimensional or n-dimensional
aggregation.
4 Semantics of the ((/T Data Model
The semantics of an EER Schema is given in terms
of legal data warehouse states, i.e., data warehouses
which conform to the constraints imposed by the
schema. We consider as a starting point the ER
semantics introduced in (Calvanese et al. 1998), re-
casted to cope with multidimensional information.
For we consider (/T, the logical multidimensional
data model introduced in (Franconi & Kamble 2003).
(/T abstracts notions such as levels, multiple
level hierarchies, dimensions, facts, cells, aggregation,
cube, coordinates and measures. A central element
in (/T is a cube. A cube dened on all specied
dimensions with their basic levels is called a basic
cube, otherwise, it is called an aggregated cube. A
cube is computed from a cube. An aggregated cube
is computed from a cube on which the aggregation
is based on. The (/T introduces a notion of data
warehouse state. A data warehouse state is a collec-
tion of cells (with their dimensions and measures). A
data warehouse state is legal if it satises the above
cube conditions.
Denition 2 (EER Semantics) A data warehouse
state I = < , ,
I
> over the signature <
c, , /, |, 1, _, card, [ [ > with respect to the EER
schema E is constituted by
a nonempty nite set assumed to be dierent
from all domains,
a nite set of (concrete) domains

I
an interpretation function such that
V
I
for each V 1, where V
I
is
disjoint from any other W
I
such that W 1
E
I
for each E c, where E
I
is
disjoint from any other E
I
such that E c
A
I
V
I
for each A /, and for
some V 1
R
I
. . . =
k
for each k-ary
relationship R such that a tuple r
R
I
is of the form [U
1
:e
1
,. . . ,U
k
:e
k
], where
e
i
E
I
i
, for each i 1, . . . , k.
A tuple r R
I
over can be viewed as a function
that maps each ER-role U
i
to e
i
E
i
and is denoted
by [U
1
:e
1
,. . . ,U
k
:e
k
], i.e., r[U
i
] = e
i
E
i
for each
i, i = 1, . . . , k.
The elements of E
I
, A
I
, and R
I
are called instances
of E, A, and R, respectively.
A data warehouse state I = < , ,
I
> is said to be
legal for an EER schema E, if it satises the follow-
ing:
E
I
1
E
I
2
for each is-a link in E between two
entities E
1
, E
2
in E such that E
1
_ E
2
Similarly R
I
1
R
I
2
for each is-a link between
relationships R
1
, R
2
i E such that R
1
_ R
2
A
I
(e) V
I
for each e E
I
, where A / is an
attribute of E with domain V 1.
Similarly, A
I
(r) V
I
for each r R
I
, where
A / is an attribute of R with domain V 1
R
I
E
I
1
. . . E
I
k
for each relationship R in
E connected to entities E
1
,. . . ,E
k
in E
cmin(E, R, U) #r R
I
[ r[U] = e
cmax(E, R, U)
for each U |, associated to R and
E c in E, for each e E
I
, and cardi-
nality constraint card(U) = (min, max) asso-
ciated with ER-role U where cmin(E, R, U) =
min and cmin(E, R, U) = max
for each disjoint-total construct in E where E is
a superclass and E
1
,. . . ,E
n
are subclasses (parti-
tions), the following must hold:
E
I
i
E
I
for each i = 1, . . . , n and
E
I
i
E
I
j
= for each i ,= j, and
E
I
E
I
1
. . . E
I
n
for two connected levels L
i
, L
j
(each one being
an entity or simple aggregation) in E there must
be a (possibly partial) roll-up function
Li,Lj
such
that

L
i
,Lj
(x) = y for each x L
I
i
and y L
I
j
L
i
,
L
j
c,
We dene reexive transitive closure of roll-up
function

Li,Lj
(from L
i
to any higher level L
j
if
there is a level L
k
along the path between L
i
and
L
j
)
inductively as follows:

Li,Li
= id

Li,Lj
=

k

Li,L
k

L
k
,Lj
for each k such
that L
i
L
k
where
(
L
p
,Lq

L
r
,Ls
)(x) = y
i


Lp,Lq
(x) =
Lr,Ls
(x) = y, or

Lp,Lq
(x) = y and
Lr,Ls
(x) = , or

L
p
,Lq
(x) = and
L
r
,Ls
(x) = y
for each fact F T (being a weak entity) in E
with p dimensions D
1
, . . . , D
p
(each one being an
identifying relationship) and corresponding p lev-
els L
1
, . . . , L
p
(each one being an entity or sim-
ple aggregation) in E for i = 1, . . . , p, M
j
/,
V
j
1 for j = 1, . . . , m,
the following holds ((/T cube conditions):
1. f. F(f) l
1
, . . . , l
n
. D
1
(f) = l
1
L
1
(l
1
)
. . . D
n
(f) = l
n
L
n
(l
n
)
2. f, f

, l
1
, . . . , l
n
. F(f) F(f

)
D
1
(f) = l
1
D
1
(f

) = l
1
. . .
D
n
(f) = l
n
D
n
(f

) = l
n
f = f

for each aggregation G in E involving n dimen-


sions D
1
, . . . , D
n
and n levels R
1
, . . . , R
n
(one
per dimension) and a fact F:
G
.
= F D
1
[
R1
, . . . , D
n
[
Rn

where F
.
= E D
1
[
L
1
, . . . , D
p
[
L
p
such that
n p
the following must hold:
g. G
I
(g)
g = [f [ F
I
(f)

h=1,...,p
(

L
h
,R
h
(D
I
h
(f)) = D
I
h
(g))[
for each n p, [ [ denotes aggregation.
Each aggregated cell is an aggregation of cells whose
coordinates roll-up to the coordinates associated with
an aggregated cell on which it is based on.
Thus, a particular EER diagram denotes a set of data
warehouse states. According to (/T, a particular
EER schema is a set of legal data warehouse states,
if they (data warehouse states) satisfy the cube (to-
gether with the aggregated cube) conditions imposed
by the (/T schema, i.e., the set of all possible data
warehouse states which conformto the constraints im-
posed by the (/T schema, conform to the diagram
it self i.e., they are legal data warehouse states. If a
diagram is inconsistent, then no data warehouse may
conform to it.
5 Related Work
Several proposals on a conceptual model exist in the
data warehouse eld. The only proposals by (Gol-
farelli et al. 1998, Sapia et al. 1998, Tryfona et al.
1999, Husemann et al. 2000, Zepeda & Celma 2006)
address the conceptual model in a real fashion. The
proposals by (Perez et al. 2005, Berenguer et al. 2005)
address UML model (Perez et al. 2005) based on
star schema, and propose the quality indicator met-
rics (Berenguer et al. 2005) for the conceptual model,
although their work is based on UML modeling.
In (Golfarelli et al. 1998), a Dimensional Fact
Model (DFM) is constructed from an operational ER
schema based on requirement analysis. The con-
struction methodology is well dened. It is rela-
tional and is based on star schema. DFM does not
support generalisation/speciaisation hierarchies and
many-to-many relationships. In the similar manner,
Zepeda and Celema (Zepeda & Celma 2006) pre-
sented a Model Driven Architecture (MDA) for pro-
ducing candidate multidimensional schemas from op-
erational ER schema based on requirement analysis.
Each of the candidate schema is based on star schema.
However, a model supports generalisation hierarchies
and many-to-many relationships. A mapping is pre-
sented for transformation of candidate (multidimen-
sional) ER schema to cube, dimensions, levels, and
measures. In both DFM and MDA, no aggregation is
dened at conceptual schema level.
In (Kimball 1997, 1996), a multidimensional mod-
eling manifesto using multidimensional view of enter-
prise data has been proposed; it is a relational im-
plementation in the form of star schema. This ap-
proach is not conceptual in the sense that it is not
independent of the implementation.
A multidimensional conceptual model called Mul-
tiDimER model based on ER model has been pro-
posed in (Malinowski & Zimanyi 2006). The model
is based on star and snowake schema. The features
such as generalisation/specialisation hierarchies, com-
posite attributes, aggregations, etc have not been con-
sidered in this model. The model is well dened. It
is based on the ER model and its logical representa-
tions. A conceptual model proposed by (Abello et al.
2006) is based on UML and its extensions, empha-
sizing on part-hole relationships for aggregation but
does not support aggregations at the schema level.
None of these proposals addresses conceptual
structure of aggregation. They only derive basic
multidimensional schema from the given ER schema.
Moreover, all these models need to specify design
methodology such as information analysis, require-
ment analysis and specications, etc (Golfarelli et al.
1998, Husemann et al. 2000) manually. The only pro-
posal by (Franconi & Sattler 1999) for data ware-
house conceptual model presents the structure of
multidimensional aggregation; and it automates the
construction of multidimensional conceptual schema
from an ER diagram. The ((/T model is purely
conceptual and addresses all the issues from data
warehouse construction to aggregations and view
management. The ((/T takes care of all con-
straints of the standard ER model in addition to mul-
tidimensional constraints. This shows ((/T is syn-
tactically and semantically richer than the other mod-
els.
6 Evaluation of the ((/T model
In this section, we evaluate the ((/T data model
according to certain criteria found for a multidimen-
sional conceptual model in the literature. We also
compare ((/T with other models. For evalua-
tion, we consider some criteria listed in (Blaschka
et al. 1998), nine requirements introduced in (Peder-
sen & Jensen 1999), and several requirements found
in (Abello et al. 2001) for a data warehouse mul-
tidimensional model. We also consider some addi-
tional requirements which are also important for a
data warehouse multidimensional conceptual model.
All these requirements are randomly listed below.
1. Implementation independent (Blaschka et al.
1998):
2. Explicit Separation of Structure and Con-
tents (Blaschka et al. 1998):
3. Explicit hierarchies (Blaschka et al. 1998, Peder-
sen & Jensen 1999): A model should support the
explicit hierarchy in the dimension.
4. Symmetric treatment for dimensions and mea-
sures (Blaschka et al. 1998, Pedersen & Jensen
1999): A model should allow measures to be
treated as dimensions and vice versa.
5. Multiple hierarchies in dimension (Pedersen &
Jensen 1999):
6. Dimension/level attributes (Abello et al. 2001):
A model should specify the attributes that do
not dene hierarchies.
7. Support for aggregation (Pedersen & Jensen
1999): A model should be able to provide mean-
ingful aggregations.
8. Complex Measures (Blaschka et al. 1998): A
model should support multiple and complex mea-
sures for the same fact (cube).
9. Handling dierent levels of granularity (Pedersen
& Jensen 1999):
10. Support for non-onto hierarchies (Pedersen &
Jensen 1999): A model should support non-onto
(unbalanced) hierarchies, i.e., hierarchies with
paths of dierent lengths.
11. Support for non-strict hierarchies (Pedersen &
Jensen 1999): A model should support non-strict
hierarchies.
12. Support for many-to-many relationships (Peder-
sen & Jensen 1999):
13. Generalisation/specialisation hierarchies (Abello
et al. 2001): A model should support genenalisa-
tion/specialisation (is-a) relationships.
14. Handling change over time (Pedersen & Jensen
1999):
15. Handling uncertainty
16. Multi-cube/fact schema (Abello et al. 2001): A
model should support multiple cubes/facts in
schema.
Since in ((/T one cube/fact is based on an-
other, it (((/T) allows
17. Summarisability: A model should support sum-
marisation (Lez & Shoshani 1997).
18. User dened Aggregation functions (Abello et al.
2001): A model should support user dened ag-
gregation functions.
19. Drill-across (Abello et al. 2001): A model should
allow to drill-across (sharing dimensions).
20. Dimensionless aggregation
21. Measureless aggregation
22. Aggregation from aggregation (view over view)
The ((/T model fullls all the above require-
ments 122 except requirements 4, 12 and 14. Re-
quirements 4 and 12 are partially supported, how-
ever, requirement 4 can be fully supported if the
measure is used as coordinate. The requirement 14
is not supported as no syntactical provision is made
for changing dimensions/levels in ((/T. In addi-
tion, the ((/T model supports aggregations at any
higher level (ignoring intermediate levels). This is one
of the important characteristics of the ((/T model.
7 Comparison of ((/T with other models
In this section, we evaluate other models against the
same requirements listed in Section 6, and compare
them with the ((/T model which is already evalu-
ated in Section 6.
We consider the models proposed in (Golfarelli
et al. 1998, Sapia et al. 1998, Tryfona et al. 1999,
Husemann et al. 2000) for comparison as they are
conceptual models in true sense. However, models
proposed in (Tsois et al. 2001, Abello et al. 2001, Pei
2003, Jensen et al. 2004), and (Trujillo et al. 2001) (an
object oriented model extension of (Trujillo et al.
2000)) are also taken into consideration for compari-
son because they are current state-of-the-art models,
and are also conceptual in some way or other. Ta-
ble in Figure 6 presents summary of comparison of
these models. As before, if a model meets a particu-
lar requirement/feature/functionality fully, then it is
denoted by

. If a model supports the requirement


partially then it denoted by p, and if a model does
not support it at all then it is denoted by x.
Requirement 1 (Implementation independent) is par-
tially supported by a model of (Abello et al. 2001),
since it is a multi-star schema based on concepts
of star model. Remaining models (Kimball 1996,
Golfarelli et al. 1998, Jensen et al. 2004, Pei 2003),
can not be considered implementation independent
as they are either relational (Kimball 1996) or based
on star schema ((Golfarelli et al. 1998)), a relational
model or designed for a specic domain (Jensen et al.
2004, Trujillo et al. 2001, Pei 2003) implementation,
for example, clinical domain (Jensen et al. 2004), and
thus provide no support.
Requirement 2 (Explicit Separation of Structure
and Contents), requirement 3 (Explicit hierarchies),
requirement 5 (Multiple hierarchies) and require-
ment 8 (Complex measures) are met by all of the
models (except star schema (Kimball 1996) for re-
quirement 3), thus provide full support. Star schema
does not support explicit hierarchy, and consequently
does not support the requirement 3.
Requirement 6 (Dimension/level attributes). Only
one model (Jensen et al. 2004) does specify the at-
tributes that do not dene hierarchies, hence pro-
vides no support. Remaining models specify the non-
dimension attributes, thus provide full support.
Requirement 4 (Symmetric treatment for dimensions
and measures), only one model (Jensen et al. 2004)
captures this feature by means of derivation mech-
anism, thus providing full support. Some mod-
els (Abello et al. 2001, Tryfona et al. 1999) do not
consider measure explicitly as dimensions but condi-
tionally if the measure is used to identify a cell, thus
providing partial support. Remaining models (Gol-
farelli et al. 1998, Sapia et al. 1998, Husemann et al.
2000, Tsois et al. 2001, Trujillo et al. 2001, Pei 2003),
do not consider this feature in the framework and thus
provide no support.
Requirement 7 (Support for correct aggregation) is
met by a very few models (Abello et al. 2001, Jensen
et al. 2004) either by derivation through some op-
erations (Abello et al. 2001) or by restricting hierar-
chies to strict, covering and onto through some deriva-
tions so that data will not be double counted, thus
provide full support. Only one model (Tsois et al.
2001) does not support this requirement because of
including many-to-many relationships between facts
and dimensions and among the hierarchies. Remain-
ing models partially support this feature either by re-
stricting the dimension/hierarchies and aggregation
functions (Golfarelli et al. 1998) or hierarchies to
strict, onto and covering, and restricting aggregation
functions (Sapia et al. 1998, Husemann et al. 2000,
Tsois et al. 2001, Trujillo et al. 2001, Pei 2003).
Requirement 9 (dierent levels of granularity) A very
few models (Tsois et al. 2001, Abello et al. 2001)
captures dierent levels of granularity (i.e., measures
at dierent levels of granularity) either by aggrega-
tion (Tsois et al. 2001) or by specialising the cells
depending on whether the measure is derived or not.
Only one model (Jensen et al. 2004) captures this fea-
ture partially through some derivation. Rest of the
models do not capture this feature, and hence pro-
vide no support.
For Requirement 10 (Support for non-onto hierar-
chies), only three models (Trujillo et al. 2001, Tsois
et al. 2001, Jensen et al. 2004) fully support this fea-
tures. Remaining models provide no support.
Requirement 11 (Support for non-strict hierarchies)
is fully supported by only two models (Jensen et al.
2004, Tsois et al. 2001). Remaining models provide
no support.
Requirement 12 (Support for many-to-many relation-
ships). A model of (Tsois et al. 2001) supports many-
to-many relationships between facts and dimensions
but does not support many-to-many relationships be-
tween hierarchies. Only four models (Tryfona et al.
1999, Trujillo et al. 2001, Abello et al. 2001, Jensen
et al. 2004) of the other models support both many-
to-many relationships between facts and dimensions
and between hierarchies. Rests do not support many-
to-many relationships.
Requirement 13 (Generalisation/specialisation hier-
archies). Support provided by (Sapia et al. 1998,
Tryfona et al. 1999, Abello et al. 2001, Trujillo et al.
2001, Jensen et al. 2004) is considered partial, because
generalisation/specialisation is considered in the hier-
archy but is rather kept to distinguish the contents,
thus providing partial support. Remaining models do
not support this feature.
Requirement 14 (Change over time in data) and Re-
quirement 15 (Uncertainty in data) are supported by
a model (Jensen et al. 2004) by attaching a time tag
attribute to dimension values for probabilistic mea-
surements of occurrences of facts and dimension val-
ues. A model of (Abello et al. 2001) supports require-
ment 14 only by changing the schema with appropri-
ate time tags, but does not support requirement 15.
Remaining models do not support both features.
Requirement 16 (Multi-cube/fact schema). Only one
model (Abello et al. 2001) allows multiple stars in a
single schema. However, which dimensions/levels be-
long to which star schema are not clearly reected in
this model in any way, thus providing partial support.
Remaining models do not meet this feature, thus pro-
vide no support.
Requirement 17 (Summarisability). The only mod-
els such as (Abello et al. 2001) and (Jensen et al.
2004) provide full support, either by applying some
algebraic operations to make part-whole relationships
between levels and then applying aggregation func-
tions (Abello et al. 2001) or by computing weight-
ing factor between facts and dimensions and makes
full covering relationship between levels in the ag-
gregation path (Jensen et al. 2004). Some models
such as (Trujillo & Palomar 1998, Trujillo et al. 2000,
2001), (Golfarelli et al. 1998), and (Tryfona et al.
1999) specify possible functions that can be applied
in order to support summarisation, but do not pro-
vide with a specic operation application, thus pro-
vide only partial support. Remaining models do not
support summarisation.
Requirement 18 (user dened aggregation function).
The only model of (Abello et al. 2001) supports this
feature since it based on UML which allows user de-
ned operations. Rest of the models provide no sup-
port.
Requirement 19 (Drill-across). The only model of
(Abello et al. 2001) allows drill-across because of shar-
ing multi-star dimensions, thus, providing full sup-
port. Some models such as Star (Kimball 1996) (con-
stellation) and DFM (Golfarelli et al. 1998) share di-
mensions but limit drilling mechanism and hence pro-
vide partial support. Remaining models provides no
support.
Requirement 20 (Dimensionless aggregation) and Re-
quirement 22 (aggregation from aggregations) are
not met by any of the other models. Require-
ment 21 (measureless aggregation) is partially sup-
ported by only two models star (Kimball 1996) and
DFM (Golfarelli et al. 1998) by exploring the possi-
bility of having factless (measureless) fact but they
do not address or reect aggregation in any way.
Model Requiremets/Criteria/Features/Functionalities
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
star x p x p

p

x x x x x p x x p p x p x p x
DFM

x

p

x p x x x x x p p x p x p x
ME/R x

x

x

x x x x p x x p x x x x x x
starER x

p

p

x x x

p x x x p x x x x x
DWPM x x

x

p

x p x x p x x x p x x x x x
OOCM x

x

p

x

x

p x p p x x x x x x
MAC x

x

x

x x x x x x x x x x
YAM

p

x x

p p x

x x x
GOLAP x

x

p

x p x

x x x x p x x x x x
EMDM x

p

p

x

x p x x x x x
CGMD

p

p

x

Figure 6: Comparison between ((/T and other Multidimensional Models
8 Conclusions and Future work
There are several proposals on multidimensional mod-
eling and data warehouse design. So far, there is no
consensus on modeling and design method yet (Rizzi
& Abello 2006). The ((/T data model gives uni-
form way of modeling multidimensional concepts,
data warehouse design and aggregations. Thus, gen-
eralising the design of data warehouse and provid-
ing the uniform way for view management. It is a
framework where to translate and compare concep-
tual properties and expressive power of dierent ex-
pressivities of the models and related extensions. The
((/T model is to help cost eective design of the
data warehouse, update propagation and view man-
agement of multidimensional data. Our future work is
based translation of ((/T into Description Logica
language for reasoning mechanism.
Acknowledgements
The major portion of this work was supported by
School of Computer Science, University of Manch-
ester, United Kingdom and Free University of
Bolzano, Italy.
References
Abello, A., Samos, J. & Saltor, F. (2001), Yam (yet
another multidimensional): An extension of uml,
in Techniocal Report LSI-01-43-R of Department
de Llenguates i Sistems Informatics.
Abello, A., Samos, J. & Saltor, F. (2006), Yam2: A
multidimensional conceptual model extending uml,
in Information Systems, 31(6), pp. 541567.
Agrawal, R., Gupta, A. & Sarawagi, S. (1997), Model-
ing multidimensional databases, in Proc. of ICDE-
97.
Berenguer, G., Romero, R., Trujillo, J., Serrano, M.
& Piattini, M. (2005), A set of quality indicator and
thier corresponding metrics for conceptual models
for data warehouses, in Proc. International Con-
ference on Data Warehousing and Knowledge Dis-
covery (DaWak05), pp. 95104.
Blaschka, M., Sapia, C., Hoing, G. & Dinter, B.
(1998), Finding your way through multidimen-
sional data models, in Proc. 9th International
Workshop on Database and Expert Systems Ap-
plications (DEXA98), pp. 198203.
Borgida, A., Lenzerini, M. & Rosati, R. (2003), De-
scription logics for databases., in F. Baader, D. Cal-
vanese, D. McGuinness, D. Nardi & P. Patel-
Schneider, eds, Description Logic Handbook,
Cambridge University Press, chapter 16, pp. 462
484.
Calvanese, D., Lenzerini, M. & Nardi, D. (1998), De-
scription logics for conceptual data modeling, in
Chomicki, Jan and Saake, Gunter, editors 1998,
Logics for Databases and Information Systems.
Kluwer, pp. 229264.
Elmasri, R. & Navathe, S., eds (2000), Fundamen-
tals of Database Systems, Third Edition, Addison-
Wesley.
Franconi, E. & Kamble, A. (2003), The (/T data
model for multidimensional information: a brief in-
troduction, in Proc. of 5th International Confer-
ence on Data Warehousing and Knowledge Discov-
ery (DaWak-03), pp. 5565.
Franconi, E. & Kamble, A. (2004a), Data warehouse
conceptual data model, in Proc. of 16th Interna-
tional International Conference on Scientic and
Statistical Database Management (SSDBM).
Franconi, E. & Kamble, A. (2004b), The (/T
data model and algebra for multidimensional in-
formation, in Proc. of 16th International Con-
ference on Advanced Information Systems Engi-
neering (CAiSE 2004), Riga, Latvia, June 7-11,
pp. 446462.
Franconi, E. & Sattler, U. (1999), A data warehouse
conceptual data model for multidimensional aggre-
gation, in Proc. of the Workshop on Design and
Management of Data Warehouses (DMDW-99),
pp. 1311310.
Golfarelli, M., Maio, D. & Rizzi, S. (1998), The di-
mensional fact model: a conceptual model for data
warehouses, IJCIS 7(2-3), 215247.
Husemann, B., Lechtenborger, J. & Vossen, G.
(2000), Conceptual data warehouse modeling, in
Proc. of the International Workshop on Design and
Management of Data Warehouses (DMDW2000),
Stockholm, Sweden, June 5-6, pp. 61611.
Jensen, C., Kligys, A., Pedersen, T. & Timko,
I. (2004), Multidimensional data modeling for
location-based services, in The VLDB Journal
Vol.13, pp. 121.
Kimball, R. (1996), The Data Warehouse Toolkit,
John Wiley & Sons, USA.
Kimball, R. (1997), A dimensional modeling mani-
festo, DBMS Magazine, August 10(9), 5870.
Lez, H. J. & Shoshani, A. (1997), Summarizabilty in
olap and statistical databases, in Proc. 9 th In-
ternational Conference on Scientic and Statistical
Database Management (SSDBM).
Malinowski, E. & Zimanyi, E. (2006), Hierrachies in
a multidimensional model: From conceptual mod-
eling to logical representation, in Data and Knowl-
edge Engineering, In press.
Moody, D. & Kortink, M. (2000), From enterpise
models to dimensional models: A methodology
for data warehouse and data mart design, in
Proc. of the International Workshop on Design and
Management of Data Warehouses (DMDW2000),
Stockholm, Sweden, June 5-6, pp. 51512.
Pedersen, T. & Jensen, C. (1999), Multidimensional
data modeling for complex data, in Proc. of 15th
IEEE Intenational Conference on Data Enginerring
(ICDE-99), pp. 336345.
Pei, J. (2003), A general model for online analytical
processing of complex data design and evolution,
in Proc. of 22nd International Conference on Con-
ceptual Modeling (ER2003).
Perez, J., Berlanga, R. & Pedersen, T. (2005), A
relevence-extended multi-dimensional model for a
data warehouse contextualized with documents, in
Proc. 8th ACM International Workshop on Data
Warehousing and OLAP (DOLAP05), pp. 1928.
Phipps, C. & Davis, K. C. (2002), Automating
data warehouse conceptual schema design and evo-
lution, in Proc. of the International Workshop
on Design and Management of Data Warehouses
(DMDW2002), pp. 2332.
Rizzi, S. & Abello, A. (2006), Research in data ware-
house modeling and design: Dead or alive?, in
Proc. 9th ACM International Workshop on Data
Warehousing and OLAP (DOLAP06), pp. 310.
Sapia, C., Blaschka, M., Hoing, G. & Dinter, B.
(1998), Extending the er model for the multidi-
mensional paradigm, in Proc. of ER Workshop,
pp. 105116.
Trujillo, J. & Palomar, M. (1998), An object ori-
ented approach to multidimensional databases con-
ceptual modelling, in Proc. of First International
ACM Workshop Data Warehousing and OLAP,
pp. 1621.
Trujillo, J., Palomar, M. & Gomez, J. (2000), Ap-
plying object oriented conceptual modeling tech-
niqyes to the design of multidimensional databases
and olap applications, in Proc. of the First In-
ternational Conference on Web-Age Information
Management (WAIM2000), Sanghai, China, June,
pp. 8394.
Trujillo, J., Palomar, M., Gomez, J. & Song, I.
(2001), Designing data warehouses with oo con-
ceptual models, in IEEE Computer Vol.34(12),
pp. 6675.
Tryfona, N., Busborg, F. & Christiansen, J. (1999),
starer: A conceptual model for data warehouse de-
sign, in Proc. of ACM Second International Work-
shop on Data Warehousing and OLAP (DOAP99),
Kansas City, Missouri, USA, November, pp. 38.
Tsois, A., Karayiannidis, N. & Sellis, T. (2001),
MAC: Conceptual data modelling for OLAP, in
Proc. of the International Workshop on Design and
Management of Data Warehouses (DMDW-2001),
pp. 51513.
Zepeda, L. & Celma, M. (2006), A model driven ap-
proach for data warehouse conceptual design, in
Proc. 7th IEEE International Baltic Conference
on Databases and Information Systems (DBIS06),
pp. 114121.

You might also like