2010ccsbe10 6
2010ccsbe10 6
2010ccsbe10 6
net/publication/233813118
CITATIONS READS
2 3,360
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Dora Maria Simões on 23 May 2014.
Dora Simões
Higher Institute of Accounting and Administration, Unit of Research in Governance,
Competitiveness and Public Policy, University of Aveiro, Campus Universitário de Santiago, 3810-
193
Aveiro, Portugal
[email protected]
Author Biography: Dora Simões received her PhD in Informatics Engineering from the Faculty of
Engineering of the University of Porto, Portugal, in 2008. She is an Adjunct Professor at the Higher
Institute of Accounting and Administration – University of Aveiro (ISCA-UA), Portugal. Her
current research interests include data and information management, collaborative networks and
business intelligence.
Abstract (English): Currently, Data Warehouse systems are a powerful technological tool for
managing business information in order to support the processes of decision making at the strategic
level more effectively. To better understand the extent to which these systems are or not
contributing to the success of business initiatives, the author has developed an extensive study on
the implementation of these systems in wider business organizations. Thus, the first purpose of the
present paper is to report the results of a review of the state-of-the-art on the architectures of the
most used Enterprise Data Warehouse systems, and the level of success achieved against the
objectives established beforehand. This work also addresses the methodologies followed in the
implementation of the various architectures. The comparison between the different architectures
used in most business environments is also made under the two most relevant dimensions for the
business stakeholders: time and cost. It is a major objective of the author to contribute in this paper
with a conceptual framework aimed at guiding the business actors in the selection of the
architecture that best fits their organization, considering the hierarchical level of organization that is
supposed to support the Data Warehouse, in which the domain is applied and some considerations
about the profile of users in each infrastructure. It is also an objective of this study, to indicate a set
of metrics to evaluate the performance of an Enterprise Data Warehouse as well as to establish
guidelines on how to proceed to make that assessment.
Abstract (French): Actuellement, les systèmes de Date Warehouse sont des outils technologiques
puissants pour une gestion de l’information des affaires, qui ont l’objectif de supporter plus
effectivement les méthodes à suivre pour une prise de décision, au niveau stratégique. Pour mieux
comprendre jusqu’où ces systèmes contribuent pour un succès des initiatives d’affaires ou pas,
l’auteur a fait de longues études sur l’application de ces systèmes, à tous les niveaux des
organisations d’affaires. Ainsi, le premier objectif, de cette communication, à présenter, est de faire
un compte-rendu des résultats de la révision de l’état de l’art à propos des architectures d’Enterprise
Date Warehouses les plus utilisées et du niveau de succès obtenu en relation aux objectifs définis au
départ. Dans ce travail, on aborde aussi les méthodologies suivies lors de l’implantation des
diverses architectures. D’autre part, on réalise une analyse des potentiels fournisseurs de software
de ces types de systèmes, en accord avec les supports donnés: technologique, infrastructures ou
consultatif. La comparaison entre les différentes architectures les plus utilisées au niveau des
entreprises est aussi faite aux deux niveaux les plus importants pour les agents des entreprises:
temps et coût. Le principal objectif de l’auteur est de contribuer, dans cette communication, avec un
cadre conceptuel qui oriente les agents des entreprises dans la sélection de l’architecture la mieux
adaptée à son cas organisationnel, en considérant le niveau hiérarchique organisationnel qui peut
être supporter par Date Warehouse, le domaine où il sera appliqué et quelques considérations à
propos do profil des utilisateurs dans chaque infrastructure. Un autre objectif de ce travail est, non
seulement, d’indiquer un ensemble de métriques pour l’évaluation du parcours d’un Agent
d’Entreprises Date Warehouse mais aussi, de fournir des trajets d’orientation pour la réalisation de
cette évaluation.
The enterprises are a key element of most economic systems. Throughout time business was
often undervalued based on assumptions made, such as the existence of valuable information1 for
increasingly agreed that the information is the most precious asset of any enterprise. In the current
information has an increasingly important role in the processes of decision making, carried out by
The concept of entrepreneurship is related with various aspects that are linked to business
management, particularly regarding the aspect of creating new and innovative businesses. The
assessment of entrepreneurial activity and of its level of success in achieving its goals requires the
consideration of multiple influencing aspects. One of these is undoubtedly the way the entrepreneur
manages (collects, organizes, stores, updates and disseminates) the business information.
Most organizations recognize that a solid Data Warehouse system can be the basis for
sustaining competitive advantage in the long term. However, in reality it seems that achieving the
desired Data Warehouse system is not an easy task. The building of a Data Warehouse system
involves considerable human and financial resources, which typically extend to a long period of
time.
The author developed a study to review the state-of-the-art in order to identify major problems,
but also best practices that have dictated the success or failure of programs of implementation of
Data Warehouse systems. In this sense, the basic architectures, the most followed methodologies
1
In Stair and Reynolds (2008) valuable information understands the following characteristics: accessible, accurate, complete,
choices for success or failure of the Data Warehouse programs, having in mind especially two
The result of this work culminates in the proposal of a conceptual framework that serves as a
guideline for organizations that are programming to implement a Data Warehouse system. Some
metrics for further evaluation of the performance of the program implemented are also presented.
It is important to clarify that in the vision of the author, it is a program and not a project of
implementation of a Data Warehouse system. This view is based on the fact that a program refers to
a set of planned activities for a given period of time, but supposed to be continuously reviewed. In
turn, when talking about a project, we refer to a plan to perform an act which has a well-defined
time. For this reason, this paper refers, in all its extension, to the implementation of a Data
This paper is structured as follows: first a review of the literature on the issue in question is
presented. In the following section, the conceptual framework proposed is described. Finally, some
light is shed on this work, especially the advantages brought by the research undertaken.
Literature Review
This literature review focuses on three key points, with regard to issues related to a program of
implementation of a Data Warehouse system: the known architectures, the methodologies followed
and the factors to consider when choosing the software. Attention is also given to references of the
best practices that can ensure greater success; or situations to avoid in order to minimize the risk of
failure.
Since this study aimed to identify not only the possible architectures but also the methodologies
distinguish these two concepts. In the author’s opinion, the architecture identifies the parts
(components), their characteristics, and establishes the relationships between the parties. The
methodology is defined as the identification of a set of activities (process) and their sequence in
order to lead to the ultimate goal. Thus, this section is divided into five sub-sections that comprise
the approach to the architectures, the methodologies, the softwares and the performance metrics. In
the end, some issues regarding the adoption of one or another structure are highlighted.
Architectures
Relatively to the most widely used architectures, Ariyachandra and Watson (2005) distinguish
five: (1) Independent Data Marts; (2) Data Mart Bus; (3) Hub and Spoke; (4) Centralized, and (5)
Federated. Other authors make classifications slightly different. For example, Turban, Aronson,
Liang et al. (2007) distinguish only four architectures, emphasizing primarily the architectures
based on Data Marts, and on these, they distinguish if the architectures include Independent Data
Marts or Dependent Data Marts formed from a Data Warehouse; an architecture centered on only
one Enterprise Data Warehouse; and a hybrid architecture for information management that is
supported neither by a Data Warehouse nor by Data Marts. To quote others, Sen and Sinha (2005)
also distinguish five architectures, but with a little different philosophy: (1) Enterprise Data
Warehousing (2) Data Mart, (3) Hub-and-Spoke Data Mart, (4) Enterprise Warehouse with
Based on the classification of Watson and Ariyachandra (2005), which seems more inclusive
and segregating, we can perceive a little better the way they organize each of the architectures:
• Independent Data Marts – the Data Marts are designed to operate independently from
each other. Each Data Mart has its definition of the data and the dimensions and
measures between multiple Data Marts are not normalized, making it difficult to analyze
The first Data Mart is built to a single business process, using dimensions and measures
normalized, which are then used with the other Data Marts.
• Hub and Spoke – is based on a requirement analysis for extensible enterprise level, also
developed in an iterative manner, subject by subject. Data Marts created from the data
special purposes (for example, data mining), and can have structures of normalized,
• Centralized – differs from the Hub and Spoke architecture since it does not have Data
Marts. This centralized approach allows the user access to all data in the Data
Warehouse. It also allows reducing the amount of data to transfer or change, thus
based on the business requirements, the data are accessed from these sources. Data are
logically and physically integrated using shared keys, global metadata, distributed
On this issue, we should also mention the choices of the two most famous researchers in this
area. Inmon (2002) advocates the use of the Hub and Spoke architecture. Kimball and Ross (2002)
propose in turn the use of the Data Mart Bus architecture, according to the definitions presented in
the previous paragraph. Note, however, that as regards the strategy to follow in the implementation
of the architecture, Kimball and Ross (2002) propose a mixed strategy top-down and bottom-up;
this is, creating individual Data Marts in a bottom-up way, but in accordance with the skeleton of
the Data Mart Bus architecture. Hence, the Data Warehouse for the organization will be the union
In addition, result of a study on best practices in business initiatives, Lawyer and Chowdhury
(2004) argue that the use of Independent Data Marts architecture should be avoided, as it is very
difficult to be a single source of truth for the organization, and therefore the consequences for the
Whatever the chosen architecture is, components that are always included in the
implementation of a Data Warehouse system are (Monteiro, Pinto and Costa 2003):
• OLTP (Online Transaction Processing) – registry operating systems that capture the
• ETL (Extract, Transfer and Load) – process that is the first step of the task of obtaining
• DSA (Data Staging Area) – application that performs the connection between OLTP
Along with the components listed above, applications oriented to decision support are also parts
in the implementation of a Data Warehouse system. Since the activities involved depend on the data
quality management and on the metadata management, the Data Warehouse system should include
tools for this purpose (Sen and Sinha 2005). When the data arrive at DSA, many changes occur, as
for example filtering the data (correction of typographical errors, resolution of domain conflicts,
etc.), the integration of data from multiple sources, the deletion of duplicate data and the key
Overall, the literature identifies the following sequential activities in the process of
implementing a Data Warehouse system (Sen and Sinha 2005; List, Bruckner, Machaczek et al.
2002):
• Design of architecture;
• Implementation;
• Maintenance.
However, in the author’s view and towards what is stated in the previous sub-section, it is
concluded that depending on the chosen architecture, the design of the data will follow different
paths. Later on, this approach is discussed and another is conceptualized, one that meets the reality
But consider some possible approaches advocated by renowned researchers in this area (Sen
and Sinha 2005; List, Bruckner, Machaczek et al. 2002). For example, Kimball and Ross (2002)
defend, in the first instance, the focus on the analytical requirements that are elicited from
managers, to design dimensional Data Marts. In their approach, we should begin with the planning
of the program and then follow to the definition of business requirements, dimensional modeling,
design of architecture, physical design, implementation, and so on. We can say that this approach is
Inmon (2002) argues in turn that instead of getting the requirements, we start by the data. These
are first obtained, integrated and tested. Then the programs are written according to the data and the
results are analyzed. Finally, the requirements are formulated. This approach is iterative and
the business strategy, assuming that the purpose of the organization is the same for all users. He
proposes to start by defining a first prototype based on business needs, and then fit it to the needs
and skills of users. You could say that it is, in this case, an approach oriented to the user (List,
Sen and Sinha (2005) present the results of a study that distinguishes the potential suppliers of
software/support, within the scope of Data Warehouse systems, in three categories: (1) Technology,
(2) Infrastructure, and (3) Consultancy (see Table 1). For these categories, each supplier is featured
according to: core competence, the way the modeling of requirements is carried out, the models
adopted for the data modeling, whether support for normalization and denormalization is provided,
the philosophy of architecture design, the strategy followed in the implementation, if some tool to
In Watson and Ariyachandra (2005), it is indicated that the most used platforms are the ones of
ORACLE, MICROSOFT and IBM. For these suppliers, and as outlined in Sen and Sinha (2005),
MICROSOFT and IBM follow the Hub and Spoke architecture, while ORACLE follows the Data
Mart Bus architecture (as definition of Watson and Ariyachandra (2005) presented in sub-section
“Architectures”).
Comparing these suppliers on their core competency, according to data presented in Sen and
Sinha (2005), it seems that in the Technology category, the core competence lies on the level of
Database Management Systems (DBMS); whereas in the case of the Infrastructure category, its
competence focuses on the level of business analysis software (OLAP, Data Mining, Predictive
Analysis, etc.); and with regard to suppliers in the category of Consulting, their skills are also
focused on the level of Business Intelligence, and in some cases essentially devoted to Enterprise
Performance Metrics
After being implemented and having passed some period of use, each system must be assessed.
The assessment can be made from various perspectives, depending on the objectives of the
organization. For example, Ariyachandra and Watson (2006) suggest, for evaluating the
In relation to the binomial cost/time, and from the analysis of literature made, the five
architectures defined in Ariyachandra and Watson (2005) seem to be distributed according to Figure
1.
Other authors also refer as possible criteria for evaluating performance of a Data Warehouse
system: the ease of use, the speed of access, the understanding of data, the ability and ease in
putting new questions to the system, or the improvement in capacity of decision-making (Wixon
and Watson 2001; Weir 2002; Brown 2004; Hwang and Xu 2005).
Discussion
From literature, and as general considerations, we can say that in case of large firms and those
suffering successive structural changes such as mergers or alliances, the most appropriate
architecture is the “Federated”. When the factor of interdependence of information has in turn more
weight on the others, the Data Mart Bus architecture should be adopted. Moreover, when the
predominant factor is related to the strategic vision, the choice must usually be on one of the
following architectures: Centralized or Hub and Spoke. If in the enterprise, the strategic vision of
implementing a Data Warehouse system is still limited in scope, there are resource constraints and
the expertise is low, the option should be the Independent Data Mart architecture (see Figure 2).
“Methodologies” and in the vision of this author, the data-oriented approach advocated by Inmon
(2002) does not allow an organizational implementation on strategic level (long term); and even on
the tactical level, its implementation is very limited. Nevertheless, this approach seems adequate to
manage the process at the operational level. In the context of organizational activities, this author
agrees with List, Bruckner, Machaczek et al. (2002) that a data-oriented methodology for
conducting a program to implement a Data Warehouse system is suitable for activities of workflow,
because of the high degree of repetition, which allows you to generate high business value with
customer focus.
In the case of goal-oriented approach and defended by Kimball and Ross (2002), this seems to
support the organization at the strategic level, guiding it toward a broader future. It allows support
Finally, with regard to user-oriented approach from Westerman (2001), it seems not to
guarantee decision support, therefore it should only be used in very specific contexts. Furthermore,
and because in general most employees tend to focus their vision on only one angle, the success of
implementation of a Data Warehouse system that follows just this approach can be seriously
compromised. A hybrid approach that combines this approach with one of the others may enable a
broader view.
The work of revising the state-of-the-art, described briefly in the previous section, allowed the
The development of systems is usually a complex process, mainly due to the difficulty in
aligning the characteristics of the system with the needs of an organization, but also because
controlling both time and costs of development is difficult (Stair and Reynolds 2008). First, it is
important that the organization realizes and recognizes the activities of development of systems, so
that it can embark on a program of implementation of a Data Warehouse system in a conscious and
reflected way.
Thus, in systems development, involving either the creation of a new system or the
modification of an existing one, it must meet the five main steps: (1) Investigation – understand the
problem, this is, gain a clear understanding of the problem to be solved or opportunities to be
addressed; (2) Analysis – understand solutions, this is, clearly define the problem and the expected
opportunities; (3) Design – select and plan the best solution, this is, determine how the new system
will work to meet the business needs defined during analysis; (4) Implementation – place solution
into effect, this is, create or acquire the system components defined in the design, assembling them,
and putting the new system into operation; and (5) Maintenance and Revision – evaluate results of
solution, this is, monitor and evaluate system performance, and decide on the need for possible
The conceptual framework presented in Table 2 for building a Data Warehouse system is based,
in the first instance, on the identification of the main factors justifying the need for implementing
the system for the organization in question, according to the model presented in Figure 2. This task
will allow later to map the choice into the most appropriate architecture and to reflect
be sought. It interests to identify one that guarantees it can develop/provide the solution specified
and at a lower cost or/and at a minor time. In this process, it is also necessary to consider what kind
of support is desired, that is, if the wanted support is at the level technology, infrastructure and/or
consultancy. It is determined which of them offers more favorable conditions under various criteria,
Thus, after choosing the software supplier, the next step is the implementation of the system of
Data Warehouse desired. Completed the construction of the Data Warehouse system, comes the
stage of evaluation. It is important to assess the financial costs and time spent, but also to perceive
how well this new system suits the organizational activity. The results of this evaluation may dictate
Discussion
The conceptual framework presented is a result of aggregating information from the review of
existing literature on this subject, of the author's professional experience and of a deep reflection of
hers on this. The analysis of best practices, described by some professionals in this area, was also
A good practice, for example, is to have a metadata repository that supports the import and
export, and that also has capabilities of integration with Internet technology (Weir 2002; Lawyer
Brown (2004) highlights as success factors in the implementation a Data Warehouse system:
the adoption of an approach geared to business objectives in order to maximize the acquisition of
necessary knowledge about the organization's data and about the potential users of the system, and
the promotion of the incremental development of the system. The expected benefits are both of data
effective techniques of cleansing and integration allows to produce higher quality data relatively to
Concluding Remarks
For presentation of the conceptual framework described in previous section, the author had as
foundation the delineation of general lines that serve as a matrix in the process of building a Data
Warehouse system.
As mentioned earlier in this paper, the building a Data Warehouse system is understood as a
program, that has a beginning, middle and end, but the several steps involved are cyclical, this
means that the result of each phase can dictate, spaced in time, the change of various aspects of the
Data Warehouse system implemented. These aspects can be at the level of the data, the architecture,
Future efforts can use the conceptual framework presented here to develop programs to
determinant of success or failure of Customer Relationship Management (CRM) systems. In the era
of business intelligence, the concern for the implementation of an effective Data Warehouse system
proves to be an increasingly important factor, also given the emerging trends of enterprises for the
Relational Marketing.
References
Ariyachandra, T. and H. Watson (2005). "Key factors in selecting a Data Warehouse architecture."
Business Intelligence Journal 10(2).
Ariyachandra, T. and H. Watson (2006). "Which Data Warehouse Architecture is Most
Successful?" Business Intelligence Journal 11(1).
Brown, M. (2004). "8 Characteristics of a Successful Data Warehouse". Twenty-Ninth Annual SAS
Users Group International Conference (SUGI 29), Montreal.
Hwang, M. and H. Xu (2005). "A Survey of Data Warehousing Success Issues." Business
Intelligence Journal 10(4).
Inmon, W. H. (2002). Building the Data Warehouse. New York, Wiley.
Kimball, R. and M. Ross (2002). The Data Warehouse Toolkit: The Complete Guide to
Dimensional Modeling. New York, Wiley.
Lawyer, J. and S. Chowdhury (2004). "Best Practices in Data Warehousing to Support Business
Initiatives and Needs". 37th Hawaii International Conference on System Science, IEEE.
List, B., R. M. Bruckner, K. Machaczek and J. Schiefer (2002). A Comparison of Data Warehouse
Development Methodologies: Case Study of the Process Warehouse. R. C. e. al., Springer-
Verlag Berlin Heidelberg. LNCS 2453: 203-215.
Monteiro, A., M. Pinto and R. Costa (2003). Uma Aplicação de Data Warehouse para Apoiar
Negócios. Rio de Janeiro, Brasil, Universidade do Estado do Rio de Janeiro - UERJ, IME -
Dept de Informática e Ciência da Computação.
Sen, A. and A. P. Sinha (2005). "A Comparison of Data Warehousing Methodologies."
Communications of the ACM 48(3): 79-84.
Stair, R. and G. Reynolds (2008). Fundamentals of Information Systems. Boston, Thomson Course
Technology.
Turban, E., J. E. Aronson, T.-P. Liang and R. Sharda (2007). Decision Support and Business
Intelligence Systems. New Jersey, Pearson Prentice Hall.
Vervuren, P. and F. Dietvorst (2006). "Contours of a Drug Development Data Warehouse". First
Conference of the Pharmaceutical Users Software Exchange (PhUSE), Dublin, Ireland.
Watson, H. J. and T. Ariyachandra (2005). Data Warehouse Architectures: Factors in the Selection
Decision and the Success of the Architectures, Terry College of Business, University of
Georgia.
Weir, R. (2002). "Best Practices for Implementing a Data Warehouse." Journal of Data
Warehousing 7(1).
Westerman, P. (2001). Data Warehousing using the Wal-Mart Model. Morgan Kaufmann.
Wixon, B. and H. Watson (2001). "An Empirical Investigation of the Factors Affecting Data
Warehousing Success." MIS Quartely 25(1).
Table 1
Suppliers of Data Warehouse Systems by Category (based on Sen and Sinha (2005))
Denomination
Tecnology NCR/Teradata
Oracle
IBM
Sybase
Microsoft
Infrastruture SAS
Informatica’s Velocity
Computer Associates
Visible Technologies
Hyperion’s STAR
Consultancy SAP
PeopleSoft
CGEY
Corporate Information Designs
Creative Data
Figure 1
Distribution of Data Warehouse Architectures for the Dimensions of Cost and Time
Figure 2