Graph NoSQL Data Warehouse Creation
Graph NoSQL Data Warehouse Creation
ABSTRACT
Over the last few years, NoSQL systems are gaining strong 1 Introduction
popularity and a number of decision makers are using it to
By integrating external data from social network with a
implement their warehouses. In the recent years, many web
company’s data warehouse, decision-makers can better anticipate
applications are moving towards the use of data in the form of changes in customer behavior, strengthen supply chains, improve
graphs. For example, social media and the emergence of Facebook, the effectiveness of marketing campaigns, and enhance business
LinkedIn and Twitter have accelerated the emergence of the continuity. Meanwhile, big volumes of data cannot be processed by
NoSQL database and in particular graph-oriented databases that traditional warehouses and OLAP servers that base on RDBMS
represent the basic format with which data in these media is stored. solutions.
Based on these findings and in addition to the absence of a clear In the recent years, many web applications are moving towards the
approach which allows the creation of data warehouse under use of data in the form of graphs. For example, social media, such
NoSQL model, we propose, in this paper, an approach to create a as Facebook, LinkedIn and Twitter have accelerated the emergence
Graph-oriented Data warehouse. We propose the transformation of of the NoSQL database such as Key-value, document-oriented,
Dimensional Fact Model into Graph Dimensional Model. Then, we column-oriented and graph-oriented. So many works have
implement the Graph Dimensional Model using java routines in proposed the use of NoSQL DB as new kind of storage for the DW.
Talend Data Integration tool (TOS). However, the majority of those works are based on creating data
warehouse under column-oriented or document-oriented NoSQL
CCS CONCEPTS storage. However, the column and document oriented model share
the major disadvantage of the relational database which is the join.
Information systems Data management systems The latter is usually highlighted during the interrogation which
Database design and models Graph-based database models represents a painful work. On the other hand, the major interesting
Hierarchical data models advantage of the graph oriented model is related to the ability for
supporting complex queries without using joins. Based on these
findings, we leaned towards a graphic paradigm-based model for
KEYWORDS
data warehouse implementation. In fact, Graph-oriented database
NoSQL Data Warehouse, Graph-oriented NoSQL model, Graph- presents one of the most powerful NoSQL database models that
oriented Data Warehouse rely on graph theory to manipulate and store data. Designed to
explore highly connected data, the graph oriented database
ACM Reference format: structure enables the modeling of complex data in a simple and
Amal Sellami, Ahlem Nabli, Faiez Gargouri. 2020. Graph NoSQL Data intuitive way where there is no difference between data and
Warehouse Creation. In 22nd International Conference on Information relation-ships.
Integration and Web-based Applications & Services (iiWAS2020), Thus, given the advantages of the NOSQL Graph database, we
November 30 - December 2nd, Chiang Mai, Thailand. ACM, New York, propose in this paper a general approach for creating a Graph-
NY, USA 5 pages. https://doi.org/10.1145/3428757.3429141 oriented Data warehouse. We propose to transform conceptual data
warehouse model (Dimensional Fact Model) to logical model based
Permission to make digital or hard copies of all or part of this work for on graph formalism model named Graph Dimensional
personal or classroom use is granted without fee provided that copies are Model(GDM). Therefore, the mapping to NoSQL data storage is
not made or distributed for profit or commercial advantage and that copies viewed as the migration from explicitly defined data structures
bear this notice and the full citation on the first page. Copyrights for towards implicit ones. These databases assign the responsibility to
components of this work owned by others than ACM must be honored. maintain the schema to the developer. Consequently, the creation
Abstracting with credit is permitted. To copy otherwise, or republish, to of a schema occurs while inserting data at Extract Transform and
post on servers or to redistribute to lists, requires prior specific permission
Load (ETL) level. For GDM implementation, we propose to use
and/or a fee. Request permissions from [email protected].
java routines in Talend Data Integration tool (TOS).
iiWAS '20, November 30-December 2, 2020, Chiang Mai, Thailand
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-8922-8/20/11…$15.00
https://doi.org/10.1145/3428757.3429141
iiWAS ’20, November 30 - December 2nd, Chiang Mai, Thailand A. Sellami et al.
This paper is organized as follows. Section 2 represents a state of NoSQL model. It should be noted that the column and document
the art. In Section 3, we introduce an overview of our approach.
The dimensional fact model is presented in Section 4. Our proposal oriented model share the major disadvantage of the relational
for the graph dimensional model is disclosed in Section 5. Section database which is the join. The latter must be highlighted during
6, addresses the creation of GDM with TOS. In Section7, we the interrogation which represents a painful work. The major
conclude this paper by giving some future research directions.
interesting advantage of the graph oriented model is related to the
ability for supporting complex queries without using joins. Based
2 Related Work on these findings, we thought it would be interesting to define a
In the literature, a number of researchers have recognized the process to implement a data warehouse using the graph NoSQL
deficiencies of the traditional ROLAP data storage and have data base.
proposed approaches for the migration from relational databases to
NoSQL ones (indirect approaches). However, few works have 3 Approach Overview
focused on the transformation of the multidimensional conceptual
In the absence of a clear approach that allows the creation of a graph
model to NoSQL logical one (direct approaches). Authors in [1, 2,
oriented data warehouse, we propose in this section a new approach
3], have tried to define logical models for NoSQL data stores
for data warehouse building under graph-oriented system. This
(oriented columns and oriented documents). They proposed a set of
approach is composed of three phases (Figure 1).
rules to map star schema into two NoSQL models: column-oriented
The first phase, concerns the definition of the transformation rules
(using HBase) and document-oriented (using MongoDB).
from the conceptual model of DW to the graph-oriented model.
In [4], authors have proposed a transformation rules that ensure the
This transformation can be processed with or without normalization.
successful translation from conceptual DW schema to two logical
The second phase, concerns the implementation of the normalized
NoSQL models (column-oriented and document-oriented). They
and denormalized transformations. These transformations are
also proposed two possible transformations namely: simple and
implemented under TOS.
hierarchical transformations. The first one stores the fact and
The third phase, concerns the comparative study. In order to choose
dimensions into one column-family/collection. The second
the best transformation between the normalized and denormalized
transformation uses different column-families/collections for
transformations. All comparison will be based on Loading Time,
storing fact and dimensions while explaining hierarchies. In [5], the
Querying Time and Storage Space. In the remaining sections of this
authors analyze several issues including modeling, querying,
paper, we focus on the denormalized transformation and its
loading data and OLAP cuboids. They compared document-
implementation.
oriented models (with and without normalization) to analogous
relational database models. In [6] authors focused on simplifying
the heterogeneous data querying in the graph-oriented NoSQL
systems.
Moreover, in [7,8], authors proposed three approaches which
allow big data warehouses to be implemented under the column
oriented NoSQL model but without giving the formalization for the
modeling process. Each approach differs in terms of structure and
the attribute types used when mapping the conceptual model into
logical model is performed.
A recent work, [9] proposed a data storage models for Graph
cubes by introducing a document oriented model and a column
oriented model for storing a graph cube data and implementing the
roll up operation over the MongoDB document-oriented database
and Cassandra Column-oriented database. In [10], authors propose
a method for transforming object-relational database to NoSQL
databases, more especially to the document-oriented databases.
The majority of works proposed the transformation and creation of
data warehouse in the column-oriented or document-oriented
Figure 1: Approach overview
Graph NoSQL Data Warehouse Creation iiWAS ’20, November 30 - December 2nd, Chiang Mai, Thailand
7 Conclusion
Facing the wide development of the social media, a huge amount
of data is now continuously available to decision support. Since the
relational systems are lack of scaling and inefficient of handling big
data it’s vital to extract transform and loading the data into graph
NoSQL data warehouse.
In this paper, we identified two transformations named respectively
normalized and denormalized. We have focused on the
denormalized transformation and we have proposed graph-based
model for data warehouse implementation.
As future work, we aim to implement the normalized
transformation and carry a comparative study in order to choose the
best transformation.
REFERENCES
[1] M. Chevalier, M. El Malki, A. Kopliku, O.Teste, T. Tournier(2015).
Implementing Multidimensional Data Warehouses into NoSQL. International
Conference on Enterprise Information Systems David Harel. 1979. First-Order
Dynamic Logic. Lecture Notes in Computer Science, Vol. 68. Springer-Verlag,
New York, NY. https://doi.org/10.1007/3-540-09237-4.
Figure 5: Example of a Job with TOS [2] M. Chevalier, M. El Malki, A. Kopliku, O. Teste, R. Tournier(2015).
Implementation of Multidimensional Databases in Column-Oriented NoSQL
Systems. In : East European Conference on Advances in Databases and
Figure 6 shows an example of instantiation of GDM. The figure 6 Information Systems.
(a) presents an instance of the fact Forum with its measures [3] M. Chevalier, M. El Malki, A., O. Teste, R. Tournier(2015). Implementation of
(Nb_Post, Nb_Tag, Nb_Message, Nb_Comment, Multidimensional Data bases with Document-Oriented NoSQL. International
Conference on Big Data Analytics and Knowledge Discovery. Springer, Cham,
Nb_like_Comment, Nb_like_Post) for given dimensions (Person,
2015. p. 379-390.
Tags, Message, Date). The figure 6 (b) describe the dimension Tag
[4] Rania Yangui, Ahlem Nabli, and Faiez Gargouri (2016). Automatic
with its parameters saved as properties named (Name, Type, URL). Transformation of Data Warehouse Schema to NoSQL Data Base: Comparative
Study. Procedia Computer Science, 2016, vol. 96, p. 255-264.
[5] M. Chevalier, M. El Malki, A. Kopliku, O. Teste, R. Tournier(2017) Entrepôts
de données orientés documents : cuboïdes étendus : Modèles et cuboïdes
NoSQL orientés documents. Document Numérique
[6] Mohammed El Malki, Hamdi Ben Hamadou, Max Chevalier, André Péninou,
and Olivier Teste (2018). Querying Heterogeneous Data in Graph Oriented
NoSQL Systems. Big Data Analytics and Knowledge Discovery - 20th
International Conference, DaWaK 2018.
[7] Dehdouh, K., O. Boussaid, et F. Bentayeb (2014). Columnar NoSQL star
schema benchmark. In 4th International Conference on Model and Data
Engineering (MEDI), LNCS 8748, pp. 281–288. Springer.
[8] Dehdouh, K., Boussaid, O., Bentayeb, F. (2015) Using the column oriented
NoSQL model for implementing big data warehouses. Proceedings of the 21st
International Conference on Parallel and Distributed Processing Techniques and
Applications, pp. 469-475 (2015).
[9] Zakia Challal, Wafaa Bala, Hanifa Mokeddem, Kamel Boukhalfa Omar
Boussaidy and Elhadj Benkhelifa (2019) Document-oriented versus Column-
oriented Data Storage for Social Graph Data Warehouse
[10] Aicha AGGOUNE, Mohamed Sofiane NAMOUNE (2020) A Method for
Transforming Object-relational to Document-oriented Databases. 2020
International Conference on Mathematics and Information Technology, Adrar,
Algeria, February 18-19, 2020 154
[11] A. Prat and A. Averbuch. Benchmark design for navigational pattern matching
benchmarking.http://ldbcouncil.org/sites/default/files/LDBC_D3.3.34.pdf.Sell
ami
[12] Amal, Ahlem Nabli, Faiez Garouri (2018) Transformation of Data Warehouse
Schema To NoSQL Graph Data Base. Intelligent Systems Design and
Figure 6: Example of instantiation of GDM Applications- 18th International Conference on Intelligent Systems Design and
Applications ‘ISDA’ 2018, Volume 2