0% found this document useful (0 votes)
13 views

Graph NoSQL Data Warehouse Creation

Graph NoSQL Data Warehouse Creation

Uploaded by

Bruno Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Graph NoSQL Data Warehouse Creation

Graph NoSQL Data Warehouse Creation

Uploaded by

Bruno Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Graph NoSQL Data Warehouse Creation

Amal Sellami Ahlem Nabli Faiez Gargouri


Institute of Computer Science and Al-Baha University, Faculty of Institute of Computer Science and
Multimedia Computer Science and Information Multimedia
MIRACL laboratory, University of Technology, KSA MIRACL laboratory, University of
Sfax, Tunisia MIRACL laboratory, University of Sfax, Tunisia
[email protected] Sfax, Tunisia [email protected]
[email protected]

ABSTRACT
Over the last few years, NoSQL systems are gaining strong 1 Introduction
popularity and a number of decision makers are using it to
By integrating external data from social network with a
implement their warehouses. In the recent years, many web
company’s data warehouse, decision-makers can better anticipate
applications are moving towards the use of data in the form of changes in customer behavior, strengthen supply chains, improve
graphs. For example, social media and the emergence of Facebook, the effectiveness of marketing campaigns, and enhance business
LinkedIn and Twitter have accelerated the emergence of the continuity. Meanwhile, big volumes of data cannot be processed by
NoSQL database and in particular graph-oriented databases that traditional warehouses and OLAP servers that base on RDBMS
represent the basic format with which data in these media is stored. solutions.
Based on these findings and in addition to the absence of a clear In the recent years, many web applications are moving towards the
approach which allows the creation of data warehouse under use of data in the form of graphs. For example, social media, such
NoSQL model, we propose, in this paper, an approach to create a as Facebook, LinkedIn and Twitter have accelerated the emergence
Graph-oriented Data warehouse. We propose the transformation of of the NoSQL database such as Key-value, document-oriented,
Dimensional Fact Model into Graph Dimensional Model. Then, we column-oriented and graph-oriented. So many works have
implement the Graph Dimensional Model using java routines in proposed the use of NoSQL DB as new kind of storage for the DW.
Talend Data Integration tool (TOS). However, the majority of those works are based on creating data
warehouse under column-oriented or document-oriented NoSQL
CCS CONCEPTS storage. However, the column and document oriented model share
the major disadvantage of the relational database which is the join.
 Information systems  Data management systems  The latter is usually highlighted during the interrogation which
Database design and models  Graph-based database models represents a painful work. On the other hand, the major interesting
 Hierarchical data models advantage of the graph oriented model is related to the ability for
supporting complex queries without using joins. Based on these
findings, we leaned towards a graphic paradigm-based model for
KEYWORDS
data warehouse implementation. In fact, Graph-oriented database
NoSQL Data Warehouse, Graph-oriented NoSQL model, Graph- presents one of the most powerful NoSQL database models that
oriented Data Warehouse rely on graph theory to manipulate and store data. Designed to
explore highly connected data, the graph oriented database
ACM Reference format: structure enables the modeling of complex data in a simple and
Amal Sellami, Ahlem Nabli, Faiez Gargouri. 2020. Graph NoSQL Data intuitive way where there is no difference between data and
Warehouse Creation. In 22nd International Conference on Information relation-ships.
Integration and Web-based Applications & Services (iiWAS2020), Thus, given the advantages of the NOSQL Graph database, we
November 30 - December 2nd, Chiang Mai, Thailand. ACM, New York, propose in this paper a general approach for creating a Graph-
NY, USA 5 pages. https://doi.org/10.1145/3428757.3429141 oriented Data warehouse. We propose to transform conceptual data
warehouse model (Dimensional Fact Model) to logical model based
Permission to make digital or hard copies of all or part of this work for on graph formalism model named Graph Dimensional
personal or classroom use is granted without fee provided that copies are Model(GDM). Therefore, the mapping to NoSQL data storage is
not made or distributed for profit or commercial advantage and that copies viewed as the migration from explicitly defined data structures
bear this notice and the full citation on the first page. Copyrights for towards implicit ones. These databases assign the responsibility to
components of this work owned by others than ACM must be honored. maintain the schema to the developer. Consequently, the creation
Abstracting with credit is permitted. To copy otherwise, or republish, to of a schema occurs while inserting data at Extract Transform and
post on servers or to redistribute to lists, requires prior specific permission
Load (ETL) level. For GDM implementation, we propose to use
and/or a fee. Request permissions from [email protected].
java routines in Talend Data Integration tool (TOS).
iiWAS '20, November 30-December 2, 2020, Chiang Mai, Thailand
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-8922-8/20/11…$15.00
https://doi.org/10.1145/3428757.3429141
iiWAS ’20, November 30 - December 2nd, Chiang Mai, Thailand A. Sellami et al.

This paper is organized as follows. Section 2 represents a state of NoSQL model. It should be noted that the column and document
the art. In Section 3, we introduce an overview of our approach.
The dimensional fact model is presented in Section 4. Our proposal oriented model share the major disadvantage of the relational
for the graph dimensional model is disclosed in Section 5. Section database which is the join. The latter must be highlighted during
6, addresses the creation of GDM with TOS. In Section7, we the interrogation which represents a painful work. The major
conclude this paper by giving some future research directions.
interesting advantage of the graph oriented model is related to the
ability for supporting complex queries without using joins. Based
2 Related Work on these findings, we thought it would be interesting to define a
In the literature, a number of researchers have recognized the process to implement a data warehouse using the graph NoSQL
deficiencies of the traditional ROLAP data storage and have data base.
proposed approaches for the migration from relational databases to
NoSQL ones (indirect approaches). However, few works have 3 Approach Overview
focused on the transformation of the multidimensional conceptual
In the absence of a clear approach that allows the creation of a graph
model to NoSQL logical one (direct approaches). Authors in [1, 2,
oriented data warehouse, we propose in this section a new approach
3], have tried to define logical models for NoSQL data stores
for data warehouse building under graph-oriented system. This
(oriented columns and oriented documents). They proposed a set of
approach is composed of three phases (Figure 1).
rules to map star schema into two NoSQL models: column-oriented
The first phase, concerns the definition of the transformation rules
(using HBase) and document-oriented (using MongoDB).
from the conceptual model of DW to the graph-oriented model.
In [4], authors have proposed a transformation rules that ensure the
This transformation can be processed with or without normalization.
successful translation from conceptual DW schema to two logical
The second phase, concerns the implementation of the normalized
NoSQL models (column-oriented and document-oriented). They
and denormalized transformations. These transformations are
also proposed two possible transformations namely: simple and
implemented under TOS.
hierarchical transformations. The first one stores the fact and
The third phase, concerns the comparative study. In order to choose
dimensions into one column-family/collection. The second
the best transformation between the normalized and denormalized
transformation uses different column-families/collections for
transformations. All comparison will be based on Loading Time,
storing fact and dimensions while explaining hierarchies. In [5], the
Querying Time and Storage Space. In the remaining sections of this
authors analyze several issues including modeling, querying,
paper, we focus on the denormalized transformation and its
loading data and OLAP cuboids. They compared document-
implementation.
oriented models (with and without normalization) to analogous
relational database models. In [6] authors focused on simplifying
the heterogeneous data querying in the graph-oriented NoSQL
systems.
Moreover, in [7,8], authors proposed three approaches which
allow big data warehouses to be implemented under the column
oriented NoSQL model but without giving the formalization for the
modeling process. Each approach differs in terms of structure and
the attribute types used when mapping the conceptual model into
logical model is performed.
A recent work, [9] proposed a data storage models for Graph
cubes by introducing a document oriented model and a column
oriented model for storing a graph cube data and implementing the
roll up operation over the MongoDB document-oriented database
and Cassandra Column-oriented database. In [10], authors propose
a method for transforming object-relational database to NoSQL
databases, more especially to the document-oriented databases.
The majority of works proposed the transformation and creation of
data warehouse in the column-oriented or document-oriented
Figure 1: Approach overview
Graph NoSQL Data Warehouse Creation iiWAS ’20, November 30 - December 2nd, Chiang Mai, Thailand

4 DFM from LDBC’s Social Network


The main aim of our approach is to determine the conceptual model
of data warehouse from a given data source. AS a data source
Benchmark, we propose to use the Linked Data Benchmark
Council Social Network Benchmark (LDBC-SNB) [11].

Figure 3: Dimensional Fact Model (DFM)

5 GDM: Graph Dimensional Model

Graph Dimensional Model (GDM) is the logical model of data


warehouse for graph-oriented database. To obtain GDM, using
transformation rules to pass from the conceptual model (DFM) to
GDM one, is a compulsory step.
Recall that, the graph oriented database is composed of a set of
nodes, edges, and properties. Each node has property and label. The
relation connecting the nodes can eventually have properties.
Figure 2: The LDBC-SNB data schema In [12], we have proposed already two types of transformation:
normalized and denormalized transformation. In this section, we
Figure 2 shows the LDBC data schema in UML. The schema present the denormalized transformation. This transformation
defines the structure of the data used in the benchmark in terms of ensures the mapping to NoSQL model while highlighting the
entities and their relations. LDBC represents a snapshot of the concepts of the Multidimensional Schema (MS) but without
activity of a social network during a period of time. It includes detailing the hierarchies. In this transformation each fact and
several entities such as Persons, Organizations, and Places. The dimension parameter are transformed into nodes according the
schema also models the way that persons interact, by means of the following rules:
friendship relations established with other persons, and the sharing Rule 1: Transformation of a fact and its measures to the graph-
of content such as messages (both textual and images), replies to oriented model.
messages and likes to messages. People form groups to talk about
specific topics, which are represented as tags. The conceptual Rule.1: Fact/Measures Transformation.
model of data warehouse schema generated from the data source Each fact is transformed into a node with the label of the node
LDBC-SNB is illustrated by the Figure 3. takes the type of the concept of the multidimensional model
which is ‘fact’ then we add the name of the fact as a second label
at the same node. Each measure is transformed by a property of
Fact node.

Rule 2: Transformation of a dimension and its attributes (Strong


and Weak) to the graph-oriented model.

Rule.2: Dimension/Parameters Transformation.


iiWAS ’20, November 30 - December 2nd, Chiang Mai, Thailand A. Sellami et al.

Each dimension is transformed into a node with the label of the


node takes the name of the concept of the multidimensional This table is useful for the ETL process. In fact, to load data to
model (in this case is the dimension). Then, we allow the name NoSQL DW, data must be identified and extracted from the source.
of the dimension as a second label at the same node. After, the Consequently, the data must be transformed and verified before
identifier is transformed into a property in the node. Finally, any being loaded into “Neo4j”. Moreover, loading data require the
weak attribute associated to the identifier is transformed into a structure of the target to be known at advances. As NoSQL DBs are
property in the same node. schema-less, this increases the need for extending the existing ETL
Each parameter is transformed into a property in the node tool in order to be able to create data warehouse while integrating
(dimension). After that, each weak attribute is represented in the data. ETL tool should be adapted with the constant changes, to
form of property. produce and to modify executable code quickly.

Rule 3: Transformation of the link between the fact and


dimension to the graph-oriented model. 6 GDM creation with TOS under Neo4j
Rule.3: Link fact-dimension Transformation. For implementing the graph-oriented DW, our choice is oriented
Each link between fact and dimension is represented as a
towards Neo4j. Like any other NoSQL database, Neo4j is a high-
relation having as node source the node modeling the fact and
as node destination the node modeling the dimension. The performance, NoSQL graph database with all the features of a
relation has as name ‘link fact-dimension’. mature and robust database. Neo4j is debatably the most popular
graph database. It is particularly developed for Java applications,
The application of the proposed rules on the dimensional fact but it also supports Python. The graph model in Neo4j consists of
model of figure 3 provides the logical model of a data warehouse three characteristics, namely: a Property (a key-value pair that can
using the graphic formalism illustrated by Figure 4.
be added to both nodes and edges), only edges can be associated
with a type (e.g. “KNOWS”) and Edges can be specified as directed
or undirected. Neo4j uses the following index mechanism: a super
reference node is connected to all the nodes by a special edge type
“REFERENCE”. This actually allows to create multiple indexes to
distinguish them by different edge types.
Therefore, to create a NoSQL DW, we are required to write a
program that relies on some form of implicit schema. In this work,
schema migration to graph-oriented model is done according to the
proposed transformation rules. These rules are implemented using
java routines integrated with data integration tool “Talend for Big
Data” (TOS).
A Job is a graphical design, of one or more components
connected together such as tFileInputDelimited (PersonFile,
TagFile, ect.), tMap and tneo4joutput (‘Neo4j’), as depicted in
Figure 4: Graph Dimensional Model (GDM) Figure 5. Otherwise, a routine is a complex Java code, generally
used to optimize data processing and improve Job capacities. The
In this level, the correspondence table is generated to keep trace of GDM shown in Figure 4 was created and loaded from the
different transformations. Table 1 presents an excerpt of the benchmark LDBC-SNB model data source modeled with UML in
generated correspondence table (CT). Figure 2.
Table 2 presents the input data file according to the numbers of
Table 1: Correspondence Table excerpt rows and extract time.

Object Type Operation Target Type Table 2: Obtained results


source Data
Forum Fact Fact Transf Forum Node Input File Numbers Rows Extract Time
Nb_Post Measures Measures Nb_Post Property
PersonFile 1527 rows 1527 rows in 0,3s
Person Dimension Dimension Person Node
TagsFile 16079 rows 16079 rows in 0,9s
Transf
CommentsFile 151042 rows 151042 rows in 0,1s
Id_Pers Weak Parameter Id_Pers Property
PostFile 135700 rows 135700 rows in 2,21s
attribute Transf
Forum_FactFile 13750 rows 13750 rows in 0,6s
Graph NoSQL Data Warehouse Creation iiWAS ’20, November 30 - December 2nd, Chiang Mai, Thailand

7 Conclusion
Facing the wide development of the social media, a huge amount
of data is now continuously available to decision support. Since the
relational systems are lack of scaling and inefficient of handling big
data it’s vital to extract transform and loading the data into graph
NoSQL data warehouse.
In this paper, we identified two transformations named respectively
normalized and denormalized. We have focused on the
denormalized transformation and we have proposed graph-based
model for data warehouse implementation.
As future work, we aim to implement the normalized
transformation and carry a comparative study in order to choose the
best transformation.

REFERENCES
[1] M. Chevalier, M. El Malki, A. Kopliku, O.Teste, T. Tournier(2015).
Implementing Multidimensional Data Warehouses into NoSQL. International
Conference on Enterprise Information Systems David Harel. 1979. First-Order
Dynamic Logic. Lecture Notes in Computer Science, Vol. 68. Springer-Verlag,
New York, NY. https://doi.org/10.1007/3-540-09237-4.
Figure 5: Example of a Job with TOS [2] M. Chevalier, M. El Malki, A. Kopliku, O. Teste, R. Tournier(2015).
Implementation of Multidimensional Databases in Column-Oriented NoSQL
Systems. In : East European Conference on Advances in Databases and
Figure 6 shows an example of instantiation of GDM. The figure 6 Information Systems.
(a) presents an instance of the fact Forum with its measures [3] M. Chevalier, M. El Malki, A., O. Teste, R. Tournier(2015). Implementation of
(Nb_Post, Nb_Tag, Nb_Message, Nb_Comment, Multidimensional Data bases with Document-Oriented NoSQL. International
Conference on Big Data Analytics and Knowledge Discovery. Springer, Cham,
Nb_like_Comment, Nb_like_Post) for given dimensions (Person,
2015. p. 379-390.
Tags, Message, Date). The figure 6 (b) describe the dimension Tag
[4] Rania Yangui, Ahlem Nabli, and Faiez Gargouri (2016). Automatic
with its parameters saved as properties named (Name, Type, URL). Transformation of Data Warehouse Schema to NoSQL Data Base: Comparative
Study. Procedia Computer Science, 2016, vol. 96, p. 255-264.
[5] M. Chevalier, M. El Malki, A. Kopliku, O. Teste, R. Tournier(2017) Entrepôts
de données orientés documents : cuboïdes étendus : Modèles et cuboïdes
NoSQL orientés documents. Document Numérique
[6] Mohammed El Malki, Hamdi Ben Hamadou, Max Chevalier, André Péninou,
and Olivier Teste (2018). Querying Heterogeneous Data in Graph Oriented
NoSQL Systems. Big Data Analytics and Knowledge Discovery - 20th
International Conference, DaWaK 2018.
[7] Dehdouh, K., O. Boussaid, et F. Bentayeb (2014). Columnar NoSQL star
schema benchmark. In 4th International Conference on Model and Data
Engineering (MEDI), LNCS 8748, pp. 281–288. Springer.
[8] Dehdouh, K., Boussaid, O., Bentayeb, F. (2015) Using the column oriented
NoSQL model for implementing big data warehouses. Proceedings of the 21st
International Conference on Parallel and Distributed Processing Techniques and
Applications, pp. 469-475 (2015).
[9] Zakia Challal, Wafaa Bala, Hanifa Mokeddem, Kamel Boukhalfa Omar
Boussaidy and Elhadj Benkhelifa (2019) Document-oriented versus Column-
oriented Data Storage for Social Graph Data Warehouse
[10] Aicha AGGOUNE, Mohamed Sofiane NAMOUNE (2020) A Method for
Transforming Object-relational to Document-oriented Databases. 2020
International Conference on Mathematics and Information Technology, Adrar,
Algeria, February 18-19, 2020 154
[11] A. Prat and A. Averbuch. Benchmark design for navigational pattern matching
benchmarking.http://ldbcouncil.org/sites/default/files/LDBC_D3.3.34.pdf.Sell
ami
[12] Amal, Ahlem Nabli, Faiez Garouri (2018) Transformation of Data Warehouse
Schema To NoSQL Graph Data Base. Intelligent Systems Design and
Figure 6: Example of instantiation of GDM Applications- 18th International Conference on Intelligent Systems Design and
Applications ‘ISDA’ 2018, Volume 2

You might also like