UMLtoNoSQL Automatic Transformation of Conceptual Schema To NoSQL
UMLtoNoSQL Automatic Transformation of Conceptual Schema To NoSQL
Abstract—Volume, Variety and Velocity are the three While NoSQL systems have proven their efficiency to
dimensions that have definitely changed the tools we need to handle Big Data, it is still an unsolved problem how the
store and process Big Data effectively, giving rise to NoSQL automatic storage of Big Data in NoSQL systems could
systems for faster data access, better scalability and higher be done. In our view, it is important to have a precise and
flexibility. While NoSQL systems have proven their automatic approach that helps and assists developer in the
efficiency to handle Big Data, it is still an unsolved problem Big Database implementation task within NoSQL
how the automatic storage of Big Data in NoSQL systems systems. One solution for addressing this problem is to
could be done. One solution for addressing this problem is model Big Data, and then define mapping rules towards
to model Big Data, and then define mapping rules towards
the physical level. As discussed in the related work (see
the physical level. This paper proposes an automatic MDA-
section 5), there are only few solutions that focus on
based approach that translates conceptual models expressed
using the Unified Modeling Language (UML) into NoSQL mapping UML conceptual models into NoSQL physical
physical models. Our approach rely on an intermediate models.
logical model compatible with column, document and graph To overcome this situation, we propose the
oriented systems which allows to choose the system type that UMLtoNoSQL approach that automatically translates
suits the best with business rules and technical constraints. conceptual models expressed using the Unified Modeling
Language (UML) into several NoSQL physical models.
Keywords-UML conceptual model; NoSQL; Big Data This approach is based on the Model Driven Architecture
storage; MDA; Models Transformation (MDA) especially known as a framework for models
automatic transformations and allows the developer to
I. CONTEXT AND RESEARCH PROBLEM choose the system type (column, document or graph) that
Big data have received a great deal of attention in suits the best with business rules and technical
recent years. Not only the amount of data is on a constraints.
completely different level than before, but also we have The rest of the paper is structured as follows: Section
different type of data including factors such as format, 2 motivates our work using a case study in the healthcare
structure, and sources. In addition, the speed at which field. Section 3 introduces our approach; two
these data must be collected and analyzed is increasing. transformations processes are presented in this section,
This has definitely changed the tools we need to benefit the first one creates a NoSQL generic model starting
from Big Data, giving rise to new kinds of data from a UML conceptual model, and the second one
management tools. NoSQL systems are a widely generates NoSQL physical models from this generic
accepted tool able to support larger volumes of data by model; Section 4 details our experiments; Section 5
providing faster data access, better scalability and higher reviews previous work on models transformation; Section
flexibility [16]. 6 provides a discussion on our approach and announces
The lack of a model when creating a database is a key future work. Finally, Section 7 concludes the paper.
feature in NoSQL systems. In a table, attributes names
and types are specified as and when the row is entered II. ILLUSTRATIVE EXAMPLE
[8]. Unlike relational systems, where the model must be To motivate our work and illustrate the different steps
defined when creating the table, the schema less appears of our approach, we introduce in this section an example
in NoSQL systems. This property offers undeniable of Big Data application in the healthcare filed. This
flexibility that facilitates the evolution of models in application concerns international scientific programs for
NoSQL systems; but, it concerns exclusively the physical monitoring patients suffering from serious diseases. The
level (implementation) of a database [2]. A conceptual main goal of this program is (1) to collect data about
model is still required to define how data are stored and diseases development over time, (2) to study interactions
related in the database [3]. It provides a high level of between different diseases and (3) to evaluate the short
abstraction and a semantic knowledge element close to and medium-term effects of their treatments. The medical
human logic, which guarantees efficient data program can last up to 3 years.
management [1]. The Unified Modeling Language
(UML) has gained much attention in this area [1].
DOI 10.1109/AICCSA.2017.76
Figure 1. Overview of UMLtoNoSQL process.
Data collected from establishments involved in this conceptual (business description) and physical (technical
kind of program have the features of Big Data (the 3 V): description) levels in which a generic logical model is
Volume: the amount of data collected from all the developed. This logical model exhibits a sufficient degree
establishments in three years can reach several terabytes. of independence so as to enable its mapping to one or
more NoSQL platforms. Developers will benefit from it
Variety: data created while monitoring patients come in
in two ways: (1) it describes data according to the
different types; it could be (1) structured as the patient's
common features of NoSQL models (column, document
vital signs (respiratory rate, blood pressure, etc.), (2) and graph), which allow it’s mapping on several
semi-structured document such as the package leaflets of platforms. (2) it abstracts out technical details of NoSQL
medicinal products, (3) unstructured such as consultation systems, this mean that the logical level remains stable,
summaries, paper prescriptions and radiology reports. even though the NoSQL system evolves over time. In this
Velocity: some data are produced in continuous way by case, it would be enough to evolve the physical model,
sensors; it needs a [near] real time process because it and of course adapt the transformation rules; this
could be integrated into a time-sensitive processes (for simplifies the transformation process and saves time for
example, some measurements, like temperature, require developers.
an emergency medical treatment if they cross a given To formalize and automate UMLtoNoSQL process,
threshold). we use the Model Driven Architecture (MDA) proposed
This is a typical example in which the use of a by the OMG [4]. One of the main aims of MDA is to
NoSQL system is suitable. As mentioned. This kind of separate the functional specification of a system from the
systems operate on schema less data model enabling details of its implementation in a specific platform. This
users to quickly and easily incorporate new data into their architecture defines a hierarchy of models from three
applications without rewriting tables. Nevertheless, there points of view: Computation Independent Model (CIM),
is still a need for a semantic model to know how data are Platform Independent Model (PIM), and Platform
structured and related in the database; this is particularly Specific Model (PSM) [5]. Among this proposed models,
necessary to write declarative queries where tables and we use:
columns names are specified. UMLtoGenericModel (1) is the first transformation
UML is widely accepted as a standard modelling (section 3.1) in UMLtoNoSQL process. It is in charge of
language for describing complex data. In the medical converting the input UML class diagram (conceptual
application, briefly presented above, the database PIM) into the generic logical model (2) conforming to the
contains structured data, data of various types and generic logical metamodel proposed in Section 3.1.2; this
formats (explanatory texts, medical records, x-rays, etc.), metamodel describes a data structure compatible with the
and big tables (records of variables produced by sensors). three types of NoSQL systems.
Therefore, we choose the UML class diagram to describe GenericModeltoPhysicalModel (3) is the second
the medical data. transformation (section 3.2) in UMLtoNoSQL process. It
is in charge of transforming the generic logical model
III. CONTRIBUTION into NoSQL physical models (PSMs) (4).
Our purpose in this paper is to assist developers in We note that UMLtoNoSQL process generates
storing Big Data in NoSQL systems. For this, we propose several NoSQL physical models from a UML class
the UMLtoNoSQL approach that automatically diagram. In order to do this, it’s necessary to register, for
transforms a UML conceptual model describing Big Data each physical model, its specific parameters
into a NoSQL physical model. (transformation rules). To illustrate our work, we have
In our approach, we introduce a logical level between taken as example three physical models that correspond
to: Cassandra, MongoDB and Neo4j systems. If the variable number of attributes. The schema of
developer chooses to use another system, the process each attribute ୲ אA is a pair (N,Ty) where
must be completed by adding new parameters specific to “୲ .N” is the attribute name and “୲ .Ty” the
this system. attribute type.
A. UMLtoGenericModel Transformation • t. ୲ is a special attribute of t; it has a name
୲ .N and a type called “Rid”. In this paper,
In this section we present the UMLtoGenericModel an attribute whose type is “Rid” represents a
transformation, which is the initial step in our approach unique row identifier, i.e. an attribute whose
presented in Figure 1. We first define the source (UML value distinguishes a row from all other rows of
Class Diagram) and the target (Generic Logical Model), the same table.
and then we focus on the transformation itself.
Source: A Class Diagram (CD) is defined as a tuple R is a set of binary relationships. In the generic logical
(N, C, L), where: model there are only binary relationships between tables.
N is the class diagram name, Each relationship r אR between ଵ and ଶ is defined as a
C is a set of classes. Classes are composed from tuple (N, ୰ ), where:
structural and behavioral constituents; in this paper, we • r.N is the relationship name.
consider only the structural part. Since the operations are • r. ୰ = {ଵ୰ ǡ ଶ୰ } is a set of two pairs. i א
linked to the behavior, we will not take them into {1,2}, ୧୰ = (t, ୲ ), where ୧୰ .t is a related table
account. The schema of each class c אC is a tuple (N, A, and ୧୰ . ୲ is the cardinality placed next to t.
ୡ ), where: Metamodel of the proposed generic logical model is
• c.N is the class name. shown in Figure 3. Note that the attribute value may be
• c.A = {ܽଵ ǡ ǥ ǡ ܽ } is a set of q attributes. The either atomic or complex (set of attributes). We represent
schema of each attribute ܽ אA is a pair (N,C) this by using the XOR constraint (UML predefined
where “ܽ .N” is the attribute name and “ܽ .C” constraint).
the attribute type; C can be a predefined class, i.e.
a standard data type (String, Integer, Date ...) or a
business class (class defined by user).
• c. ୡ is a special attribute of c; it has a
name ୡ .N and a type called “Oid”. In this
paper, an attribute whose type is “Oid” represents
a unique object identifier, i.e. an attribute whose
value distinguishes an object from all other
objects of the same class.
L is a set of links. Each link l between n classes, with
n>=2, is defined as a tuple (N, Ty,ܲ ݎ ), where:
• l.N is the link name.
• l.Ty is the link type. In this paper, we will only
consider the three main types of links between
classes: Association, Composition and
Generalization.
• l. ୪ = {ଵ୪ ǡ ǥ ǡ ୬୪ } is a set of n pairs. i א
{1,..,n}, ୧୪ = (c, ୡ ), where ୧୪ .c is a linked
Figure 2. Source Metamodel.
class and ୧୪ . ୡ is the cardinality placed next to
c. Note that ୧୪ . ୡ can contain a null value if no
cardinality is indicated next to c (like in
generalization link).
Class diagram metamodel is shown in Figure 2; this
metamodel is adapted from the one proposed by the
OMG [7].
Target: The target of UMLtoGenericModel
transformation corresponds to a generic logical model
that describes data according to the common features of
the three types of NoSQL systems: column-oriented,
document-oriented and graph-oriented. In the generic
logical model, a DataBase (DB) is defined as a tuple (N,
T, R), where:
N is the database name, {XOR}