NoSQL Paper 2
NoSQL Paper 2
Article
An Introduction of NoSQL Databases Based on Their
Categories and Application Industries †
Jeang-Kuo Chen * and Wei-Zhe Lee
Department of Information Management, Chaoyang University of Technology, Taichung 41349, Taiwan;
[email protected]
* Correspondence: [email protected]
† This Paper is an Extended Version of the Conference Paper (ID: 1070) in Taichung, Taiwan,
6–8 December 2018, IS3C2018.
Received: 31 January 2019; Accepted: 9 May 2019; Published: 16 May 2019
Abstract: The popularization of big data makes the enterprise need to store more and more data.
The data in the enterprise’s database must be accessed as fast as possible, but the Relational Database
(RDB) has the speed limitation due to the join operation. Many enterprises have changed to use
a NoSQL database, which can meet the requirement of fast data access. However, there are more
than hundreds of NoSQL databases. It is important to select a suitable NoSQL database for a certain
enterprise because this decision will affect the performance of the enterprise operations. In this paper,
fifteen categories of NoSQL databases will be introduced to find out the characteristics of every
category. Some principles and examples are proposed to choose an appropriate NoSQL database for
different industries.
1. Introduction
The Relational Database (RDB) was developed from the 1970s to present. Through a powerful
Relational Database Management System (RDBMS), RDB is easy to use and maintain, and becomes
a widely used kind of database [1]. Due to the popularization of big data acquisition technologies
and applications, enterprises need to store more data than ever before. The enterprise’s database is
desired to be accessed as fast as possible. To obtain complex information from multiple relations, RDB
sometimes needs to perform SQL join operations to merge two or more relations at the same time, which
can lead to performance bottlenecks. Besides, except the relational data storage format, other data
storage formats have been proposed in many applications, such as key-value pairs, document-oriented,
time series, etc. As a result, more and more enterprises have decided to use NoSQL databases to store
big data [2–4].
However, there are more than 225 NoSQL databases [2]. How to choose an appropriate NoSQL
database for a specific enterprise is very important because the change of database may affect the
enterprise performance of the business operations. This paper introduces basic concepts, compares
the data formats and features, and lists some actual products for every category of NoSQL databases.
In addition, this paper also proposes principles and key points for different types of enterprises to
choose an appropriate NoSQL database to solve the business problems and challenges.
2. Related Work
are the data structures of RDM. A relation is a two-dimensional, normalized table to organize data.
Each relation consists of two parts, relation schema and relation instances, where relation schema
includes the name of the relation, the names of the attributes in the relation and their domains; relation
instances refer to the data records stored in the relation at a specific time. Table 1 shows an example of
a relation as follows:
As part of an RDB design, integrity constraints are used to check the correctness of the data input
into the database. Integrity constraints not only prevent authorized users from storing illegal data into
the database but also avoid data inconsistency between relations. There are four common kinds of
integrity constraints described as follows:
1. Key constraint: A relation must have a unique and minimal primary key;
2. Domain constraint: An attribute value of a relation must be an atomic one belonging to the
corresponding domain of the attribute;
3. Entity integrity constraint: Some principles for a primary key;
4. Referential integrity constraint: Some principles for a foreign key.
Table 2.
Table The
Thecommon
2. The commongeometries
geometriesof the ER-model.
of the
the ER-model.
Table 2. common geometries of ER-model.
ER-Model Elements
ER-Model Elements Symbols
Symbols
ER-Model Elements Symbols
Entity
EntityEntity
Weak
WeakWeak Entity
Entity Entity
Relationship-Entity
Relationship-Entity
Relationship-Entity
Relationship-Entity
REVIEW(Bridge
Algorithms 2019, 12, x FOR PEER (Bridge
(Bridge Entity)
Entity)Entity) 3 of 16
Algorithms 2019, 12, x FOR PEER REVIEW 3 of 16
Algorithms 2019, 12, x FOR PEER REVIEW 3 of 16
Algorithms 2019, 12, x FOR PEER REVIEW 3 of 16
Algorithms 2019, 12, x FOR PEER REVIEW 3 of 16
Algorithms 2019, 12, x FOR PEER REVIEW Relationship
Relationship 3 of 16
Relationship
Relationship
Relationship
Relationship
Identifying Relationship
Identifying
Identifying Relationship
Relationship
Relationship
Identifying Relationship
Identifying Relationship
Identifying Relationship
Attribute
Identifying Attribute
Relationship
Attribute Attribute
Attribute
Attribute
Key Attribute
KeyAttribute
Attribute
Key Attribute
Key Attribute
Key Attribute
Key Attribute
Key Attribute
Composite Attribute
Composite Attribute
Composite Attribute
Composite
Composite
Composite Attribute
Attribute
Attribute
Composite Attribute
Multivalued Attribute
Multivalued Attribute
Multivalued Attribute
Multivalued Attribute
Multivalued Attribute
Multivalued Attribute
An ERD example is shown Multivaluedin Figure 1. This is an ERD of a simple school database with four
Attribute
An ERD example is shown in Figure 1. This is an ERD of a simple school database with four
An ERD
entities: example
Students, is shown
selections, in Figure
courses, and 1. This is an where
employees, ERD ofbridge-entity
a simple school database
selections with four
is converted
An
entities:
An ERDERD example
Students,
example is shown
selections,
is shown in
courses, Figure 1. This is
and1.employees,
in Figure This is an
an where
ERD of
ERD ofbridge-entity
aa simple
simple school
school database
selections
database with four
is with four
converted
entities:
fromAna ERDStudents,
many-to-many
example selections, courses,
isrelationship.
showncourses, and
Theand
in Figure employees,
relationships where
This is anbetween
1. employees, ERD bridge-entity
of entities
simpledescribe
abridge-entity
schoolselections is
as follows:
database converted
with four
entities:
fromAn
entities: Students,
a many-to-many
ERD example
Students, selections,
relationship.
is shown
selections, The relationships
in Figure
courses, 1.employees, where
This is anbetween entities
ERD ofbridge-entity
a simple selections
describe
school as is
follows:
database converted
with four
from a Students,
entities: many-to-many relationship.
selections, courses, Theand
and relationships
employees, where
where
between
bridge-entity
selections
entities describe
selections
is converted
asisfollows:
converted from
from
from a
entities: many-to-many
Students,
a many-to-many relationship.
selections, courses,
relationship. The
andrelationships
The and employees,
relationships between
where entities describe
bridge-entity as follows:
selections
between entities describe as follows: is converted
1. A student
a many-to-many can select many courses vice versa;
from student relationship.
1. aAmany-to-many can select manyThecourses
relationship.
relationships
and vice between
The relationships versa; entities describe as follows:
between entities describe as follows:
1. An
2. A student
employeecancan
select
teachmanymany courses andbut
courses, vicea versa;
course can only be taught by an employee.
1.
2. AnA student can
employee select
can teach many many courses and
courses, vice
but versa;
a course can only be taught by an employee.
1.1.
2. AA An student
student canselect
can
employee select
can many
many
teach many courses
courses andvice
and
courses, viceversa;
but aversa;
course can only be taught by an employee.
1.2. AAn employee
student can can teach
select many manycoursescourses,
and but aversa;
vice course can only be taught by an employee.
2.2. An Anemployee
employeecancanteach
teachmanymanycourses,
courses,but buta acourse
coursecancanonly
onlybebetaught
taughtby
byananemployee.
employee.
2. An employee can teach many title courses, but a course
credits EIDcan onlyname
be taught by an employee.
title credits EID name
c_no title credits EID name
c_no title credits EID name
c_no title credits EID
office name salary
c_no office salary
c_no title credits EID
office name salary
N office
1 salary
teaches office salary
c_no Courses N 1 Employees
Employees
Courses N teaches office1 salary
Courses N teaches 1 Employees
Courses N teaches 1 Employees
remark Courses classroom teaches Employees
position
remark N 1
position ext
remark classroom teaches
Courses classroom Employees ext
position
remark time position ext
remark time classroom ext
M classroom SID position name ext
remark time M SID name
time M classroom SIDposition name ext
time M SID
Selections
M N SID name address
time Selections N Students name address
Selections
M N SIDStudents address
Selections N Students name address
address
Selections N Students
Students
gradesSelections
grades N e-mail parent_name address telephone
telephone
grades e-mail Students parent_name telephone
grades e-mail parent_name telephone
grades e-mail parent_name telephone
e-mail parent_name
grades e-mail parent_name telephone
Figure 1. An entity-relationship diagram (ERD) example.
Figure 1. An entity-relationship diagram (ERD) example.
Figure 1. An entity-relationship diagram (ERD) example.
Figure 1.1. An entity-relationship
entity-relationship diagram (ERD)
diagram example.
2.3. Big Data
Figure
Figure 1. An entity-relationship
An diagram ((ERD)
ERD) example.
example.
2.3. Big
2.3. Data Figure 1. An entity-relationship diagram (ERD) example.
2.3.Big
BigData
Data
2.3. Big
What
2.3. Big Data
Datais big data? Different ages have different answers. Today, big data refers to materials that
What is big data? Different ages have different answers.Today,
Today, big data refers to materials that
areWhat
2.3. What
Big
What
isis
Data
difficult big
is
bigdata?
to
bigstore
data?
Different
data?inDifferent
RDBs and
Different
ages
ageshave
havedifferent
cannot
ages have
different
be answers.
processed
different
answers. Today,big
by stand-alone
answers. Today,
bigdata
big
datarefers
data
data
referstotomaterials
analysis
refers to
materials
and that
that
statistical
materials that
are difficult
Whatto to store
is to
big data? in RDBs and cannot be processed by stand-alone data refers
analysis and statistical
are
aredifficult
tools.difficult
This datastore
store
needsinDifferent
in RDBs RDBs
to be
ages
andstored
cannot
and have be
cannot different
inbea processed
large processedanswers.
by
parallel
Today,
stand-alone
by stand-alone
system with
biganalysis
data data
tensdata and to
analysis
or hundreds
materials
statistical
and that
tools.
statistical
of machines,
are difficult
What
tools. This is to store
big
data data?
needs in RDBs
Different
to be and
ages
stored cannot
have
in a be processed
different
large parallel by stand-alone
answers.
systemToday,
with big
tensdata
data
or analysis
refers
hundreds to and statistical
materials
of that
machines,
are
tools.difficult
This to store
data needsin to
RDBs and cannot
be system
stored in be processed
a large parallel by stand-alone
system with tens data hundreds
analysis and statistical
while
tools. the NoSQL
This data database
needs to be
be stored
stored just
in has these
a large
large features,
parallel system suitable
with tens for or
tens storing
or hundreds
of machines,
big data and can
of machines,
machines,
are difficult
while
tools. This to store
the NoSQL
data needsin to
RDBs
database and cannot
system injust
a be
hasprocessed
these by stand-alone
features,
parallel system suitable
with data
for analysis
storing
or hundreds biganddata
of statistical
and can
while the
quickly NoSQL
access datadatabase
for system
various just hasprocessing.
application these features,
Big suitable
data can for storing
apply to all big
areasdata
of and can
daily life
while
quickly
tools.
while the
This
the NoSQL
access data
datadata
NoSQL database
needsfortovarious
database system
be system
stored in just
application has
a large
just these
processing.features,
parallel
hasprocessing.
these system
features, suitable
Big data
withcantens
suitable for
apply
for orstoring
to allbig
hundreds
storing big
areas data
of
ofof
data and
daily
machines,
and cancan
life
quickly
(such access
asaccess
socialdata for various
networking, application
e-commerce, Big data
etc.) and scientific can apply
research to all
(such areas daily
as astronomical life
quickly
(such
while
quickly theas social
NoSQL for
databasevarious
networking, systemapplication
e-commerce,
just has processing.
etc.)
these and Big data
scientific
features, can
suitable apply
research
for to all
(such
storing areas
bigas of daily
astronomical
data and life
can
(such asaccess
meteorology, socialdata for various e-commerce,
networking,
clinical
application processing.
medicine,e-commerce,
etc.), and the and Big
etc.)continued data can
scientific apply to all areas
growthresearch
of data (such
of daily life
as astronomical
has forced people to
(such
quickly
(such as
meteorology,
as social
access
social datanetworking,
clinical
for medicine,
various e-commerce,
networking, application etc.)continued
etc.), andprocessing.
the
etc.) and Big
and scientific
growth
data
scientific canresearch
of datato
apply
research (such
has as astronomical
forced
all areas
(such as astronomical
of people
daily life to
meteorology, clinical medicine, etc.),
reconsider the storage and management of data [3,4]. and the continued growth of data has forced people to
Algorithms 2019, 12, 106 4 of 17
This data needs to be stored in a large parallel system with tens or hundreds of machines, while the
NoSQL database system just has these features, suitable for storing big data and can quickly access
data for various application processing. Big data can apply to all areas of daily life (such as social
networking, e-commerce, etc.) and scientific research (such as astronomical meteorology, clinical
medicine, etc.), and the continued growth of data has forced people to reconsider the storage and
management of data [3,4].
The features of big data (i.e., 4V) are described as follows [3].
1. Non-relational: NoSQL databases do not use relational database model, neither does support
SQL join operations. In addition, unlike RDBs to obtain advanced data through join operations,
NoSQL databases do not support join operations, the related data needs to be stored together to
improve the speed of data access.
2. Distributed: Data in NoSQL databases is usually stored in different servers and the locations of
the stored data are managed by metadata.
3. Open-source: Unlike most RDBs that require a fee to purchase, most NoSQL databases are open
source and free to download.
4. Horizontally scalable: Increase or decrease multiple normal servers to meet the data processing
capacity of NoSQL database.
5. Schema-free: Unlike RDBs need to define database schema before inserting data, NoSQL databases
do not need to do this. Therefore, NoSQL databases can flexibly add data.
6. Easy replication support: NoSQL databases mostly support master-slave replication or
peer-to-peer replication, making it easier for NoSQL databases to ensure high availability.
7. Simple API: The NoSQL database provides APIs for network delivery, data collection, etc. for
programmers to use, so that programmers do not need to design additional programs to make
writing programs easier.
8. BASE is an abbreviation for “basically available, soft-state, and eventual consistency,” and the
meanings are described as follows.
Algorithms 2019, 12, 106 5 of 17
(1) Basically available: The DB system can execute and always provide services. Some parts
of the DB system may have partial failures and the rest of the DB system can continue
to operate. Some NoSQL DBs typically keep several copies of specific data on different
servers, which allows the DB system to respond to all queries even if few of the servers fail.
(2) Soft-state: The DB system does not require a state of strong consistency. Strong consistency
means that no matter which replication of a certain data is updated, all later reading
operations of the data must be able to obtain the latest information.
(3) Eventual consistency: The DB system needs to meet the consistency requirement after
a certain time. Sometimes the DB may be in an inconsistent state. For example, some
NoSQL DBs keep multiple copies of certain data on multiple servers. However, these
copies may be inconsistent in a short time, which may happen when a copy of the data is
updated while the other copies continue to have data from the old version. Eventually, the
replication mechanism in the NoSQL DB system will update all replicas to be consistent.
According to the statistics of the NoSQL database official website [2], the current number of
NoSQL databases has more than 225. Moreover, some NoSQL databases are widely used in many
famous enterprises such as Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on [3].
(1) Hecht and Jablonski [6] evaluated the relevant technologies of some of the four common NoSQL
database categories (i.e., key value store, document Store, wide column store, and graph databases)
to assist users in selecting an appropriate NoSQL database. Related technologies include data
models, queries, concurrency controls, partitions, and replication.
(2) Lourenço et al. [7] compared several quality attributes for several NoSQL databases. The evaluated
NoSQL databases contain Aerospike, Cassandra, Couchbase, CouchDB, HBase, MongoDB,
and Voldemort, while the quality attributes include availability, consistency, durability,
maintainability, read and write performance, recovery time, reliability, robustness, scalability, and
stabilization time.
(3) Corbellini et al. [8] reviewed the basic concepts of four common categories of NoSQL databases and
compared some databases for each category. In addition, this paper also discussed how to select
an appropriate NoSQL database from existing databases. The decision-making factors include
data analysis, hardware scalability (horizontally scalable and BASE [3,4]), flexibility schema, fast
deployment of servers (replication and sharding configuration), distributed technology, etc.
(4) Khazaei et al. [9] illustrated the basic concepts of four popular NoSQL database models and
evaluated some databases for each model. In this paper, the authors discussed several factors
to be considered in order to select an appropriate NoSQL database, such as data model, access
patterns, queries, non-functional requirements (including data access performance, replication,
partition, horizontally scalable, BASE [3,4], software development and maintenance, etc.).
(5) Gessert et al. [10] linked functional requirements, non-functional requirements in the NoSQL
database to the used technologies, and provided decision trees to assist users in selecting the
appropriate NoSQL database, where:
(d) Evaluated NoSQL databases contain MongoDB, Redis, HBase, Riak, and Cassandra.
(6) Davoudian et al. [11] clarified four factors for deciding a suitable NoSQL database, such as
data model, consistency model, data partitioning, and CAP theorem, and further explained
the available strategies, features, advantages, and disadvantages for them. This is helpful for
selecting an appropriate NoSQL database.
1. A row key is an identification that has a unique value used to identify a specific record, similar to
the primary key of a relation in RDB.
2. A timestamp (abbreviated as ts) is an integer used to identify a specific version of a data value.
3. At least one column family that has the format of “Family: Qualifier = Value,” where “Family” is
the name of a column family, “Qualifier” is the name of a column qualifier, and “Value” is a real
value of a column qualifier stored in text.
4. The name of a column family need to be defined when the table is created, but the name of a
column qualifier does not.
5. Users can find the actual data value through the value of a specific row key, the name of a specific
column family, the name of a specific column qualifier, and the value of a specific timestamp.
1. Products_Inventory is the name of the inventory table, which contains two column families,
products, and inventory, and has three records with the product codes P001, P002, and P003 as
the values of three row keys, respectively;
2. An increasing integer ti (i = 1, 2, . . . , 18) is the value of timestamp for each column qualifier when
a data value of a column qualifier is inserted into the table;
3. Column family products includes four column qualifiers: Classes, title, descriptions, price, and
their data values, for example, are “TV”, “SONY 55 inch 4K OLED Smart Networked TV”, “TBD”,
and “24999”, respectively;
4. Column family inventory includes two column qualifiers: Quantity, place, and their data values,
for example, are “10” and “1A”, respectively.
Algorithms 2019, 12, 106 7 of 17
According to the statistics of the DB-Engines Ranking website [15], Apache Cassandra and Apache
HBase are the more widely discussed ones of the wide column store databases.
• A collection is a group of documents. The documents within a collection are usually related to the
same subject, such as employees, products, and so on.
• A document is a set of ordered key-value pairs, where key is a string used to reference a particular
value, and value can be either a string or a document.
• JSON (JavaScript Object Notation), BSON (Binary JSON), and XML (eXtensible Markup Language)
are formats commonly used to define documents.
• Embedded documents are documents within documents. An embedded document enables users
to store related data in a single document to improve database performance.
• Document store databases do not require users to formally specify the structure of documents
prior to adding documents to a collection. Therefore, document databases are called schemaless
ones. Application programs should verify rules about the structure of a document.
{
{
"c_no": "C001",
"title": "Accounting",
"credits": 3,
"instructor": "Zoe"
},
{
"c_no": "C002",
"title": "Economics",
"credits": 3,
"instructor": "Wendy"
},
{
"c_no": "C003",
"title": "Computer Science",
"credits": 3,
"instructor": "Cathy"
}
}
According
According totothe
thestatistics of the
statistics of DB-Engines Ranking
the DB-Engines WebsiteWebsite
Ranking [15], the MongoDB and Couchbase
[15], the MongoDB and
Server are the
Couchbase moreare
Server widely discussed
the more widelyones of the document
discussed ones of thestore databases.
document store databases.
Suppose that an online shopping website uses a key value store database to store data as shown in
Figure 3. This database includes several namespaces, such as “products” and “customers” [5], where
1. The key in the namespace “Products” is the ID of products, and the value is the details
about products;
2. The key in the namespace “Customers” is the ID of customers, and the value is the details
about customers.
shown in Figure 3. This database includes several namespaces, such as “products” and “customers”
[5], where
1. The key in the namespace “Products” is the ID of products, and the value is the details about
products;
2. The
Algorithms key
2019, 12, in
106the namespace “Customers” is the ID of customers, and the value is the details about
9 of 17
customers.
Products Customers
Key Value Key Value
{ {
"classes": "TV", "username": "Jack",
"title": "LG 55 inch 4K LED TV", "telephone": "0939-619997",
P001 C001
"price": 32000 "rank": "Normal"
} }
{ {
"classes": "Laptop", "username": "Cindy",
"title": "ASUS FX503VD i7 gaming laptop", "telephone": "0939-519973",
P002 C002
"price": 36000 "rank": "Platinum"
} }
Figure
Figure 3. An
3. An example
example of of
twotwo namespaces
namespaces in ainkey
a key value
value store
store database.
database.
According to to
According thethe
statistics of of
statistics thethe
DB-Engines
DB-EnginesRanking
RankingWebsite [15],
Website both
[15], Redis
both and
Redis DynamoDB
and DynamoDB
areare
thethe
more widely
more widelydiscussed
discussedones of the
ones keykey
of the value store
value databases.
store databases.
3.4.3.4.
Graph Databases
Graph Databases
The graph
The database
graph model
database (GDM)
model is composed
(GDM) of vertices
is composed and
of vertices edges
and [5],[5],
edges where
where
1. 1. A A vertex
vertex is is
anan entity
entity instance,
instance, which
which is is equivalent
equivalent to to a tuple
a tuple in in RDM;
RDM;
2. 2. AnAn edge
edge is is used
used toto define
define thethe relationship
relationship between
between vertices;
vertices;
3. 3. Each vertex and edge contains any number of attributes
Each vertex and edge contains any number of attributes that that store
store thethe actual
actual data
data value.
value.
An Oceania airline is illustrated as an example. The airline needs to store flight hours among
An Oceania airline is illustrated as an example. The airline needs to store flight hours among some
some cities. The data can be stored in a graph database as shown in Figure 4. In this graph database,
cities. The data can be stored in a graph database as shown in Figure 4. In this graph database, each
each vertex contains some data such as nation, city, and A2C_time (time from an airport to a city
vertex contains12,some
Algorithms
data such as nation, city, and A2C_time (time from an airport to a city center),
center),2019,
and each x FOR PEER
edge REVIEW the flight duration between two cities [5].
represents 9 of 16
and each edge represents the flight duration between two cities [5].
Figure 4.of
According to the statistics Anthe
example of data
DB-Engines stored in
Ranking a graph
website database.
[15], Neo4J and FlockDB are the
more widely discussed ones of the graph databases.
According to the statistics of the DB-Engines Ranking website [15], Neo4J and FlockDB are the
more
3.5. widely discussed
Multimodel Databasesones of the graph databases.
The data format of this category of NoSQL databases contains more than two data formats of
3.5. Multimodel Databases
the other categories of NoSQL databases [16]. According to the statistics of the DB-Engines Ranking
The data format of this category of NoSQL databases contains more than two data formats of
the other categories of NoSQL databases [16]. According to the statistics of the DB-Engines Ranking
website [15], OrientDB and ArangoDB are more widely discussed ones of multimodel databases.
OrientDB contains the data formats of object database, document store, graph database, and key
value store; while ArangoDB contains the data formats of document store, graph database, and key
value store [2].
According to the statistics of the DB-Engines Ranking website [15], Neo4J and FlockDB are the
more widely discussed ones of the graph databases.
Order_details
quantity
readData()
writeData()
deleteData()
Figure 5. An
Figure example
5. An exampleofofaaclass diagramininananobject
class diagram object database.
database.
Figure 6. 6.
Figure AnAn
example
exampleof
ofaa data fileininan
data file anXML
XML database.
database.
3.9. Multidimensional
3.9. Multidimensional Databases
Databases
The data
The data inin this
this category of NoSQL
category of databases is
NoSQL databases is stored
stored in
in aa multidimensional
multidimensional arrayarray in
in order
order to
to
analyze the value of each array element. Suppose a printing company stores
analyze the value of each array element. Suppose a printing company stores data in a data in a multidimensional
database as showndatabase
multidimensional in Figureas7 [19].
shownTheinprinting
Figure 7company
[19]. Theneeds to analyze
printing companythe needs
total sales amountthe
to analyze of
printed products
total sales amountbased on three
of printed dimensions:
products basedProducts, branches, and
on three dimensions: customerbranches,
Products, rank. Forand
example, the
customer
company has two branches, Taipei and Tainan, three products, copy paper, photo
rank. For example, the company has two branches, Taipei and Tainan, three products, copy paper, paper, and poster,
and two
photo customer
paper, ranks, and
and poster, platinum member ranks,
two customer and normal member.
platinum member Theand
bossnormal
of the member.
printing company
The boss
wants
of the printing company wants the total sales amount of each branch, each product, andtoeach
the total sales amount of each branch, each product, and each customer rank. According the
statistics
customer of the DB-Engines
rank.12,According
Algorithms 2019, Ranking website [15], intersystems cache and GT.M are the
to the statistics of the DB-Engines Ranking website [15], intersystems
x FOR PEER REVIEW more widely
11 of 16
discussed
cache and ones
GT.Mofarethethemultidimensional databases.
more widely discussed ones of the multidimensional databases.
Products
Figure 7. An example of a three-dimensional array in a multidimensional database.
Figure 7. An example of a three-dimensional array in a multidimensional database.
3.10. Multivalue Databases
3.10.This categoryDatabases
Multivalue of NoSQL databases is suitable for storing data of multivalued attributes or composite
attributes [20]. An example of student data is illustrated in a table of multivalue databases as shown
This category of NoSQL databases is suitable for storing data of multivalued attributes or
in Table 4. The schema of the table is students (SID, name, and society), where name is a composite
composite attributes [20]. An example of student data is illustrated in a table of multivalue
attribute composed of the two attributes, First_name and Last_name, society is a multivalued attribute.
databases as shown in Table 4. The schema of the table is students (SID, name, and society), where
There are six records in this data table, the name of each student is divided into two parts to save into
name is a composite attribute composed of the two attributes, First_name and Last_name, society is
the attributes, First_name and Last_name, respectively, and the attending societies of each student can
a multivalued attribute. There are six records in this data table, the name of each student is divided
into two parts to save into the attributes, First_name and Last_name, respectively, and the attending
societies of each student can have more than one value. According to the statistics of the DB-Engines
Ranking website [15], jBASE and Model 204 Database are the more widely discussed ones of the
multivalue databases.
have more than one value. According to the statistics of the DB-Engines Ranking website [15], jBASE
and Model 204 Database are the more widely discussed ones of the multivalue databases.
Time
Person Current Enrolment Number
(dd/mm/yy)
22/12/2018 12:30 Amy 1
25/12/2018 10:40 Ruby 2
28/12/2018 13:20 Cindy 3
29/12/2018 14:10 John 4
30/12/2018 15:00 Mary 5
31/12/2018 16:50 Zoe 6
Measurement Time
Air Quality Index (AQI) The Density of PM2.5
(dd/mm/yy)
01/01/2018 00:00 156 45
01/01/2018 01:00 101 29
01/01/2018 02:00 97 19
... ... ...
31/12/2018 21:00 133 34
31/12/2018 22:00 135 36
31/12/2018 23:00 141 43
3.16. Summary
The basic concepts of each category of NoSQL databases have been described. Then, all the
categories of NoSQL databases are analyzed to get the results that each NoSQL database is suitable for
processing certain features of data. The results are summarized in Table 7.
Algorithms 2019, 12, 106 14 of 17
1. Understand the current problems, goals, and challenges of the corporate operation database.
2. The engineers of the IT center or database administrators (DBAs) must decide to continue
using the current RDB or change using a NoSQL database based on the needs of enterprise and
their expertise.
3. If changing to use a NoSQL database, the IT engineers or DBA first select a suitable category of
NoSQL databases based on the features and formats of the enterprise’s operating data.
4. When deciding which NoSQL database to choose, the IT engineers or DBA can make a decision
according to the needs of the enterprise, the characteristics of each database, as well as the
reputation and popularity of each database on websites (for example, DB-Engines Ranking
website [15], vschart [25]). The more websites we query for this information, the more accurate the
reputation and popularity of each database, and the more we can find the right NoSQL database.
large amount of data. Therefore, the supervisor recommends using the NoSQL database as a solution
because NoSQL databases can merge some tables of RDB in advance so that when querying a NoSQL
database, the desired data can be read quickly without waiting much time to do join operations.
After the business owner agrees, the head of the information department will then decide which
NoSQL database to use. The decision process is as follows.
1. The most suitable category of NoSQL database is the wide column store because access to the
database often requires searching for data in a specific field.
2. According to the DB-Engines Ranking website [15], the wide column store databases that are
more commonly discussed on the internet are Apache Cassandra and Apache HBase.
3. According to the experimental results of Chen et al. [26], the time of Apache HBase to read data is
less than that of Apache Cassandra. Therefore, Apache HBase is recommended as the NoSQL
database used by the enterprise.
1. Since the newspaper needs to collect files generated by a large number of instant messages such
as tens of thousands of online news and related readers’ messages every day, it is necessary to
replace the RDB with a NoSQL database.
2. There are fifteen categories of NoSQL databases available, and the category found to be suitable
for storing news multimedia materials is the document store.
3. According to the DB-Engines Ranking website [15], the document store database that is often
discussed on the internet has two NoSQL databases, MongoDB and Couchbase Server. Since
the former has a higher market share than the latter, it is recommended to use MongoDB as the
NoSQL database for the company.
For the above reasons, the head of the information department in this corporation decides to use a
NoSQL database to replace RDB. The decision process is described as follows.
1. The most suitable category of NoSQL database for the enterprise is graph databases because
graph databases is the most suitable for the recommendation system as described in Table 7.
2. According to the statistics of DB-Engines Ranking website [15], the most discussed NoSQL
database in graph databases are Neo4j and FlockDB.
3. Since Neo4j has the best market share among all graph databases [27]; thereby, Neo4j is
recommended as the NoSQL database used by this enterprise.
Algorithms 2019, 12, 106 16 of 17
5. Conclusions
The main contents of this paper are as follows. First of all, we introduce the basic characteristics
of the fifteen categories of NoSQL database (such as the wide column store, document store, key
value store, and graph databases, etc.) in the NoSQL database official website [2]. Then we analyze
the characteristics of the data that each category of NoSQL database is suitable for processing. Next,
we propose some principles and key points for reference to help enterprises to find an appropriate
NoSQL database from more than 225 ones when enterprises intend to abandon the use of RDB to use
NoSQL database. Finally, we illustrate three cases, 3C shopping website, newspapers, and the US retail
industry, to demonstrate how a particular company can choose a suitable NoSQL database to improve
its competitiveness and customer services.
In summary, if a company abandons RDB and switches to NoSQL DB, it needs to consider
the characteristics of the company’s data in order to find the right DB. The transaction data of the
e-commerce industry often needs to be related, the suitable NoSQL DB category is the wide column store,
and Apache HBase is a good choice. The news materials of the news industry have semi-structured
features. The suitable NoSQL DB category is the document store, and the better choice is MongoDB.
The retailer data needs to be used by the recommendation system, so the suitable NoSQL DB category
is the graph databases, and the best choice is Neo4j. We hope that these principles and examples will
help decision makers to change databases correctly.
Author Contributions: Conceptualization, J.-K.C.; methodology, J.-K.C. and W.-Z.L.; writing—original draft
preparation, W.-Z.L.; writing—review and editing, J.-K.C.; supervision, J.-K.C.; project administration, J.-K.C.
Funding: This research received no external funding.
Acknowledgments: Thanks to the reviewers for providing a lot of valuable comments to make this paper
more complete.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Chen, H.A. Database System: Concept, Design, and Implementation, 3rd ed.; XBOOK MARKETING Co., Ltd.:
Taipei, Taiwan, 2013. (In Chinese)
2. NoSQL Databases. Available online: http://nosql-database.org/ (accessed on 20 January 2019).
3. Pi, S.J. Establish the Cornerstone of Big Data: NoSQL Database Technique, 2nd ed.; TopTeam Information Co.,
Ltd.: Taipei, Taiwan, 2016. (In Chinese)
4. Lu, J.H. Challenge Big Data, How to Process Big Data in Facebook, Google, Amazon? Use NoSQL to Get 10 Billion
Annual Hard Disk Data, 2nd ed.; TopTeam Information Co., Ltd.: Taipei, Taiwan, 2015. (In Chinese)
5. Sullivan, D. NoSQL for Mere Mortals, 1st ed.; Pearson P T R: London, UK, 2015.
6. Hecht, R.; Jablonski, S. NoSQL Evaluation: A Use Case Oriented Survey. In Proceedings of the 2011
International Conference on Cloud and Service Computing, Hong Kong, China, 12–14 December 2011.
7. Lourenço, J.R.; Cabral, B.; Carreiro, P.; Vieira, M.; Bernardino, J. Choosing the right NoSQL database for the
job: A quality attribute evaluation. J. Big Data 2015, 2, 18:1–18:26. [CrossRef]
8. Corbellini, A.; Mateos, C.; Zunino, A.; Godoy, D.; Schiaffino, S. Persisting big-data: The NoSQL landscape.
Inf. Syst. 2016, 63, 1–23. [CrossRef]
9. Khazaei, H.; Fokaefs, M.; Zareian, S.; Beigi-Mohammadi, N.; Ramprasad, B.; Shtern, M.; Gaikwad, P.;
Litoiu, M. How do I Choose the Right NoSQL Solution? A Comprehensive Theoretical and Experimental
Survey. Big Data Inf. Anal. 2016, 1, 185–216.
10. Gessert, F.; Wingerath, W.; Friedrich, S.; Ritter, N. NoSQL database systems: A survey and decision guidance.
Softw.-Intensiv. Cyber-Phys. Syst. 2017, 32, 353–365. [CrossRef]
11. Davoudian, A.; Chen, L.; Liu, M. A Survey on NoSQL Stores. ACM Comput. Surv. (CSUR) 2018, 51, 40:1–40:43.
[CrossRef]
12. Dimiduk, N.; Khurana, A. HBase in Action, 1st ed.; Oreilly & Associates Inc.: New York, NY, USA, 2012.
13. Lu, J.H. Hadoop: Practical Technical Handbook, 2nd ed.; TopTeam Information Co., Ltd.: Taipei, Taiwan, 2014.
(In Chinese)
Algorithms 2019, 12, 106 17 of 17
14. George, L. HBase: The Definitive Guide, 1st ed.; Oreilly & Associates Inc.: New York, NY, USA, 2011.
15. DB-Engines Ranking. Available online: https://db-engines.com/en/ranking (accessed on 4 March 2018).
16. Multi-Model Databases (Wikipedia). Available online: https://en.wikipedia.org/wiki/Multi-model_database
(accessed on 15 June 2018).
17. Wu, R.H. Object-Oriented System Analysis and Design: An MDA Approach with UML, 4th ed.; BestWise Co.,
Ltd.: Taipei, Taiwan, 2013. (In Chinese)
18. Document-Oriented Database (Wikipedia). Available online: https://en.wikipedia.org/wiki/Document-
oriented_database (accessed on 15 June 2018).
19. Multidimensional Databases. Available online: https://docs.oracle.com/cd/E12478_01/rpas/pdf/150/html/
classic_client_user_guide/basic_rpas_concepts/multidimensional_databases.htm (accessed on 5 May 2018).
20. MultiValue (Wikipedia). Available online: https://en.wikipedia.org/wiki/MultiValue (accessed on
15 June 2018).
21. Introducing to Event Sourcing. Available online: https://msdn.microsoft.com/en-us/library/jj591559.aspx#sec1
(accessed on 16 January 2018).
22. Time Series Database (Wikipedia). Available online: https://en.wikipedia.org/wiki/Time_series_database
(accessed on 16 January 2018).
23. Time Series (Wikipedia). Available online: https://en.wikipedia.org/wiki/Time_series (accessed on
16 January 2018).
24. Central Weather Bureau. Available online: https://www.cwb.gov.tw/eng/index.htm (accessed on 10 July 2018).
25. vsChart.com: The Comparison Wiki: Database List. Available online: http://vschart.com/list/database/
(accessed on 18 February 2019).
26. Chen, C.Y.; Chang, B.R.; Tsai, H.F.; Guo, C.L. Empirical Analysis of High Efficient Remote Cloud Data Center
Backup Using HBase and Cassandra. Sci. Progr. 2014, 2015, 1–10.
27. Neo4j: Walmart Case Study. Available online: https://neo4j.com/case-studies/walmart/ (accessed on
10 December 2018).
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Copyright of Algorithms is the property of MDPI Publishing and its content may not be
copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for
individual use.