An Introduction of NoSQL Databases Based On Their
An Introduction of NoSQL Databases Based On Their
Abstract: The popularization of big data makes the enterprise need to store more and more data.
The data in the enterprise’s database must be accessed as fast as possible, but the Relational
Database (RDB) has the speed limitation due to the join operation. Many enterprises have changed
to use a NoSQL database, which can meet the requirement of fast data access. However, there are
more than hundreds of NoSQL databases. It is important to select a suitable NoSQL database for a
certain enterprise because this decision will affect the performance of the enterprise operations. In
this paper, fifteen categories of NoSQL databases will be introduced to find out the characteristics
of every category. Some principles and examples are proposed to choose an appropriate NoSQL
database for different industries.
1. Introduction
The Relational Database (RDB) was developed from the 1970s to present. Through a powerful
Relational Database Management System (RDBMS), RDB is easy to use and maintain, and becomes
a widely used kind of database [1]. Due to the popularization of big data acquisition technologies
and applications, enterprises need to store more data than ever before. The enterprise’s database is
desired to be accessed as fast as possible. To obtain complex information from multiple relations,
RDB sometimes needs to perform SQL join operations to merge two or more relations at the same
time, which can lead to performance bottlenecks. Besides, except the relational data storage format,
other data storage formats have been proposed in many applications, such as key-value pairs,
document-oriented, time series, etc. As a result, more and more enterprises have decided to use
NoSQL databases to store big data [2–4].
However, there are more than 225 NoSQL databases [2]. How to choose an appropriate NoSQL
database for a specific enterprise is very important because the change of database may affect the
enterprise performance of the business operations. This paper introduces basic concepts, compares
the data formats and features, and lists some actual products for every category of NoSQL databases.
In addition, this paper also proposes principles and key points for different types of enterprises to
choose an appropriate NoSQL database to solve the business problems and challenges.
2. Related Work
are the data structures of RDM. A relation is a two-dimensional, normalized table to organize data.
Each relation consists of two parts, relation schema and relation instances, where relation schema
includes the name of the relation, the names of the attributes in the relation and their domains;
relation instances refer to the data records stored in the relation at a specific time. Table 1 shows an
example of a relation as follows:
1. The name of this relation is Students;
2. The names of the attributes in this relation are SID, name, telephone, and birthday, respectively;
3. The domain of each attribute is a collection of acceptable data values for the attribute, for
example, the acceptable data value of attribute birthday is date;
4. There are five data records in this relation.
As part of an RDB design, integrity constraints are used to check the correctness of the data input
into the database. Integrity constraints not only prevent authorized users from storing illegal data
into the database but also avoid data inconsistency between relations. There are four common kinds
of integrity constraints described as follows:
1. Key constraint: A relation must have a unique and minimal primary key;
2. Domain constraint: An attribute value of a relation must be an atomic one belonging to the
corresponding domain of the attribute;
3. Entity integrity constraint: Some principles for a primary key;
4. Referential integrity constraint: Some principles for a foreign key.
Weak Entity
Relationship-Entity
(Bridge Entity)
Relationship
Algorithms 2019, 12, 106 3 of 16
Identifying Relationship
Attribute
Key Attribute
Composite Attribute
Multivalued Attribute
An ERD example is shown in Figure 1. This is an ERD of a simple school database with four
entities: Students, selections, courses, and employees, where bridge-entity selections is converted
from a many-to-many relationship. The relationships between entities describe as follows:
N 1 Employees
Courses teac
remar positio
classroo ext
tim
M SI nam
Selections N addres
Students
4. Velocity: It refers to that enterprises not only need to know how to quickly collect data, but also
must know how to process, analyze, and pass back the results to users to meet their immediate
needs.
According to the statistics of the NoSQL database official website [2], the current number of
NoSQL databases has more than 225. Moreover, some NoSQL databases are widely used in many
famous enterprises such as Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on [3].
Algorithms 2019, 12, 106 5 of 16
3. At least one column family that has the format of "Family: Qualifier = Value," where “Family” is
the name of a column family, “Qualifier” is the name of a column qualifier, and “Value” is a real
value of a column qualifier stored in text.
4. The name of a column family need to be defined when the table is created, but the name of a
column qualifier does not.
5. Users can find the actual data value through the value of a specific row key, the name of a specific
column family, the name of a specific column qualifier, and the value of a specific timestamp.
An example is illustrated as follows. An inventory table of 3C products in a wide column store
database is shown in Table 3, where:
1. Products_Inventory is the name of the inventory table, which contains two column families,
products, and inventory, and has three records with the product codes P001, P002, and P003 as
the values of three row keys, respectively;
2. An increasing integer ti (i = 1, 2, …, 18) is the value of timestamp for each column qualifier when
a data value of a column qualifier is inserted into the table;
3. Column family products includes four column qualifiers: Classes, title, descriptions, price, and
their data values, for example, are “TV”, “SONY 55 inch 4K OLED Smart Networked TV”,
“TBD”, and “24999”, respectively;
4. Column family inventory includes two column qualifiers: Quantity, place, and their data values,
for example, are “10” and “1A”, respectively.
According to the statistics of the DB-Engines Ranking website [15], Apache Cassandra and
Apache HBase are the more widely discussed ones of the wide column store databases.
{
{
"c_no": "C001",
"title": "Accounting",
"credits": 3,
"instructor": "Zoe"
},
{
"c_no": "C002",
"title": "Economics",
"credits": 3,
"instructor": "Wendy"
},
{
"c_no": "C003",
"title": "Computer Science",
"credits": 3,
"instructor": "Cathy"
}
}
Figure 2. An example of a collection in a document store database.
According to the statistics of the DB-Engines Ranking Website [15], the MongoDB and
Couchbase Server are the more widely discussed ones of the document store databases.
3. In key value store databases, operations on values are derived from keys. Users can retrieve, set,
and delete a value by a key;
4. A namespace is a logical data structure that can contain any number of key-value pairs.
Suppose that an online shopping website uses a key value store database to store data as shown
in Figure 3. This database includes several namespaces, such as “products” and “customers” [5],
where
1. The key in the namespace “Products” is the ID of products, and the value is the details about
products;
2. The key in the namespace “Customers” is the ID of customers, and the value is the details about
customers.
Products Customers
Key Value Key Value
{ {
"classes": "TV", "username": "Jack",
"title": "LG 55 inch 4K LED TV", C001
"telephone": "0939-619997",
P001
"price": 32000 "rank": "Normal"
} }
{ {
"classes": "Laptop", "username": "Cindy",
"title": "ASUS FX503VD i7 gaming laptop" "telephone": "0939-519973",
P002 C002
"price": 36000 "rank": "Platinum"
} }
According to the statistics of the DB-Engines Ranking Website [15], both Redis and DynamoDB
are the more widely discussed ones of the key value store databases.
According to the statistics of the DB-Engines Ranking website [15], Neo4J and FlockDB are the
more widely discussed ones of the graph databases.
Order_details
quantity
readData()
writeData()
deleteData()
3.16. Summary
The basic concepts of each category of NoSQL databases have been described. Then, all the
categories of NoSQL databases are analyzed to get the results that each NoSQL database is suitable
for processing certain features of data. The results are summarized in Table 7.
Algorithms 2019, 12, 106 13 of 16
Categories of NoSQL
Suitable Data Features
Databases
Three-dimensional data.
Wide Column Store
Applications that often search for specific field data.
Document Store Semi-structured files, such as XML, JSON, and so on.
Key Value Store One-dimensional data, which is stored in key-value pairs.
Data stored in a graphic structure.
Graph Databases Suitable for data of social network relations, recommendation
systems, and so on.
Determine data features suitable processing based on the data
Multimodel Databases
format of a specific database.
The object-oriented concepts are used to describe the data itself
and the relationship among the data.
Object Databases
Suitable for computer aided design (CAD) and office
automation.
Grid and Cloud Database
Applications that need to search recent access data frequently.
Solutions
XML Databases Data stored in XML files.
Multidimensional
Applications that often analyze data in multiple dimensions.
Databases
Multivalue Databases Data with multivalued attributes or composite attributes.
Data with events that occurred in the past for tracking the
Event Sourcing
status of something.
Time Series Databases Data related to time series.
Other NoSQL Related
Unable to know.
Databases
Scientific and Specialized
Data suitable for scientific research or computing.
DBs
Unresolved and
Data based on the data format of a specific database.
Uncategorized
Suppose that a 3C shopping website uses an RDB to store data for a long period of time, and this
RDB generates 300,000 records per day. Users reflect that the website is slower, and hope the data
processing speed to be as fast as possible, so the business owner asks the information department
staff to solve this problem.
The head of the information department traced the reasons according to the boss instructions
and found that the reasons for the slower access speed of the data are not only a large amount of data
generated every day but also the need for many users to merge several tables of RDB with a large
amount of data. Therefore, the supervisor recommends using the NoSQL database as a solution
because NoSQL databases can merge some tables of RDB in advance so that when querying a NoSQL
database, the desired data can be read quickly without waiting much time to do join operations.
After the business owner agrees, the head of the information department will then decide which
NoSQL database to use. The decision process is as follows.
1. The most suitable category of NoSQL database is the wide column store because access to the
database often requires searching for data in a specific field.
2. According to the DB-Engines Ranking website [15], the wide column store databases that are
more commonly discussed on the internet are Apache Cassandra and Apache HBase.
3. According to the experimental results of Chen et al. [26], the time of Apache HBase to read data
is less than that of Apache Cassandra. Therefore, Apache HBase is recommended as the NoSQL
database used by the enterprise.
2. According to the statistics of DB-Engines Ranking website [15], the most discussed NoSQL
database in graph databases are Neo4j and FlockDB.
3. Since Neo4j has the best market share among all graph databases [27]; thereby, Neo4j is
recommended as the NoSQL database used by this enterprise.
5. Conclusions
The main contents of this paper are as follows. First of all, we introduce the basic characteristics
of the fifteen categories of NoSQL database (such as the wide column store, document store, key
value store, and graph databases, etc.) in the NoSQL database official website [2]. Then we analyze
the characteristics of the data that each category of NoSQL database is suitable for processing. Next,
we propose some principles and key points for reference to help enterprises to find an appropriate
NoSQL database from more than 225 ones when enterprises intend to abandon the use of RDB to use
NoSQL database. Finally, we illustrate three cases, 3C shopping website, newspapers, and the US
retail industry, to demonstrate how a particular company can choose a suitable NoSQL database to
improve its competitiveness and customer services.
In summary, if a company abandons RDB and switches to NoSQL DB, it needs to consider the
characteristics of the company's data in order to find the right DB. The transaction data of the e-
commerce industry often needs to be related, the suitable NoSQL DB category is the wide column
store, and Apache HBase is a good choice. The news materials of the news industry have semi-
structured features. The suitable NoSQL DB category is the document store, and the better choice is
MongoDB. The retailer data needs to be used by the recommendation system, so the suitable NoSQL
DB category is the graph databases, and the best choice is Neo4j. We hope that these principles and
examples will help decision makers to change databases correctly.
Author Contributions: Conceptualization, J.-K.C.; methodology, J.-K.C. and W.-Z.L.; writing—original draft
preparation, W.-Z.L.; writing—review and editing, J.-K.C.; supervision, J.-K.C.; project administration, J.-K.C.
Acknowledgments: Thanks to the reviewers for providing a lot of valuable comments to make this paper more
complete.
References
1. Chen, H.A. Database System: Concept, Design, and Implementation, 3rd ed.; XBOOK MARKETING Co., Ltd.:
Republic of China (ROC), Taipei, 2013. (In Chinese)
2. NoSQL databases. Available online: http://nosql-database.org/ (accessed on 20 January 2019).
3. Pi, S.J. Establish the cornerstone of Big Data: NoSQL Database technique, 2nd ed.; TopTeam Information Co.,
Ltd.: Republic of China (ROC), Taipei, 2016. (In Chinese)
4. Lu, J.H. Challenge big data, how to process Big Data in Facebook, Google, Amazon? Use NoSQL to get 10 billion
annual hard disk data, 2nd ed.; TopTeam Information Co., Ltd.: Republic of China (ROC), Taipei, 2015. (In
Chinese)
5. Sullivan, D. NoSQL for Mere Mortals, 1st ed.; Pearson P T R: London, UK, 2015.
6. Hecht, R.; Jablonski, S. NoSQL Evaluation: A Use Case Oriented Survey. In Proceedings of the 2011
International Conference on Cloud and Service Computing, Hong Kong, China, 12–14 December 2011.
7. Lourenço, J.R.; Cabral, B.; Carreiro, P.; Vieira, M.; Bernardino, J. Choosing the right NoSQL database for the
job: A quality attribute evaluation. J. Big Data 2015, 2, 18:1–18:26.
8. Corbellini, A.; Mateos, C.; Zunino, A.; Godoy, D.; Schiaffino, S. Persisting big-data: The NoSQL landscape.
Inf. Syst. 2016, 63, 1–23.
9. Khazaei, H.; Fokaefs, M.; Zareian, S.; Beigi-Mohammadi, N.; Ramprasad, B.; Shtern, M.; Gaikwad, P.; Litoiu,
M. How do I Choose the Right NoSQL Solution? A Comprehensive Theoretical and Experimental Survey.
Big Data Inf. Anal. 2016, 1, 185–216.
10. Gessert, F.; Wingerath, W.; Friedrich, S.; Ritter, N. NoSQL database systems: a survey and decision guidance.
Softw.-Intensive Cyber-Phys. Syst. 2017, 32, 353–365.
Algorithms 2019, 12, 106 16 of 16
11. Davoudian, A.; Chen, L.; Liu, M. A Survey on NoSQL Stores. ACM Comput. Surv. (CSUR) 2018, 51, 40:1–
40:43.
12. Dimiduk, N.; Khurana, A. HBase in Action, 1st ed.; Oreilly & Associates Inc.: New York, NY, USA, 2012.
13. Lu, J.H. Hadoop: Practical Technical Handbook, 2nd ed.; TopTeam Information Co., Ltd.: Republic of China
(ROC), Taipei, 2014. (In Chinese)
14. George, L. HBase: The Definitive Guide, 1st ed.; Oreilly & Associates Inc.: New York, NY, USA, 2011.
15. DB-Engines Ranking. Available online: https://db-engines.com/en/ranking (accessed on 4 March 2018).
16. Multi-model databases (Wikipedia). Available online: https://en.wikipedia.org/wiki/Multi-model_database
(accessed on 15 June 2018).
17. Wu, R.H. Object-Oriented System Analysis and Design: An MDA Approach with UML, 4th ed., BestWise Co.,
Ltd.: Republic of China (ROC), Taipei, 2013. (In Chinese)
18. Document-oriented database (Wikipedia). Available online: https://en.wikipedia.org/wiki/Document-
oriented_database (accessed on 15 June 2018).
19. Multidimensional Databases. Available online:
https://docs.oracle.com/cd/E12478_01/rpas/pdf/150/html/classic_client_user_guide/basic_rpas_concepts/m
ultidimensional_databases.htm (accessed on 5 May 2018).
20. MultiValue (Wikipedia). Available online: https://en.wikipedia.org/wiki/MultiValue (accessed on 15 June
2018).
21. Introducing to Event Sourcing. Available online: https://msdn.microsoft.com/en-
us/library/jj591559.aspx#sec1 (accessed on 16 January 2018).
22. Time series database (Wikipedia). Available online: https://en.wikipedia.org/wiki/Time_series_database
(accessed on 16 January 2018).
23. Time series (Wikipedia). Available online: https://en.wikipedia.org/wiki/Time_series (accessed on 16
January2018).
24. Central Weather Bureau. Available online: https://www.cwb.gov.tw/eng/index.htm (accessed on 10 July
2018).
25. vsChart.com: the comparison wiki: Database list. Available online: http://vschart.com/list/database/
(accessed on 18 February 2019).
26. Chen, C.Y.; Chang, B.R.; Tsai, H.F.; Guo, C.L. Empirical Analysis of High Efficient Remote Cloud Data
Center Backup Using HBase and Cassandra. Sci. Progr. 2014, 2015, 1–10.
27. Neo4j: Walmart Case Study. Available online: https://neo4j.com/case-studies/walmart/ (accessed on 10
December 2018).
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article
distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license
(http://creativecommons.org/licenses/by/4.0/).