Chap 2 Emerging Database Landscape
Chap 2 Emerging Database Landscape
As the data volumes grew exponentially and increasingly there was a need
to integrate and leverage a vast array of data sources, a new generation of
database products began to emerge. These were labeled as Not Only
SQL (NoSQL) products. These products were
Fig 1Scale out Architecture
Master-Slave
Peer-To-Peer
On a broad level, we can assume that there are two specific kinds of
databases: the relational database and the “non-relational” database. There
are several definitions and interpretations of what the characteristics of
these two types of databases are.
Let’s first define what structured data is and what unstructured data is.
These definitions heavily weigh into the characteristics of RDBMS and non-
RDBMS systems.
Structured Data: Structured data contains an explicit structure of the
data elements. In other words, there exists metadata for every data element
and how it will be stored and accessed through SQL-based commands or
other programming constructs are clearly defined.
Unstructured Data: Unstructured data constitutes all other data that fall
outside the definition of structured data. Its structure is not explicitly
declared in a schema. In some cases, as with natural language, the structure
may need to be discovered.
The Relational Database (RDBMS): A relational database stores data in
tables and pre-dominantly uses SQL-based commands to access the data.
Mostly, the data structures and resulting data models take the third-normal
form (3NF) structure. In practice, the data model is a set of tables and
relationships between them, which are expressed in terms of keys and
integrity constraints across related tables such as foreign keys. A row of any
table consists of columns of structured data, and the database as a whole
contains only structured data. The logical model of the data held in the
database is based on tables and relationships.
For example, for an Employee table we can define the columns
as Employee_ID, First_Name, Initial, Last_Name, Address_Line_1,
Address_Line_2, City, State, Zip_Code, Home_Tel_No, Cell_Tel_No. In the
database schema, we further define the data types for each one of these
columns: integer, char, varchar, etc. These column names feature in the
SQL queries as data of interest for the user. We call this structured data
because the data held in the database is represented in a tabular fashion
and is known in advance and recorded in a schema.
The Non-Relational Database: Since RDBMS is confined to representing
data as related tables made up of rows and columns, it does not easily
accommodate data that have nested or hierarchical structures such as a bill
of materials or a complex document. Non-relational databases cater to a
wider variety of data structures (older mainframe data structures, object and
object-relational data structures, document and XML data structures, graph
data structures, etc.) than just tables. What we have defined here is an
“everything else bucket” that includes all databases that are not purely
relational.
Database workload
The above table provides a summary of the database landscape that has
emerged. The traditional databases, including open source ones like MySQL
and ProgreSQL are, of course, suited to the traditional OLTP, data mart and
data warehouse workloads. There are also databases like Aerospike and
VoltDB that specialize in extremely high volumes of OLTP transactions. This
category is very close, but not identical, to the in-memory databases like
SAP’s HANA or Kognitio, which simply focus on speed and response time.
For businesses who are selecting database products for a specific type of
application, our advice is to determine which category of database they need
before thinking of which products to investigate. While, as time passes, we
can expect there to be some rationalization among these database
categories, we expect most of them to persist with two or three products
dominating each category. This is because the categories have been derived
based on different types of workload, and we do not expect a database
engine that is excellent in one of these categories to perform particularly
well in other categories.