Chapter14_BigData&NoSQLDatabases
Chapter14_BigData&NoSQLDatabases
Databases
CSC3326
Learning Objectives
• Big data is a movement to find new and better ways to manage large amounts of data and
derive business insight from it, providing high performance and scalability at a reasonable cost
=> 3Vs.
• Volume: the quantity of data to be stored. The storage capacities associated with Big Data are
extremely large.
• Velocity: the rate at which new data enters the system as well as the rate at which the data
must be processed.
• Variety: the variations in the structure of the data to be stored. Data can be structured,
unstructured, or semistructured.
Big Data
As the quantity of data needing to be stored increases, the need for
larger storage devices increases as well:
Scaling up is keeping the same number of systems, but migrating each
system to a larger system:
Scaling out means that when the workload exceeds the capacity of a
server, the workload is spread out across a number of servers. This is also
referred to as clustering—creating a cluster of low-cost servers to share a
workload.
Big Data
• There is a need for databases that can provide:
Scalability
Flexibility
Cost
Availability
NoSQL
• Although much of the transactional data that organizations use works well in a
structured environment, most of the data in the world is semistructured or
unstructured.
• Relational databases impose a structure on the data when the data is captured
and stored.
• Big Data requires that the data be captured in whatever format it naturally
exists, without any attempt to impose a data model or structure to the data.
NoSQL
• NoSQL represents a broad array of nonrelational database technologies
that have developed to address the challenges represented by Big Data
• NoSQL DBs are built to be flexible, scalable and capable of rapidly
responding to the data management demands of Big Data applications.
• NoSQL DBs represent a different way of approaching the storage and
processing of data in a nonrelational way.
• NoSQL DBs do not force data to fit predefined structures.
• NoSQL DBs provide distributed, fault-tolerant databases for processing
unstructured data.
• NoSQL DBs are not based on the relational model and SQL.
NoSQL
• One of four categories: key-value data stores, document databases,
column-oriented databases, and graph databases
Data Consistency in Distributed Systems
• Distributed systems offer a range of benefits, including increased scalability, fault tolerance, and performance
• However, managing data consistency in distributed systems is a very complex problem
• Two consistency modes:
Strong consistency: it is a requirement for data to be consistently and identically available across all
server nodes globally. At any given time, all server nodes should have the same value for a given entity. That
means that data across nodes need to be updated immediately after a write request was made to one of the
server nodes. During that time, access to data is locked.
Eventual consistency: allows for temporary inconsistencies between server nodes in the system. This means
that the data across nodes will get consistent eventually. This will take time for updates to reach other nodes. This
makes data highly available=> access to data is not locked.
• Strong consistency provides immediate consistency but can result in higher latency and lower
availability. In contrast, eventual consistency prioritizes availability but can lead to temporary data
inconsistencies.
• When choosing between strong and eventual consistency, it’s important to consider the specific
requirements of your system.
• Non relational DBs adopt eventual consistency while relational DBs adopt strong consistency.
Strong Consistency Mode
Performance
• NoSQL DBs provide high scalability and high performance.
• Example: a blog website data.
• In a document-based NoSQL, all data related to each post is collected
into a self-contained single document containing data about user, post
details and comments.
• In a relational DB, this data will be split into three tables: user, post and
comment.
What are the benefits of NoSQL
•
databases?
Flexible data models: NoSQL databases typically have very flexible schemas.
• Horizontal scaling: most NoSQL databases allow you to scale-out horizontally, meaning you
can add cheaper commodity servers whenever you need to.
• Fast queries: Queries in NoSQL databases can be faster than SQL databases. Data in SQL
databases is typically normalized, so queries require you to join data from multiple tables. As
your tables grow, the joins can become expensive. However, data in NoSQL databases is
typically stored in a way that is optimized for queries => The rule of thumb is data that is
accessed together should be stored together.
The Disadvantages of NoSQL Databases
• NoSQL databases also have their own limitations and weaknesses.
• The lack of ACID: ACID stands for the four key properties that define a transaction
(Atomicity, Consistency, Isolation, and Durability) and NoSQL does not support these
properties.
NoSQL Databases
SQL Databases
Scaling Vertical (scale-up with a larger server) Horizontal (scale-out across commodity servers)
Most do not support
multi-record ACID
Multi-Record ACID
Supported transactions.
Transactions
However, some — like
MongoDB — do.
• A document database is a NoSQL database that stores data in tagged documents in key-value pairs.
• Unlike a KV database where the value component can contain any type of data, a document database always
stores a document in the value component.
• The document can be in any encoded format, such as XML or JSON (JavaScript Object Notation)
• While KV databases do not attempt to understand the content of the value component, document databases
do
• Despite the use of tags in documents, document databases are considered schema-less, that is, they do not
impose a predefined structure on the data that is stored
• Being schema-less means that although all documents have tags, not all documents are required to have the
same tags, so each document can have its own structure
• Tags inside the document are accessible to the DBMS, which makes sophisticated querying possible.
• Each row key in the column family can have different columns.
Graph Databases
• A graph database is a NoSQL database based on graph theory to store data about relationship-rich
environments.
• Modeling and storing data about relationships is the focus of graph databases.
• The primary components of graph databases are nodes, edges, and properties
• Properties are like attributes; they are the data that we need to store about the node
• The value component may have multiple values that would be appropriate for a given key
• When there are multiple values for a single key, an array is used.
• Arrays in JSON are placed inside square brackets []. For example, the above document
could be expanded to:
{_id: 101, title: “Database Systems”, author: [“Coronel”, “Morris”]}
Embedded documents
• Objects can also have other objects embedded as a value.
• Consider another simple document with data about a publisher that is
related to the book in the previous example.
Embedded documents
• In a relational environment, we would have used a BOOK table and a
PUBLISHER table with a 1:M relationship.
• Although this increases redundancy, NoSQL databases often sacrifice
redundancy to improve scalability.
• With document databases, we are attempting to avoid the need for joins,
making documents independent of each other so they can be easily scaled out
to many computers in a cluster.
Creating Databases and Collections in
MongoDB
• MongoDB databases comprise collections of documents.
• Each MongoDB server can host many databases.
• A database object contains collections. Collections are also objects.
Collection objects contain document objects.
• In addition to holding data content, an object can also have methods,
which are programmed functions for manipulating the object.
• MongoDB has two versions of the command-line MongoDB shell and a
graphical interface called MongoDB Compass.
• A list of the databases available on the server can be retrieved with the
command:
show dbs
• The following command creates a database named demo
use demo
• Using the createCollection() method with the db variable creates a
collection with the specified name. The following command creates a
“newproducts” collection inside the previously defined demo database:
db.createCollection(“newproducts”)
Inserting Documents in MongoDB
• db.<collection name>.insertOne({document})