NOSQL
NOSQL
Introduction
• There is a longstanding dominance of relational databases in
the software industry, particularly for enterprise applications,
and contrasts it with the recent surge in interest surrounding
NoSQL databases.
Relational Databases Dominance:
For decades, relational databases were the default choice
for serious data storage, especially in enterprise contexts.
Most software architects only had to decide which relational
database to use.
Introduction
Challenges from Other Technologies:
Throughout history, other database technologies, like object
databases in the 1990s, tried to challenge the relational
model but failed to gain significant traction.
NoSQL's Emergence:
The recent rise of NoSQL databases has caught many by
surprise, challenging the previously unshaken dominance of
relational databases.
The Value of Relational
Databases
1. Getting at Persistent Data :
• Memory Hierarchy: In computing, there are two primary types of
memory.
Main Memory (RAM): Fast but volatile, meaning data is lost when
power is cut or the system fails.
Backing Store (Persistent Storage): Slower but non-volatile,
traditionally a disk, although modern systems may use persistent memory
(like SSDs or flash storage).
• The Role of Backing Store: Persistent storage ensures that data
remains available even after power loss or system failures.
File System vs. Database
• For some applications (like word processors), data is simply
stored as files in the file system.
• For enterprise applications, databases are preferred because
they offer more flexibility and efficiency in managing large
amounts of data.
• They allow applications to quickly retrieve small pieces of data,
providing more sophisticated ways to organize and query
information compared to file systems.
Concurrency
• Key-Value Stores with Metadata: Systems like Riak allow for metadata that
can be used for indexing.
• Structured Elements in Key-Value Stores: Redis can handle lists and sets,
providing more structured access than typical key-value databases.
Column-Family Stores
• The emergence of Google's BigTable marked a pivotal point in the
development of NoSQL databases, influencing subsequent systems like
HBase and Cassandra.
• Below is a comprehensive overview of its structure, characteristics, and
implications for data storage:
Overview of BigTable
• Data Model: BigTable is often conceptualized as a two-level map rather
than a traditional table. It uses a schema-less design that supports sparse
data, where columns can be added freely without predefined constraints.
• Column Families: The data is organized into column families, which group
related columns together. Each column belongs to a single column family,
and operations typically access data at the column family level.
Column-Family Structure
• Row-Oriented Perspective:
1.Each row is seen as an aggregate of related data (e.g., customer ID
1234), with column families categorizing useful chunks (e.g.,
profile, order history).
2.This approach allows you to retrieve all data for a specific
aggregate with a single query.
• Column-Oriented Perspective:
1.Each column family defines a record type (e.g., customer profiles),
and each row represents an instance of that type.
2.This allows for thinking of a row as a composite of records across
different column families.
Storage and Access Characteristics
Dynamic Schema: Unlike traditional relational databases, column-family
databases allow adding new columns to existing rows without altering the
overall schema. This flexibility is beneficial for dealing with unstructured or
evolving data.
In Cassandra, a row belongs to only one column family, but column
families may contain supercolumns that can hold nested columns. This
concept allows for hierarchical data representation, providing greater
flexibility in modeling complex relationships.
Here’s how we might structure the UserProfile column family:
•Column Family: UserProfile
•Row Key: User ID (e.g., user_123)
•Supercolumns:
•Profile (supercolumn)
•name: "Alice"
•age: 30
•location: "New York"
•Interests (supercolumn)
•hobbies: "Photography"
•sports: "Tennis"
•music: "Jazz"
Advantages of Column-Family Databases
•Performance Optimization: The ability to store related columns
together optimizes read performance, especially for use cases
where reading multiple columns across many rows is common.