NO SQL PDF
NO SQL PDF
lOMoARcPSD|46195641
data entry, and each column represents a different attribute or piece of information about the
data.
2. Schema: Relational databases have a predefined schema that defines the structure of the data,
including the tables, columns, data types, and relationships between tables. This schema enforces
data integrity and consistency.
3. SQL (Structured Query Language): SQL is a standardized language used to query and
manipulate data in relational databases. It provides commands for creating, updating, querying,
and deleting data, as well as defining the structure of the database.
4. ACID Properties: Relational databases are designed to maintain the ACID properties—
Atomicity, Consistency, Isolation, and Durability. These properties ensure that database
transactions are reliable and maintain data integrity.
What is NoSQL?
NoSQL, which stands for "Not Only SQL," is a term that encompasses a diverse set of database
systems designed to handle various types of data and use cases. Unlike traditional relational
databases, NoSQL databases are not strictly tied to a structured tabular format. They provide
flexibility, scalability, and better performance for certain types of applications.
- Key-Value Stores: These databases store data as key-value pairs, making them efficient for
simple lookups and caching. Examples include Redis and Amazon DynamoDB.
- Column-Family Stores: These databases organize data into columns rather than rows and are
optimized for reading and writing large volumes of data. Apache Cassandra is a popular
example.
- Graph Databases: These databases are designed for storing and querying highly connected
data, such as social networks or recommendation engines. Neo4j is a well-known graph
database.
NoSQL databases are particularly well-suited for scenarios where traditional relational databases
might struggle:
- Big Data Applications: When dealing with massive amounts of data that require horizontal
scaling and flexibility.
- Real-time Analytics: NoSQL databases are often used for quickly processing and analyzing
large volumes of data in real-time.
- Web Applications: NoSQL databases can handle the varying and unpredictable data structures
often encountered in web applications.
- Content Management Systems: NoSQL databases are great for storing and serving content with
diverse attributes.
- IoT (Internet of Things): NoSQL databases can handle the high volume and variety of data
generated by IoT devices.
NoSQL Database
It provides a mechanism for storage and retrieval of data other than tabular relations model used
in relational databases. NoSQL database doesn't use tables for storing data. It is generally used to
store big data and real-time web applications.
In the early 1970, Flat File Systems are used. Data were stored in flat files and the biggest
problems with flat files are each company implement their own flat files and there are no
standards. It is very difficult to store data in the files, retrieve data from files because there is no
standard way to store data.
Then the relational database was created by E.F. Codd and these databases answered the
question of having no standard way to store data. But later relational database also get a problem
that it could not handle big data, due to this problem there was a need of database which can
handle every types of problems then NoSQL database was developed.
Advantages of NoSQL
What is NoSQL?
NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for
distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-
time web apps. For example, companies like Twitter, Facebook and Google collect terabytes of
user data every single day.
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a better term would be
“NoREL”, NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data. Let’s understand about NoSQL
with a diagram in this NoSQL database tutorial:
lOMoARcPSD|46195641
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google, Facebook,
Amazon, etc. who deal with huge volumes of data. The system response time becomes slow
when you use RDBMS for massive volumes of data.
To resolve this problem, we could “scale up” our systems by upgrading our existing hardware.
This process is expensive.
The alternative for this issue is to distribute database load on multiple hosts whenever the load
increases. This method is known as “scaling out.”
lOMoARcPSD|46195641
NoSQL database is non-relational, so it scales out better than relational databases as they are
designed with web applications in mind.
• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
Features of
NoSQL Non-
relational
Schema-free
NoSQL is Schema-Free
lOMoARcPSD|46195641
Simple API
• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON
• Mostly used no standard based NoSQL query language
• Web-enabled databases running as internet-facing services
Distributed
NoSQL Databases are mainly categorized into four types: Key-value pair, Column-oriented,
Graph-based and Document-oriented. Every category has its unique attributes and limitations.
None of the above-specified database is better to solve all the problems. Users should select the
database based on their product needs.
Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like “Website” associated with a value like
“Guru99”.
lOMoARcPSD|46195641
It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon’s Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google. Every
column is treated separately. Values of single column databases are stored contiguously.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the
data is readily available in a column.
lOMoARcPSD|46195641
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.
Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.
In this diagram on your left you can see we have rows and columns, and in the right, we have a
document database which has a similar structure to JSON. Now for the relational database, you
have to know what columns you have and so on. However, for a document database, you have
data store like JSON object. You do not require to define which make it flexible.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e- commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
lOMoARcPSD|46195641
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases
lOMoARcPSD|46195641
Advantages of NoSQL
Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values as keys
become difficult
• Doesn’t work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.
Summary
• NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and
is easy to scale
• The concept of NoSQL databases became popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data
• In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
• NoSQL databases never follow the relational model it is either schema-free or has
relaxed schemas
• Four types of NoSQL Database are 1). Key-value Pair Based 2). Column-oriented Graph
3). Graphs based 4). Document-oriented
• NOSQL can handle structured, semi-structured, and unstructured data with equal effect
• CAP theorem consists of three words Consistency, Availability, and Partition Tolerance
• BASE stands for Basically Available, Soft state, Eventual consistency
• The term “eventual consistency” means to have copies of data on multiple machines to
get high availability and scalability
lOMoARcPSD|46195641
However, with a NoSQL document-oriented database, you can use an aggregate data model to
handle this scenario more flexibly.
In a relational database, each product requires the same set of columns, which might lead to
empty or irrelevant fields for certain products.
lOMoARcPSD|46195641
In a NoSQL document-oriented database, you can use a flexible schema to store products as
documents, each with its own structure based on the specific product attributes. Here's how this
might look:
Product 1 (Laptop):
```json
{
"_id": 1,
"name": "Laptop",
"price": 800,
"manufacturer": "Dell",
"category": "Electronics",
"specifications": {
"screen_size": "15.6 inches",
"processor": "Intel Core i5",
"storage": "256GB SSD"
}
}
```
Product 2 (T-shirt):
```json
{
"_id": 2,
"name": "T-shirt",
"price": 20,
"manufacturer": "Nike",
"category": "Clothing",
"size": "M",
"color": "Black"
lOMoARcPSD|46195641
}
```
Product 3 (Book):
```json
{
"_id": 3,
"name": "Book",
"price": 15,
"manufacturer": "Penguin",
"category": "Books",
"author": "Jane Doe",
"genre": "Fiction"
}
```
In this approach, each product is stored as a JSON document with attributes that are relevant to
that specific product. Some products have additional attributes not present in others. This flexible
structure allows you to accommodate different types of products without requiring a fixed
schema.
Benefits:
- You can store products with varying attributes in a single collection without empty or irrelevant
fields.
- When querying, you can search for specific attributes without having to navigate through
columns not applicable to a particular product.
- As new types of products are introduced, you can easily adapt the document structure without
changing the database schema.
- The aggregate data model supports efficient horizontal scaling as your product catalog grows.
This example illustrates how aggregate data models in NoSQL document-oriented databases
provide the flexibility to organize and store data based on application requirements, allowing for
efficient handling of unstructured or semi-structured data while supporting horizontal scalability.
lOMoARcPSD|46195641
Key-Value Data Model: The key-value data model is one of the simplest and most fundamental
types of NoSQL databases. In this model, data is stored as a collection of key-value pairs, where
each piece of data is associated with a unique key. The value can be of various types, including
strings, numbers, or even complex structures like JSON objects. Key-value databases are highly
scalable and efficient for operations involving simple read and write operations, but they may lack
advanced querying capabilities compared to other NoSQL models.
Imagine you're building a social media platform, and you want to use a key-value store to
efficiently manage user profiles. Each user profile contains basic information such as name, age,
location, and a profile picture. The key-value data model simplifies the storage and retrieval of
user profiles.
lOMoARcPSD|46195641
In this example, you'll use a fictional key-value database to showcase the concept.
- **User 1:**
- Key: `user123`
- Value: JSON object containing user information
```json
{
"name": "Alice",
"age": 28,
"location": "New York",
"profile_picture": "https://example.com/profiles/user123.jpg"
}
```
- **User 2:**
- Key: `user456`
- Value: JSON object containing user information
```json
{
"name": "Bob",
"age": 35,
"location": "San Francisco",
"profile_picture": "https://example.com/profiles/user456.jpg"
}
```
- **User 3:**
- Key: `user789`
lOMoARcPSD|46195641
In this scenario, the keys (`user123`, `user456`, `user789`) are unique identifiers for each user
profile, and the corresponding values are JSON objects containing user information.
**Benefits:**
- **Efficient Retrieval:** Retrieving a user's profile is efficient because you can directly access
the data using the unique key. This is especially advantageous when the dataset is large.
- **Simple Structure:** The key-value data model's simplicity is well-suited for use cases with
straightforward data storage requirements.
- **Scalability:** Key-value stores are highly scalable and can handle a large number of simple
read and write operations effectively.
- **Low Latency:** Due to the straightforward nature of key-value lookups, data retrieval is
often low-latency.
Document Data Model: The document data model is another popular type of NoSQL database,
designed to store and manage semi-structured or hierarchical data. In this model, data is stored in
documents, which are typically formatted using JSON or BSON (binary JSON) formats. Each
document can have varying structures, allowing different fields and attributes to be added
without altering the entire database schema. Document databases are suitable for use cases where
data is not strictly tabular and may have nested or complex relationships.
lOMoARcPSD|46195641
Consider a content management system (CMS) that needs to store and manage various types of
content, including articles, images, and user comments. Each piece of content has different
attributes, and the hierarchical structure can vary based on the content type. A document-oriented
database is well-suited for handling this kind of semi-structured data.
**Article Document:**
- Each article is stored as a document with attributes such as title, author, content, publication
date, and tags.
- **Image Document:**
- Images are stored as documents with attributes like file name, size, format, and URL.
lOMoARcPSD|46195641
- **Comment Document:**
- User comments are stored as documents with attributes including commenter name, content,
timestamp, and associated article ID.
- Example JSON document:
```json
{
"_id": "comment789",
"commenter": "Bob",
"content": "Great article! Thanks for
sharing.", "timestamp": "2023-07-
16T08:30:00Z",
"article_id": "article123"
}
```
In this scenario, each type of content is stored as a JSON document, and the structure of each
document can vary based on the content's attributes.
**Benefits:**
- **Flexible Schema:** The document data model allows for a dynamic and evolving schema,
accommodating changes in data structure without altering the entire database.
lOMoARcPSD|46195641
- **Hierarchical Data:** Documents can contain nested structures, making it easy to represent
complex and hierarchical data relationships.
- **Semi-Structured Data:** Document databases are well-suited for storing semi-structured
or unstructured data, as well as data with varying attributes.
- **Efficient Retrieval:** Retrieving content and its associated data is efficient, as related
information is often stored together within a single document.
RELATIONSHIPS
**Relationships in Key-Value Data Model:**
The key-value data model is simple and doesn't inherently support complex relationships
between data items. However, relationships can still be established by using various strategies,
often at the application level.
- **User 1:**
- Key: `user123`
- Value: JSON object with user information
- **User 2:**
- Key: `user456`
- Value: JSON object with user information
**Follower Relationship:**
```plaintext
user123-followers: [user456]
user456-followers: [user123]
```
lOMoARcPSD|46195641
- **User 1 Document:**
```json
{
"_id": "user123",
"name": "Alice",
"followers": ["user456"]
}
```
- **User 2 Document:**
```json
{
"_id": "user456",
"name": "Bob",
"followers": ["user123"]
}
```
**Benefits:**
- **Key-Value:** While less suitable for complex relationships, relationships can still be
represented using keys and values.
lOMoARcPSD|46195641
- **Document:** The document data model naturally supports relationships, allowing you to
embed arrays or references to related data within documents.
**Schemaless Databases:**
A schemaless database is a type of database that doesn't require a predefined schema for data
storage. It's often associated with NoSQL databases, as they allow for flexible and dynamic data
models. In a schemaless database, data can be added with varying structures without needing to
modify the overall database schema. This approach provides agility in rapidly evolving
environments where data structures may change frequently.
- **Dynamic Data Models:** Data can be stored without predefined tables or fixed columns,
allowing for variation in data structure.
- **Flexibility:** Schemaless databases accommodate changing data requirements and evolving
applications.
- **Variety of Data Types:** They can store structured, semi-structured, or unstructured data
in various formats, such as JSON, BSON, XML, or key-value pairs.
- **Agility:** Schemaless databases are suitable for projects with dynamic or experimental data
requirements.
- **Use Cases:** Rapid application development, projects with frequently changing data
structures, scenarios with diverse data formats.