Lec 15 Notes
Lec 15 Notes
Lec 15 Notes
1. NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently than relational tables. NoSQL databases
come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They
provide flexible schemas and scale easily with large amounts of data and high user loads.
1. They are schema free.
2. Data structures used are not tabular, they are more flexible, has the ability to adjust dynamically.
3. Can handle huge amount of data (big data).
4. Most of the NoSQL are open sources and has the capability of horizontal scaling.
5. It just stores data in some format other than relational.
2. History behind NoSQL
1. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to
create a complex, difficult-to-manage data model in order to avoid data duplication. Developers (rather than storage) were
becoming the primary cost of software development, so NoSQL databases optimised for developer productivity.
2. Data becoming unstructured more, hence structuring (defining schema in advance) them had becoming costly.
3. NoSQL databases allow developers to store huge amounts of unstructured data, giving them a lot of flexibility.
4. Recognising the need to rapidly adapt to changing requirements in a software system. Developers needed the ability to iterate
quickly and make changes throughout their software stack — all the way down to the database. NoSQL databases gave them
p
this flexibility.
5. Cloud computing also rose in popularity, and developers began using public clouds to host their applications and data. They
wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale out instead
of scale up, and to intelligently geo-place their data. Some NoSQL databases like MongoDB provide these capabilities.
el
3. NoSQL Databases Advantages
A. Flexible Schema
1. RDBMS has pre-defined schema, which become an issue when we do not have all the data with us or we need to change
the schema. It's a huge task to change schema on the go.
B. Horizontal Scaling
eH
1. Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. This is difficult with
relational databases due to the difficulty in spreading out related data across nodes. With non-relational databases, this is
made simpler since collections are self-contained and not coupled relationally. This allows them to be distributed across
nodes more simply, as queries do not have to “join” them together across nodes.
2. Scaling horizontally is achieved through Sharding OR Replica-sets.
C. High Availability
1. NoSQL databases are highly available due to its auto replication feature i.e. whenever any kind of failure happens data
replicates itself to the preceding consistent state.
2. If a server fails, we can access that data from another server as well, as in NoSQL database data is stored at multiple
od
servers.
D. Easy insert and read operations.
1. Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalised, so
queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can
become expensive. However, data in NoSQL databases is typically stored in a way that is optimised for queries. The rule of
thumb when you use MongoDB is data that is accessed together should be stored together. Queries typically do not require
joins, so the queries are very fast.
C
p
2. Column-Oriented / Columnar / C-Store / Wide-Column
1. The data is stored such that each row of a column will be next to other rows from that same column.
2. While a relational database stores data in rows and reads data row by row, a column store is organised as a set of columns.
el
This means that when you want to run analytics on a small number of columns, you can read those columns directly
without consuming memory with the unwanted data. Columns are often of the same type and benefit from more efficient
compression, making reads even faster. Columnar databases can quickly aggregate the value of a given column (adding up
the total sales for the year, for example). Use cases include analytics.
3. e.g., Cassandra, RedShift, Snowflake.
eH
3. Document Based Stores
1. This DB store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of
fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or
objects.
2. Use cases include e-commerce platforms, trading platforms, and mobile app development across industries.
3. Supports ACID properties hence, suitable for Transactions.
4. e.g., MongoDB, CouchDB.
4. Graph Based Stores
1. A graph database focuses on the relationship between data elements. Each element is stored as a node (such as a person
od
in a social media graph). The connections between elements are called links or relationships. In a graph database,
connections are first-class elements of the database, stored directly. In relational databases, links are implied, using data to
express the relationships.
2. A graph database is optimised to capture and search the connections between data elements, overcoming the overhead
associated with JOINing multiple tables in SQL.
3. Very few real-world business systems can survive solely on graph queries. As a result graph databases are usually run
alongside other more traditional databases.
C
4. Use cases include fraud detection, social networks, and knowledge graphs.
7. NoSQL Databases Dis-advantages
1. Data Redundancy
1. Since data models in NoSQL databases are typically optimised for queries and not for reducing data duplication, NoSQL
databases can be larger than SQL databases. Storage is currently so cheap that most consider this a minor drawback, and
some NoSQL databases also support compression to reduce the storage footprint.
2. Update & Delete operations are costly.
3. All type of NoSQL Data model doesn’t fulfil all of your application needs
1. Depending on the NoSQL database type you select, you may not be able to achieve all of your use cases in a single
database. For example, graph databases are excellent for analysing relationships in your data but may not provide what
you need for everyday retrieval of the data such as range queries. When selecting a NoSQL database, consider what your
use cases will be and if a general purpose database like MongoDB would be a better option.
4. Doesn’t support ACID properties in general.
5. Doesn’t support data entry with consistency constraints.
8. SQL vs NoSQL
Data Storage Model Tables with fixed rows and Document: JSON documents,
columns Key-value: key-value pairs, Wide-
column: tables with rows and
dynamic columns, Graph: nodes
and edges
Development History Developed in the 1970s with a Developed in the late 2000s with
focus on reducing data a focus on scaling and allowing
duplication for rapid application change
driven by agile and DevOps
practices.
p
Cassandra and HBase, Graph:
Neo4j and Amazon Neptune
el
value: large amounts of data with
simple lookup queries, Wide-
column: large amounts of data
with predictable query patterns,
Graph: analyzing and traversing
eH
relationships between connected
data
MongoDB etc.