CH.5 NOSQL Database For Business Applications
CH.5 NOSQL Database For Business Applications
NoSQL Databases
We know that MongoDB is a NoSQL Database, so it is very necessary to know about NoSQL
Database to understand MongoDB throughly.
NoSQL Database
NoSQL Database is used to refer a non-SQL or non relational database.
It provides a mechanism for storage and retrieval of data other than tabular relations model used
in relational databases. NoSQL database doesn't use tables for storing data. It is generally used to
store big data and real-time web applications.
Then the relational database was created by E.F. Codd and these databases answered the question
of having no standard way to store data. But later relational database also get a problem that it
could not handle big data, due to this problem there was a need of database which can handle
every types of problems then NoSQL database was developed.
Advantages of NoSQL
It supports query language.
It provides fast performance.
It provides horizontal scalability.
Document-Based Database:
The document-based database is a nonrelational database. Instead of storing the data in rows and
columns (tables), it uses the documents to store the data in the database. A document database
stores data in JSON, BSON, or XML documents.
Documents can be stored and retrieved in a form that is much closer to the data objects used in
applications which means less translation is required to use these data in the applications. In the
Document database, the particular elements can be accessed by using the index value that is
assigned for faster querying.
Collections are the group of documents that store documents that have similar contents. Not all the
documents are in any collection as they require a similar schema because document databases have
a flexible schema.
Flexible schema: Documents in the database has a flexible schema. It means the documents in the
database need not be the same schema.
Faster creation and maintenance: the creation of documents is easy and minimal maintenance is
required once we create the document.
No foreign keys: There is no dynamic relationship between two documents so documents can be
independent of one another. So, there is no requirement for a foreign key in a document
database.
Open formats: To build a document we use XML, JSON, and others.
Key-Value Stores:
A key-value store is a nonrelational database. The simplest form of a NoSQL database is a key-
value store. Every data element in the database is stored in key-value pairs. The data can be
retrieved by using a unique key allotted to each element in the database. The values can be simple
data types like strings and numbers or complex objects.
A key-value store is like a relational database with only two columns which is the key and the
value.
Simplicity.
Scalability.
Speed.
A column-oriented database is a non-relational database that stores the data in columns instead of
rows. That means when we want to run analytics on a small number of columns, you can read
those columns directly without consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve the data with greater
speed. A columnar database is used to store a large amount of data. Key features of columnar
oriented database:
Scalability.
Compression.
Very responsive.
Graph-Based databases:
Graph-based databases focus on the relationship between the elements. It stores the data in the
form of nodes in the database. The connections between the nodes are called links or relationships.
In a graph-based database, it is easy to identify the relationship between the data by using the
links.
The Query’s output is real-time results.
The speed depends upon the number of relationships among the database elements.
Updating data is also easy, as adding a new node or edge to a graph database is a straightforward
task that does not require significant schema changes.
The row key is the first column in each column family and serves as an identifier of a row. Each
column after that has a column key (name) that identifies columns within rows and enables
column queries. The value and the timestamp come after the column key, leaving a trace of when
the data was entered or modified.
Not every column of a column family has the same number of rows. They might share their
name but the database contains each column within one row and does not run across all rows.
Standard column family is like a table. It contains a key-value pair where the key is the row key,
and the values use their names as identifiers.
Super column family represents an array of columns. Each super column has a name and a
value mapping the super column out to several different columns. The database joins related
super columns under a single row into super column families.
Scalability. A column database's primary advantage is the ability to handle big data. Depending
on the scale of the database, it can cover hundreds of different machines. Columnar databases
support massively parallel processing, employing many processors to work on the same set of
computations simultaneously.
Compression. Compressing large amounts of data saves storage space.
Very responsive. The load time is minimal and columnar databases perform queries quickly,
making them practical for big data and analytics.
Aggregation performance. Columnar databases can scan and aggregate large volumes of data in
columns efficiently. Aggregations such as averages and sums are faster because the database
engine reads only the necessary columns, not entire rows. Aggregations that need to read the
entire column before updating a value -- such as a distinct count or sorting -- are generally much
faster than row-oriented databases.
Flexibility. In general, column-oriented databases are less flexible than row-oriented
architectures for general-purpose database work. However, the columnar architecture is more
adaptable in certain situations. For example, adding or removing columns as schemas evolve is
generally easier than in a row-based system because only the affected column must change. In a
traditional database, every row of data needs updates. Schema flexibility can be valuable when
analytic requirements change frequently or evolve over time.
Online transactional processing. Column databases are not as efficient with online transactional
processing as they are for online analytical processing (OLAP). They can analyze transactions,
but struggle to update them. A common strategy is to have the column database hold the data
required for business analysis and have a relational database store the data in the back end.
Incremental data loading. Column-oriented databases can quickly retrieve data for analysis,
even when processing complex queries. Incremental data loads are not impossible, but
columnar databases do not perform them in the most efficient way. It must scan the columns to
identify the right rows and then conduct another scan to locate the modified data that requires
overwriting.
Row-specific queries. The disadvantages of column databases all boil down to the same issue --
using the right type of database for the right purposes. Row-specific queries introduce an extra
step of scanning the columns to identify the rows and then locating the data to retrieve. It takes
more time to get to individual records scattered in multiple columns, rather than accessing
grouped records in a single column. Frequent row-specific queries might cause performance
issues by slowing down a column-oriented database, which defeats its purpose of providing
information quickly.
Security. Columnar databases are slightly more vulnerable to some security issues than other
types of databases because they rely on data compression -- especially of commonly repeated
values -- to improve performance. Compression can conflict with encryption. Encryption might
be less effective if compression occurs first because patterns in the compressed data might
remain. However, encrypting the data first can limit the performance benefits from
compression. One mitigating factor for potential vulnerabilities is that the most sensitive data,
such as unique identifiers, are less likely to effectively compress.
NoSQL Key/Value databases using MongoDB
Key value databases, also known as key value stores, are database types where data is
stored in a “key-value” format and optimized for reading and writing that data. The data
is fetched by a unique key or a number of unique keys to retrieve the associated value
with each key. The values can be simple data types like strings and numbers or complex
objects.
MongoDB covers a wide range of database examples and use cases, supporting key-value
pair data concepts. With its flexible schema and rich query language with secondary
indexes, MongoDB is a compelling store for “key-value” data. Learn more in this article
and try it with MongoDB Atlas, MongoDB’s Database-as-a-Service platform.
What is a key-value database?
Over the years, database systems have evolved from legacy relational databases storing
data in rows and columns to NoSQL distributed databases allowing a solution per use
case. Key-value pair stores are not a new concept and were already with us for the last
few decades. One of the known stores is the old Windows Registry allowing the
system/applications to store data in a “key-value” structure, where a key can be
represented as a unique identifier or a unique path to the value.
Data is written (inserted, updated, and deleted) and queried based on the key to
store/retrieve its value.
Key-value databases use compact, efficient index structures to be able to quickly and reliably
locate a value by its key, making them ideal for systems that need to be able to find and retrieve
data in constant time. Redis, for instance, is a key-value database that is optimized for tracking
relatively simple data structures (primitive types, lists, heaps, and maps) in a persistent database.
By only supporting a limited number of value types, Redis is able to expose an extremely simple
interface to querying and manipulating them, and when configured optimally is capable of high
throughput.
Retrieving a value (if there is one) stored and associated with a given key
Deleting the value (if there is one) stored and associated with a given key
Setting, updating, and replacing the value (if there is one) associated with a given key
Modern applications will probably require more than the above, but this is the bare minimum for
a key-value store.
Real time random data access, e.g., user session attributes in an online application such as
gaming or finance.
Caching mechanism for frequently accessed data or c
{
session_id : "ueyrt-jshdt-6783-utyrts",
create_time : 1122384666000
}
Further, MongoDB’s document values allow nested key-value structures, allowing not only for
accessing data by key in a global sense, but accessing and manipulating data associated with
keys within documents, and even creating indexes that allow fast retrieval by these secondary
kinds of keys.
{
name: "John",
age : 35,
dob : ISODate("01-05-1990"),
profile_pic : "https://example.com/john.jpg",
social : {
twitter : "@mongojohn",
linkedin : "https://linkedin.com/abcd_mongojohn"
}
}
MongoDB’s native drivers support multiple top used languages like Python, C#, C++, and
Node.js, allowing you to store the key value data in your language of choice.
Secondary indexes to support key value
Each one of the fields can be Indexed based on your query patterns. For example, if we seek for
a specific sessionid as the key and the createtime as a value, we can index
db.sessions.createIndex({session_id : 1}) and query on that key:
Wild card indexing allows users to index every field or a subset of fields in a MongoDB
collection. Therefore, if we have a set of field-value types stored in a single document and
queries could come dynamically for each identifier, we can create a single index for those field
value sets.
db.profiles.createIndex({"$**" : 1 });
As a result, our queries will have a full per field-value query supported by this index. Having
said that, wild card indexing should only be used in use cases when we cannot predict the field
names upfront and the variety of the queries predicates require so. See wild card restrictions for
more information.
Since MongoDB documents can be complex objects, applications can use a schema design to
minimize index footprints and optimize access for a “key-value” approach. This design pattern is
called the Attribute Pattern and it utilizes arrays of documents to store a “key-value” structure.
attributes: [
{
key: "USA",
value: ISODate("1977-05-20T01:00:00+01:00")
},
{
key: "France",
value: ISODate("1977-10-19T01:00:00+01:00")
},
{
key: "Italy",
value: ISODate("1977-10-20T01:00:00+01:00")
},
{
key: "UK",
value: ISODate("1977-12-27T01:00:00+01:00")
},
...
]
Indexing {attributes.key : 1 , attributes.value : 1} will allow us to search on any
key with just one index.
MongoDB uses the cache of its WiredTiger engine to optimize data access and read performance
together with strong consistency and high availability across replica sets. This allows for more
resilient and available field-value stores while still using the best performance of cached data.
MongoDB documents can form compact flexible structures to support fast indexing for your
key-value stores. On the other hand, MongoDB documents may consist of rich objects which can
contain entire hierarchies and sub-values, and sophisticated indexing allows documents to be
retrieved by any number of different keys.
Summary
Key-value stores are used for use cases where applications will require values to be retrieved fast
via keys, like maps or dictionaries in programming languages. The compact structure and
straightforward indexing and seeking through those indexes makes this database concept a win
for specific application workloads.
However, modern applications will probably require more than just a key-value retrieval and this
is where MongoDB and MongoDB Atlas offer the optimal solution. MongoDB can support the
field-value store solution while allowing complex objects to be formed and multiple ways to
query the data: Full-Text Search, Aggregation Framework, Atlas Data Tiering, or Scaling it
across multiple shards.
What is MongoDB?
MongoDB the most popular NoSQL database, is an open-source document-oriented database.
The term ‘NoSQL’ means ‘non-relational‘.
It means that MongoDB isn’t based on the table-like relational database structure but provides
an altogether different mechanism for the storage and retrieval of data. This format of storage
is called BSON ( similar to JSON format).
{
title: 'Geeksforgeeks',
by: 'Harshit Gupta',
url: 'https://www.geeksforgeeks.org',
type: 'NoSQL'
}
SQL databases store data in tabular format. This data is stored in a predefined data model which
is not very much flexible for today’s real-world highly growing applications.
Modern applications are more networked, social and interactive than ever. Applications are
storing more and more data and are accessing it at higher rates.
Relational Database Management System(RDBMS) is not the correct choice when it comes to
handling big data by the virtue of their design since they are not horizontally scalable. If the
database runs on a single server, then it will reach a scaling limit.
NoSQL databases are more scalable and provide superior performance. MongoDB is such a
NoSQL database that scales by adding more and more servers and increases productivity with its
flexible document model.
To learn how to use MongoDB effectively in full-stack applications, from data modeling to
querying, the Full Stack Development with Node JS course offers a comprehensive
introduction to MongoDB and its integration with Node.js.
RDBMS vs MongoDB
RDBMS has a typical schema design that shows number of tables and the relationship between
these tables whereas MongoDB is document-oriented. There is no concept of schema or
relationship.
Complex transactions are not supported in MongoDB because complex join operations are not
available.
MongoDB allows a highly flexible and scalable document structure. For example, one data
document of a collection in MongoDB can have two fields whereas the other document in the
same collection can have four.
MongoDB is faster as compared to RDBMS due to efficient indexing and storage techniques.
There are a few terms that are related in both databases. What’s called Table in RDBMS is called
a Collection in MongoDB. Similarly, a Row is called a Document and a Column is called a Field.
MongoDB provides a default ‘_id’ (if not provided explicitly) which is a 12-byte hexadecimal
number that assures the uniqueness of every document. It is similar to the Primary key in
RDBMS.
Indexing: Without indexing, a database would have to scan every document of a collection to
select those that match the query which would be inefficient. So, for efficient searching Indexing
is a must and MongoDB uses it to process huge volumes of data in very less time.
Scalability: MongoDB scales horizontally using sharding (partitioning data across various
servers). Data is partitioned into data chunks using the shard key and these data chunks are
evenly distributed across shards that reside across many physical servers. Also, new machines
can be added to a running database.
Replication and High Availability: MongoDB increases the data availability with multiple copies
of data on different servers. By providing redundancy, it protects the database from hardware
failures. If one server goes down, the data can be retrieved easily from other active servers
which also had the data stored on them.
Aggregation: Aggregation operations process data records and return the computed results. It is
similar to the GROUPBY clause in SQL. A few aggregation expressions are sum, avg, min, max,
etc
Big Data: If we have huge amount of data to be stored in tables, think of MongoDB before
RDBMS databases. MongoDB has built-in solution for partitioning and sharding our database.
Unstable Schema: Adding a new column in RDBMS is hard whereas MongoDB is schema-less.
Adding a new field does not effect old documents and will be very easy.
Distributed data Since multiple copies of data are stored across different servers, recovery of
data is instant and safe even if there is a hardware failure.
Language Support by MongoDB
MongoDB currently provides official driver support for all popular programming languages like
C, C++, Rust, C#, Java, Node.js, Perl, PHP, Python, Ruby, Scala, Go and Erlang.
Installing MongoDB:
For Windows, a few options for the 64-bit operating systems drops down. When you’re running
on Windows 7, 8 or newer versions, select Windows 64-bit 2008 R2+. When you’re using
Windows XP or Vista then select Windows 64-bit 2008 R2+ legacy.
Conclusion
In conclusion, MongoDB’s document-oriented structure, scalable architecture through sharding,
and robust features like indexing and aggregation make it a preferred choice for modern
applications handling large volumes of data. With its ability to manage distributed data
effectively and support a wide array of programming languages, MongoDB continues to
empower developers to build scalable and efficient applications.
With relational databases, JSON data needs to be parsed or stored using the NVARCHAR
column (LOB storage). However, document databases like MongoDB can store JSON data in its
natural format, which is readable by humans and machines.
book {
title: 'Moby Dick',
author: {
name: 'Herman Melville',
born: 1819
}
}
Here, the author details are inside the book document itself. This technique is also known as
embedding because the author subdocument is embedded in the book document.
2. Store parts of objects separately and link them using the unique identifiers (referencing).
One author may write multiple books. So, to avoid duplicating data inside all the books,
we can create separate author document and refer to it by its _id field:
author {
_id: ObjectId(1),
name: 'Herman Melville',
born: 1819
}
book {
_id: ObjectId(55),
title: 'Moby Dick',
author: ObjectId(1)
}
NoSQL databases, in general, have more storage flexibility and offer better indexing methods. In
a document database, each document is handled as an individual object, and there is no fixed
schema, so you can store each document in the way it can be most easily retrieved and viewed.
Additionally, you can evolve your data model to adapt it to your changing application
requirements. The schema versioning pattern makes use of the flexible document model to allow
just that.
Many developers are familiar with SQL. By storing data in a JSON database, developers can
simply map SQL columns and JSON document key names. For example, the bookName key of a
document can be mapped to the book_name column of the book table. Most JSON databases
automate this mapping, which saves on a developer’s learning curve and reduces the
development time.
Due to the availability of various index types, search queries are quite fast. For example, since
MongoDB has no fixed schema, you can create a wildcard index on a field or set of fields to
support querying that field. There are many other types of indexes, like O2-tree and T-tree, that
make NoSQL databases highly performant.
JSON databases have a flexible schema and scale well vertically and horizontally, making them
suitable to store huge volumes and a variety of big data. Document databases like MongoDB
have a rich query language (MQL) and aggregation pipeline, eliminating the need for ETL
systems for data processing and transformation. Further, these databases can easily pass data to
popular data analysis programming languages like Python and R, without additional coding
steps.
Introduction to FireBase:
History of Firebase
Firebase evolved from Envolve. Envolve is a prior startup founded by James Tamplin and
Andrew Lee in 2011. Envolve provided developers an API which allowed the integration of
online chat functionality into their websites. After releasing the chat service, it found that the
envlove was being used to pass application data, which were not chat messages. Developers used
Envolve to sync application to separate the real-time architecture and the chat system which
powered it. In September 2011, Tamplin and Lee founded firebase as a separate company. It was
lastly launched to the public in April 2012.
Firebase Real-time Database was the first product of firebase. It is an API which syncs
application data across Android, iOS, and Web devices. It gets stored on Firebase's cloud. Then
the firebase real-time database helps the developers to build real-time, collaborative applications.
In May 2012, after launching the beta, Firebase raised $1.1M in seed funding from
Greylock Partners, venture capitalists Flybridge Capital Partners, New Enterprise
Associates, and Founder Collective.
In June 2013, the company again raised $5.6M in Series A funding from Flybridge
Capital Partnersandventure capitalists Union Square Ventures.
Firebase launched two products in 2014, i.e., Firebase Hosting and Firebase
Authentication. It positioned the company as a mobile backend as a service.
Firebase was acquired by Google in October 2014.
Google promoted Divshot to merge it with the Firebase team in October 2015.
In May 2016, Firebase expanded its services to become a unified platform for mobile
developers. Now it has integrated with various other Google services, including AdMob,
Google Cloud Platform, and Google Ads, to offer broader products and scale it for
developers.
Google acquired Fabric and Crashlytics from Twitter in January 2017 to add Fabric and
Crashlytics services to Firebase.
Firebase launched Cloud Firestore in October 2017. It is a realtime document database as
the successor product for the original Firebase Realtime Database.
Pros
Cons
Features of Firebase
Firebase has several features that make this platform essential. These features include unlimited
reporting, cloud messaging, authentication and hosting, etc. Let's take a look at these features to
understand how these features make Firebase essential:
Incredibly Built-In Analytics
The analytics dashboard is one of the best features of Firebase, which is equipped with. It is free
and can report 500 event types, each with 25 attributes. The dashboard is top-notch for observing
user behavior and measuring various user characteristics. Ultimately it helps us to understand
how people use our app so that we can better optimize it in the future.
Key features
Unlimited Reporting
Advertisement
Audience Segmentation
We can identify custom audiences in the Firebase console based on device data, custom events,
or user properties. After that, we can use these audiences that we specified with other Firebase
attributes when targeting new features or notifications.
Cloud Messaging
Firebase allows us to deliver and receive messages in a more reliable way across platforms.
Authentication
Test Lab
Hosting
Remote Configuration
Dynamic Links
Dynamic Links are smart URLs which dynamically change behavior for providing the best
experience across different platforms. These links allow app users to take directly to the content
of their interest after installing the app - no matter whether they are completely new or lifetime
customers.
Crash Reporting
Real-time Database
It can store and sync app data in real-time.
Storage
Here are some user interaction aspects which make development a piece of cake:
AdWords
Linking AdWords is very easy, and with it, we can segment and define our user base using
Firebase Analytics. Also, it is easy to improve our targeting in marketing advertising campaigns.
Some other benefits include conversion tracking, cross-network, attribution networks, and LTV
(Calculating Customer Lifetime Value).
App Indexing
With app indexing, we can work on aspects like re-engaging with our app, especially by surfing
the in-app content within Google search results. It will also help in ranking our application in
Google search results.