0% found this document useful (0 votes)
5 views

Module 1 Nosql

The document discusses the advantages and limitations of relational databases, emphasizing their role in persistent data storage, concurrency management, and integration. It highlights the impedance mismatch between relational models and in-memory data structures, leading to the rise of NoSQL databases designed for large-scale data handling in distributed environments. The emergence of NoSQL is characterized by its schema-less nature, polyglot persistence, and the ability to efficiently manage complex data structures through aggregates.

Uploaded by

siraj.934124
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 1 Nosql

The document discusses the advantages and limitations of relational databases, emphasizing their role in persistent data storage, concurrency management, and integration. It highlights the impedance mismatch between relational models and in-memory data structures, leading to the rise of NoSQL databases designed for large-scale data handling in distributed environments. The emergence of NoSQL is characterized by its schema-less nature, polyglot persistence, and the ability to efficiently manage complex data structures through aggregates.

Uploaded by

siraj.934124
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

NoSQL Database 21CS745

MODULE 1
CHAPTER 1
1.1 The Value of Relational Databases
1.1.1 Persistent Data Storage
Relational databases have become deeply embedded in our computing culture, often taken for
granted. The primary advantage of relational databases is their ability to store large amounts
of persistent data. In computer systems, memory is typically divided into fast but volatile
"main memory" and a slower but larger "backing store" (usually a disk or other persistent
memory). Main memory is limited in space and loses data when the system shuts down.
Relational databases, as part of the backing store, provide a structured way to store and
retrieve data efficiently, allowing applications to access small parts of data quickly and
reliably.
1.1.2 Concurrency
In enterprise applications, multiple users may access the same data simultaneously,
potentially making changes. While users often modify different areas of the data, conflicts
can arise when they try to change the same data. Managing these concurrent interactions is
challenging, and errors like double-booking hotel rooms can occur. Relational databases
handle concurrency by controlling access to data through transactions, which allow safe and
coordinated interactions between users and systems. Though transactions don't eliminate all
errors, they significantly reduce complexity and help manage concurrency effectively.
1.1.3 Integration
Transactions in relational databases also facilitate error handling. When a change is made,
and an error occurs during processing, the transaction can be rolled back, undoing the change.
This is particularly important in enterprise ecosystems where multiple applications developed
by different teams often need to collaborate and share data. Shared database integration is a
common approach where all applications store and access data from a single database. This
method simplifies data sharing, and the database's built-in concurrency control manages
multiple applications in the same way it handles multiple users.
1.1.4 A (Mostly) Standard Model
One reason relational databases have remained dominant is the standardization of their core
features. Developers and database administrators can learn the basic principles of the
relational model and apply them across different projects. While there are variations between
vendors (e.g., different SQL dialects), the fundamental mechanisms remain consistent. This
standardization makes relational databases widely accessible and usable across different
environments.

1
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

1.2 Impedance Mismatch


While relational databases offer numerous advantages, they are not without their
shortcomings. One of the most significant issues developers face is the impedance
mismatch—the disconnect between the relational model used by databases and the in-
memory data structures used in application development.
Relational Data Model vs. In-Memory Data Structures
The relational data model structures data into tables (or relations) and rows (or tuples). A
tuple in the relational sense is a set of name-value pairs, while a relation is a collection of
these tuples. This structure lends itself to relational algebra, a mathematically elegant
approach to managing and manipulating data.
However, the simplicity of relational tuples comes with a limitation: the values in a tuple
must be simple and cannot include complex structures, such as nested records or lists. In
contrast, in-memory data structures used in programming languages can be far more
complex, allowing for rich hierarchies and nested collections of data. This means that when
developers want to store complex in-memory data in a relational database, they must first
translate it into the flat, tabular structure of the relational model. This translation process
creates what is referred to as impedance mismatch.

Example of Impedance Mismatch


Consider an example where an application has an order structure in its user interface,
represented as a single aggregate (nested) structure with items, customer information,
shipping details, and more. When storing this order in a relational database, the data needs to
be split across multiple tables and rows, such as separate tables for items, customers, and
shipping. This fragmentation of a single logical entity into many rows across different tables
highlights the complexity of mapping in-memory structures to relational data stores.

2
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

The Object-Oriented Challenge


In the 1990s, the rise of object-oriented programming (OOP) languages further magnified
the impedance mismatch. Object-oriented languages allowed developers to model complex,
real-world entities in code using objects that had rich relationships and behaviors. However,
relational databases could not naturally store these objects without significant transformation
into flat, relational tables.
During this time, many believed that relational databases would be replaced by object-
oriented databases that could store and retrieve in-memory objects directly, bypassing the
need for complex translations. Object-oriented databases gained some traction as an
alternative solution that promised to eliminate impedance mismatch by providing a direct
mapping between the object model in the code and the storage model on disk.
Relational Databases Persist
Despite the promise of object-oriented databases, relational databases continued to dominate.
Their success was due to several factors:
1. SQL Standardization: SQL, the query language for relational databases, became
widely accepted and implemented across different database systems. This
standardization provided developers and database professionals with a consistent
language for managing data.
2. Integration Role: Relational databases emphasized their role in integrating multiple
applications by serving as a central data repository, enabling various systems to share
and access the same data.
3. Separation of Concerns: The growing distinction between application development
and database management led to the emergence of specialized roles: application
developers focused on business logic, while database administrators (DBAs)
handled the complexities of data storage and optimization.
1.3 Application and Integration Databases
The triumph of relational databases over object-oriented (OO) databases is a debated topic.
One key factor in this victory was SQL’s role as an integration mechanism. Relational
databases became the integration database for multiple applications, each often developed
by separate teams, but sharing a common database to operate on a consistent set of persistent
data.
However, using a shared integration database comes with its own downsides:
 Increased complexity: Integrating multiple applications leads to a more complex
database structure than what a single application requires.
 Coordination issues: Changing the database structure for one application necessitates
coordination with other applications using the same database.
 Diverging needs: Applications often have different structural or performance
requirements, leading to conflicts over how the data is stored or indexed.

3
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

On the other hand, an application database is managed by a single application codebase and
team, allowing for much easier schema evolution and maintenance. This approach shifts
interoperability concerns to the application’s interface, where applications communicate via
web services, especially over HTTP. The shift to web services, commonly using XML or
JSON, allows for richer data structures compared to SQL relations.
Relational databases persisted in popularity despite the rise of application databases. Most
teams stuck with them, recognizing their familiarity and ease of use.
1.4 Attack of the Clusters
The early 2000s saw the rise of massive web properties handling large-scale data from links,
logs, social networks, and mapping data. As data volumes and traffic increased, companies
faced the challenge of scaling their computing resources.
There are two ways to scale:
1. Scaling up: Adding more power (e.g., processors, memory) to a single machine.
2. Scaling out: Using multiple smaller machines in a cluster. This option is cheaper,
more resilient, and can handle individual machine failures without affecting overall
availability.
However, relational databases struggled with clusters. They were designed for single-server
environments and couldn't handle distributed data management well. Although sharding
(dividing the database across multiple servers) helped distribute the load, it introduced new
problems like losing querying capabilities and referential integrity across shards.
Large-scale companies like Google and Amazon, which handled vast amounts of data, began
to look for alternatives. Their internal developments—Google's BigTable and Amazon's
Dynamo—provided models for databases that worked efficiently on clusters, setting the
stage for NoSQL databases designed to handle big data in distributed systems.
1.5 The Emergence of NoSQL
The term "NoSQL" emerged in the late 90s as the name of an open-source relational database
developed by Carlo Strozzi. This database was unique as it stored tables in ASCII files and
manipulated them using shell scripts instead of SQL, which gave it the "NoSQL" label.
However, this early iteration of NoSQL had no lasting impact on modern databases.
The NoSQL we recognize today traces its roots to a meetup organized by Johan Oskarsson in
San Francisco in 2009, inspired by projects like BigTable and Dynamo that were
experimenting with alternative data storage solutions. Oskarsson, seeking a memorable name
for the event, chose "NoSQL" from a suggestion by Eric Evans, even though the name was
somewhat misleading as these new databases weren't strictly against SQL but rather offered
different ways to handle data.
NoSQL databases quickly gained popularity, especially in the open-source community,
though they were never bound by a precise definition. These databases are often
characterized by their non-SQL query models, openness, and ability to run on clusters,
differing from traditional relational databases, which rely on ACID transactions. The rise of
web-scale applications in the early 21st century fueled the adoption of NoSQL databases,

4
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

driven by the need to handle large-scale data and provide flexibility with schema-less
structures, making them suitable for non-uniform data.
Though NoSQL databases are primarily open-source, some closed-source systems are also
labeled as NoSQL. Despite not using SQL, some NoSQL systems, like Cassandra’s CQL,
developed query languages that resemble SQL for easier adoption. These databases are
typically built for horizontal scalability, meaning they excel in distributed environments
where relational databases struggle.
The NoSQL movement also brought forth the concept of polyglot persistence—the use of
multiple types of databases within the same application, tailored to different use cases. This
approach allows organizations to choose the most appropriate data storage solution for each
scenario, moving beyond the default choice of relational databases. As the book suggests,
NoSQL databases are seen more as application databases rather than integration
databases, shifting away from using a single relational database for everything.
1.6 Key Points
 Relational databases have been highly successful for over two decades, offering
persistence, concurrency control, and integration mechanisms.
 Impedance mismatch between the relational model and in-memory data structures
has frustrated application developers.
 There is a growing trend to encapsulate databases within applications and integrate
through services, moving away from using databases as integration points.
 The primary driver of change in data storage has been the need to handle large
volumes of data on clusters, a task for which relational databases are not optimized.
 NoSQL is an accidental neologism with no prescriptive definition, only observable
common characteristics.
 The common characteristics of NoSQL databases include:
o Not using the relational model
o Running efficiently on clusters
o Being open-source
o Designed for the 21st-century web estates
o Schema-less
 The most significant outcome of NoSQL's rise is the concept of Polyglot Persistence,
where multiple data storage technologies are used within a single system for different
use cases.

5
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

Chapter 2: Aggregate Data Model


2.1. Data Models
 Data model: Refers to how we interact with data in the database, as opposed to the
storage model, which deals with how the database stores and manipulates data
internally.
o Ideally, we should not need to understand the storage model, but some
understanding is often required for performance optimization.
o Commonly, developers refer to data models as entity-relationship diagrams in
an application (e.g., customers, orders), but in this context, the data model
refers to how the database organizes data, also known as a metamodel.
Relational Data Model
 The relational model is based on tables (relations), where each table has rows
(tuples) that represent entities of interest. Columns describe entities, and relationships
between entities can be represented by columns that refer to other rows or tables.
 NoSQL databases move away from this relational model, employing different
approaches to organizing data, which can be categorized into:
1. Key-value
2. Document
3. Column-family
4. Graph
2.1.1. Aggregates
 Relational model limitations: Relational databases divide data into tuples, which do
not support nesting or complex structures like lists of values or nested records.
 Aggregate orientation:
o Recognizes that data is often manipulated in complex structures rather than
simple tuples.
o Aggregates allow nested records and lists, making it easier to treat related data
as a unit for operations.
 Aggregate:
o A collection of related objects treated as a single unit for data manipulation
and consistency management.
o The term is borrowed from Domain-Driven Design.
o Aggregates are useful in key-value, document, and column-family databases
because they enable atomic operations and help with replication and
sharding in a clustered environment.

6
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

2.1.2. Example: Relational vs. Aggregate Models


 Consider building an e-commerce website where data needs to be stored about users,
product catalogs, orders, shipping, and payment information.
 A relational database would normalize this data, separating it into various tables
(e.g., users, orders, products, addresses) with relationships maintained through foreign
keys.
 NoSQL approach:
o Instead of normalizing the data into separate tables, aggregates group related
data together (e.g., orders with associated shipping and payment info in one
document).
o Aggregates help optimize performance in clustered environments, facilitating
operations like sharding and replication.
 Pros and cons of both models can be explored depending on the complexity of the
application and its data manipulation requirements.

7
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

JSON Representation of Aggregates


Here’s the sample data presented in JSON format, a common format used in NoSQL
databases:
Customer Aggregate:
{
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}]
}
Order Aggregate:
{
"id": 99,
"customerId": 1,
"orderItems": [
{
"productId": 27,

8
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

"price": 32.45,
"productName": "NoSQL Distilled"
}
],
"shippingAddress": [{"city": "Chicago"}],
"orderPayment": [
{
"ccinfo": "1000-1000-1000-1000",
"txnId": "abelif879rft",
"billingAddress": {"city": "Chicago"}
}
]
}
Explanation of Aggregates
 Aggregate Boundaries:
o The customer and order data form distinct aggregates. Each aggregate
encapsulates related data such as billing addresses, order items, and payment
information.
o The customer aggregate contains details about the customer, including the
billing address.
o The order aggregate contains data related to the order, including items
ordered, shipping address, and payment details.
 Data Duplication:
o In this model, certain data (e.g., the billing and shipping address) is copied
into different parts of the JSON rather than linked by foreign keys (as in a
relational database).
o This approach allows for the immutability of certain information. For
example, you don’t want the shipping address or payment details to change
after an order is placed. Instead of linking addresses by an ID and updating
them globally, addresses are copied where needed.
Aggregates and Relationships
 The link between aggregates (such as between a customer and their orders) is
maintained through fields like customerId in the order. This shows the relationship but
does not imply aggregation; the customer and order aggregates remain distinct.

9
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

 Similarly, within the order, the productId would normally refer to a separate product
aggregate. However, in this case, we've denormalized the data by including the
productName directly in the order item to reduce the need to access multiple
aggregates during data interactions.
Aggregate Design Considerations
 Trade-offs: The main design decision here is whether to bundle related data (such as
a customer and their orders) into a single aggregate or keep them separate. This choice
depends on how data is typically accessed in the application:
o If a system frequently needs to retrieve all orders for a customer, it may make
sense to include orders within the customer aggregate (see Figure 2.4 in the
reference).
o If individual orders are accessed independently of customers, it's better to keep
orders and customers as separate aggregates.
Example of a Combined Customer and Order Aggregate
In some cases, you might embed all the customer's orders within the customer aggregate.
Here’s how that data might look in JSON:
{
"customer": {
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"orders": [
{
"id": 99,
"customerId": 1,
"orderItems": [
{
"productId": 27,
"price": 32.45,
"productName": "NoSQL Distilled"
}
],
"shippingAddress": [{"city": "Chicago"}],
"orderPayment": [

10
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

{
"ccinfo": "1000-1000-1000-1000",
"txnId": "abelif879rft",
"billingAddress": {"city": "Chicago"}
}
]
}
]
}
}
Consequences of Aggregate Orientation
Relational databases manage data elements and relationships, but they lack an inherent
understanding of aggregate entities. In real-world applications, such as a customer order,
aggregates like order items, shipping address, and payment may be logically grouped
together. Relational databases represent these relationships using foreign keys but lack any
distinction between aggregation and non-aggregation relationships. This limitation means that
relational databases can't leverage aggregate structures to optimize data storage or
distribution.
Challenges with Aggregates:
 Relational and Aggregate-Ignorant Models: Relational databases are "aggregate-
ignorant," meaning they don't recognize or optimize for aggregate structures. NoSQL
graph databases also share this characteristic. Aggregate-ignorant databases provide
flexibility to view data from various perspectives, but this can be a limitation when
trying to identify aggregates, which may hinder performance in specific use cases,
like querying product sales across orders.
 Cluster Considerations: Aggregate orientation becomes crucial when running
databases on a cluster, as is common with NoSQL systems. Defining aggregates helps
determine which pieces of data should be stored together on the same node,
minimizing the number of nodes that need to be queried and thus improving
efficiency.
 Transactions: Aggregate-oriented databases tend to limit ACID (Atomic, Consistent,
Isolated, Durable) transactions to within a single aggregate. This can be seen as a
drawback compared to relational databases, which allow ACID transactions across
multiple rows and tables. However, managing transactions across multiple aggregates
is often left to the application logic. In contrast, aggregate-ignorant databases such as
graph databases still support ACID transactions, similar to relational models.

11
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

2.2. Key-Value and Document Data Models


Key-value and document databases both follow an aggregate-oriented approach. In key-value
databases, the aggregate is opaque—treated as a simple blob—while document databases can
see and understand the structure within the aggregate.
 Key-Value Stores: These databases allow complete freedom over what can be stored,
as the database only recognizes the aggregate as a blob. However, the only way to
access data is through its key. This simplicity provides flexibility in data storage but
limits how data can be queried.
 Document Stores: In contrast, document databases impose some structure on the
aggregate, which enables querying based on fields within the document. This allows
users to retrieve parts of an aggregate and even create indexes for faster access. The
flexibility of document databases in querying makes them more powerful in certain
use cases compared to key-value stores.
While the line between key-value and document databases can be blurred, key-value
databases mainly emphasize lookup by key, while document databases allow more complex
queries based on the internal structure of the document.
2.3. Column-Family Stores
Column-family databases, such as Google's BigTable and its derivatives (e.g., HBase,
Cassandra), organize data into two-level aggregates consisting of rows and columns. These
databases are optimized for storing and retrieving large datasets, particularly in distributed
systems.
 Row-Oriented vs. Column-Oriented: In column-family databases, each row can be
seen as an aggregate (e.g., a customer), and column families represent chunks of
related data within that aggregate (e.g., customer profile or order history). Columns
within a row can vary across different rows, making these databases suitable for
handling complex and diverse data structures.
 Wide and Skinny Rows: These databases differentiate between wide rows (with
many columns) and skinny rows (with few columns shared across multiple rows).
Wide rows are often used to model lists, with each column representing an individual
element in the list. Skinny rows function more like traditional relational rows, where
columns represent specific fields.
 Sorting and Ordering: Column-family databases often allow for sorting columns
within a family. This enables efficient range queries, such as retrieving orders by a
combination of date and ID, making these databases ideal for time-series data or
applications requiring ordered access to large datasets.
Summarizing Aggregate-Oriented Databases
Aggregate-oriented databases, such as key-value, document, and column-family models,
share the common concept of an aggregate. An aggregate is a collection of data indexed by a
key and treated as a single unit, essential for enabling storage over clusters. Each aggregate is
stored together on one node, providing transactional control through atomic updates.

12
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

However, there are differences among the models:


 Key-Value Model: Treats the aggregate as a whole, accessible only through key
lookups, without the ability to query or retrieve parts of it.
 Document Model: Makes the aggregate transparent, allowing queries and partial
retrievals. But, due to the lack of a strict schema, databases can't optimize storage or
retrieval of document parts.
 Column-Family Model: Breaks the aggregate into column families, enabling
structured storage that improves accessibility by allowing the database to act on that
structure.

Chapter 3: More Details on Data Models

3.1. Relationships

Aggregates are beneficial for grouping data frequently accessed together, but different
applications may require various access patterns. For instance, while some applications might
prefer to combine customer information with their order history into a single aggregate,
others may treat orders as independent entities.

In cases where you want separate customer and order aggregates, it’s essential to establish a
relationship between them. A straightforward method is to embed the customer ID within the
order aggregate. This allows you to fetch the customer data by referencing the ID from the
order. However, this approach does not inform the database of the underlying relationship,
which can limit its ability to optimize queries or manage data effectively.

To address this, many databases, including key-value stores, offer mechanisms to represent
relationships explicitly. Document stores expose aggregate contents to facilitate indexing and
querying. For example, Riak allows metadata to include link information, enabling partial
retrieval and link-walking capabilities.

An important consideration with relationships is how updates are managed. Aggregate-


oriented databases treat aggregates as the atomic unit for data retrieval, meaning that
atomicity is only guaranteed within a single aggregate. If you need to update multiple
aggregates, you must handle potential failures manually. In contrast, relational databases
provide ACID transactions, enabling modifications across multiple records simultaneously.

Graph Databases in the NoSQL Landscape


Graph databases represent a unique approach within the NoSQL ecosystem, primarily
distinguished by their focus on complex relationships rather than large, aggregate-oriented
data models. While many NoSQL databases are designed to optimize for clustered
environments with large records and simple connections, graph databases emerged from the
need to overcome the limitations of relational databases, emphasizing small records with
intricate interconnections.

13
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

Understanding Graph Structures


In the context of databases, a graph consists of nodes connected by edges. For instance, in a
social network, each person can be represented as a node, while the relationships between
them (like friendships) are the edges. This structure enables powerful queries, such as finding
items related to a person based on their connections.
Graph databases excel in domains characterized by complex relationships, such as social
networks, product recommendations, and eligibility rules. The fundamental model of a graph
database is straightforward: it consists of nodes and edges. However, there is significant
variation in how data can be stored within these nodes and edges.
For example:
 FlockDB: Offers basic nodes and edges without additional attributes.
 Neo4j: Allows properties to be attached to nodes and edges in a schemaless manner,
enabling more flexibility.
 Infinite Graph: Stores Java objects as nodes and edges, allowing for more complex
data structures.
Querying Graph Databases
Graph databases enable specialized query operations that leverage their structural
relationships. Unlike relational databases, which often rely on foreign keys and can incur
high costs due to join operations, graph databases facilitate cheap traversals across
relationships. This efficiency is achieved by shifting the burden of relationship navigation
from query time to insert time, making queries much faster in scenarios where querying is
prioritized over inserting data.
Most queries involve navigating through the network of edges. For instance, to determine
what both Anna and Barbara like, a graph database would utilize the relationships connecting
their respective nodes. Typically, a starting point is needed, which can be achieved through
indexing based on attributes like ID. From there, the query can explore the relationships
associated with those nodes.
Differences from Aggregate-Oriented Databases
The emphasis on relationships in graph databases fundamentally distinguishes them from
aggregate-oriented databases. This distinction affects other aspects as well:
 Architecture: Graph databases are more likely to run on a single server instead of
distributed clusters.
 ACID Transactions: Maintaining consistency in graph databases often requires
transactions that span multiple nodes and edges, unlike aggregate-oriented databases
that focus on individual aggregates.

14
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

3.3. Schemaless Databases


A defining characteristic of NoSQL databases is their schemaless nature. Unlike relational
databases, which require a predefined schema—specifying tables, columns, and data types—
NoSQL databases allow for a more flexible and dynamic approach to data storage.
Flexibility and Freedom
In schemaless databases:
 Key-Value Stores: You can store any data under a key without any constraints.
 Document Databases: Documents can have varied structures, and there are no
restrictions on the content.
 Column-Family Databases: You can store different data types under any column.
 Graph Databases: You can freely add new nodes and edges, adapting the structure as
needed.
This flexibility is particularly beneficial for evolving projects, enabling developers to modify
data storage as requirements change. If certain fields become unnecessary, they can be
discarded without the concerns associated with deleting columns in a relational schema.
Schemaless databases also handle nonuniform data more gracefully. In relational databases, a
schema forces all rows into a rigid structure, often resulting in sparse tables or irrelevant
columns. In contrast, schemaless databases allow each record to contain only the necessary
fields.
Implicit Schemas
While the absence of a formal schema offers advantages, it also introduces challenges.
Applications often rely on an implicit schema—assumptions about the data's structure coded
into the application. For instance, a program may expect specific field names and data types,
which can lead to confusion if the data doesn't adhere to these assumptions.
This reliance on implicit schemas can create several problems:
 Understanding Data Structure: To comprehend what data is available, one must
delve into the application code, which may vary in clarity.
 Database Ignorance: The database itself lacks awareness of the implicit schema,
preventing it from optimizing data storage and retrieval or applying consistent
validations.
Challenges of Multiple Applications
When multiple applications access the same schemaless database, it can lead to
inconsistencies and complications. A few strategies to mitigate these issues include:
1. Encapsulation: Limit database interactions to a single application, using web services
to integrate with others.
2. Delineation: Clearly define different areas of an aggregate for different applications,
such as separate sections in a document or different column families.

15
Koustav Biswas, Dept Of CSE , DSATM
NoSQL Database 21CS745

Relational Schema vs. Schemaless


Critics of relational databases often point to their rigid schemas as a limitation, but relational
schemas can be modified over time using SQL commands. This flexibility allows for the
addition of new columns for nonuniform data. However, when data lacks uniformity, a
schemaless approach often proves advantageous.
While changing a relational database's schema can be controlled, managing changes in a
schemaless database also requires careful consideration to ensure both old and new data are
accessible. Moreover, while schemalessness provides flexibility within aggregates, changing
aggregate boundaries can present the same complexities as in relational systems.

------------------------------------------END OF MODULE 1------------------------------------

16
Koustav Biswas, Dept Of CSE , DSATM

You might also like