0% found this document useful (0 votes)
21 views3 pages

CH 2 BDA

The document discusses the differences between schema-based and schema-less databases, advantages of NoSQL databases over relational databases including scalability, flexibility and high availability, and the CAP theorem which states it is impossible to achieve consistency, availability and partition tolerance simultaneously in a distributed system.

Uploaded by

Binit Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views3 pages

CH 2 BDA

The document discusses the differences between schema-based and schema-less databases, advantages of NoSQL databases over relational databases including scalability, flexibility and high availability, and the CAP theorem which states it is impossible to achieve consistency, availability and partition tolerance simultaneously in a distributed system.

Uploaded by

Binit Karmakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

BIG DATA ANALYTICS

PECAIML601A

MODULE 2:

1. DIFFERENTIATE BETWEEN SCHEMA BASED AND SCHEMA LESS DATABASE

Feature Schema-based databases Schema-less databases


Requires a schema to be defined before data can Does not require a schema to be defined
Definition
be stored before data can be stored
Follows a strict structure defined by the schema,
Can be structured or unstructured, with flexible
Structure with tables, columns, and relationships between
data modeling and no fixed schema
them
Supports a wide variety of data types and
Supports only fixed data types and structures
Data types structures, including structured, semi-
defined by the schema
structured, and unstructured data
Can be horizontally scaled by sharding data across
Designed to be horizontally scalable, with
Scalability multiple servers, but scaling requires careful
built-in features for sharding and distribution
planning and management
Enforces strong consistency through the use of Allows for eventual consistency, where data
Consistency transactions, ensuring that data is always in a may not be immediately consistent but will
consistent state eventually become so
Supports a variety of query languages,
Supports SQL for querying data, with powerful including some SQL variants, as well as
Querying
features for filtering, sorting, and joining data specialized query languages for specific use
cases
Ideal for use cases that require flexible data
Ideal for use cases that require strong consistency, modeling, high scalability, and high availability,
Use cases
transactional support, and complex querying such as real-time analytics, content
management, and IoT applications
The schema determines the structure of the
The data model is flexible and can handle a
Data Flexibility data and any data added must conform to the
wide variety of data types and structures
schema
ACID
They are typically ACID-compliant They may or may not be ACID-compliant
Compliance
Examples Oracle, MySQL, SQL Server MongoDB, Cassandra, Couchbase

2. WHAT ARE SOME OF THE ADVANTAGES OF USING NOSQL DATABASES OV ER TRADITIONAL


RELATIONAL DATABASES?

• NoSQL databases have several advantages over traditional relational databases, including:

SCALABILITY:
• NoSQL databases are designed to be highly scalable, allowing them to handle large volumes of data and high levels
of read and write operations.

FLEXIBILITY:
• NoSQL databases are schema-less, meaning they can handle a variety of data types and structures without the
need for predefined schemas.

HIGH AVAILABILITY:
• NoSQL databases are designed to be highly available, with built-in features for replication and automatic failover.

PERFORMANCE:
• NoSQL databases are optimized for high-performance and can handle big data workloads with distributed
architectures, in-memory processing, and other optimization techniques.

COST-EFFECTIVENESS:
• NoSQL databases are often more cost-effective than traditional relational databases, as they can be run on
commodity hardware and do not require expensive licensing fees.

Overall, NoSQL databases are well-suited for modern data-intensive applications that require high scalability, flexibility,
and performance.

3. EXPLAIN THE CAP THEOREM.

• According to the CAP theorem, it is impossible to simultaneously achieve all three guarantees in a distributed
system. Instead, a trade-off must be made between consistency and availability in the presence of partition
tolerance.

a. Consistency
b. Availability
c. Partition Tolerance

CONSISTENCY:
• The data should remain consistent even after the execution of an operation. This means once data is written, any
future read request should contain that data. For example, after updating the order status, all the clients should
be able to see the same data.

AVAILABILITY:
• The database should always be available and responsive. It should not have any downtime.
PARTITION TOLERANCE:
• Partition Tolerance means that the system should continue to function even if the communication among the
servers is not stable. For example, the servers can be partitioned into multiple groups which may not communicate
with each other. Here, if part of the database is unavailable, other parts are always unaffected.

4. WHAT IS SHARDING IN NOSQL DATABASES, AND HOW DOES IT HELP WITH SCALABILITY?

• Different types of distributed systems prioritize different aspects of the CAP theorem. For example, traditional
relational databases prioritize consistency over availability and partition tolerance, while NoSQL databases may
prioritize availability and partition tolerance over consistency. It is up to the designer of the distributed system to
determine which guarantees to prioritize based on the needs of the application.
• Sharding is a technique used in NoSQL databases to horizontally partition data across multiple nodes in a cluster.
Each node in the cluster stores a subset of the data, and data is distributed based on a sharding key, which
determines which node should store each piece of data.
• Sharding helps with scalability by allowing NoSQL databases to handle large amounts of data and a high volume of
read and write operations. By distributing data across multiple nodes, sharding allows for parallel processing of
data and reduces the load on individual nodes. Sharding also allows for easy scaling of NoSQL databases, as new
nodes can be added to the cluster as needed.
• However, sharding can also introduce complexity and potential issues with consistency, as data may need to be
replicated across multiple nodes to ensure consistency. It is important to carefully design and manage sharding in
NoSQL databases to ensure optimal performance and consistency.

You might also like