0% found this document useful (0 votes)
33 views9 pages

BDA CW Chapter 3

Uploaded by

tyagrajssecs121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views9 pages

BDA CW Chapter 3

Uploaded by

tyagrajssecs121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

BDA CW Chapter 3: 20M

1. What is NoSQL? Differentiate it from RDBMS and explain its types: Key-Value Stores, Graph
Stores, Column Family Stores, and Document Stores. [IA1, PYQ]

NoSǪL stands for “Not Only SǪL” and refers to a variety of database technologies that were developed
to address the limitations of traditional relational databases. NoSǪL databases are particularly useful for
handling big data and real-time web applications.

1. Document Databases Example: MongoDB

 Structure: Stores data in documents similar to JSON objects.


 Use Case: Ideal for content management systems, real-time analytics, and applications
requiring flexible schemas.
 Example Document:

"_id": "12345",

"name": "John Doe",

"hobbies": ["reading", "gaming", "hiking"]

2. Key-Value Stores Example: Redis

 Structure: Stores data as key-value pairs.


 Use Case: Suitable for caching, session management, and real-time analytics.
 Example "user:1000" -> {"name": "John Doe", "email": "[email protected]"}

3. Graph Databases Example: Neo4j

 Structure: Stores data in nodes, edges, and properties.


 Use Case: Perfect for applications involving complex relationships, like social networks,
recommendation engines, and fraud detection.
 Example: (User: John Doe)-[FRIEND]->(User: Jane Smith)

4. Column Stores Example: Cassandra

 Structure: Stores data in tables with rows and dynamic columns.


 Use Case: Excellent for handling large-scale, distributed data across many servers.
 Example:
• Row Key: user123
• Columns:
• name: John Doe
• email: [email protected]
• address: 123 Main St, Anytown, CA, 12345

Advantages of NoSǪL Databases

1. Scalability: Easily scale out by adding more servers.


2. Flexibility: Handle various data types and structures without predefined schemas.
3. Performance: Optimized for specific data models and access patterns, often resulting in faster
performance for certain queries.
4. Availability: Designed to ensure high availability and fault tolerance.

RDBMS Example: MySQL

• Use Case: Banking system where data integrity and complex transactions are crucial.
• Structure: Tables for customers, accounts, transactions, etc., with relationships defined by foreign
keys.

NoSQL Example: MongoDB

 Use Case: Social media platform where data is unstructured and rapidly changing.
 Structure: Collections of documents, each document being a JSON-like object.
2. Explain CAP theorem and how it is different from ACID properties in databases. [IA1, PYQ]
3. Describe BASE properties in NoSQL Database.

 NoSQL systems guarantee the BASE properties, which stand for Basically Available, Soft State, and
Eventual Consistency.
1. Basically Available: This means the system guarantees availability of the data, ensuring that
requests will receive a response, even if it’s a failure. This is achieved through redundancy and
replication, which help maintain availability even during partial system failures
2. Soft State: The state of the system can change over time, even without input from the user. This is
due to the ongoing background processes that update the data. NoSQL systems handle these changes
gracefully, ensuring that the system remains operational, and data is not corrupted
3. Eventual Consistency: Instead of requiring immediate consistency like traditional ACID-compliant
databases, NoSQL systems ensure that data will eventually become consistent. This means that after
some time, all updates will propagate through the system, and all nodes will have the same data.
 These properties allow NoSQL databases to be highly scalable and available, making them suitable
for large-scale applications like social media platforms and online shopping websites.

4. What are the different ways NoSQL systems handle Big Data problems? [PYQ]

1. Horizontal Scaling (Sharding)

 Solution: NoSQL systems support horizontal scaling by distributing data across multiple
machines (sharding). Unlike traditional relational databases, which rely on vertical scaling
(increasing the power of a single machine), NoSQL databases can grow by simply adding more
servers to the cluster.
 How it helps: This enables NoSQL databases to handle massive datasets and traffic spikes
efficiently, offering better performance and flexibility.

2. Schema-less Data Models

 Solution: NoSQL databases are typically schema-less, meaning they don’t require a fixed
structure for data storage. This allows for flexible storage of diverse data types (e.g., JSON,
XML, key-value pairs, or graphs) without needing to define a rigid schema upfront.
 How it helps: This flexibility is particularly useful for handling unstructured or semi-structured
data, which is common in Big Data scenarios.

3. Distributed and Replicated Data Storage

 Solution: NoSQL systems store copies of data across multiple nodes (replication) and use
distributed architectures for fault tolerance and availability.
 How it helps: By replicating data across multiple locations, NoSQL databases ensure high
availability, durability, and fault tolerance, making them resilient to node failures.
4. CAP Theorem (Consistency, Availability, Partition Tolerance)

 Solution: NoSQL systems often make trade-offs based on the CAP Theorem, prioritizing two
of the three properties (Consistency, Availability, Partition Tolerance) depending on the use
case.
o Consistency: Ensures data is the same across all nodes.
o Availability: Guarantees that a system is always available to read and write.
o Partition Tolerance: Ensures the system remains operational despite network partitions
or communication failures.
 How it helps: This flexibility allows NoSQL systems to be optimized for different applications,
like real-time data processing or large-scale data storage, depending on which two properties are
prioritized.

These approaches enable NoSQL databases to address Big Data challenges like scalability, flexibility,
fault tolerance, and consistency.

5. Demonstrate how LiveJournal's use of Memcache has successfully enhanced scalability,


performance, and cost-effectiveness.

Scalability
 Distributed caching system: Memcache spread the load across multiple servers by caching
frequently accessed query results.
 Horizontal scaling: As traffic increased, new Memcache servers could be added seamlessly to
handle the load without overburdening individual servers.

Performance
 Reduced database queries: Memcache cached frequently accessed data in RAM, minimizing
the need for repetitive database queries.
 Improved response times: Retrieval from Memcache (cache hit) was near-instantaneous,
ensuring faster user experience even during high traffic periods.

Cost-Effectiveness
 Database offloading: Memcache reduced the frequency of expensive database operations,
easing pressure on database infrastructure.
 Lower infrastructure costs: Avoided the need for constant database scaling, saving on
additional hardware and operational expenses.

Key Business Drivers for Adopting Memcache


 Rising traffic: Addressed the challenges of increasing user demand without degrading
performance.
 Resource optimization: Maximized RAM utilization across servers to improve efficiency.
 Database overload: Solved issues caused by repetitive queries slowing down the system.
 Cost management: Offered a cost-effective alternative to scaling database infrastructure.
6. Explain NoSQL Data architecture patterns. [PYQ]

NoSQL databases provide flexibility and scalability, making them well-suited for handling big data.
These databases use four main architecture patterns for data storage and management:

1. Key-Value Store Database

 Description:
Data is stored as key-value pairs, similar to a hash table, where each key is unique, and the
associated value can be any type (e.g., JSON, BLOB).
 Use Case:
Common in e-commerce applications or shopping websites.
 Advantages:
o Efficient for large-scale data and high loads.
o Quick retrieval of data using keys.
 Limitations:
o Difficult to handle complex queries involving multiple keys.
o May face challenges with many-to-many relationships.
 Examples:
o DynamoDB
o Berkeley DB

2. Column Store Database

 Description:
Data is stored in columns rather than rows, with each column treated individually and capable of
storing large volumes of data.
 Use Case:
Ideal for analytics and operations requiring computations across columns (e.g., SUM,
AVERAGE).
 Advantages:
o Data is readily available for column-based queries.
o Efficient for aggregate functions.
 Examples:
o HBase
o Cassandra
o Google Bigtable
3. Document Database

 Description:
Stores data as documents, which are complex structures like JSON, XML, or text. These
databases are ideal for semi-structured or unstructured data.
 Use Case:
Widely used for applications generating JSON-based data.
 Advantages:
o Excellent for semi-structured data.
o Easy storage, retrieval, and management of documents.
 Limitations:
o Handling multiple documents can be difficult.
o Aggregation operations may not always work accurately.
 Examples:
o MongoDB
o CouchDB

4. Graph Database

 Description:
Stores data in graph structures where entities are represented as nodes and relationships between
them as edges. Useful for highly interconnected data.
 Use Case:
Social networks, recommendation systems, and scenarios involving relational connections.
 Advantages:
o Fast traversal due to direct connections.
o Effective handling of spatial and relational data.
 Limitations:
o Incorrect connections can cause infinite loops.
 Examples:
o Neo4J
o FlockDB (used by Twitter)

7. Explain Shared Memory, Shared Disk System and Shared Nothing System with the help of diagrams

Shared Memory System

Definition: Multiple processors (or nodes) share a single memory space, allowing direct communication
and data exchange between processors.
Components:
1. Processors (Nodes):
o Multiple processors operate independently but access a shared pool of memory.
2. Shared Memory:
o A single, centralized memory unit that all processors can read from and write to.
Key Characteristics:
 Centralized Memory: A single memory unit is shared across all processors, eliminating the
need for inter-node data transfer.
 Low Communication Latency: Processors can directly access shared memory, making data
exchange faster compared to distributed systems.
 Scalability Challenges: Adding more processors can lead to contention for shared memory
access, creating bottlenecks as the number of nodes increases.
 Synchronization Requirements: Mechanisms like semaphores or locks are needed to
coordinate memory access and prevent conflicts among processors.
Shared Disk System

 Definition: Multiple processors (or nodes) share access to a centralized disk storage while
maintaining their private memory.
 Components:
1. Processors (Nodes): Each processor operates independently but accesses a common disk.
2. Shared Disk: All data is stored on a single shared disk accessed by all nodes, often
coordinated using a distributed lock manager.
 Key Characteristics:
o Centralized Storage: Data resides on a shared disk accessible by all nodes.
o Coordination: Distributed lock management is required to ensure data consistency.
o Scalability Challenges: Bottlenecks may occur at the shared disk, and managing locks can
be complex.

Shared Nothing System

 Definition:
Each node has independent resources (processor, memory, and disk storage) with no shared
components, allowing high scalability and fault tolerance.
 Components:
1. Independent Nodes: Each node has its own processor, memory, and disk storage,
operating independently.
2. Data Partitioning: Data is distributed across nodes, with each node responsible for a
specific subset.
 Key Characteristics:
o Independent Operation: Eliminates bottlenecks as nodes do not share resources.
o Scalability: Horizontal scaling is straightforward by adding more nodes.
o Fault Tolerance: System remains operational even if individual nodes fail.

You might also like