0% found this document useful (0 votes)
20 views21 pages

Unit-3 BDA

The document provides an overview of NoSQL data management, highlighting its flexibility, scalability, and various data models such as key-value, document, and graph databases. It discusses aggregate data models, distribution models, sharding, versioning, and the significance of MapReduce in processing large datasets. Key takeaways emphasize the importance of NoSQL for modern data management needs and encourage exploration of emerging trends in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views21 pages

Unit-3 BDA

The document provides an overview of NoSQL data management, highlighting its flexibility, scalability, and various data models such as key-value, document, and graph databases. It discusses aggregate data models, distribution models, sharding, versioning, and the significance of MapReduce in processing large datasets. Key takeaways emphasize the importance of NoSQL for modern data management needs and encourage exploration of emerging trends in the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Unit - 3

NOSQL Data Management


Introduction to NoSQL Data Management
 NoSQL databases are a category of databases that do not
adhere to the traditional relational database management system
(RDBMS) model.
 They are designed to handle large volumes of unstructured or
semi-structured data, providing flexible data models and
scalability.
 NoSQL databases offer various data models, including key-value,
document, columnar, and graph databases, each suited for
different types of data and use cases.
 Key benefits include improved scalability, performance, and
flexibility for modern data management needs.
Aggregate Data Models: Definition and Examples
Definition: Aggregate data models group data together for
efficient storage and retrieval.

Examples:
 Arrays, lists,
 Nested documents,
 Sets.

A simple diagram showing how data is grouped


together in an aggregate model.
Aggregate Data Models: Use Cases and Benefits
Use Cases: Ideal for scenarios where data naturally groups together,
like social media posts with comments or product catalogs with
reviews.
Benefits: Simplified data organization, faster queries, and improved
performance.
Example: A social media platform leveraging aggregate data models
where,Each user profile is an aggregate containing various
data points such as username, bio, profile picture, etc.
User Profile:-
Username: JohnDoe
Posts: "Coding marathon! 💻" & "Excited for the weekend! 🎉"
Comments & Likes aggregated under each post .
Key-Value Data Model
Explanation:
• Simplest form of NoSQL.
• Each data item is stored as a key-value pair.
• Examples: Redis, Amazon DynamoDB.
Advantages:
Fast retrieval by key.
Scalable and flexible.
Disadvantages:
Limited querying capabilities.
Not suitable for complex data relationships.
Document Data Model
Explanation:
 Stores data as JSON-like documents.
 Each document can have its own structure.
 Examples: MongoDB, Couchbase.
Advantages:
Flexible schema.
Supports complex data structures.
Disadvantages:
Can lead to data redundancy.
Limited to document size constraints.
Relationships and Use Cases
Relationships: Connect nodes
representing how entities are related.
Use Cases:
• Social networks:
Friend connections.
• Recommendations:
Product or content ecommendations.

• Fraud detection:
Patterns in financial transactions.
Introduction to Graph Databases
Graph Databases:
 Store data as nodes, edges, and properties.
 Nodes represent entities, edges represent relationships,
properties contain attributes.

Examples:
 Neo4j
 Amazon Neptune
Schema-less Databases
Definition:
Schema-less databases, also known as schema-less or
schema-flexible databases, allow for storing data without
a predefined schema.
Benefits:
• Flexibility: Data can be added or modified without altering the
entire database schema.
• Agility: Rapid development and iteration, especially in evolving or
experimental projects.
• Scalability: Easier to scale horizontally as there are no
constraints on schema changes.
Materialized Views
Purpose:
Materialized views are precomputed views of data that are stored
physically and updated periodically based on changes to the underlying
data.
Benefits:
– Improved Performance: Materialized views can significantly speed up
query processing by precomputing and storing frequently accessed or
complex query results.
– Reduced Complexity: They simplify query execution by reducing the
need for complex joins or computations, leading to more efficient data
retrieval.
– Enhanced Scalability: Materialized views can help distribute query load
and improve scalability by offloading heavy query processing tasks to
precomputed views.
Distribution Models
 Distribution models in NoSQL databases determine
how data is distributed across multiple nodes or servers in
a distributed environment.
 Common distribution models include replication,
partitioning, and hybrid approaches.
 Distribution models decide how data spreads across
many servers.
 Think of it like sharing a pizza—everyone gets a slice.
Overview of Sharding
 Sharding is a database partitioning technique where large
datasets are divided into smaller, more manageable parts
called shards.
 Each shard is stored on a separate server or node.
 Sharding helps distribute data and query load evenly across
multiple servers, improving scalability and performance.
 Imagine it like organizing a library: spreading books across
multiple shelves for quicker access.
 Minimizes risks—if one server fails, others keep running .
 Speeds up data access by spreading the workload.
Versioning
 Versioning tracks changes made to data or documents over
time, creating a historical record of edits and updates.
 Enables auditing, compliance, and error recovery.
Methods:
 Timestamps: Assigns a timestamp to each data change, enabling
chronological tracking of revisions.
 Incremental Versioning: Assigns a unique version number to each
update, allowing easy comparison and retrieval of specific
versions.
 Branch Versioning: Creates separate branches for different
versions or branches of data, enabling parallel development or
experimentation without affecting the main version.
MapReduce: Overview and Significance
Overview:
 MapReduce is a programming model for processing and
generating large datasets in parallel.
 It consists of two phases: Map and Reduce.

Significance:
Revolutionized big data processing by enabling
distributed computing on large datasets.
Key component in processing and analyzing massive
volumes of data efficiently.
MapReduce in NoSQL Databases
How it Works:
 NoSQL databases leverage MapReduce for distributed data processing.
 MapReduce tasks are executed across multiple nodes in the database
cluster.
Benefits:
 Enables parallel processing and distributed computation.
 Enhances scalability and performance of NoSQL databases for handling
large datasets.
MapReduce Phases: Map,Shuffle & Sort/Reduce.
Example:
In MongoDB, MapReduce can be used for aggregating, filtering, and
analyzing large volumes of data stored in document collections.
Utilization of Partitioning and Combining
 NoSQL databases employ Partitioning to distribute data across
nodes for scalability and fault tolerance.
 Combining is utilized during query processing to aggregate and
summarize data from multiple partitions, reducing the need for
data movement and improving query performance.
Example:
 Partitioning: Imagine dividing a large library into smaller sections.
 NoSQL databases use partitioning to split data across servers.
 Combining: Then, you count or summarize information from all
sections.
 Combining merges data from partitions for efficient querying.
Partitioning and Combining
 Partitioning involves dividing a large dataset into
smaller, manageable partitions or shards.
 It enables horizontal scalability by distributing data
across multiple nodes.

 Combining, also known as aggregation, merges data


from multiple partitions to generate a unified result.
 It enhances performance by reducing the amount of data
transferred and processed during aggregation operations.
Composing MapReduce Calculations
Techniques:
Map Function: Breaks down input data into key-value pairs for
processing.
Reduce Function: Aggregates and processes intermediate
results from map tasks.

Examples:
Word Count: Counts the frequency of words in a document.
Average Calculation: Computes the average value of a dataset.
Sorting: Orders data based on specific criteria, like alphabetical
order or numerical value.
Conclusion: Recap of Key Points
NoSQL Data Management:
Explored various NoSQL concepts including aggregate data models, graph databases,
and MapReduce.
Importance:
NoSQL offers flexibility, scalability, and performance for modern data management
needs.
Key Takeaways:
– NoSQL databases provide diverse data models suited for different use cases.
– MapReduce enables distributed processing of large datasets.
– Techniques like partitioning and combining improve scalability and efficiency.
Future Directions:
Explore emerging trends and advancements in NoSQL technology for continued
innovation and growth.
Questions and Discussion
Feel free to ask questions or share insights. We're here to engage
and learn together.
Topics to Explore:
 NoSQL databases
 Aggregate data models
 Graph databases
 MapReduce
 Partitioning and combining
 Any related topics of interest
THANK
YOU

You might also like