0% found this document useful (0 votes)

10 views4 pages

Data Engineering Unit 3

Uploaded by

Santhu N Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views4 pages

Data Engineering Unit 3

Uploaded by

Santhu N Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Engineering Unit 3

HBase Distributed Storage Architecture

 Master-Worker Pattern: HBase follows a master-worker architecture. The Master node is responsible for
coordinating tasks, while Range Servers (also known as Region Servers) manage specific subsets of data
(regions).

 Regions and Row Keys: Each range (or region) in HBase stores an ordered set of rows, identified by unique
row keys. As the size of data in a region grows beyond a configured threshold, the region is split into two,
with the data divided accordingly.

 Column-Family and Store Mapping: HBase stores data in columns, grouped into column families. Each
region maintains a separate store for each column family, with these stores mapping to physical files in the
underlying distributed file system.

 Write-Ahead Log (WAL): HBase uses a write-ahead log to ensure data durability. When data is written to
HBase, it first goes to the WAL before being written to the in-memory store. If the in-memory store is full,
data is flushed to disk.

 Distributed File System: HBase typically uses the Hadoop Distributed File System (HDFS) for storage. The
HDFS follows a master-worker pattern similar to HBase, with a NameNode and DataNodes. HBase interacts
with the file system through a filesystem API, allowing compatibility with other systems like CloudStore
(formerly Kosmos FileSystem).

 ZooKeeper for Configuration and Coordination: HBase relies on ZooKeeper for configuration management
and coordination. ZooKeeper assists with client requests by managing the catalog information (-ROOT-
and .META.) necessary for locating specific rows in HBase tables.

 Data Access Flow: When accessing data, the client first consults the -ROOT- and .META. catalogs via
ZooKeeper to locate the relevant region. This process is cached, so subsequent requests to the same data
can bypass the lookup steps.
Introduction to NoSQL Databases

NoSQL databases are designed to store and manage large volumes of unstructured, semi-structured, or structured
data. Unlike traditional relational databases, NoSQL databases do not require a fixed schema, and they scale
horizontally, making them ideal for handling big data and real-time web applications.

Types of NoSQL Databases with Examples

1. Document-Oriented Databases

o Description: These databases store data in document formats like JSON, BSON, or XML, where each
document can have a different structure.

o Example: MongoDB

 Use Case: Ideal for content management systems, e-commerce platforms, and applications
that require flexible schemas.

2. Key-Value Stores

o Description: These databases store data as a collection of key-value pairs. The key is a unique
identifier, and the value can be any type of data.

o Example: Redis

 Use Case: Suitable for caching, session management, and real-time analytics.

3. Column-Family Stores

o Description: These databases store data in columns rather than rows, allowing for efficient querying
and storage of large datasets.

o Example: Apache Cassandra

 Use Case: Best for time-series data, logging, and real-time analytics applications.

4. Graph Databases

o Description: These databases store data in nodes and edges, representing entities and relationships
between them, respectively.

o Example: Neo4j

 Use Case: Ideal for social networks, recommendation engines, and fraud detection systems.

5. Wide-Column Stores

o Description: A hybrid between key-value and column-family stores, wide-column stores allow
storing a large amount of data in a columnar format.

o Example: HBase

 Use Case: Used for handling sparse data, such as in big data applications and Hadoop
ecosystems.

6. Object-Oriented Databases

o Description: These databases store data as objects, similar to how they are handled in object-
oriented programming languages.

o Example: db4o

 Use Case: Suitable for applications where data is naturally represented as objects, such as in
complex simulations.
7. Time-Series Databases

o Description: These databases are optimized for storing and querying time-stamped or time-series
data.

o Example: InfluxDB

 Use Case: Used in IoT, monitoring systems, and real-time analytics.

8. Multi-Model Databases

o Description: These databases support multiple data models, such as key-value, document, and graph
within the same database.

o Example: ArangoDB

 Use Case: Useful for applications requiring flexibility in handling different data types and
relationships.

9. Search Engines

o Description: These are specialized databases designed for searching and indexing large volumes of
text data.

o Example: Elasticsearch

 Use Case: Used in full-text search applications, logging, and analytics.

10. Geospatial Databases

o Description: These databases are optimized for storing and querying geospatial data, such as
coordinates and polygons.

o Example: PostGIS (an extension of PostgreSQL)

 Use Case: Used in geographic information systems (GIS), location-based services, and
mapping applications

CAP theorem https://www.geeksforgeeks.org/the-cap-theorem-in-dbms/

The CAP theorem is a fundamental principle in distributed systems that helps explain the trade-offs that
must be made when designing databases that are spread across multiple networked nodes. The theorem, originally
proposed by Eric Brewer in 2000, outlines three essential properties that distributed databases aim to achieve:

1. Consistency: Every read operation reflects the most recent write. This means that all clients accessing the
system will have the same view of the data at the same time. For example, if a transaction updates a piece of
data, all subsequent reads should return the updated data.

2. Availability: Every request (whether read or write) receives a response, even if it might not reflect the latest
data. This means that the system remains operational, and clients can always access data, but the data might
be stale or inconsistent during certain conditions.

3. Partition Tolerance: The system continues to function even if there is a network partition that prevents
some parts of the system from communicating with others. This means the system is resilient to network
failures, ensuring that it remains operational even if some nodes are isolated due to network issues.

Key Insight of the CAP Theorem

The CAP theorem states that it is impossible for a distributed system to fully achieve all three of these properties
simultaneously. Instead, a system can at most achieve two out of the three:
 CA (Consistency and Availability): A system that ensures both consistency and availability will not be able to
handle network partitions effectively. Such a system will function smoothly as long as there are no network
partitions, but if a partition occurs, the system might fail to maintain either consistency or availability.

 CP (Consistency and Partition Tolerance): A system that ensures consistency and can handle network
partitions might have to sacrifice availability during a partition. For example, in the case of a network
partition, some parts of the system might become unavailable to ensure that the data remains consistent
across all nodes.

 AP (Availability and Partition Tolerance): A system that ensures availability and can handle partitions might
sacrifice consistency. This means that during a network partition, the system might continue to operate and
respond to requests, but different parts of the system might return different, potentially inconsistent, data.

Eventual Consistency and CAP Theorem

The concept of eventual consistency arises within the context of the CAP theorem, particularly in systems that
prioritize availability and partition tolerance (AP). Eventual consistency is a weaker form of consistency that allows
the system to provide immediate availability and partition tolerance, with the understanding that the data will
eventually become consistent once the system has had enough time to propagate all updates across all nodes.

In other words, under eventual consistency:

 In the absence of updates, the system will eventually reach a state where all nodes have the same data.

 With continuous updates, there might be temporary inconsistencies, but eventually, all replicas in the
system will converge to a consistent state, or a replica might be removed if it cannot be synchronized.

BASE vs. ACID

Eventual consistency is part of the BASE (Basically Available, Soft state, Eventual consistency) model, which is often
contrasted with the ACID (Atomicity, Consistency, Isolation, Durability) model used in traditional relational
databases:

 ACID ensures strict consistency and reliability of transactions, making it suitable for systems where data
accuracy and integrity are critical, such as financial applications.

 BASE allows for more flexibility and scalability in distributed systems, where availability and partition
tolerance are prioritized, and temporary inconsistencies are acceptable as long as the system eventually
becomes consistent.

Unit-5 Notes
No ratings yet
Unit-5 Notes
17 pages
BDA UT2 QB Answers
100% (1)
BDA UT2 QB Answers
22 pages
09 - Cloud-Enabling Technologies - v2
No ratings yet
09 - Cloud-Enabling Technologies - v2
45 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Module-2
No ratings yet
Module-2
104 pages
8.4 NoSQL Database
No ratings yet
8.4 NoSQL Database
36 pages
CAP Theorem
No ratings yet
CAP Theorem
39 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
MODULE 3&5 21&18
No ratings yet
MODULE 3&5 21&18
26 pages
Module-2
No ratings yet
Module-2
100 pages
UNIT 4 CAP MONGODB
No ratings yet
UNIT 4 CAP MONGODB
23 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Chapter_4_3d6b7fe08203468c915d52f43c8757c0_1712934164766
No ratings yet
Chapter_4_3d6b7fe08203468c915d52f43c8757c0_1712934164766
28 pages
M3, C2 CAP Theorem
No ratings yet
M3, C2 CAP Theorem
30 pages
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
Module_1
No ratings yet
Module_1
69 pages
NoSQL
No ratings yet
NoSQL
39 pages
Big Data Storage and Processing
No ratings yet
Big Data Storage and Processing
49 pages
IntroNoSQL (3)
No ratings yet
IntroNoSQL (3)
44 pages
2- NoSQL
No ratings yet
2- NoSQL
32 pages
R23-IDS-Unit3-PPT
No ratings yet
R23-IDS-Unit3-PPT
36 pages
Design & Analysis of Algorithms-16-01-2024
No ratings yet
Design & Analysis of Algorithms-16-01-2024
2 pages
ngd unit 1-4
No ratings yet
ngd unit 1-4
43 pages
Recent Trends - Nosql Database Management
No ratings yet
Recent Trends - Nosql Database Management
26 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
nosql-kk
No ratings yet
nosql-kk
23 pages
MODULE 3
No ratings yet
MODULE 3
37 pages
4 NoSql
No ratings yet
4 NoSql
25 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Big Data Analytics Lecture 3A
No ratings yet
Big Data Analytics Lecture 3A
27 pages
DSM - CAP Theorem
No ratings yet
DSM - CAP Theorem
7 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
Bda Module 3
No ratings yet
Bda Module 3
24 pages
unitw12w2
No ratings yet
unitw12w2
18 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
Kaylee - Spencer Et Al. Maya Imagery, Architecture - Space and Spatial Analysis in Art History 2015 Ori PDF
100% (2)
Kaylee - Spencer Et Al. Maya Imagery, Architecture - Space and Spatial Analysis in Art History 2015 Ori PDF
442 pages
Nosql
No ratings yet
Nosql
12 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Transaction Properties: Acid vs. Base
No ratings yet
Transaction Properties: Acid vs. Base
13 pages
csss_2012_336 -- 3f1e9b267952bdf33a9a202320ee9b09 -- Anna’s Archive
No ratings yet
csss_2012_336 -- 3f1e9b267952bdf33a9a202320ee9b09 -- Anna’s Archive
4 pages
Acid vs Base
No ratings yet
Acid vs Base
13 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Base Properties
No ratings yet
Base Properties
5 pages
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
No ratings yet
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
27 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
CIS Amazon Linux 2 Benchmark v2.0.0
No ratings yet
CIS Amazon Linux 2 Benchmark v2.0.0
608 pages
MDS 271 2448001
No ratings yet
MDS 271 2448001
9 pages
CAP Theorem
No ratings yet
CAP Theorem
15 pages
project-report-admission-management-system
No ratings yet
project-report-admission-management-system
67 pages
Class 10 Employability Skills Chapter.1
No ratings yet
Class 10 Employability Skills Chapter.1
13 pages
The Power of Reading Insights from the Research 2nd Edition Insights from the Research Stephen D. Krashen pdf download
No ratings yet
The Power of Reading Insights from the Research 2nd Edition Insights from the Research Stephen D. Krashen pdf download
52 pages
BDA_answers[1]
No ratings yet
BDA_answers[1]
6 pages
Finders Keepers Chapter 1 PDF Version
No ratings yet
Finders Keepers Chapter 1 PDF Version
2 pages
POETIC DEVICES
No ratings yet
POETIC DEVICES
9 pages
NOSQL
No ratings yet
NOSQL
23 pages
Kiswahili Songs
No ratings yet
Kiswahili Songs
4 pages
Verbal Memory Activities
100% (1)
Verbal Memory Activities
27 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
GeorgiaTech CS-6515: Graduate Algorithms: CS6515 - Exam2 Flashcards by Daniel Barker - Brainscape
No ratings yet
GeorgiaTech CS-6515: Graduate Algorithms: CS6515 - Exam2 Flashcards by Daniel Barker - Brainscape
89 pages
Gramar and Vocabulary
100% (2)
Gramar and Vocabulary
254 pages
Librot Systems of Eq
No ratings yet
Librot Systems of Eq
12 pages
The Adventure of the Musgrave Ritual (1)
No ratings yet
The Adventure of the Musgrave Ritual (1)
3 pages
SNMP Traps Forward From One System To Another
No ratings yet
SNMP Traps Forward From One System To Another
3 pages
Final Written Final Exam For HBP Management Communication Solutions
No ratings yet
Final Written Final Exam For HBP Management Communication Solutions
6 pages
Lec 21 Memory Management Part 3
No ratings yet
Lec 21 Memory Management Part 3
24 pages
Gopinath K Resume
No ratings yet
Gopinath K Resume
1 page
Creative Response Rubric
No ratings yet
Creative Response Rubric
1 page
Stream Tutorial11 Slides Only
No ratings yet
Stream Tutorial11 Slides Only
20 pages
EMCEE SCRIPT GRADUATION-version 1
No ratings yet
EMCEE SCRIPT GRADUATION-version 1
4 pages
Q1 Module 7
No ratings yet
Q1 Module 7
40 pages
Catechetics Reviewer
No ratings yet
Catechetics Reviewer
14 pages
kemh101 (1)
No ratings yet
kemh101 (1)
3 pages
Arabic Poetry
No ratings yet
Arabic Poetry
6 pages
Ety 4103 PDF
No ratings yet
Ety 4103 PDF
2 pages
SQL Script
No ratings yet
SQL Script
10 pages
Mahayana Buddhist Tripitaka in 12 Divisions
No ratings yet
Mahayana Buddhist Tripitaka in 12 Divisions
4 pages
Test 2 English
No ratings yet
Test 2 English
4 pages
Architectural Precedent Analysis-Iiia
No ratings yet
Architectural Precedent Analysis-Iiia
20 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet

Data Engineering Unit 3

Uploaded by

Data Engineering Unit 3

Uploaded by

Data Engineering Unit 3

HBase Distributed Storage Architecture

Types of NoSQL Databases with Examples

o Example: Apache Cassandra

 Use Case: Used in IoT, monitoring systems, and real-time analytics.

 Use Case: Used in full-text search applications, logging, and analytics.

10. Geospatial Databases

o Example: PostGIS (an extension of PostgreSQL)

CAP theorem https://www.geeksforgeeks.org/the-cap-theorem-in-dbms/

Key Insight of the CAP Theorem

Eventual Consistency and CAP Theorem

In other words, under eventual consistency:

BASE vs. ACID

You might also like