BD Unit 6
BD Unit 6
Functions:
• Batch Processing: Spark processes large-scale data in batches, using its distributed
and parallel architecture to handle extensive datasets efficiently.
• Real-Time Processing: Spark Streaming processes real-time data streams, making
it ideal for applications requiring live data processing, like fraud detection and social
media analytics.
• Machine Learning and Graph Analysis: Through MLlib and GraphX, Spark enables
complex analytical tasks, including predictive analytics and graph-based
computations, to be performed at scale.
Q4. Difference between Hbase and RDBMS.
Ans:
SR Aspect HBase RDMS
1 Data Model Column-oriented storage Row-oriented storage
2 Data Access Row-key based, supports random Primary and foreign key-based access
access
3 Data Volume Ideal for large, sparse datasets Optimized for structured, smaller
datasets
4 Schema Schema-less, flexible column Fixed schema with tables and columns
Flexibility families
5 Scaling Horizontally scalable across Mostly vertically scalable
distributed servers
6 Query NoSQL; no SQL support SQL-based query support
Language
7 ACID Not fully ACID compliant Fully ACID compliant
Compliance
8 Read/Write Fast reads/writes for large data Moderate speed with structured data
Speed sets
9 Transaction Limited transaction support Strong transaction support
10 Use Cases Real-time analytics, large OLTP systems, structured data
datasets processing
Acronym: DS QAR TU