0% found this document useful (0 votes)

9 views8 pages

BDA Ass 3

The document discusses various aspects of data processing frameworks, focusing on MapReduce and Apache Spark, highlighting their differences in handling iterative and interactive operations. It covers NoSQL databases, their types, and comparisons with SQL databases, as well as the features and architecture of Apache Hive, Pig, and HBase. Additionally, it addresses the role of Apache ZooKeeper, Big Data streaming, and Apache Kafka in managing and processing large-scale data.

Uploaded by

harshshah72004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

BDA Ass 3

Uploaded by

harshshah72004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Harsh Shah TY-IT-1-A 220410116025

1) Describe iterative and interactive operations on MapReduce and Spark RDD.

MapReduce is suitable for batch processing but struggles with iterative and interactive
computations. Iterative operations require multiple passes over the same data, such as
machine learning algorithms, where intermediate results must be stored and read from disk
repeatedly. This leads to inefficiency in MapReduce. Interactive operations, which allow
users to explore data and receive real-time feedback, are also not well-supported in
MapReduce due to its high latency. Apache Spark, with its in-memory data storage using
RDDs (Resilient Distributed Datasets), supports both iterative and interactive processing
efficiently. RDDs cache data in memory, avoiding redundant disk I/O, and enable low-latency
computations.
2) Describe important features of Apache Spark. Also explain transformations and actions
in Spark.
Apache Spark is an in-memory distributed computing framework known for its speed, ease
of use, and flexibility. It supports batch processing, real-time analytics, machine learning,
and graph processing. Key features include fault-tolerant RDDs, support for multiple
languages (Java, Scala, Python), and integration with Hadoop and various data sources.
Transformations are lazy operations that define a new RDD from an existing one, such as
map(), filter(), and flatMap(). Actions trigger execution and return results, such as collect(),
count(), and reduce(). This separation allows Spark to optimize execution plans before
processing.
3) What is NoSQL? Differentiate NoSQL with SQL.
NoSQL (Not Only SQL) databases are non-relational databases designed for large-scale data
storage and real-time web applications. Unlike SQL (relational) databases that store
structured data in tables with predefined schemas, NoSQL supports flexible schema designs
and stores data in forms like key-value pairs, documents, graphs, or wide-columns. NoSQL
provides high scalability, availability, and performance for unstructured or semi-structured
data. SQL databases are better suited for complex transactions and consistency, whereas
NoSQL is optimized for scalability and handling diverse data types.
Differences between NoSQL and SQL:

Feature SQL (Relational DB) NoSQL (Non-relational DB)

Data Model Tables with rows and columns Key-Value, Document, Column, or Graph

Schema Fixed schema Dynamic schema

Scalability Vertical scaling Horizontal scaling

Transactions Supports ACID May follow BASE (eventual consistency)

Query
SQL Varies by database (MongoQL, CQL, etc.)
Language

Use Case Structured data with Large-scale, semi-structured/unstructured

1
Harsh Shah TY-IT-1-A 220410116025

relationships data

NoSQL (Not Only SQL) databases are non-relational databases designed for large-scale data
storage and real-time web applications. Unlike SQL (relational) databases that store
structured data in tables with predefined schemas, NoSQL supports flexible schema designs
and stores data in forms like key-value pairs, documents, graphs, or wide-columns. NoSQL
provides high scalability, availability, and performance for unstructured or semi-structured
data. SQL databases are better suited for complex transactions and consistency, whereas
NoSQL is optimized for scalability and handling diverse data types.
4) Categorize types of NoSQL databases with examples.
NoSQL databases are categorized into four main types:
1. Key-Value Stores: Store data as key-value pairs (e.g., Redis, Riak).
2. Document Stores: Store semi-structured data in JSON/XML documents (e.g.,
MongoDB, CouchDB).
3. Column-Family Stores: Store data in columns instead of rows (e.g., Apache
Cassandra, HBase).
4. Graph Databases: Store relationships between entities using graph structures (e.g.,
Neo4j, OrientDB). Each type is optimized for specific use cases and offers flexible
scalability.
5) Explain important components of Spark with necessary diagram.
Apache Spark's architecture includes the following components:
 Driver Program: Coordinates all activities and initiates the SparkContext.
 Cluster Manager: Allocates resources (e.g., YARN, Mesos).
 Executors: Run computations and store data for applications.
 Tasks: Units of work executed on each partition.
[Diagram: Spark Architecture]
Driver Program --> Cluster Manager --> Executors --> Tasks
This architecture enables distributed data processing with in-memory performance and fault
tolerance.
6) What is MongoDB? Explain important features.
MongoDB is a NoSQL, document-oriented database that stores data in flexible, JSON-like
documents. It supports dynamic schemas, allowing different documents in a collection to
have different structures. Key features include high scalability, indexing, replication, and
sharding for horizontal scaling. MongoDB allows developers to store and retrieve complex
data types efficiently and supports aggregation, ad hoc queries, and rich indexing.

2
Harsh Shah TY-IT-1-A 220410116025

7) Differentiate MongoDB with RDBMS. Compare advantages and drawbacks.

MongoDB differs from RDBMS in data model, schema, and performance. While RDBMS uses
tables with fixed schemas, MongoDB stores data in collections of dynamic JSON-like
documents. RDBMS is ideal for transactions and complex queries, whereas MongoDB is
suited for scalability and agile development. MongoDB excels in handling large volumes of
unstructured data, though it may lack strict ACID compliance. RDBMS offers strong
consistency but struggles with horizontal scalability.
8) Mention advantages of using NoSQL databases.
NoSQL databases provide several advantages such as flexible schema design, horizontal
scalability, high performance for large data volumes, and ease of replication. They are ideal
for applications requiring fast access to unstructured or semi-structured data, real-time
analytics, and distributed environments. NoSQL also supports agile development by allowing
quick changes to data structures.
9) MongoDB Terms: Database, Collection, Document, Datatypes
In MongoDB, a Database is a container for collections. A Collection is a group of documents,
similar to a table in RDBMS. A Document is a JSON-like data structure that contains fields
and values. MongoDB supports data types like strings, numbers, arrays, objects, dates, and
binary data, allowing flexibility in data modeling.
10) MongoDB CRUD operations with syntax
 Create Database: use myDatabase
 Drop Database: db.dropDatabase()
 Create Collection: db.createCollection("users")
 Insert Document: db.users.insert({name: "John", age: 30})
 Find Document: db.users.find({name: "John"})
 Update Document: db.users.update({name: "John"}, {$set: {age: 31}})
 Delete Document: db.users.remove({name: "John"})
11) What is RDD? Explain RDD operations in detail.
RDD (Resilient Distributed Dataset) is Spark's fundamental data structure, representing an
immutable, distributed collection of objects partitioned across nodes. RDDs are fault-
tolerant and support parallel processing. Operations on RDDs are categorized into
transformations (e.g., map, filter, flatMap) and actions (e.g., collect, count, reduce).
Transformations are lazy and build a lineage graph, which is recomputed in case of failures.
Actions trigger the computation.
12) Why RDD is better than MapReduce data storage?

3
Harsh Shah TY-IT-1-A 220410116025

RDDs outperform MapReduce by enabling in-memory computation, reducing the overhead

of writing intermediate data to disk. RDDs support iterative algorithms efficiently and
provide fault tolerance through lineage, making them ideal for machine learning and real-
time data processing. In contrast, MapReduce writes data to HDFS after each operation,
resulting in higher latency.
13) Justify: “SPARK is faster than MapReduce”
Spark is significantly faster than MapReduce due to its in-memory processing, DAG-based
execution engine, and support for advanced analytics. While MapReduce stores
intermediate results on disk after each job, Spark keeps data in memory using RDDs,
drastically reducing I/O operations. This architectural difference makes Spark up to 100 times
faster in certain use cases, particularly for iterative algorithms.
14) Word Count program in Scala using Spark
val input = sc.textFile("input.txt")
val words = input.flatMap(line => line.split(" "))
val wordCounts = words.map(word => (word, 1)).reduceByKey(_ + _)
wordCounts.collect().foreach(println)
This simple program splits input text into words, maps each word to a count of 1, and
aggregates them using reduceByKey.
15) "Moving Computation is Cheaper than Moving Data" – Justify
In distributed systems, moving large volumes of data across the network is costly in terms of
time and resources. Thus, it is more efficient to move computation to where the data
resides, reducing latency and bandwidth usage. Hadoop and Spark follow this principle by
running tasks on nodes that store the required data blocks. This data locality enhances
system performance and scalability.

) Mention usefulness of Pig. What are key features of Pig?

Apache Pig is a high-level platform for processing large data sets. It uses a scripting language
called Pig Latin, which simplifies the development of MapReduce programs. Pig is useful
because it abstracts the complexities of writing low-level MapReduce code and allows for
faster development and prototyping. Key features include ease of use, support for both
structured and semi-structured data, fault tolerance, and extensibility. Pig is ideal for ETL
(Extract, Transform, Load) operations, data preparation, and research analysis.

4
Harsh Shah TY-IT-1-A 220410116025

2) Explain components of Hive architecture. Also describe working of Hive with suitable
diagram.
Apache Hive is a data warehouse infrastructure built on top of Hadoop that provides data
summarization, query, and analysis using HiveQL, a SQL-like language. The main components
of Hive architecture include:
 Metastore: Stores metadata about tables and partitions.
 Driver: Manages the lifecycle of HiveQL statements.
 Compiler: Translates HiveQL into execution plans.
 Execution Engine: Executes the plans using Hadoop or Spark.
 Hive Server: Accepts client connections and handles queries.
[Diagram: Hive Architecture]
Clients --> Hive Server --> Compiler --> Execution Engine --> Hadoop/YARN
|
--> Metastore
Hive processes user queries by converting them into DAGs of MapReduce jobs, which are
executed in the Hadoop ecosystem.

3) Differentiate:
(i) Pig vs. MapReduce

Feature Pig MapReduce

Language Pig Latin Java (programming required)

Ease of Use High-level, user-friendly scripting Low-level programming complexity

Development Speed Faster Slower

Data Types Supports complex and nested types Mostly primitive types

Execution Engine Converts scripts to MapReduce Native MapReduce

(ii) HDFS vs. HBase

Feature HDFS HBase

Data Model File system for batch processing NoSQL database (column-oriented)

Access Pattern Sequential Random read/write

Latency High (not ideal for real-time) Low (supports real-time)

5
Harsh Shah TY-IT-1-A 220410116025

Schema Schema-less Flexible schema

Integration Works with Hive, Pig, MapReduce Works with Spark, MapReduce

4) What is role of Zookeeper? How it helps in monitoring a cluster?

Apache ZooKeeper is a centralized service for maintaining configuration information,
naming, and providing distributed synchronization. In a Hadoop ecosystem, it helps in
managing and coordinating distributed components. ZooKeeper ensures that various nodes
in a cluster can work together in a synchronized manner. It maintains a hierarchical tree of
nodes (znodes) which store data and provide a way to coordinate distributed processes. It is
essential in managing failover for Hadoop components like HBase, HDFS, and YARN.

5) Data model and implementation of HBase

HBase is a distributed, column-oriented database built on top of HDFS. Its data model
resembles Google's BigTable and includes tables, rows, column families, and columns. Each
row has a unique row key, and columns are grouped into families. HBase stores data in
HFiles and uses a Write-Ahead Log (WAL) for durability. RegionServers manage regions
(subsets of tables), and the HBase Master oversees load balancing and failover. HBase is
ideal for sparse datasets and supports real-time read/write access.

6) HiveQL data manipulation queries in detail

HiveQL supports data manipulation operations like INSERT, UPDATE, DELETE, and SELECT. For
example:
 INSERT INTO table_name VALUES (...) adds new records.
 UPDATE table_name SET column=value WHERE condition modifies existing records.
 DELETE FROM table_name WHERE condition removes records.
 SELECT column FROM table WHERE condition queries data.
Hive transforms these SQL-like statements into MapReduce jobs, enabling large-scale data
processing over Hadoop.

7) What is HBase? Write a query to create a table in HBase.

HBase is a distributed NoSQL database that stores structured and semi-structured data in a
fault-tolerant way. It supports random access to large datasets and is suitable for sparse
tables. To create a table in HBase:
create 'students', 'personal', 'academic'

6
Harsh Shah TY-IT-1-A 220410116025

This creates a table named 'students' with two column families: 'personal' and 'academic'.

8) Draw architecture of Apache Pig and explain in short.

Apache Pig architecture includes:
 Pig Latin Scripts: Input by the user.
 Parser: Checks syntax and generates logical plans.
 Optimizer: Optimizes execution plans.
 Compiler: Converts plans into MapReduce jobs.
 Execution Engine: Executes jobs on Hadoop.
[Diagram: Pig Architecture]
Pig Latin Script --> Parser --> Logical Plan --> Optimizer --> Physical Plan --> MapReduce -->
HDFS

9) Benefits of Zookeeper, znodes and their types

ZooKeeper benefits include coordination, configuration management, and fault tolerance. A
znode is a data node in ZooKeeper’s hierarchy. There are three types:
 Persistent: Remain after client disconnects.
 Ephemeral: Deleted once the client session ends.
 Sequential: Unique, numbered znodes.

10) RDBMS vs. HBase; HiveQL Data Definition Language (DDL)

Feature RDBMS HBase

Data Model Relational (tables, rows) Column-oriented NoSQL

Schema Fixed schema Schema-less, flexible

Transactions Full ACID support Limited ACID

Scalability Vertical Horizontal

HiveQL DDL defines structure. Examples:

 CREATE TABLE students (id INT, name STRING)
 ALTER TABLE students ADD COLUMNS (age INT)
 DROP TABLE students removes the table structure.

7
Harsh Shah TY-IT-1-A 220410116025

11) What is Big Data streaming? Stream data architecture

Big Data Streaming involves processing continuous flows of data in real-time. Examples
include logs, sensor data, or social media feeds. A typical stream data architecture consists
of:
 Data Sources: Devices or applications generating data.
 Stream Processing Engine: e.g., Apache Storm, Spark Streaming.
 Data Storage: HDFS, HBase.
 Data Sink: Dashboards or databases for analysis.

12) Write about Apache Kafka

Apache Kafka is a distributed messaging system used for building real-time data pipelines
and streaming applications. It uses a publish-subscribe model and is designed for high
throughput, fault tolerance, and durability. Kafka stores messages in topics and distributes
them across brokers. Producers send messages to topics, and consumers read them
asynchronously. Kafka is widely used for real-time analytics, log aggregation, and event
sourcing.

1) Write A Short Note On Nosql
No ratings yet
1) Write A Short Note On Nosql
9 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
PySpark Comprehensive Notes
No ratings yet
PySpark Comprehensive Notes
59 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Mongo DB
No ratings yet
Mongo DB
33 pages
Comparison Between NoSQL and RDBMS
No ratings yet
Comparison Between NoSQL and RDBMS
6 pages
Unit-I Remaining HM
No ratings yet
Unit-I Remaining HM
32 pages
NoSQL Database Comprehensive Report
No ratings yet
NoSQL Database Comprehensive Report
75 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
Wa0001.
No ratings yet
Wa0001.
34 pages
Nosql
No ratings yet
Nosql
64 pages
Unit 6
No ratings yet
Unit 6
143 pages
NOSQL Database
No ratings yet
NOSQL Database
6 pages
21 Mca 2326 Researchpaper
No ratings yet
21 Mca 2326 Researchpaper
14 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
Unit II
No ratings yet
Unit II
31 pages
NoSQL Database
No ratings yet
NoSQL Database
45 pages
SQL and NoSQL Databases
No ratings yet
SQL and NoSQL Databases
9 pages
Unit 1 Mangodb
No ratings yet
Unit 1 Mangodb
57 pages
IJCRT2307237
No ratings yet
IJCRT2307237
7 pages
NOSQL
No ratings yet
NOSQL
50 pages
BDA Ass 3
No ratings yet
BDA Ass 3
8 pages
Chapter 5: No SQL Data Management and Mongodb: Unit-2
No ratings yet
Chapter 5: No SQL Data Management and Mongodb: Unit-2
65 pages
AWS Data Engineering Services
No ratings yet
AWS Data Engineering Services
24 pages
Big Data in Healthcare
100% (1)
Big Data in Healthcare
33 pages
ADBMS Original-Output
No ratings yet
ADBMS Original-Output
28 pages
BDA Unit3
No ratings yet
BDA Unit3
23 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
NGT Paper
No ratings yet
NGT Paper
25 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
NoSQL, Cloud Computing, and IOT
No ratings yet
NoSQL, Cloud Computing, and IOT
3 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
NOSQL Data Management
No ratings yet
NOSQL Data Management
21 pages
Module 1
No ratings yet
Module 1
34 pages
Benjamin Reyes Cabalona JR.: Okada Manila
No ratings yet
Benjamin Reyes Cabalona JR.: Okada Manila
1 page
Tarea 8
0% (2)
Tarea 8
13 pages
DP-900 Dump
67% (6)
DP-900 Dump
64 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
No SQL
No ratings yet
No SQL
109 pages
Nonsql-Database Note
No ratings yet
Nonsql-Database Note
24 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Unit 4
No ratings yet
Unit 4
36 pages
No SQL - Types, CAP Theorem
No ratings yet
No SQL - Types, CAP Theorem
12 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Advanced Analytics With Pyspark 1st Edition Akash Tandon
No ratings yet
Advanced Analytics With Pyspark 1st Edition Akash Tandon
50 pages
Unit-5 Spark
No ratings yet
Unit-5 Spark
24 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
UNIT II First Half Notes
No ratings yet
UNIT II First Half Notes
21 pages
NOSQL
No ratings yet
NOSQL
25 pages
Azure Applied AI Services
No ratings yet
Azure Applied AI Services
3 pages
NOSQL Concept 2
No ratings yet
NOSQL Concept 2
4 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
28 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Nosql Databases
No ratings yet
Nosql Databases
2 pages
Data and Analytics Syllabus
No ratings yet
Data and Analytics Syllabus
4 pages
Amazon EMR Serverless Architecture and Use Cases
No ratings yet
Amazon EMR Serverless Architecture and Use Cases
6 pages
Azure Data Engineer + Databricks Content
No ratings yet
Azure Data Engineer + Databricks Content
7 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
1 page
Distributed Machine Learning With PySpark
100% (3)
Distributed Machine Learning With PySpark
830 pages
BIgData and Hadoop Ecosytem
No ratings yet
BIgData and Hadoop Ecosytem
8 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
Unit 3
No ratings yet
Unit 3
10 pages
NoSQL Lecture Notes Compilation
No ratings yet
NoSQL Lecture Notes Compilation
5 pages
PySpark Slides
No ratings yet
PySpark Slides
30 pages
PGD DS Brochure
No ratings yet
PGD DS Brochure
9 pages
Big Data Machine Learning Using Apache Spark Mllib: December 2017
No ratings yet
Big Data Machine Learning Using Apache Spark Mllib: December 2017
8 pages
Plan of Mata Elang Stable Development
No ratings yet
Plan of Mata Elang Stable Development
11 pages
DP-203 Exam - Free Actual Q&as, Page 3 - ExamTopics
No ratings yet
DP-203 Exam - Free Actual Q&as, Page 3 - ExamTopics
10 pages
Introduction To Big Data Platforms
No ratings yet
Introduction To Big Data Platforms
11 pages
Sem 6 Viva
No ratings yet
Sem 6 Viva
5 pages
10 Apexscv Digital Data Science Luqman Shamsudin
No ratings yet
10 Apexscv Digital Data Science Luqman Shamsudin
2 pages
DP 203T00A ENU PowerPoint - 01
No ratings yet
DP 203T00A ENU PowerPoint - 01
20 pages
Spark Cds 3
No ratings yet
Spark Cds 3
37 pages
MapReduce Quora
No ratings yet
MapReduce Quora
39 pages
Cassandra
No ratings yet
Cassandra
10 pages
Shamee K Sharma - IR
No ratings yet
Shamee K Sharma - IR
11 pages
Serverless Etl Aws Glue
No ratings yet
Serverless Etl Aws Glue
17 pages
My Resume Nov
No ratings yet
My Resume Nov
1 page
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet

BDA Ass 3

Uploaded by

BDA Ass 3

Uploaded by

Harsh Shah TY-IT-1-A 220410116025

1) Describe iterative and interactive operations on MapReduce and Spark RDD.

Feature SQL (Relational DB) NoSQL (Non-relational DB)

Schema Fixed schema Dynamic schema

Scalability Vertical scaling Horizontal scaling

Transactions Supports ACID May follow BASE (eventual consistency)

Use Case Structured data with Large-scale, semi-structured/unstructured

7) Differentiate MongoDB with RDBMS. Compare advantages and drawbacks.

RDDs outperform MapReduce by enabling in-memory computation, reducing the overhead

) Mention usefulness of Pig. What are key features of Pig?

Feature Pig MapReduce

Language Pig Latin Java (programming required)

Ease of Use High-level, user-friendly scripting Low-level programming complexity

Development Speed Faster Slower

Execution Engine Converts scripts to MapReduce Native MapReduce

(ii) HDFS vs. HBase

Feature HDFS HBase

Access Pattern Sequential Random read/write

Latency High (not ideal for real-time) Low (supports real-time)

Schema Schema-less Flexible schema

4) What is role of Zookeeper? How it helps in monitoring a cluster?

5) Data model and implementation of HBase

6) HiveQL data manipulation queries in detail

7) What is HBase? Write a query to create a table in HBase.

8) Draw architecture of Apache Pig and explain in short.

9) Benefits of Zookeeper, znodes and their types

10) RDBMS vs. HBase; HiveQL Data Definition Language (DDL)

Feature RDBMS HBase

Data Model Relational (tables, rows) Column-oriented NoSQL

Schema Fixed schema Schema-less, flexible

Transactions Full ACID support Limited ACID

Scalability Vertical Horizontal

HiveQL DDL defines structure. Examples:

11) What is Big Data streaming? Stream data architecture

12) Write about Apache Kafka

You might also like