BDA Ass 3
BDA Ass 3
Data Model Tables with rows and columns Key-Value, Document, Column, or Graph
Query
SQL Varies by database (MongoQL, CQL, etc.)
Language
1
Harsh Shah TY-IT-1-A 220410116025
relationships data
NoSQL (Not Only SQL) databases are non-relational databases designed for large-scale data
storage and real-time web applications. Unlike SQL (relational) databases that store
structured data in tables with predefined schemas, NoSQL supports flexible schema designs
and stores data in forms like key-value pairs, documents, graphs, or wide-columns. NoSQL
provides high scalability, availability, and performance for unstructured or semi-structured
data. SQL databases are better suited for complex transactions and consistency, whereas
NoSQL is optimized for scalability and handling diverse data types.
4) Categorize types of NoSQL databases with examples.
NoSQL databases are categorized into four main types:
1. Key-Value Stores: Store data as key-value pairs (e.g., Redis, Riak).
2. Document Stores: Store semi-structured data in JSON/XML documents (e.g.,
MongoDB, CouchDB).
3. Column-Family Stores: Store data in columns instead of rows (e.g., Apache
Cassandra, HBase).
4. Graph Databases: Store relationships between entities using graph structures (e.g.,
Neo4j, OrientDB). Each type is optimized for specific use cases and offers flexible
scalability.
5) Explain important components of Spark with necessary diagram.
Apache Spark's architecture includes the following components:
Driver Program: Coordinates all activities and initiates the SparkContext.
Cluster Manager: Allocates resources (e.g., YARN, Mesos).
Executors: Run computations and store data for applications.
Tasks: Units of work executed on each partition.
[Diagram: Spark Architecture]
Driver Program --> Cluster Manager --> Executors --> Tasks
This architecture enables distributed data processing with in-memory performance and fault
tolerance.
6) What is MongoDB? Explain important features.
MongoDB is a NoSQL, document-oriented database that stores data in flexible, JSON-like
documents. It supports dynamic schemas, allowing different documents in a collection to
have different structures. Key features include high scalability, indexing, replication, and
sharding for horizontal scaling. MongoDB allows developers to store and retrieve complex
data types efficiently and supports aggregation, ad hoc queries, and rich indexing.
2
Harsh Shah TY-IT-1-A 220410116025
3
Harsh Shah TY-IT-1-A 220410116025
4
Harsh Shah TY-IT-1-A 220410116025
2) Explain components of Hive architecture. Also describe working of Hive with suitable
diagram.
Apache Hive is a data warehouse infrastructure built on top of Hadoop that provides data
summarization, query, and analysis using HiveQL, a SQL-like language. The main components
of Hive architecture include:
Metastore: Stores metadata about tables and partitions.
Driver: Manages the lifecycle of HiveQL statements.
Compiler: Translates HiveQL into execution plans.
Execution Engine: Executes the plans using Hadoop or Spark.
Hive Server: Accepts client connections and handles queries.
[Diagram: Hive Architecture]
Clients --> Hive Server --> Compiler --> Execution Engine --> Hadoop/YARN
|
--> Metastore
Hive processes user queries by converting them into DAGs of MapReduce jobs, which are
executed in the Hadoop ecosystem.
3) Differentiate:
(i) Pig vs. MapReduce
Data Types Supports complex and nested types Mostly primitive types
Data Model File system for batch processing NoSQL database (column-oriented)
5
Harsh Shah TY-IT-1-A 220410116025
Integration Works with Hive, Pig, MapReduce Works with Spark, MapReduce
6
Harsh Shah TY-IT-1-A 220410116025
This creates a table named 'students' with two column families: 'personal' and 'academic'.
7
Harsh Shah TY-IT-1-A 220410116025