0% found this document useful (0 votes)

20 views4 pages

Module 5

Uploaded by

sonia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views4 pages

Module 5

Uploaded by

sonia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

1. Compare relational databases with NoSQL databases.

Relational Database NoSQL

It is used to handle data coming in high
It is used to handle data coming in low velocity.
velocity.
It gives only read scalability. It gives both read and write scalability.
It manages structured data. It manages all type of data.
Data arrives from one or few locations. Data arrives from many locations.
It supports complex transactions. It supports simple transactions.
It has single point of failure. No single point of failure.
It handles data in less volume. It handles data in high volume.
Transactions written in one location. Transactions written in many locations.
support ACID properties compliance doesn’t support ACID properties
Its difficult to make changes in database once it is Enables easy and frequent changes to
defined database
schema is mandatory to store the data schema design is not required
Deployed in vertical fashion. Deployed in Horizontal fashion.
2. What is the purpose of sharding? What is the difference between replication and
sharding?

Ans. Sharding is a technique of providing horizontal scalability, which allows different sites
to have different types of data. This scalability helps in reducing the work load of servers.
Replication is just a process of copying the same data across different sites while sharding is
the process of distributing different datasets on different sites. In addition, sharding improves
both read and write performance while replication improves read performance but not write
performance.

3. Explain the CAP theorem.

Ans. In case of distributed databases, the three important aspects of the CAP theorem
are Consistency (C), Availability (A), and Partition tolerance (P). The CAP theorem
states that in any distributed system, we can select only two aspects. Let us discuss
the three aspects of the CAP theorem. The first one refers to the number of nodes that
should respond to a read request before it is considered as a successful operation. The
second is the number of nodes that should respond to a write request before it is
considered a successful operation. The third is the number of nodes where the data is
replicated or copied.

4. Explain the ways in which data can be distributed.

Ans. Data distribution can be performed in the following two ways:
 Through sharding―Sharding is one of the major techniques of data
distribution. It is used to distribute various types of data across multiple servers.
Therefore, each server acts as a single source for a subset of data.
 Through replication―Replication is one of the major techniques for fault
tolerance. The idea is to copy data across multiple servers so that each bit of data can
be found in multiple places. Replication occurs in two forms:
 Master-slave replication, which makes one node the authoritative copy
that handles writes while slaves, which are synchronized with the master, handle
reads.
 Peer-to-peer replication, which allows writes to any node without
requiring authorization. Here, the nodes can coordinate with each other to
synchronize their copies of the data.

5. List the three important aspects of CAP theorem

Consistency (C), Availability (A), and Partition tolerance (P)

6. Demonstrate how the concept of materialized views can be effectively utilized

within NoSQL databases to optimize query performance and support real-time
analytics in a specific use case or application scenario
Materialized views are slightly different from normal views. These are disk based and
update periodically as per the requirements of the query. This is an advantage
because when we query a materialized view, we are basically querying a table, which
can be indexed. Creating materialized views in the form of aggregate tables or copies
of frequently executed queries can speed up the response time. A disadvantage of a
materialized view is that the data obtained from it is updated to the point or date it
was last refreshed. Materialized views are mostly used in BI applications or Big Data,
where query response time is a basic need.

7. List the various functions of Sqoop.

a. Data Import
b. Data Export
c. Parallel Data Transfer:
d. Incremental Data Transfer:
e. Customized Data Mapping:
f. Support for Various Data Sources:
g. Integration with Hadoop Ecosystem:
h. Compression and Serialization:
i. Security:
j. Extensibility:
k. Job Scheduling:
8. Describe some applications of the clustering technique in Mahout.

a. Recommendation Systems
b. Document Classification
c. Customer Segmentation
d. Anomaly Detection
e. Image and Video Analysis
f. Network Analysis
g. Natural Language Processing (NLP)
h. Image Compression
i. Spatial Data Analysis
j. Fraud Detection
k. Data Preprocessing

9. Differentiate Apache Flume and Apache Sqoop with respect to failure handling.
Apache Flume is well-suited for real-time, event-based data streaming with a focus on
robust failure handling and guaranteed event delivery. On the other hand, Apache
Sqoop is designed for batch-oriented data transfer between Hadoop and structured
data stores, with a focus on data integrity but not real-time processing or fine-grained
failure handling. The choice between the two tools depends on the specific
requirements and use cases of the data transfer operation.

10. Name the data model that can be used for social network mining.
The data model commonly used for social network mining is the "Graph Data Model"
or simply "Graph.

11. Correlate reliability and Failure handling in Apache Flume.

Apache Flume, reliability and failure handling are closely intertwined. Flume's design
and features prioritize data reliability by using mechanisms like in-memory channels,
retries, backoff strategies, and durable sinks to manage and mitigate failures, ensuring
that events are transferred to their destination with minimal data loss. The correlation
between these aspects is essential for maintaining data integrity in data streaming and
collection processes.

12. Differentiate replication and sharding.

Sharding is a technique of providing horizontal scalability, which allows different
sites to have different types of data. This scalability helps in reducing the work load
of servers. Replication is just a process of copying the same data across different sites
while sharding is the process of distributing different datasets on different sites. In
addition, sharding improves both read and write performance while replication
improves read performance but not write performance.

13. Illustrate the different types of No-SQL database in detail.

Refer PPT

14. Explain the architecture of Flume in detail.

Refer PPT

15. Explain the architecture of Sqoop in detail.

Refer PPT

16. Illustrate the 3Cs of mahout on the machine learning framework for processing
data.[or] Illustrate Collaborative filtering, Clustering and Classification in
Mahout.
Refer PPT

Overview of NoSQL Data Stores
No ratings yet
Overview of NoSQL Data Stores
6 pages
Bda CHP 3
No ratings yet
Bda CHP 3
8 pages
No SQL Ia-01 - Micro
No ratings yet
No SQL Ia-01 - Micro
6 pages
Big Data 2023
No ratings yet
Big Data 2023
18 pages
No SQL Quiz Questions
No ratings yet
No SQL Quiz Questions
7 pages
Question Bank To Pass in Exam
No ratings yet
Question Bank To Pass in Exam
3 pages
NoSQL Database True/False Quiz
No ratings yet
NoSQL Database True/False Quiz
7 pages
No SQL
No ratings yet
No SQL
49 pages
Overview of NoSQL Database Systems
No ratings yet
Overview of NoSQL Database Systems
9 pages
NoSQL Exam Answer Key
No ratings yet
NoSQL Exam Answer Key
4 pages
Big Data 2020
No ratings yet
Big Data 2020
13 pages
CH 2 BDA
No ratings yet
CH 2 BDA
3 pages
Interview Technical Java Nosql 2025
No ratings yet
Interview Technical Java Nosql 2025
5 pages
Understanding NoSQL Databases and CAP Theorem
No ratings yet
Understanding NoSQL Databases and CAP Theorem
23 pages
NOSQL Databases and Big Data Storage Systems: Shilpa R Assistant Professor Cse, Sdmit Ujire
No ratings yet
NOSQL Databases and Big Data Storage Systems: Shilpa R Assistant Professor Cse, Sdmit Ujire
61 pages
Key NOSQL Features in Distributed Systems
No ratings yet
Key NOSQL Features in Distributed Systems
36 pages
1) Write A Short Note On Nosql
No ratings yet
1) Write A Short Note On Nosql
9 pages
Bda 2
No ratings yet
Bda 2
6 pages
CS22512 QN Bank
No ratings yet
CS22512 QN Bank
3 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
End Semester Paper
No ratings yet
End Semester Paper
1 page
NoSQL Databases Explained
No ratings yet
NoSQL Databases Explained
13 pages
Question Bank BDA CIA 1
No ratings yet
Question Bank BDA CIA 1
5 pages
Big Data Analysis Unit 1-5 Extended
No ratings yet
Big Data Analysis Unit 1-5 Extended
35 pages
DBMS 11
No ratings yet
DBMS 11
13 pages
Big Data
No ratings yet
Big Data
6 pages
Bda IA2
No ratings yet
Bda IA2
12 pages
Module 3 and 4
No ratings yet
Module 3 and 4
11 pages
Understanding CAP Theorem and NoSQL Databases
No ratings yet
Understanding CAP Theorem and NoSQL Databases
7 pages
Nosql
No ratings yet
Nosql
5 pages
Be Sem 7 Ia 1 Question Bank
No ratings yet
Be Sem 7 Ia 1 Question Bank
4 pages
Cse2024 Set A
No ratings yet
Cse2024 Set A
3 pages
TYBSC IT SEM 5 NGT Exam Papers 2019
No ratings yet
TYBSC IT SEM 5 NGT Exam Papers 2019
25 pages
Nosql
No ratings yet
Nosql
64 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
No SQL QuestionBank
No ratings yet
No SQL QuestionBank
2 pages
Date of Exam:25/09/2020: "T3 Examination, Sep 2020."
No ratings yet
Date of Exam:25/09/2020: "T3 Examination, Sep 2020."
6 pages
ADBS Short Questions - Lecture 7
No ratings yet
ADBS Short Questions - Lecture 7
4 pages
Module 2
No ratings yet
Module 2
40 pages
Bda QB
No ratings yet
Bda QB
18 pages
NoSQL Interview Questions
No ratings yet
NoSQL Interview Questions
8 pages
NOSQL
No ratings yet
NOSQL
16 pages
Understanding Big Data and Hadoop Components
No ratings yet
Understanding Big Data and Hadoop Components
18 pages
DD Sem II Answer
No ratings yet
DD Sem II Answer
17 pages
Nosql KK
No ratings yet
Nosql KK
23 pages
UDM Cae-1 Answers
No ratings yet
UDM Cae-1 Answers
14 pages
NoSQL & Graph Databases Explained
No ratings yet
NoSQL & Graph Databases Explained
4 pages
HBase & NoSQL Database Insights
No ratings yet
HBase & NoSQL Database Insights
4 pages
No SQL
No ratings yet
No SQL
14 pages
Nosql 1
No ratings yet
Nosql 1
40 pages
DBMS05
No ratings yet
DBMS05
12 pages
NoSQL Database Features & Architecture
No ratings yet
NoSQL Database Features & Architecture
14 pages
Dbms Unite Test 3 Ans
No ratings yet
Dbms Unite Test 3 Ans
6 pages
Fromat
No ratings yet
Fromat
17 pages
MongoDB User Document Syntax Guide
No ratings yet
MongoDB User Document Syntax Guide
70 pages
Characteristics of NoSQL Databases
No ratings yet
Characteristics of NoSQL Databases
35 pages
Big Data Visualization
No ratings yet
Big Data Visualization
55 pages
01PL082009004060030
No ratings yet
01PL082009004060030
18 pages
MapReduce and SQL in Big Data Analytics
No ratings yet
MapReduce and SQL in Big Data Analytics
13 pages
Module 4 - Yarn
No ratings yet
Module 4 - Yarn
34 pages
Module 4 - Yarn Schedulers
No ratings yet
Module 4 - Yarn Schedulers
21 pages
Apache Flume Architecture Overview
No ratings yet
Apache Flume Architecture Overview
23 pages
Mahout Interview Questions Overview
No ratings yet
Mahout Interview Questions Overview
20 pages
Big Data Lecture
No ratings yet
Big Data Lecture
49 pages
Snowpro-Core 9
No ratings yet
Snowpro-Core 9
27 pages
Relational Database Management Systems Comparison
No ratings yet
Relational Database Management Systems Comparison
47 pages
Index, View, Sequence, Synonym Theory
No ratings yet
Index, View, Sequence, Synonym Theory
5 pages
Oracle 19c - Complete Checklist For Upgrading To Oracle Database 19c (19.x) Using DBUA - 2545064.1
No ratings yet
Oracle 19c - Complete Checklist For Upgrading To Oracle Database 19c (19.x) Using DBUA - 2545064.1
23 pages
Oracle 8I Introduced Materialized Views. Generally
No ratings yet
Oracle 8I Introduced Materialized Views. Generally
4 pages
SQL Role Privileges Overview
No ratings yet
SQL Role Privileges Overview
411 pages
Parallel Refresh of Materialized Views
No ratings yet
Parallel Refresh of Materialized Views
4 pages
Automated Selection of Materialized Views and Indexes For SQL Databases
No ratings yet
Automated Selection of Materialized Views and Indexes For SQL Databases
10 pages
Databricks Certified Data Analyst Associate Oct 2025
No ratings yet
Databricks Certified Data Analyst Associate Oct 2025
9 pages
ORACLE-BASE - Materialized Views in Oracle
No ratings yet
ORACLE-BASE - Materialized Views in Oracle
5 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
Oracle Exadata X5 Admin Exam Guide
No ratings yet
Oracle Exadata X5 Admin Exam Guide
19 pages
Snowflake Prctice1
100% (1)
Snowflake Prctice1
51 pages
Oracle Privileges Full List
No ratings yet
Oracle Privileges Full List
12 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Ebook Guide To Building Etl Pipelines SQL
No ratings yet
Ebook Guide To Building Etl Pipelines SQL
49 pages
Database Assignment Guide
No ratings yet
Database Assignment Guide
7 pages
Document 2539778.1Upgradeto19C NON CDB
No ratings yet
Document 2539778.1Upgradeto19C NON CDB
15 pages
Associate Data Practitioner 1
No ratings yet
Associate Data Practitioner 1
35 pages
Practice Ques 201 300
No ratings yet
Practice Ques 201 300
20 pages
Chapter - 3 - OLAP and TLAP
No ratings yet
Chapter - 3 - OLAP and TLAP
10 pages
GCP Mock Interview Answers Combined
No ratings yet
GCP Mock Interview Answers Combined
10 pages
SQL Data Wrangling Techniques Guide
100% (1)
SQL Data Wrangling Techniques Guide
85 pages
Usaspending DB Setup
No ratings yet
Usaspending DB Setup
5 pages
Queries
No ratings yet
Queries
153 pages
Spring Boot Microservices Interview Full
No ratings yet
Spring Boot Microservices Interview Full
28 pages
Lab Manual For DBA
No ratings yet
Lab Manual For DBA
7 pages
Advanced SQL Techniques Guide
No ratings yet
Advanced SQL Techniques Guide
48 pages
Data Ware Housing Assignment # 3: Submitted By: Rao Waqas Akram FA16-BSE-077
No ratings yet
Data Ware Housing Assignment # 3: Submitted By: Rao Waqas Akram FA16-BSE-077
4 pages
Altibase 7.1.0 GettingStarted Eng PDF
No ratings yet
Altibase 7.1.0 GettingStarted Eng PDF
84 pages

Module 5

Uploaded by

Module 5

Uploaded by

1. Compare relational databases with NoSQL databases.

Relational Database NoSQL

3. Explain the CAP theorem.

4. Explain the ways in which data can be distributed.

5. List the three important aspects of CAP theorem

6. Demonstrate how the concept of materialized views can be effectively utilized

7. List the various functions of Sqoop.

11. Correlate reliability and Failure handling in Apache Flume.

12. Differentiate replication and sharding.

13. Illustrate the different types of No-SQL database in detail.

14. Explain the architecture of Flume in detail.

15. Explain the architecture of Sqoop in detail.

You might also like