Bda Assignment-1
Bda Assignment-1
ASSIGNMENT - 1
Instructions:
● Write your solutions in file pages only (Use both the sides of page)
● Write programming solution (code ) with output ( You can use Databicks’community
edition)
Questions :
1. Explain the key characteristics that make Apache Cassandra a NoSQL database
management system. Compare and contrast these characteristics with those of
traditional relational databases. Provide examples to illustrate your points.
2. Imagine you are designing a database system for a social media platform that needs to
handle a massive amount of user data, including profiles, posts, and messages. Why
might you choose Apache Cassandra as the database solution for this project?
Describe how you would model the data in Cassandra to efficiently handle the
requirements of such a system. Highlight the key considerations and advantages of
using Cassandra in this scenario.
3. You are tasked with building a data processing system for a real-time e-commerce
platform that needs to analyze customer behavior and generate personalized
recommendations. Explain the advantages and disadvantages of using both data
streaming and batch processing approaches for this scenario. Additionally, propose a
hybrid solution that combines elements of both streaming and batch processing to
optimize the recommendation engine's performance. Justify your choice of the hybrid
approach and outline the key components and considerations involved in its
implementation.
<product_id>,<review_text>,<rating>
Your goal is to use Apache Spark RDDs in Python to perform the following tasks:
Calculate the average rating for each product. Identify the product with the highest
average rating.
Write a Python script using Apache Spark RDDs to accomplish these tasks. Your
script should read the dataset, perform the calculations, and print the results in the
following format:
Note :To help you get started, you can use the following Spark RDD operations:
9. Explain narrow and wide dependency in Apache Spark with sample data.