0% found this document useful (0 votes)
31 views22 pages

Big Data Project-2 Report

Uploaded by

viplaviwade21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views22 pages

Big Data Project-2 Report

Uploaded by

viplaviwade21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Big Data Analytics Project-2

Name: Viplavi Wade


Student Id: 13922741
Subject: Big Data Analytics
Course: Msc Advanced Computing
Date: 15 July 2024

GitHub Repository Link: https://github.com/ViplaviWade/Big-Data-Project-2

Part-1: Entity Relationship Model

Entities and Attributes:

1. User table
user_id int PRIMARY KEY
username varchar(30)
email varchar(30)
password varchar(30)
city varchar(30)
country varchar(30)
age int
subscription_type FOREIGN KEY REFERENCES subscription.subscription_id

2. Subscription Table
subscription_id int PRIMARY KEY
subscription_type varchar(30)
price int
duration int

3. Movie
movie_id int PRIMARY KEY
title varchar(30)
genre varchar(30)
release_date date

4. Actor
actor_id int PRIMARY KEY
actor_name varchar(30)
city varchar(30)
dob date
5. Review
review_id int PRRIMARY KEY
user_id FOREIGN KEY REFERENCES user.user_id
movie_id FOREIGN KEY REFERENCES movie.movie_id
score int
review_comment varchar(50)

6. FavouriteMovie
Favourite_id int PRRIMARY KEY
user_id FOREIGN KEY REFERENCES user.user_id
movie_id FOREIGN KEY REFERENCES movie.movie_id
score int FOREIGN KEY REFERENCES review.score

7. MovieActor

movie_actor_id int PRIMARY KEY


movie_id FOREIGN KEY REFERENCES movie.movie_id
actor_id FOREIGN KEY REFERENCES actor. actor _id
role varchar(30)

8. WatchHistory

Watch_id int PRIMARY KEY


user_id FOREIGN KEY REFERENCES user.user_id
movie_id FOREIGN KEY REFERENCES movie.movie_id
watch_date date

Part-2:

Task-1: Entity Relationship Diagram

Database name: mininetdb


Task-2: Scripts for creating database:

1. user table:

create table user(


user_id int primary key,
username varchar(30),
email varchar(30),
password varchar(14),
city varchar(30),
country varchar(30),
age int,
subscription_type int,
foreign key (subscription_type) references
subscription(subscription_id)
);
2. actor table

create table actor(


actor_id int primary key,
actor_name varchar(30),
city varchar(30),
dob date
);

3. movie table

create table movie(


movie_id int primary key,
title varchar(30),
genre varchar(30),
release_date date
);

4. subscription table

create table subscription(


subscription_id int primary key,
subscription_type varchar(30),
price int,
duration int
);

5. review table

create table review(


review_id int primary key,
score int,
review_comment varchar(50),
user_id int,
movie_id int,
foreign key (user_id) references user (user_id),
foreign key (movie_id) references movie (movie_id)
);
6. favouriteMovie table

create table favouriteMovie(


favourite_id int primary key,
user_id int,
movie_id int,
score int,
foreign key (user_id) references user (user_id),
foreign key (movie_id) references movie (movie_id),
foreign key (score) references review (review_id)
);

7. movieActor table

create table movieActor(


movie_actor_id int primary key,
movie_actor_role varchar(30),
movie_id int,
actor_id int,
foreign key (movie_id) references movie (movie_id),
foreign key(actor_id) references actor (actor_id)
);

8. watchHistory table

create table watchHistory(


watch_id int primary key,
user_id int,
movie_id int,
watchDate date,
foreign key (user_id) references user (user_id),
foreign key (movie_id) references movie (movie_id)
);
Task-3: SQL Scripts for inserting data into database tables

1. User table

2. Subscription table

3. Movie table

4. Actor table
5. Review table

6. FavouriteMovie table

7. MovieActor table
8. WatchHistory table

Task-4: Create queries for extracting data


4.1: Export all data about users in the HD subscriptions

SELECT *
FROM user u
JOIN subscription s ON u.subscription_type = s.subscription_id
WHERE s.subscription_type = 'HD';

4.2: Export all data about actors and their associated movies
SELECT actor.*, movie.*
FROM actor
JOIN movieactor ON actor.actor_id = movieactor.actor_id
JOIN movie ON movie.movie_id = movieactor.movie_id;

4.3: Export all data to group actors from a specific city, showing also the average age (per
city)
SELECT
city,
COUNT(actor_id) AS number_of_actors,
AVG(YEAR(CURDATE()) - YEAR(dob)) AS average_age
FROM
actor
GROUP BY
city;

4.4: Export all data to show the favourite comedy movies for a specific user
SELECT favouritemovie.user_id, movie.*
FROM favouritemovie
JOIN movie ON movie.movie_id = favouritemovie.movie_id
WHERE favouritemovie.user_id = 1 and movie.genre = 'Comedy';

4.5: Export all data to count how many subscriptions are in the database per country
SELECT Country, COUNT(subscription_type) AS SubscriptionCount
FROM user
GROUP BY Country

4.6. Export all data to find the movies that start with the keyword ‘The’
SELECT *
FROM movie
WHERE title like 'The%';
4.7: Export data to find the number of subscriptions per movie category
SELECT
m.genre,
COUNT(DISTINCT u.user_id) AS subscription_count
FROM
movie m
JOIN
watchhistory wh ON m.movie_id = wh.movie_id
JOIN
user u ON wh.user_id = u.user_id
GROUP BY
m.genre;

4.8: Export data to find the username and the city of the youngest customer in the UHD
subscription category.
SELECT
u.username,
u.city,
u.age
FROM
user u
JOIN
subscription s ON u.subscription_type = s.subscription_id
WHERE
s.subscription_type = 'UHD'
ORDER BY
u.age ASC
LIMIT 1;

4.9: Export data to find users between 22 - 30 years old (including 22 and 30 )
SELECT
*
FROM
user
WHERE
age BETWEEN 22 AND 30;

4.10: Export data to find the average age of users with low score reviews (less than 3). Group
your data for users under 20, 21-40, and 41 and over
SELECT
CASE
WHEN u.age < 20 THEN 'Under 20'
WHEN u.age BETWEEN 21 AND 40 THEN '21-40'
ELSE '41 and over'
END AS age_group,
AVG(u.age) AS average_age
FROM
user u
JOIN
review r ON u.user_id = r.user_id
WHERE
r.score < 3
GROUP BY
age_group;

Task-4: Queries for extracting data from the database

Query-1: Export all data about users in the HD subscriptions

Query-2: Export all data about the actors and their associated movies

Query-3: Export all data to group actors from a specific city, showing also the average age
(per city)
Query-4: Export all data to show the favourite comedy movies for a specific user.

Query-5: Export all data to count how many subscriptions are in the database per country

Query-6: Export all data to find the movies that start with the keyword ‘The’

Query-7: Export data to find the number of subscriptions per movie category
Query-8: Export data to find the username and the city of the youngest customer in the UHD
subscription category

Query-9: Export data to find users between 22 - 30 years old (including 22 and 30)
Query-10: Export data to find the average age of users with low score reviews (less than 3).
Group your data for users under 20, 21-40, and 41 and over

Part-3:

Python scripts to demonstrate interactions with the database

1. Create database connection

def create_connection():
connection = None
try:
connection = mysql.connector.connect(
host="localhost",
user="root",
password="root",
database="mininetdb"
)
if connection.is_connected():
print("Connected to MySQL database")
except Error as e:
print(f"The error '{e}' occurred")
return connection

2. Run the user menu for user actions

def user_menu():
connection = create_connection()
if not connection:
return
while True:
print("\n Menu:")
print("\n1. Export all the data about users in HD
subscriptions")
print("\n2. Export all data about actors and their associated
movies")
print("\n3. Export all data to group actors from a specific
city, showing also the average age (per city)")
print("\n4. Export all data to show the favourite comedy movies
for a specific user")
print("\n5. Export all data to count how many subscriptions are
in the database per country")
print("\n6. Export all data to find the movies that start with
the keyword The")
print("\n7. Export data to find the number of subscriptions per
movie category")
print("\n8. Export data to find the username and the city of
the youngest customer in the UHD subscription category")
print("\n9. Export data to find users between 22 - 30 years old
(including 22 and 30 )")
print("\n10. Export data to find the average age of users with
low score reviews (less than 3). Group your data for users under 20,
21-40, and 41 and over")
print("\n11. Close Database Connection")
print("\n12. Exit")

choice = input("Enter the number of query you want to run: ")

if choice.isdigit():
choice = int(choice)
if choice == 1:
execute_query(connection, query_1)
elif choice == 2:
execute_query(connection, query_2)
elif choice == 3:
execute_query(connection, query_3)
elif choice == 4:
execute_query(connection, query_4)
elif choice == 5:
execute_query(connection, query_5)
elif choice == 6:
execute_query(connection, query_6)
elif choice == 7:
execute_query(connection, query_7)
elif choice == 8:
execute_query(connection, query_8)
elif choice == 9:
execute_query(connection, query_9)
elif choice == 10:
execute_query(connection, query_10)
elif choice == 12:
break
else:
print("Invalid choice. Please select a number from the
menu.")
else:
print("Invalid input. Please enter a number.")

continue_choice = input("Do you want to continue? (Y/N):


").strip().upper()
if continue_choice != 'Y':
break

3. Execute query as per user actions

def execute_query(connection, query):


cursor = connection.cursor()
try:
cursor.execute(query)
results = cursor.fetchall()
for row in results:
print(row)
except Error as e:
print(f"Error: {e}")
Task-5: Queries with user input

Query-1: Export all the data of users with specific subscription.

subscription_type = input("Enter subscription type (HD/UHD): ").strip()

query = f"SELECT * FROM user u


JOIN subscription s #
ON u.subscription_type = s.subscription_id
WHERE s.subscription_type = '{subscription_type}';"

User input: subscription_type = ‘HD’

Query-2: Export all the data about a specific actor and movies associated with that actor

actor_name = input("Enter actor name: ").strip()

query = f"SELECT actor.*, movie.*


FROM actor
JOIN movieactor
ON actor.actor_id = movieactor.actor_id
JOIN movie
ON movie.movie_id = movieactor.movie_id
WHERE actor.actor_name = '{actor_name}';"

User input: actor_name = ‘Aaron Paul’


Query-3: Export all data of the actors from a specific city, showing their average age

city_name = input("Enter city name: ").strip()

query = f"SELECT city, COUNT(actor_id) AS number_of_actors,


AVG(YEAR(CURDATE()) - YEAR(dob))
AS average_age
FROM actor
WHERE city = '{city_name}'
GROUP BY city;"

User input: city_name = ‘New York’

Query-4: Export all data to show the favourite comedy movies for a particular user

username = input("Enter username: ").strip()

query = f"SELECT favouritemovie.user_id, movie.*


FROM favouritemovie
JOIN user
ON favouritemovie.user_id = user.user_id
JOIN movie
ON movie.movie_id = favouritemovie.movie_id
WHERE user.username = '{username}'
AND movie.genre = 'Comedy';"
User input: username = ‘john_doe’

Query-5:

country_name = input("Enter country name: ").strip()

query = f"SELECT country, COUNT(subscription_type) AS


subscription_count
FROM user
WHERE country = '{country_name}'
GROUP BY country;"

User input: country_name: ‘USA’

Task-6: Create users table in Apache Cassandra and generate queries in CQL
Query-1: Provide the CREATE statement in CQL
Query-2: Export all users

Query-3: Export all users from a specific country

Query-4: Export data to find users between 22-30 years old (including 22 and 30)
Query-5: Count how many users exist per specific city

You might also like