Big Data Project-2 Report
Big Data Project-2 Report
1. User table
user_id int PRIMARY KEY
username varchar(30)
email varchar(30)
password varchar(30)
city varchar(30)
country varchar(30)
age int
subscription_type FOREIGN KEY REFERENCES subscription.subscription_id
2. Subscription Table
subscription_id int PRIMARY KEY
subscription_type varchar(30)
price int
duration int
3. Movie
movie_id int PRIMARY KEY
title varchar(30)
genre varchar(30)
release_date date
4. Actor
actor_id int PRIMARY KEY
actor_name varchar(30)
city varchar(30)
dob date
5. Review
review_id int PRRIMARY KEY
user_id FOREIGN KEY REFERENCES user.user_id
movie_id FOREIGN KEY REFERENCES movie.movie_id
score int
review_comment varchar(50)
6. FavouriteMovie
Favourite_id int PRRIMARY KEY
user_id FOREIGN KEY REFERENCES user.user_id
movie_id FOREIGN KEY REFERENCES movie.movie_id
score int FOREIGN KEY REFERENCES review.score
7. MovieActor
8. WatchHistory
Part-2:
1. user table:
3. movie table
4. subscription table
5. review table
7. movieActor table
8. watchHistory table
1. User table
2. Subscription table
3. Movie table
4. Actor table
5. Review table
6. FavouriteMovie table
7. MovieActor table
8. WatchHistory table
SELECT *
FROM user u
JOIN subscription s ON u.subscription_type = s.subscription_id
WHERE s.subscription_type = 'HD';
4.2: Export all data about actors and their associated movies
SELECT actor.*, movie.*
FROM actor
JOIN movieactor ON actor.actor_id = movieactor.actor_id
JOIN movie ON movie.movie_id = movieactor.movie_id;
4.3: Export all data to group actors from a specific city, showing also the average age (per
city)
SELECT
city,
COUNT(actor_id) AS number_of_actors,
AVG(YEAR(CURDATE()) - YEAR(dob)) AS average_age
FROM
actor
GROUP BY
city;
4.4: Export all data to show the favourite comedy movies for a specific user
SELECT favouritemovie.user_id, movie.*
FROM favouritemovie
JOIN movie ON movie.movie_id = favouritemovie.movie_id
WHERE favouritemovie.user_id = 1 and movie.genre = 'Comedy';
4.5: Export all data to count how many subscriptions are in the database per country
SELECT Country, COUNT(subscription_type) AS SubscriptionCount
FROM user
GROUP BY Country
4.6. Export all data to find the movies that start with the keyword ‘The’
SELECT *
FROM movie
WHERE title like 'The%';
4.7: Export data to find the number of subscriptions per movie category
SELECT
m.genre,
COUNT(DISTINCT u.user_id) AS subscription_count
FROM
movie m
JOIN
watchhistory wh ON m.movie_id = wh.movie_id
JOIN
user u ON wh.user_id = u.user_id
GROUP BY
m.genre;
4.8: Export data to find the username and the city of the youngest customer in the UHD
subscription category.
SELECT
u.username,
u.city,
u.age
FROM
user u
JOIN
subscription s ON u.subscription_type = s.subscription_id
WHERE
s.subscription_type = 'UHD'
ORDER BY
u.age ASC
LIMIT 1;
4.9: Export data to find users between 22 - 30 years old (including 22 and 30 )
SELECT
*
FROM
user
WHERE
age BETWEEN 22 AND 30;
4.10: Export data to find the average age of users with low score reviews (less than 3). Group
your data for users under 20, 21-40, and 41 and over
SELECT
CASE
WHEN u.age < 20 THEN 'Under 20'
WHEN u.age BETWEEN 21 AND 40 THEN '21-40'
ELSE '41 and over'
END AS age_group,
AVG(u.age) AS average_age
FROM
user u
JOIN
review r ON u.user_id = r.user_id
WHERE
r.score < 3
GROUP BY
age_group;
Query-2: Export all data about the actors and their associated movies
Query-3: Export all data to group actors from a specific city, showing also the average age
(per city)
Query-4: Export all data to show the favourite comedy movies for a specific user.
Query-5: Export all data to count how many subscriptions are in the database per country
Query-6: Export all data to find the movies that start with the keyword ‘The’
Query-7: Export data to find the number of subscriptions per movie category
Query-8: Export data to find the username and the city of the youngest customer in the UHD
subscription category
Query-9: Export data to find users between 22 - 30 years old (including 22 and 30)
Query-10: Export data to find the average age of users with low score reviews (less than 3).
Group your data for users under 20, 21-40, and 41 and over
Part-3:
def create_connection():
connection = None
try:
connection = mysql.connector.connect(
host="localhost",
user="root",
password="root",
database="mininetdb"
)
if connection.is_connected():
print("Connected to MySQL database")
except Error as e:
print(f"The error '{e}' occurred")
return connection
def user_menu():
connection = create_connection()
if not connection:
return
while True:
print("\n Menu:")
print("\n1. Export all the data about users in HD
subscriptions")
print("\n2. Export all data about actors and their associated
movies")
print("\n3. Export all data to group actors from a specific
city, showing also the average age (per city)")
print("\n4. Export all data to show the favourite comedy movies
for a specific user")
print("\n5. Export all data to count how many subscriptions are
in the database per country")
print("\n6. Export all data to find the movies that start with
the keyword The")
print("\n7. Export data to find the number of subscriptions per
movie category")
print("\n8. Export data to find the username and the city of
the youngest customer in the UHD subscription category")
print("\n9. Export data to find users between 22 - 30 years old
(including 22 and 30 )")
print("\n10. Export data to find the average age of users with
low score reviews (less than 3). Group your data for users under 20,
21-40, and 41 and over")
print("\n11. Close Database Connection")
print("\n12. Exit")
if choice.isdigit():
choice = int(choice)
if choice == 1:
execute_query(connection, query_1)
elif choice == 2:
execute_query(connection, query_2)
elif choice == 3:
execute_query(connection, query_3)
elif choice == 4:
execute_query(connection, query_4)
elif choice == 5:
execute_query(connection, query_5)
elif choice == 6:
execute_query(connection, query_6)
elif choice == 7:
execute_query(connection, query_7)
elif choice == 8:
execute_query(connection, query_8)
elif choice == 9:
execute_query(connection, query_9)
elif choice == 10:
execute_query(connection, query_10)
elif choice == 12:
break
else:
print("Invalid choice. Please select a number from the
menu.")
else:
print("Invalid input. Please enter a number.")
Query-2: Export all the data about a specific actor and movies associated with that actor
Query-4: Export all data to show the favourite comedy movies for a particular user
Query-5:
Task-6: Create users table in Apache Cassandra and generate queries in CQL
Query-1: Provide the CREATE statement in CQL
Query-2: Export all users
Query-4: Export data to find users between 22-30 years old (including 22 and 30)
Query-5: Count how many users exist per specific city