Assignment 1
Assignment 1
Practice-based Assignment
Assignment 1
Assignment 1
Description
Use the following two data files:
Tasks
1. Create a database named “imdb” on the Compass tool.
2. Create two collections named “titles” (refers to title.basics.tsv.gz) and “ratings” (refers to
title.ratings.tsv.gz).
3. Schema validation: Write JSON Schemas for each of the collections based on the descriptions.
4. Import data to the collections using Compass. Insert the data in titles.csv into “titles” and
ratings.csv into “ratings”.
5. Perform schema analysis and describe the data characteristics. Go to the “Schema” tab
and click on the “Schema analysis” button. Compass will generate an analysis for each
column. You should describe the interesting characteristics of the data in the columns.
You can learn more about schema analysis at:
https://docs.mongodb.com/compass/master/schema/
6. Perform some advanced analysis of the data using Aggregation. You must extract the
following information and include the output.
a. Find the total number of movies released each year.
b. Find the top five Fantasy-Adventure movie titles (primaryTitle) released in 2021
according to the rating. [Hint: Genre must include both Fantasy and Adventure.]
c. Find the top five Fantasy-Adventure movie titles (primaryTitle) released in 2021
according to the number of votes. [Hint: Genre must include both Fantasy and
Adventure.]
Marking Criteria
Database Creation and Import Data Database and collection 1.5 Mark
(Tasks 1-4) creation
Schema validation 6 Marks
7 Marks (3 Marks per schema)
Import data 3 Marks
(1.5 Mark per
collection)
Schema Analysis Complete the assigned tasks 1.5 Mark
(Task 5) using Compass
Comment on the statistics of the column
data, e.g., In which year we can find the
greatest number of released movies?
Which genre covers the highest number of Statistical analysis 3 Marks
movies? etc.
3 Marks
Good luck J