AssignmentRequirements Eng
AssignmentRequirements Eng
Assignment
COURSE: DATA STRUCTURES AND ALGORITHMS
CODE: 504008
TOPIC: Analyze the movie ratings of viewers
Version: 1.2.2 – Date: 27/11/2023
(Students carefully read all instructions before conducting)
I. Problem statement
This assignment aims to analyze the dataset of the viewer (user) ratings of movies. We use a graph
model to capture the relationships between the films and the viewers. Then, we execute several
analytical queries to obtain meaningful user insights from the graph.
Required functionalities for the system are listed below:
- Read data from existing files and build a graph based on this data.
- Develop functions that can extract information from graph structures based on the specific
requirements of the problem.
II. Resources
The attached source code includes:
- Data files and expected output files:
o The data folder consists of 03 files, including:
▪ user.csv: information of viewers.
▪ movie.csv: information on movies.
▪ rating.csv: information on viewers’ ratings.
o The expected_output folder consists of 07 outputs from Req1.txt to Req7.txt, which
are expected results corresponding to 07 requirements.
- Source code files:
o Main.java: containing the main method that calls necessary methods to test.
o Movie.java and User.java: Movie and User classes respectively. STUDENTS DO
NOT MODIFY THESE FILES.
o RatingManagement.java: The RatingManagement class has an attribute of rating for
storing the list of viewers’ ratings, movies storing the list of movies, and users storing
the list of viewers in the system. This class provides the initialization method and the
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 1/13
Ton Duc Thang University
Faculty of Information Technology
getter method. The students are required to complete the blank methods in this file,
without modifying the existing methods.
III. Assignment Procedure
- Students download the source code.
- Students carefully read and understand the descriptions and the provided files, then implement
the Rating class and integrate it with the RatingManagement class, following the class
diagram and the specifications in the subsequent sections.
- After finishing the above classes, students compile and execute the program using the main
method in the Main.java file to test the outputs of requirements.
o To execute each requirement, students must first compile the code and then run the
Main file with the proper parameter for the desired task. For instance, to run
Requirement 1:
java Main 1
o To execute multiple requirements, the student must first compile the code and then run
the Main file with the appropriate parameters for the desired requirements. For
instance, to run Requirement 1, 2, and 3:
java Main 1 2 3
Students complete the following tasks and compare your work with the results provided in the
expected_output folder.
- For requirements that students are not able to complete, do not delete related methods and
ensure the program can be executed successfully given the main method in Main.java.
- The Google Drive folder of the assignment also consists of version.txt, students regularly
check for updates via this file. If there are any updates,
students carefully read the descriptions and download the
newest resources. This file provides updated contents and
dates in case the assignment has any changes.
IV. Descriptions of the problem
The data on the ratings of movies given by viewers will be
represented as a bipartite graph, which has two sets of nodes: one
for the users and one for the movies. In this assignment, students
will construct a graph and save it in an Edge List format. The graph
has approximately 10,000 nodes and over 1,000,000 edges (the
exact numbers will be determined after completing Requirement
1).
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 2/13
Ton Duc Thang University
Faculty of Information Technology
V. Descriptions of Classes
- Descriptions of class attributes and methods are below:
o The class attributes are always declared with the private access modifier.
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 3/13
Ton Duc Thang University
Faculty of Information Technology
o Rating class – viewer’s rating for the movie (Students implement this class)
▪ Including 04 attributes:
● Attribute 1: int – identity number of viewer
● Attribute 2: int – identity number of movie
● Attribute 3: int – rating star
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 4/13
Ton Duc Thang University
Faculty of Information Technology
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 5/13
Ton Duc Thang University
Faculty of Information Technology
For example, the first data line is the movie having the identity number of the movie of 1, the
movie’s title of Toy Story (1995), and genres: Animation, Children's, and Comedy.
- The users.csv file stores information about various viewers in the system. Each line in the file
represents a single viewer and its attributes which are separated by commas (“,”) and based on
format:
identity number of viewer,gender,range of age,occupation,zip code
Notice: The symbols “M” and “F” are used as abbreviations for male and female, respectively.
The order of the attributes follows the header row at the beginning of the file.
For example, the first data line is the viewer having the identity number of the viewer of 1, the
gender of female, the age group of 1, the occupation of K-12 student, and the zip code of
48067.
- The ratings.csv file stores information about various viewers’ movie ratings. Each line in the
file represents a rating and its attributes which are separated by commas (“,”) and based on
format:
identity number of viewer,identity number of movie,rating star,timestamp
Notice: The ratings are made on a 5-star scale (whole-star ratings only, such as 1 star or 2
stars). The order of the attributes follows the header row at the beginning of the file.
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 6/13
Ton Duc Thang University
Faculty of Information Technology
For example, the first data line is the rating of the viewer having the identity number of 1, for
the movie having the identifying number of 1193, with a score of 5 stars, and at a specific time
that the viewer rating is 978300760 (converted to 01/01/2001, 5:12:40 AM, GMT+7).
- The expected_output folder consists of 07 results files corresponding to 07 tasks:
o For Task 2 and 3: Each line in each Req2.txt and Req3.txt corresponding to an object
is formatted according to the toString() method of the respective class.
o For the remaining tasks: Each line in each file .txt is the title of the movie corresponding
to the requirements of the problem.
Notice:
- Students carefully read the main method to find out approaches to complete tasks in the
designated order.
- Students may add extra actions in the main method to verify completed methods; however,
you must ensure the program can be executed successfully with the original main method.
- Students may add extra methods to support your work; however, new methods must be
declared and implemented in files for submission and the submission can be executed
successfully with the given Main.java.
- Students avoid using absolute paths when defining methods that involve reading files. Since
absolute paths are specific to a particular system, they may prevent the marking process from
accessing the file correctly, resulting in a compilation error.
- DO NOT MODIFY CLASS AND METHOD NAMES, STRICTLY FOLLOW THE GIVEN
CLASS DIAGRAM.
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 7/13
Ton Duc Thang University
Faculty of Information Technology
VII. Requirements
Besides the libraries provided in the files received, students have permission to utilize the classes in
java.util package and the necessary classes for reading file tasks in the java.io package. However,
students are not allowed to import any other libraries that are not explicitly mentioned above.
Students conduct the assignment in Java 11 or Java 8. Students are not allowed to use var data type.
Submissions are judged in Java 11 and students take responsibility for errors caused by Java version
differences.
To facilitate the work, students may define additional methods within the designated files. However,
students must not modify the names and input parameters of the existing methods and the methods
specified in the paper, as this would compromise the evaluation process.
In the Description of Classes section of the lesson, some attributes of the class are assigned a number.
When students implement this class, they should choose an appropriate name and data type that
matches the description. Additionally, they should ensure that the access modifier for all attributes is
set to private.
The large volume of data poses a challenge for students who need to process requests efficiently.
Students should monitor the program's running time and apply algorithmic optimization techniques
when necessary to reduce it. The marking program has a strict time limit for each student's work, and
any work that exceeds this limit will receive 0.0 points.
1. REQUIREMENT 1 (1 point)
Students define the methods in the RatingManagement class and use the Main class to evaluate the
results of the work.
Students implement the method:
private ArrayList<Movie> loadMovies(String moviePath)
private ArrayList<User> loadUsers(String userPath)
public ArrayList<Rating> loadEdgeList(String ratingPath)
The method reads data from files located in a specific directory. The method takes three
parameters: ratingPath, moviePath, and userPath, which specify the paths of the files containing the
rating, movie, and viewer data, respectively. The method stores the data in the existing attributes in
the class. The method also uses a function called loadEdgeList to add the rating data to an Edge List.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 1 to write the result to the Req1.txt file in the output directory. The output file contains the
number of vertices and edges in the graph, as computed by the implemented methods. Students should
verify the correctness of their output by comparing it with the sample output in the Req1.txt file in
the expected_output folder.
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 8/13
Ton Duc Thang University
Faculty of Information Technology
Notice: This is the task that students must complete correctly to gain scores from later tasks (from
Requirement 2 to Task 7). If the reading file process does not convert data into the edge list of the
graph, the requirements below are not scored.
The following tasks apply transformations to the information extracted from the file.
2. REQUIREMENT 2 (2 points)
Students implement the method:
public ArrayList<Movie> findMoviesByNameAndMatchRating(int userId, int rating)
to return a list of movies rated by a viewer having userId with the rating score greater than or equal
to the rating parameter. As a result, the movies must be named alphabetically.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 2 to write the result to the Req2.txt file in the output directory. Students should verify the
correctness of their output by comparing it with the sample output in the Req2.txt file in
the expected_output folder.
3. REQUIREMENT 3 (2 points)
Students implement the method:
public ArrayList<User> findUsersHavingSameRatingWithUser(int userId, int movieId)
to return a list of viewers provided that these viewers rate a movie with a movieId code that has the
same number of stars as the viewer with the code userId rated for that movieId.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 3 to write the result to the Req3.txt file in the output directory. Students should verify the
correctness of their output by comparing it with the sample output in the Req3.txt file in
the expected_output folder.
4. REQUIREMENT 4 (2 points)
Students implement the method:
public ArrayList<String> findMoviesNameHavingSameReputation()
to return a list of movie titles favored by at least any two viewers (the definition of favorite is rated
greater than 3 stars). As a result, the movies must be ordered alphabetically.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 4 to write the result to the Req4.txt file in the output directory. Students should verify the
correctness of their output by comparing it with the sample output in the Req4.txt file in
the expected_output folder.
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 9/13
Ton Duc Thang University
Faculty of Information Technology
5. REQUIREMENT 5 (1 point)
Students implement the method:
public ArrayList<String> findMoviesMatchOccupationAndGender(String occupation, String
gender, int k, int rating)
to return a list containing up to the k of movie names of users with the same occupation and the same
gender with the same rating score. The results must be arranged alphabetically.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 5 to write the result to the Req5.txt file in the output directory. Students should verify the
correctness of their output by comparing it with the sample output in the Req5.txt file in the
expected_output folder.
6. REQUIREMENT 6 (1 point)
Students implement the method:
public ArrayList<String> findMoviesByOccupationAndLessThanRating(String occupation,
int k, int rating)
to return a list containing up to the k of movie titles rated less than the rating by viewers with the
same occupation. The results must be arranged alphabetically.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 6 to write the result to the Req6.txt file in the output directory. Students should verify the
correctness of their output by comparing it with the sample output in the Req6.txt file in the
expected_output folder.
7. REQUIREMENT 7 (1 point)
Students implement the method:
public ArrayList<String> findMoviesMatchLatestMovieOf(int userId, int rating, int k)
to return a list containing up to k movie titles that are rated greater than or equal to the rating by
viewers of the same gender with viewer having userId and these movies share at least 01 genre with
the lastest movie reviewed by the user with the userId (the “lastest” time is considered by the
timestamp attribute) whose review score is greater than or equal to the rating.
Students implement the proposed method, compile the Main.java file, and execute the command java
Main 7 to write the result to the Req7.txt file in the output directory. Students should verify the
correctness of their output by comparing it with the sample output in the Req7.txt file in the
expected_output folder.
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 10/13
Ton Duc Thang University
Faculty of Information Technology
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 11/13
Ton Duc Thang University
Faculty of Information Technology
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 12/13
Ton Duc Thang University
Faculty of Information Technology
Quang D.C – [email protected] | Data Structures and Algorithms (504008) – 2023 13/13