Problem Statement
Problem Statement
1 Introduction
Hello, students! Plagiarism is a rampant issue that plagues the academia. Plagiarism is the academic malprac-
tice of submitting non-original work (copied from other sources or from another paper) as one’s own. This also
applies to homework essays, documents, or code submissions, as well as projects, not just research papers.
As part of this project, you are required to implement a plagiarism checker that checks code files. This is split
into three phases, one where you design a checker for two codes at a time (accuracy matters), the second where
a bulk-scale checker is designed (efficiency matters) and the third which is somewhat of a hacking phase.
In order to avoid complicating things for this project, we have already written a parsing code that generates
a stream of integer tokens from a C++ file which is given to your functions. Also, we do not expect you to
check for citations; just evaluate the works as such and flag them whenever necessary.
1
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker
Page 2
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker
Page 3
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker
by the caller) should jsut execute the add submission method and return immediately to the caller(s); the
evaluation and flagging should be done by the other thread(s), which should be hidden from outside the object.
Of course, you could also use conditional variables or semaphores – any synchronization method that works is
perfectly acceptable – just make sure that there are no deadlocks or data races and that it works as expected!
BONUS: Can you implement a working, accurate plagiarism checker that is both efficient (not naive check
one by one against all paragraphs) and matches patterns with small modifications, like in phase one?
Page 4
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker
BONUS: Heard of the famous MOSS plagiarism checker? Can you bypass that as well? Submit your hacks
for MOSS in the same format as above with the label "MOSS" if you can get a low similarity score (below 20%)
for plagiarised code files.
Page 5
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker
5 General Instructions
Below are some things you should have in mind while working on the first two phases.
5.3 Evaluation
The exact rubrics are not going to be shared until after the submission deadline. However, you should be aware
that part of the grading will be based on autograder and testcases; the other component being manual grading.
Ensure that your code compiles on both g++ and clang++ compilers and on multiple platforms the way it is
given (you should not modify any files other than the three files you are expected to submit). Otherwise, we
cannot guarantee any marks for the testcases component.
Moving on to the manual grading part, we will read your code and evaluate it based on its algorithm and
performance. Note that writing obfuscated code or simply code that is hard to read is not only difficult for us
but also difficult for your partner (heck, even yourself!) to understand or debug; you will be penalized if
your code is not readable. Consider the following:
1. Refactor 100+ line monsters into several small functions; each one should do one simple task or call other
functions. Your functions should average around 30 lines of code each.
2. Do not indent/nest your code more than 4 levels deep. This limits the complexity of the functions in
terms of the cognitive load on the human reading it, and the difficulty in understanding the same.
3. Each line of code should not be more than 100 characters long (including indents). Scrolling horizontally
to read long lines on a laptop – not reader-friendly!
4. Any class object, member, namespace, function, or even variables – anything that lives for longer than
one single function should have names verbose enough to make their purpose obvious from anywhere in
the code, particularly wherever it is used. Do not abuse abbreviations.
5. Do not add comments for every line. Provided you follow points 1 - 4, most of your code should be
obvious to understand. Comments at the top of the file should provide a 10000-foot view of the purpose,
and comments near functions or object declarations should explain the need for them in relation to the
big picture, whenever not obvious.
Try to follow these guidelines, and you should be good to go!
Page 6
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker
6 Submission Instructions
6.1 Phase one
You will have to submit a file <rollno1> <rollno2> CS293 phase1.tar.gz. If your team has three members,
then submit <rollno1> <rollno2> <rollno3> CS293 phase1.tar.gz, on Moodle. The deadline is November
3rd , Sunday EOD. On executing tar -xvf <submission>.tar.gz, it should create a directory with an
identical name (minus the tar extension). Inside the directory should be one file – match submissions.hpp.
Page 7