0% found this document useful (0 votes)

29 views7 pages

Problem Statement

Uploaded by

vardhanvarri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

Problem Statement

Uploaded by

vardhanvarri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Plagiarism Checker

CS293: Data Structures and Algorithms Lab Project

October - November 2024

1 Introduction
Hello, students! Plagiarism is a rampant issue that plagues the academia. Plagiarism is the academic malprac-
tice of submitting non-original work (copied from other sources or from another paper) as one’s own. This also
applies to homework essays, documents, or code submissions, as well as projects, not just research papers.
As part of this project, you are required to implement a plagiarism checker that checks code files. This is split
into three phases, one where you design a checker for two codes at a time (accuracy matters), the second where
a bulk-scale checker is designed (efficiency matters) and the third which is somewhat of a hacking phase.
In order to avoid complicating things for this project, we have already written a parsing code that generates
a stream of integer tokens from a C++ file which is given to your functions. Also, we do not expect you to
check for citations; just evaluate the works as such and flag them whenever necessary.

1.1 Types of Plagiarism

Plagiarism occurs in several different forms, be aware of them and do not commit any of them!
1. Global plagiarism: copying the entire or most of the work of someone else and claiming it as your own.
This is outright cheating and typically attracts the most grave penalties.
2. Direct plagiarism: also known as verbatim plagiarism, this involves copying paragraphs or portions
of someone else’s work. For code files, this involves copying functions, classes or other parts of the body.
Even though some part of the submission is original, this is still plagiarism unless citations are provided.
3. Paraphrasing plagiarism: this is very similar to the above type, except that some words or phrases are
changed, and no reference or attribution to the original is provided. This is still plagiarism. The coding
analogue would be changing variable names and slight order modifications, but the tokens after parsing
do not get changed significantly.
4. Self-plagiarism: did you know that copying another of one’s own work and submitting it afresh as
though it is a new work, without proper attribution, is also considered plagiarism? Yes, it is equally
offensive as other types, the reason being that the number of papers or submissions made does not reflect
the amount of content originally created by the same person.
5. Patchwork plagiarism: also known as mosaic plagiarism, this is harder to detect, but this is a form
of plagiarism wherein content or code copied from multiple sources are interwoven with original content
and no citations are provided.
In this project, you will be creating robust plagiarism checkers that can detect some (or all) of the above types
of plagiarism. The checkers will be tested against a variety of testcases, including some that are specially
designed to bypass plagiarism checkers.

1.2 Project Structure

This project is split into three phases where you’ll design, scale and improve your plagiarism checker. Make
sure to read this document completely and carefully before starting the project.
The phases are designed to be somewhat independent (especially Phase 3) and the submission deadlines as well
as the evaluation criteria for each phase will be separate.
Helper code is also provided for Phases 1 and 2 and files to be modified are explicitly mentioned in the
instructions of those phases. Some general instructions (including commands to set up the environment) are
also provided at the end of this document.

1
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker

2 Phase One: Barebones Checking of Two Submissions

You will be provided two submissions, both of which are vectors of integers. Each integer is a token. Conse-
quently, both submissions are collections of tokens in order that form the respective code after being parsed.
Do not worry about the demarcations of functions, comments, and code semantics; just look for similar patterns!

2.1 What kind of matching patterns should you detect?

Firstly, a pattern is a series of numbers in a particular order. Of course, the order matters as code without
order (unless you’re talking pure functional programming – even then, the order of tokens matters) is not of
any use. You need to look for matching subsequences between the two sequences of tokens.
Of course, there is no point looking for sequences of length 1 since there are only a few individual tokens, and
there are bound to be a lot of matches. The same holds for matches of very short length. For short patterns
(10-20 tokens each), you need to look for accurate matches – all tokens are to be identical in the same order.
For larger patterns (above 30 tokens each), look for approximate matches: you are to consider two patterns as
matches if there exists a subsequence of at least 80% the length of the longer of the two. This automatically
implies that the patterns that are matched should roughly be of the same length.

2.2 Alright, what should be reported?

You are to report five values in order, which gives the caller an idea of the degree of match between the two
code files and the most significant match.
• The zeroth return value is a flag that is 1 if the two submissions are flagged as plagiarised, and 0 otherwise
• The first value should be the total length (number of tokens) of all pattern matches detected, of lengths
around 10-20. Longer pattern matches will count as multiple pattern matches, but that doesn’t matter
since you report the total length.
• The second value should be the length of the longest approximate pattern match that you were able to
detect. Here, we recommend you look only at long pattern matches (30 or higher tokens); return zero
otherwise.
• The third and fourth values are the start indices of the pattern you found above in the first and second
files, respectively (start index of the pattern in either vector of tokens).
Ensure that in the first value, you do not double-count patterns, i.e., all patterns caught should be present as
accurate matches in both files, and no two of them should overlap in either file. This not only takes care of
direct/global plagiarism but also of patchwork plagiarism – when the offender copies lots of sections from the
other work and intertwines them with original content.
Looking for approximate matches as well (for the longest match) ensures that additions/modifications of a few
statements are still caught (variants of paraphrasing plagiarism), and the very fact that you look at tokens and
not code files directly ensures that merely changing of variable names does not evade your detector.
For the flag (zeroth value), try to be as accurate as possible. Ensure that your code correctly identifies pairs of
files that are plagiarized and those that are not. You are free to decide the threshold for the number of short
and approximate pattern matches that should be caught for a pair of files to be flagged as plagiarised, but
ensure that it is reasonable and avoids too many false positives and negatives. This is important for phase 3.

2.3 Your task

You are supposed to modify and submit only the file match submissions.hpp, implementing the provided
method match submissions provided, and return an std::array<int, 5>, containing whatever is mentioned
above. Do not write print, read/write file or log statements.
Write code as efficiently as you can since some submissions (a few thousand tokens) will take a long time. It is
okay if you miss some small pattern matches or if values slightly differ from expected. Grading will be relatively
lenient, and you will not be penalized for small differences in the length or start indices of the longest pattern
match detected, provided these are approximately around the expected values.

Page 2
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker

3 Phase Two: A Full-Fledged Plagiarism Checker

Alright, now comes the challenging part! The method of pairwise checking of submission files is not scalable to
real-life inputs, particularly when a large number of students submit codes in real-time and expect results for
plagiarism checks as soon as possible. We need something more efficient.

3.1 The Input and Output API

Real-life codebases utilize modularity and object-oriented programming. In other words, rather than simply
passing around pointers, arrays, and strings to lose global functions, there are multiple objects that interact
with each other using special methods.
The methods provided by an object for other objects to use (to write/overwrite onto, get something done from,
or simply read data from) itself are known as its API s. They describe how a given object works (common to all
objects that are instances of a particular class), encapsulate and hide the internal working from other objects
without hampering their functionality. How the object is implemented internally is abstracted out, but other
objects can assume the correctness of all the APIs, from creation to all methods to destruction.
In this case, there are four classes of objects: one for students, one for professors, one for your checker, and
one for submissions (this is technically a plain old datatype, or just an aggregation of multiple objects – the
code file name, a pointer to the student who submitted and a pointer to the professor).
Note: struct submission t has the code file as a string (representing its file name), and not the stream of
tokens as in phase one. Meaning, you will have to instantiate a tokenizer t object and call the get tokens()
method on it to get the tokens.
You need not worry about the working of the classes student t and professor t, except for the fact (given)
that they provide a method (no return value) called flag student or flag professor. This is to alert them
of a plagiarised submission (pass the submission pointer as an argument).
You also need not worry about the main method, the way professor/student objects are instantiated, the way
they prepare submissions, and what they do if they are alerted for plagiarism. Nor should you worry about
detailed reports. Do not print log statements; just accept inputs and call the flag API when needed.

3.2 Your class plagiarism checker t

The APIs provided by your class include its constructor with a set of submissions (again, tokenized; you get
tokens as integers). This emulates the real world when past submission code files are added first after the
plagiarism checker is ready to check for the plagiarism of new ones against those as well.
There is then the method add submission that provides a pointer to the submission. Remember, this is how
other objects (students and professors) interact with your checker, so this should be intact. They will certainly
get irritated if they are to wait for hours just to add a submission and get it checked. So, you must use parallel
processing/multithreading, with all the heavy lifting occuring in background.
Further, each submission added has a timestamp. That is not provided as an argument explicitly, but you
are expected to use the chrono::time point method to store the time in which the submission is provided.
This is essential since a submission that copies from a very old submission should be flagged while the older
one should not. We recommend you to record the timestamps before tokenizing the file, to avoid delays while
parsing the file; this also ensures that the timestamp is as close to the actual submission time as possible.
Speaking of being flagged, you need to use the APIs of student and professor classes. Just like we want this
asynchronous behavior, you can assume that our implementations of the student and professor class will respond
immediately on being flagged and will do whatever they do after that.

3.3 Tips regarding the asynchronous part

You are expected to do multithreading and asynchronous programming for this phase, something you have not
learnt officially (and will do in Operating Systems course in the next semester). So, we provide you some tips
to achieve the same. C++ offers std::thread to help you out, and threads can be stored as class members or
variables as well (but cannot be copied).
In order to synchronize the threads, you are encouraged to use a std::mutex, commonly used around a shared
object (shared among the threads of your class, invisible to students and professors). The main thread (executed

Page 3
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker

by the caller) should jsut execute the add submission method and return immediately to the caller(s); the
evaluation and flagging should be done by the other thread(s), which should be hidden from outside the object.
Of course, you could also use conditional variables or semaphores – any synchronization method that works is
perfectly acceptable – just make sure that there are no deadlocks or data races and that it works as expected!

3.4 How do you check a submission for plagiarism?

Firstly, you need to tokenize the code file (of the submission object) into a stream of tokens. We have the method
implemented for you, but you need to create an instance of class tokenizer t with that file, get tokens()
from it and then analyze them to report plagiarism if either of the following hold:
• There is an exact match of at least one pattern of length around 75 or more.
• The number of pattern matches caught between them is 10 or more.
Even if a submission is caught for plagiarism, it should be stored in the database since other files that copy
from it are still plagiarised submissions (they are definitely not original).
• If the timestamps of insertion of the submissions differ by greater than or equal to one second, then
only the later submission is to be flagged as plagiarised. Call the corresponding flag student and
flag professor methods both to do this. In case any of the student or professor pointers is null (student
submitted alone, or professor submitted alone), then only the objects pointed to by non-null pointers are
to be flagged.
• If the timestamps differ by less than one second, then both submissions are to be flagged (both students
and both professors, except null pointers) in a similar manner.
• The original submissions passed in the constructor are not to be flagged, even if plagiarism is detected
among those. Timestamps for those are treated as zero (far behind timestamps for real-time submissions).
In addition, you are also supposed to check for patchwork plagiarism across multiple sources: if there exists
more than or equal to 20 pattern matches in total, each with a corresponding (distinct) pattern taken off any
work under evaluation among all the previous works by timestamps, then it is to be flagged as plagiarised in a
similar manner.
None of the older ones are to be flagged, though. Once again, the master set of all sentences is across all
submissions that are added up to one second after the current submission is being evaluated.

3.5 Your task

As mentioned above, you are to implement the class plagiarism checker t and are supposed to modify exactly
two files: plagiarism checker.hpp and plagiarism checker.cpp. Details are clearly mentioned above.
The former is the header file, where you can add your own sub-routines, class member definitions, and helper
objects, provided you do not modify the public interface of your class. The latter is the file where you implement
the methods (public and protected) of your class. Do not add function bodies in header files, since it will cause
linker errors; just write signatures.
For phase two, in order to make detection of matches and storing or finding sentences for matches easier, two
patterns are considered matches if their length is 15 or more and if they are exact matches.
Also, you can ignore approximate matches with a few tokens here and there that are different. As evident, the
check becomes more lenient since you end up ditching checks for a few minor changes like an extra statement or
changed variable names (made by dishonest students who try to evade checkers) for the sake of speed. Accuracy
vs speed is always a trade-off in life, but here, focus on efficiency and ensure that all other checks are accurate.
Efficiency is paramount, even more so than in phase one. It is okay if you miss some matches in corner cases
or look for slightly smaller match lengths or slightly larger lengths as well; testcases will not be very strict
in checking whether your code exactly complies with the above guidelines all the time. Your code should not
crash or result in undefined behavior; testing will be strict in that sense, though.

BONUS: Can you implement a working, accurate plagiarism checker that is both efficient (not naive check
one by one against all paragraphs) and matches patterns with small modifications, like in phase one?

Page 4
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker

4 Phase Three: Gotta Hack ’em all!

Alright, now that you have implemented a (hopefully working) plagiarism checker, it is time to break some!
In this phase, you will be provided with the source codes of some of the plagiarism checkers implemented by
your peers in the first phase. Your task is to find ways to bypass these checkers, i.e., submit pairs of code files
that are incorrectly flagged by some of the checkers.

4.1 What counts as a hack?

There are two types of incorrect results a plagiarism checker can give:
• False Positives: When the checker flags a pair of code files as plagiarised, but they are not.
• False Negatives: When the checker does not flag a pair of code files as plagiarised, but they are.
For this activity, you are required to find examples of such cases for the given checkers. Since we expect that
your checkers are somewhat robust, simple operations such as just renaming the variables or changing the order
of statements might not bypass most of the checkers. Likewise, two completely different files would probably
not trigger false positives.
This is where your creativity comes into play. You can also use the internet to find ways to bypass plagiarism
checkers and use those as an inspiration to create your own hacks specific to the provided checkers.
For simplicity, you can pick any problem from a common online judge platform (like Codeforces, LeetCode,
etc.) and create a pair of code files for an accepted solution to that problem.

4.2 What do we expect you to submit?

Just for this phase, the source codes of detectors (some phase one submissions) along with a Google Form
will be released at a later date. You will have to submit your hacks one-by-one through the Google Form.
For each hack, you will have to provide the following details:
• List of labels of the detectors you are trying to bypass
• Type of the hack - false positive or false negative
• Link to the problem statement - we will submit the code files on this link to check solutions’ correctness
• A pair of C++ files which trigger an incorrect result for the checker(s). These are the code files that will
be submitted on the platform for correctness checking.
• A brief, yet complete explanation of how you bypassed the checkers.
You can (and are encouraged to) submit multiple hacks for the same checker(s) provided they are unique and
not just slight modifications of the same hack. Hence it is important to provide a brief but clear explanation
of how you bypassed the checker(s).
We would be passing your submissions through more sophisticated plagiarism checkers to verify if your hacks
are actually what they claim. Hence, you cannot submit two completely different files and claim a false negative
(or vice versa). Only incorrect behavior of the checker(s) will be considered as a successful hack.

4.3 Scoring for Phase three

Accepted hacks for checkers that are bypassed by multiple teams will be awarded fewer points. However, unique
and creative hacks might be awarded bonus points. Hence it is advantageous to submit multiple hacks.
This phase is completely independent of the first two phases (apart from the provided checkers). You can work
on this phase even if you have not successfully completed the first two phases. No helper code or testcases will
be provided, since none are needed for this phase.
Note that the hacks you submit should be your own work. Please do not share your ideas with other teams.

BONUS: Heard of the famous MOSS plagiarism checker? Can you bypass that as well? Submit your hacks
for MOSS in the same format as above with the label "MOSS" if you can get a low similarity score (below 20%)
for plagiarised code files.

Page 5
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker

5 General Instructions
Below are some things you should have in mind while working on the first two phases.

5.1 Getting the libraries to work

We will be relying heavily on the famous clang compiler, as well as on the LLVM toolchain, for the parsing of
code files into streams of tokens. We (and the LLVM devs) have done the tokenizing, but you should be able
to run your code (which relies on that) on your machine.
• For Linux: Run this: sudo apt install clang libc++-dev libclang-dev. We highly recommend
you use the latest software, a.k.a. Ubuntu 24.04 or Fedora 40. If you happen to use trashy operating
systems like Windows and do not dual boot, use WSL 2.0 and do the same.
• For MacOS: Run brew install llvm && brew link --force --overwrite llvm. If you use an Intel
MacBook, change the (Makefile) include/library directories from /opt/homebrew/... to /usr/local/...
• There will be 2 versions of Makefiles provided, one for MacOS and one for Linux. Choose the appropriate
Makefile. There will not be any use of CMake, but you should be able to understand Makefiles. Run
commands make <target> -f macos.mak or make <target> -f linux.mak to compile your code.

5.2 What’s allowed and what’s not?

You are allowed unrestricted access to C++ STL library, no questions asked. However,
• For either phase, do not use C++20 modules or #include-s of non-STL files (even extra files you write).
Nor should you add custom Makefiles; that will break our autograder.
• You should not add the include bits/stdc++.h in ANY of your files. This is a terrible practice frowned
upon in the industry for the simple reason that many compilers, including the famous clang, do not
support that header. Include STL headers one by one; it is also faster in compile time.
• Do not add using namespace std;, especially on header files; the reason why this is considered a bad
practice is that it results in name clashes between STL and custom methods/libraries.

5.3 Evaluation
The exact rubrics are not going to be shared until after the submission deadline. However, you should be aware
that part of the grading will be based on autograder and testcases; the other component being manual grading.
Ensure that your code compiles on both g++ and clang++ compilers and on multiple platforms the way it is
given (you should not modify any files other than the three files you are expected to submit). Otherwise, we
cannot guarantee any marks for the testcases component.
Moving on to the manual grading part, we will read your code and evaluate it based on its algorithm and
performance. Note that writing obfuscated code or simply code that is hard to read is not only difficult for us
but also difficult for your partner (heck, even yourself!) to understand or debug; you will be penalized if
your code is not readable. Consider the following:
1. Refactor 100+ line monsters into several small functions; each one should do one simple task or call other
functions. Your functions should average around 30 lines of code each.
2. Do not indent/nest your code more than 4 levels deep. This limits the complexity of the functions in
terms of the cognitive load on the human reading it, and the difficulty in understanding the same.
3. Each line of code should not be more than 100 characters long (including indents). Scrolling horizontally
to read long lines on a laptop – not reader-friendly!
4. Any class object, member, namespace, function, or even variables – anything that lives for longer than
one single function should have names verbose enough to make their purpose obvious from anywhere in
the code, particularly wherever it is used. Do not abuse abbreviations.
5. Do not add comments for every line. Provided you follow points 1 - 4, most of your code should be
obvious to understand. Comments at the top of the file should provide a 10000-foot view of the purpose,
and comments near functions or object declarations should explain the need for them in relation to the
big picture, whenever not obvious.
Try to follow these guidelines, and you should be good to go!

Page 6
CS293: Data Structures and Algorithms Lab Project Plagiarism Checker

6 Submission Instructions
6.1 Phase one
You will have to submit a file <rollno1> <rollno2> CS293 phase1.tar.gz. If your team has three members,
then submit <rollno1> <rollno2> <rollno3> CS293 phase1.tar.gz, on Moodle. The deadline is November
3rd , Sunday EOD. On executing tar -xvf <submission>.tar.gz, it should create a directory with an
identical name (minus the tar extension). Inside the directory should be one file – match submissions.hpp.

6.2 Phase two

You will have to submit a file <rollno1> <rollno2>[ <rollno3>] CS293 phase2.tar.gz in moodle, which on
opening creates a directory with identical name, which should contain two files – plagiarism checker.hpp
and plagiarism checker.cpp. The deadline for this phase is November 25th , Monday EOD.

6.3 Phase three

The source codes required for phase three will be released by November 7th , Wednesday EOD. As mentioned
earlier, a Google form will be shared for the submissions. You are to submit exactly one hack per form entry.
There will be an option to submit the form multiple times. Any team member can submit the form. The form
will contain all fields which you are required to submit (including 2 file upload fields). The deadline for this
phase is November 27th , Wednesday EOD.

6.4 Late Submission Policy

Requests for deadline extensions will not be entertained. For each phase, there will be a three(3)-day late
submission window. If you submit upto one day after the deadline for any phase, your score will be 80 % of
your otherwise score. This gets reduced to 70 % and 60 % for submissions upto two days and upto three days
late, respectively. After those three days, submissions will not be accepted.

6.5 Heads up!

Since the project is on plagiarism checkers, it is reasonable to expect that we run your code through strict
plagiarism checks. Should you choose to submit your work, it should be original. If any piece of code is lifted
from the web, a comment should be added near that block as a citation. Grading will be done appropriately.
Whatever happens, do not copy code from other teams. Remember, you are better off submitting buggy code
(manual grading marks) or even blank submission (zero marks) than copying from others. This is because in
cases you plagiarise while building the checker, you will be sent straight to the D-ADAC.
Just like in any project, remember to keep your code base secret since if another team copies your project, you
too get flagged; you definitely do not want to take risks when it comes to matters of academic integrity. If
you must use a GitHub repository, ensure that it is a hidden repo.
Any queries or clarifications should be posted on the public Piazza threads created for the project. We will
not be answering any queries via other mediums. If you have some very specific queries, you can post them in
a private Piazza thread visible to all instructors.

All the best!

Page 7

ICT159 - Assignment 2
No ratings yet
ICT159 - Assignment 2
22 pages
CP 97-1-2002 (2015) - Preview
No ratings yet
CP 97-1-2002 (2015) - Preview
11 pages
CS218-Data Structures Final Exam
100% (2)
CS218-Data Structures Final Exam
7 pages
Factory Automation Trainer
No ratings yet
Factory Automation Trainer
3 pages
Plagiarism Detection Software
No ratings yet
Plagiarism Detection Software
32 pages
Semester Final Project Report
No ratings yet
Semester Final Project Report
11 pages
Detecting Source Code Plagiarism With CodeMatch
No ratings yet
Detecting Source Code Plagiarism With CodeMatch
25 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Assignment 5 (Given: Dec 20, Due: Jan 8) - No Extensions
No ratings yet
Assignment 5 (Given: Dec 20, Due: Jan 8) - No Extensions
1 page
MIPS Assembly Mini Projects 20202
No ratings yet
MIPS Assembly Mini Projects 20202
7 pages
A2 - COS1512 - Assignment 2 - 2025
No ratings yet
A2 - COS1512 - Assignment 2 - 2025
2 pages
Advanced Plagiarism Detection System
No ratings yet
Advanced Plagiarism Detection System
12 pages
Metrics Based Plagiarism Monitoring PDF
No ratings yet
Metrics Based Plagiarism Monitoring PDF
8 pages
Plagiarism and Its Detection in Programming Languages
No ratings yet
Plagiarism and Its Detection in Programming Languages
8 pages
PWPReport G7
No ratings yet
PWPReport G7
7 pages
STL Features and Implementation Techniques - Stephan T. Lavavej - CppCon 2014
No ratings yet
STL Features and Implementation Techniques - Stephan T. Lavavej - CppCon 2014
47 pages
Zoho Interview Qustions
50% (2)
Zoho Interview Qustions
4 pages
Short Report
No ratings yet
Short Report
2 pages
Plagiarismchecker
No ratings yet
Plagiarismchecker
8 pages
CYBERSECURITY FOR BEGINNERS: A Practical Guide to Protecting Your Online Identity and Data (2024 Crash Course)
From Everand
CYBERSECURITY FOR BEGINNERS: A Practical Guide to Protecting Your Online Identity and Data (2024 Crash Course)
ESMOND FERGUSON
No ratings yet
CS593: Data Structure and Database Lab Take Home Assignment - 4 (10 Questions, 100 Points)
No ratings yet
CS593: Data Structure and Database Lab Take Home Assignment - 4 (10 Questions, 100 Points)
3 pages
Assignment 3:: Due 8am On Mon, Oct 14, 2024
No ratings yet
Assignment 3:: Due 8am On Mon, Oct 14, 2024
10 pages
Assignment5-NIT CALICUT DSA
No ratings yet
Assignment5-NIT CALICUT DSA
4 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Design and Development of Plagiarism Detection Software in C
No ratings yet
Design and Development of Plagiarism Detection Software in C
3 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Amazon Interview Questions
No ratings yet
Amazon Interview Questions
2 pages
Questions: Total Points: 100
No ratings yet
Questions: Total Points: 100
3 pages
CS112 Assignment-2 (G9-G12)
No ratings yet
CS112 Assignment-2 (G9-G12)
4 pages
The Ascetic Programmer
From Everand
The Ascetic Programmer
Antonio Piccolboni
5/5 (1)
Structures
No ratings yet
Structures
5 pages
Acpc 2013
No ratings yet
Acpc 2013
19 pages
ENGG1801 Project: Instructions
No ratings yet
ENGG1801 Project: Instructions
9 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
CS 1332 - 2018Fall - Practice Exam 3
No ratings yet
CS 1332 - 2018Fall - Practice Exam 3
10 pages
Assignment 0
No ratings yet
Assignment 0
4 pages
Practice Problems For Exam 3: We've Compiled Some Exercises Before The Third Exam
No ratings yet
Practice Problems For Exam 3: We've Compiled Some Exercises Before The Third Exam
6 pages
15 122 hw2
No ratings yet
15 122 hw2
10 pages
Assignment2 4
No ratings yet
Assignment2 4
1 page
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Ijarcce 2022 114158
No ratings yet
Ijarcce 2022 114158
6 pages
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
Assignment 5:: Due at 8am On Monday, Nov 18, 2024
No ratings yet
Assignment 5:: Due at 8am On Monday, Nov 18, 2024
5 pages
2021-22 Sem 2 ENGG1340-COMP2113-intro-rbluo (Revision 2)
No ratings yet
2021-22 Sem 2 ENGG1340-COMP2113-intro-rbluo (Revision 2)
28 pages
Detecting Plagiarism in Java Code
No ratings yet
Detecting Plagiarism in Java Code
52 pages
Quiz 1 - Data Structures & Algorithms (EL3101)
No ratings yet
Quiz 1 - Data Structures & Algorithms (EL3101)
9 pages
Proposal - Plagiarism Detection in Text-Based Assignments Using Natural Language Processing Technique
No ratings yet
Proposal - Plagiarism Detection in Text-Based Assignments Using Natural Language Processing Technique
11 pages
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
No ratings yet
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
3 pages
PWP Proposal G 7
No ratings yet
PWP Proposal G 7
4 pages
Assignment3-NIT CALICUT DSA
No ratings yet
Assignment3-NIT CALICUT DSA
11 pages
Test 2 Spring 01
No ratings yet
Test 2 Spring 01
11 pages
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
Sample Paper DS Final+Solution
No ratings yet
Sample Paper DS Final+Solution
21 pages
Basleal Liabrary
No ratings yet
Basleal Liabrary
9 pages
COS110 2024 Bonus Practical
No ratings yet
COS110 2024 Bonus Practical
12 pages
A1 Col761
No ratings yet
A1 Col761
4 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Coursework PDF
No ratings yet
Coursework PDF
6 pages
Assignment 02
No ratings yet
Assignment 02
7 pages
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
Lab5 - Arduino Board
No ratings yet
Lab5 - Arduino Board
11 pages
Bash Profile
No ratings yet
Bash Profile
4 pages
Questionnaire For Scam Among Student
0% (1)
Questionnaire For Scam Among Student
2 pages
Admitcard PDF
No ratings yet
Admitcard PDF
1 page
Ws2000 Cli Guide
No ratings yet
Ws2000 Cli Guide
462 pages
Modul eXeLearning 2 PDF
No ratings yet
Modul eXeLearning 2 PDF
75 pages
Debugging The Development Process
No ratings yet
Debugging The Development Process
214 pages
B205/B209/D007/D008 Parts Catalog
No ratings yet
B205/B209/D007/D008 Parts Catalog
138 pages
EEE 332 - Feedback in Amplifiers
100% (1)
EEE 332 - Feedback in Amplifiers
20 pages
White Paper To Mok - FINAL-1
67% (3)
White Paper To Mok - FINAL-1
26 pages
SanDisk Product Catalog
No ratings yet
SanDisk Product Catalog
8 pages
ISI+L1 +Most+Important+questions+
No ratings yet
ISI+L1 +Most+Important+questions+
50 pages
Ruggedcom Products at A Glance
100% (1)
Ruggedcom Products at A Glance
11 pages
Resume - Jlk10a
No ratings yet
Resume - Jlk10a
5 pages
User Manual: For Model Rp208Cn
No ratings yet
User Manual: For Model Rp208Cn
20 pages
SAP SD - SAP ERP Central Component 6.0, Enhancement Package 6 Order Fulfillment I - TSCM60 - 10 Days Instructor-Led Classroom Training
No ratings yet
SAP SD - SAP ERP Central Component 6.0, Enhancement Package 6 Order Fulfillment I - TSCM60 - 10 Days Instructor-Led Classroom Training
4 pages
Juniper Commands v4 CLI
No ratings yet
Juniper Commands v4 CLI
2 pages
Input/Output Configuration Program User's Guide For ICP Iocp
No ratings yet
Input/Output Configuration Program User's Guide For ICP Iocp
338 pages
Company Profile Techmind
No ratings yet
Company Profile Techmind
9 pages
Pallet Strategy-Putaway
No ratings yet
Pallet Strategy-Putaway
9 pages
PIC16F627
No ratings yet
PIC16F627
6 pages
Cdi 402 Final
No ratings yet
Cdi 402 Final
59 pages
MAC Address Finding Study Among Resident
No ratings yet
MAC Address Finding Study Among Resident
6 pages
Introduction - The CitectSCADA Environment
No ratings yet
Introduction - The CitectSCADA Environment
10 pages
Controller - FHC Series (PRICE)
No ratings yet
Controller - FHC Series (PRICE)
36 pages
15 ICSSC Paper Concept Andevaluation of 5G Backhaul Via Starlink
No ratings yet
15 ICSSC Paper Concept Andevaluation of 5G Backhaul Via Starlink
8 pages
Python Programming (R19) - UNIT-1
No ratings yet
Python Programming (R19) - UNIT-1
42 pages
06 Lecture Control Flow Statements Conditional Statements (If, If-Else, Elif) - 2
No ratings yet
06 Lecture Control Flow Statements Conditional Statements (If, If-Else, Elif) - 2
12 pages

Problem Statement

Uploaded by

Problem Statement

Uploaded by

Plagiarism Checker

CS293: Data Structures and Algorithms Lab Project

October - November 2024

1.1 Types of Plagiarism

1.2 Project Structure

2 Phase One: Barebones Checking of Two Submissions

2.1 What kind of matching patterns should you detect?

2.2 Alright, what should be reported?

2.3 Your task

3 Phase Two: A Full-Fledged Plagiarism Checker

3.1 The Input and Output API

3.2 Your class plagiarism checker t

3.3 Tips regarding the asynchronous part

3.4 How do you check a submission for plagiarism?

3.5 Your task

4 Phase Three: Gotta Hack ’em all!

4.1 What counts as a hack?

4.2 What do we expect you to submit?

4.3 Scoring for Phase three

5.1 Getting the libraries to work

5.2 What’s allowed and what’s not?

6.2 Phase two

6.3 Phase three

6.4 Late Submission Policy

6.5 Heads up!

All the best!

You might also like