Plagiarism Check
Plagiarism Check
Report submitted by
ANSHAY GUPTA
Department of Computer Science and Engineering
H M R Institute of Management and Technology
New Delhi-110036
(Roll No.: 35113307220)
SEPTEMBER 2022
Contents
Abstract iii
Certificate iv
Acknowledgements v
1 Introductions 1
1.1 Why is it called plagiarism? . . . . . . . . . . . . . . . . . . . 1
1.2 Why Plagiarism Detection is Important? . . . . . . . . . . . . 1
1.3 Why is it wrong? . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Types of plagiarism . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 How one can avoid plagiarism? . . . . . . . . . . . . . . . . . 2
1.6 What are the plagiarism detection tools . . . . . . . . . . . . . 3
2 Technique/Method/Tool Developed 4
2.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . 4
2.2 Software Requirements . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Method for Python Installation . . . . . . . . . . . . . . . . . 6
2.4 Project Implementation . . . . . . . . . . . . . . . . . . . . . 7
ii
Abstract
Plagiarism became a serious issue nowadays due to the presence of vast re-
sources easily available on the web, which makes developing plagiarism de-
tection tools a useful and challenging task due to the scalability issues.
This project is implementing a Plagiarism Detection Engine. The main
parts of the projects are:
Here we use the python and flask software to implement the project
report.
iii
Certificate
I declare that the project work reported in this thesis entitled Plagiarism
Detection for the partial fulfillment of the degree of Bachlor of Technol-
ogy, CSE Branch has been carried out by me under the supervision of Mr.
Deepak Kumar Verma, Scientist D, Defence Research and Development
Organisation, New Delhi, India.
The internship work embodied in this thesis, except where otherwise indi-
cated, is my original work. This thesis has not been submitted by me earlier
in part or full to any other University or Institute for the award of any degree
or diploma. This thesis does not contain other person’s data, graphs or other
information, unless specifically acknowledged.
iv
Acknowledgements
Thanks are also due to Mr. Ajay Kumar, Scientist E, DRDO for his
valuable advise and help throughout the preperation of this project report.
v
About DRDO and Labs
DRDO is one of the prestigious organizations of the country in the field of Sci-
ence and Technology, which could transform our country’s Defence force into
one of the most modern and powerful force in the world. It was established
by merging together the Scientific and Technical Development Establishment
under three services headquarters in 1958, with the aim of creating an orga-
nization that can take up the challenges of developing and delivering the high
technology in the field of modern warfare, weapon system, avionics and other
scientific aspects of nation’s defence. It has also got mandate to modernize
defence Technology.
Vision
Make India prosperous by establishing world-class science and technology
base and provide our Defence Services decisive edge by equipping them with
internationally competitive systems and solutions.
Mission
vi
vii
Core Competence
Plagiarism means copying the information from the other persons writing,
communication, ideas, thoughts, etc. This includes copying information from
websites, books, songs, television shows, email messages, interviews, articles,
artworks, and other mediums [4]. Whenever you copy the information from
another person’s works, it should be quoted and cited internally from where
it is taken out. It should be appropriately mentioned in citations and in
reference to avoid plagiarism. Ethical problems in academic research was
discussed in [6], the self-plagiarism issues were discussed in [5].
1
2 1. Introductions
and projects. This is because a lot of resources can be found on the internet.
It is so easy for them to use one of the search engines to search for any topic
and to cheat from it without citing the owner of the document. So it is bet-
ter and must all academic fields they should have to use plagiarism detection
soft-wares to stop or eliminate students cheating, copying, and modifying
documents when they know that they will be found.
Plagiarism may result in receiving a failing grade or zero for the as-
signment. Plagiarism could result in a disciplinary referral. Students
caught plagiarizing may be denied admittance to or removal from the
National Honor Society.
2. Unicheck If you are looking for a solid paid option, then Unicheck
could be the right tool for you. The interface is sleek, and it checks the pages
really fast. This is a perfect tool for corporates and professors who don’t
mind paying a little for higher accuracy.
Here we indicate the techniques, methods and tools, which we used to im-
plement this project report. The following minimum hardware and software
requirements we need:
1. Flask
Flask has a wide range of code libraries and extensions that trans-
form the web framework from a microframework into a full-featured
web application creation tool.
4
2.2 Software Requirements 5
2. Pandas
Pandas is defined as an open-source library that provides high-
performance data manipulation in Python. The name Pandas
is derived from the word Panel Data, which means Econometrics
from Multidimensional data. It is used for data analysis in Python
and was developed by Wes McKinney in 2008.
3. Numpy
NumPy is a Python library used for working with arrays. It also
has functions for working in the domain of linear algebra, Fourier
transform, and matrices. NumPy was created in 2005 by Travis
Oliphant. It is an open-source project, and you can use it freely.
4. Scikit-learn
Scikit-learn is probably the most helpful library for machine learn-
ing in Python. The sklearn library contains a lot of efficient tools
for machine learning and statistical modeling, including classifica-
tion, regression, clustering, and dimensionality reduction.
5. Urllib3
urllib3 is a powerful, user-friendly HTTP client for Python. Much
of the Python ecosystem already uses urllib3. Urllib3 has many
critical features missing from the Python standard libraries: Thread
safety. Connection pooling.
6. mysql-connector-python
MySQL provides standards-based drivers for JDBC, ODBC, and
.Net enabling developers to build database applications in their
language of choice. In addition, a native C library allows develop-
ers to embed MySQL directly into their applications. Developed
by MySQL. ADO.NET Driver for MySQL (Connector/NET).
7. matplot
Matplotlib is a python library used to create 2D graphs and plots
by using python scripts. It has a module named pyplot, which
makes things easy for plotting by providing a feature to control
line styles, font properties, formatting axes, etc.
8. cosine-similarity
The cosine similarity measures the similarity between vector lists
by calculating the cosine angle between the two vector lists. Con-
sidering the cosine function, its value at 0 degrees is 1 and -1 at 180
degrees. This means that for two overlapping vectors, the cosine
6 2. Technique/Method/Tool Developed
This Project is designed for similarity check that need to manage results
across multiple branches and students that need to track, manage and re-
port/thesis/paper results. The main advantage of this project report is that
one can run this on any kind of operating system. At a time, we can see all
the years result in a single sheet and we can see the individual candidate’s
results separately. The Project can read the Result in the brouser itself and
generate the report. Report generated provides the percentage of similarity.
11
Chapter 4
Conclusion and Future Scope
12
Bibliography
[1] Asim M. El Tahir Ali, Hussam M. Dahwa Abdulla, and Václav Snásel
(2011). Overview and Comparison of Plagiarism Detection Tools, 161-
172.
[6] Swazey, J.P., Anderson, M.S., Lewis, K.S. (1993). Ethical problems in
academic research. American Scientist, 81, 542-553.
13