0% found this document useful (0 votes)

20 views8 pages

A System For Detection of Plagiarism of Ideas Based On Deep Learning Algorithm

This paper presents a deep learning-based system for detecting plagiarism of ideas, which is particularly challenging due to its complex nature involving paraphrasing and semantic changes. The proposed system consists of a learning phase using deep learning algorithms to represent documents as vectors, followed by a detection phase that identifies similarities between documents. The authors highlight the limitations of traditional plagiarism detection methods and emphasize the advantages of their approach in improving accuracy and efficiency.

Uploaded by

ali99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views8 pages

A System For Detection of Plagiarism of Ideas Based On Deep Learning Algorithm

Uploaded by

ali99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A System for Detection of Plagiarism of Ideas Based on Deep Learning

Algorithm
El Mostafa Hambi *, Faouzia Benabbou *, Nadia Bouhriz *
Information Technology and Modeling Laboratory
Science Faculty Ben M’sik, Casablanca, Morocco

[email protected], [email protected],
[email protected]

Abstract. The evolution of the availability of online documents and ease of retrieve these documents has caused to a
big plagiarism problem. They are many technics of plagiarism detection but the plagiarism of ideas steels the most
difficult to detect. In this regard, some methods have been proposed to perform the task of plagiarism detection of ideas,
but there are still many challenges to overcome. In this paper, we propose a system of plagiarism detection of ideas
based on Deep Learning Algorithms. The proposed approach deals with some problems encountered in detecting the
plagiarism of ideas such as: the problem of loss of meaning or the difficulty of detection of similarity between
documents.
Our system consists of two parts: a part of learning (deep learning) and a plagiarism detection part.

Keywords: Plagiarism, Deep Learning, Preprocessing, Doc2vev, neural network,

1 Introduction
The development of information technology (IT) and especially the Internet has considerably increased
the availability of information and leads consequently to the rising of plagiarism. The plagiarism of
idea is more fundamental and usually includes paraphrase as well as semantic and vocabulary changes.
This kind is the most difficult type of plagiarism to detect. The not avowed use of the original works is
considered as a plagiarism case and one of the greatest problems of scientific publications throughout
history; that's why the automatic identification methods for the detection of plagiarism have been
developed for acting as a possible countermeasure [1, 2, 3].
The plagiarism detection is considered as a branch of Natural Language Processing. The aim is to find
the most representative words or concepts of a text or a document. Traditional Natural Language
Processing (NLP) approaches often use a list of the words for detecting the similarity. In such methods,
the similarity between the synonym words is not taken into account, then if we transform these words
into concepts to have a semantic representation using WordNet, there will be another problem of the
ambiguity that will be triggered, so we can lose the sense of the sentences treated. Nowadays, the deep
learning technics constitute the best solution to overcome these problems. Deep Learning (DL) is an
important component of computational intelligence which has the core domain machine learning
research in it. It provides more efficient algorithms to deal with large-scale data in neuroscience,
computer vision, speech recognition, language processing, biomedical informatics, recommender
systems, learning theory, robotics, games, and so on [6, 7].
The essential goal of deep learning is to improve the processing, and pre-processing methods of NLP in
an automatic, efficient, and fast way. In text mining applications, deep learning methods represent words
as a vector of numerical values. This new representation contains a major part of syntactic as well as
semantic rules of the text data. In applications such as similarity detection and text classification, much
larger units such as phrases, sentences and documents should be described as a vector. Vectorized
representation of text data makes it easy to compare words and sentences as well as minimizing the need
to use lexicons [2].
The remainder of this paper is organized as follows. The first section presents definition of plagiarism.
The second section defines related work. The Third and fourth Sections are devoted to illustrate deep
learning and the approach of using it in NLP applications. The fifth section defines an overall illustration
of our approach; the last section introduces our future work to be carried out in the conclusion.

2 Definition of Plagiarism
The Plagiarism is an attempt to use the other's idea and present it as a personal work, which is considered
both illegal and immoral. Plagiarism is being done in various ways. Among them we mention the types
below [1, 4]:

 Copy-paste, textually (word by word) in which the content of the text is copied from one or
more sources. Copied content could be slightly modified.
 Paraphrasing, to change the grammar, to use the synonyms of words, to reorganize the
sentences of the original work and finally to delete some parts of the text.
 The use of false references, the addition of references which are false or that do not even exist.
 Plagiarism with translation, the contents are translated and used without reference to the
original work.
 Plagiarism of ideas, it is the most difficult plagiarism to detect because it is more evolved than
the previous types, also it is not simple manipulations made on the text, but a more advanced
form. This type of plagiarism consists in using the concepts and ideas of others with a
reformulation of sentences.

3 Related Works
Many plagiarism detection methods are available. Some of the plagiarism detection methods incorporate
natural language processing techniques for plagiarism detection. These NLP techniques are applied to
process the set of documents and also analysis the structure of the document. The similarity between
two documents which address different kind of plagiarism [5]:

Lexical methods: These methods consider text as a sequence of characters or terms [14]. The processing
technique that this approach relies upon includes tokenization, lowercasing, punctuation removal and
stemming [15]. In this method, the assumption is that the more terms both documents have in common,
the more similar they are. Methods that use features such as longest common subsequence, n-grams and
fingerprint are considered as this kind of methods. The comparison units adopted for detecting
plagiarism differ from one technique to another, the different such units include words, sentences,
passages, human deﬁned sliding window or an n-gram. The summary of the work that have been done
using these techniques include [16], [17], [18], [19], [20], [21], [22].

These methods usually end up with a great outcome when the words are not changed by their synonyms.

Syntactical methods: Some methods use text’s syntactical units for comparing the similarity between
documents. This is a realization of the intuition that similar documents would have similar syntactical
structure. This method makes use of characteristics such as POS tag to compare the similarity between
different documents [25] used a low-level syntactic structure to show linguistic similarities along with
the similarity measure

The disadvantage with this approach is that it can’t give us the perfect result because two documents
share the same syntax they can be non-plagiarized.

Semantic methods: These methods use semantic similarity for comparing documents. Methods that use
synonyms, antonyms, hypernyms, and hyponyms are placed in this category [4].

In this approach different semantic features which include (Synonyms, hyponyms, hypernyms, semantic
dependencies) [5] are extracted from the source documents and then these features are used to trace out
the plagiarism case from the corpus and the fact database build-up of already existing documents.

The semantic approach is aimed to attain the high performance in terms of detection and should address
the issues of polysemy and synonymy (different words referring to the same things like car and
automobile) that are not handled by the lexical (straight forward term matching) approach. Lin et al.
[23] has explored semantic similarity using lexical databases such as Stanford Wordnet5 to acquire
synonyms. Other algorithms that can be used to extract the semantic features of sentences are Latent
Dirichelet Allocation [24].

This approach has another problem of the ambiguity that will be triggered, so we can lose the sense of
the sentences treated.

The above methods cannot resolve the problem of plagiarism of ideas, indeed if the content of the
document has been modified by the addition of synonyms and also the application of paraphrasing.

4 Proposed approach
The figure below represents the overall workflow of the proposed system. The approach is implemented
in two main phases, phase of deep learning and another phase that is based on this learning to detect
similarity.

The first part consists of preparing our corpus of documents which contains the source documents and
the plagiarized documents corresponding to the source document. We can give it the following name:
Learning Corpus.

Indeed, for the preparation stage, it will be used for the construction of our learning system, each
document will be transformed to a list of vectors using the doc2vec principle.

Then this learning system is a supervised neural network which contains the input data corresponding
to the vectors of the source documents and also the output data correspond to the vectors of the
plagiarized documents of our Learning Corpus.

According to this learning phase we go directly to the plagiarism detection phase. Firstly, we must have
the document to be analyzed and a corpus of documents in which we carry out our research. So, we can
also give at this corpus a specific name like corpus of source documents.

The learning system is used at this level to detect whether this document is plagiarized or not.

Finally, if a type of plagiarism is detected, we will add this plagiarized document to the corpus of
plagiarized documents and the source document to the corpus of source documents.

Document 1 Pre-Processsing &

Vector Deep
Representation learning
Detect if this
system
Document 2 document is
plagiarized
Detect important
sentences

Corpus of source
documents
Figure 1 : Global architecture
The figure above gives an overall view of our approach, our system takes as two documents entered, the
first is the document to be processed and the second is a document from the corpus of source document.
And then these two documents should go through a preprocessing phase and later these two documents
will be represented in the form of a list of vectors that will subsequently be input objects to our deep
learning system, this system will detect later if these two documents are similar or not. We will more
detail this step in the next section.

5 Details of our approach

In the following sub-sections, we present some details of the main methods used for detecting
plagiarism.

5-1 Document representation phase

The Figure bellow resume all of steps using in the pre-processing phase Applied for each document
processed: the document to be analyzed, the learning corpus documents or the documents of the corpus
of source documents.

Learning Corpus Suspicious Document Source Document

Vector Representation
Sentence Segmentation and Tokenisation
Pre-Processsing &
Document representation phase

Lemmatization

Construct sentence Vector using doc2vec

Detect important
sentences

Term Frequency-Inverse Sentence Frequency

Representation

SentanceVec1 SentanceVec1’

SentanceVec2 SentanceVec2’

SentanceVecN SentanceVecN’
Figure 2 : Pre-Processsing phase

A- Pre-Processing & Vector Representation

The initial module is a pretreatment of the dataset of the source and suspicious documents. This includes
sentence segmentation, tokenization, lemmatization and vector construction.
Sentence Segmentation and Tokenization: Each document is represented as a set of sentences. The
tokenization of each source and suspicious sentence is then made.
Lemmatization: convert words into their basic dictionary forms for easy comparisons.
Construct sentence Vector using doc2vec: After the Word2Vec model has proved effective and useful,
so we can easily group and find similar words in a huge corpus, people then began to think further: is it
possible to have a higher level of representation? Sentences, paragraphs or even documents. To do this,
we chose to work with the dm (distributed memory) model.
We treat the paragraph as an additional word. Then, it is concatenated / averaged with local context
word vectors during the prediction.

Figure 3: doc2vec principle

In other words, we treat each document as an additional word; Document ID / Paragraph ID is

represented as a single vector; documents are also embedded in the continuous vector space.

B- Detect important sentences

The frequency of inverse frequency document TFIDF, is a numerical is a numerical statistic intended to
reflect the importance of a sentence for a document in a collection or a corpus.
Finally, the weight is obtained by multiplying the two measures:

𝒕𝒇𝒊𝒅𝒇𝒊,𝒋 = 𝒕𝒇𝒊,𝒋 . 𝒊𝒅𝒇𝒊 (1)

𝒏𝟏,𝟏
𝒕𝒇𝟏,𝟏 = ∑ (2)
𝒌 𝒏𝒌,𝟏

𝒏𝟏,𝟏 is the number of vectors of the closest sentences in a document.

𝒏𝒌,𝟏 is the total number of sentences in a document.

|𝑫|
𝒊𝒅𝒇𝟏 =
|{𝒅𝒋 : 𝒕𝟏 ∈𝒅𝒋 }|
(3)
𝑫 is the number of documents.
𝒅𝒋 is the number of documents that contains vectors closest to the processed vector.
For each document we will calculate the weight of each of its sentences and then we will choose the
most important sentences for each document.
Once the documents are pretreated and represented as its most important vector phrases. We go directly
to the phase of the construction of our learning system by then using the representative vectors of each
document of the source corpus and the corpus which contains the plagiarized documents corresponding
to the first corpus source.

5-2 Deep Learning phase

Deep Learning is a method of Machine Learning that teaches computers what humans are naturally
capable. It teaches a computer model how to perform classification tasks directly from images, text or
audio. Deep Learning models can achieve an exceptional level of accuracy, sometimes superior to
human performance. Models are trained through a large set of labeled data and neural network
architectures that contain many layers.
So we will be inspired by the principle used in word2vec for the vector representation of a word that is
based on the neighbors of a word to generate its vector.
More precisely, we will make our system learn the types of plagiarism existing at the level of detection,
that is to say, we will build a supervised neural network with an input that contains the source documents
and an output that contains the plagiarized documents.
To do this in a first place, a matrix W will be initialized to small random values and then this W will be
adjusted automatically during the iterations performed at the learning phases of a neural network, so this
W will serve us to detect the similarity between the documents.
Here is a figure that illustrates our learning system:

Figure 3 : our apprenticeship system

5-3 Detection of plagiarism

Concerning the plagiarism detection phase, first of all, the documents must be involved in the pre-
processing and construction phases of the sentence vectors described in the previous chapters, and then
we will use the matrix built on the level of our learning system above. In the case where we could detect
a similarity, so we should refresh our learning system by adding these two documents by building a new
learning matrix W.

6 Conclusion
In this paper, we used deep representation of words for plagiarism detection task. Sentence-by-sentence
comparison is used to find text similarities by a construction of a deep learning system which serve us
to save each kind of plagiarism detected. Advantages of this method among others are its simplicity and
its fast sentence comparison. Concerning the future work consists of putting into practice this method
and comparing it with the other methods used at the level of the phase related word.

References:
1. Tuomo Kakkonen, Maxim Mozgovoy. Hermetic and Web Plagiarism Detection Systems for Student Essaysan
Evaluation Of The State-Of-The-Art. Journal of Educational Computing Research, v42 n2 p135-159 2010.
University of Joensuu, Finland, University of Aizu, Japan [en ligne] 2010.
2. Bela Gipp State-of-the-art in detecting academic plagiarism. International Journal for Educational Integrity.
University of California, Berkeley and University of Magdeburg, Department of Computer Science.
3. Maurer, H. and Zaka, B., 2007. Plagiarism–a problem and how to fight it. Proceeding of Ed-Media 2007, 4451-
4458.
4. Ahmed Jabr Ahmed Muftah. Document Plagiarism Detection Algorithm Using Semantic Networks. A project
report submitted in partial fulfillment of the requirements for the award of the degree of Master of Science
(Computer Science). Faculty of Computer Science and Information Systems University Technology Malaysia
(2009).
5. Erfaneh Gharavi, Kayvan Bijari et Kiarash ZahirniaA Deep Learning Approach to Persian Plagiarism Detection.
Journal of Machine Learning Research (2011).
6. Collobert, R. and Weston, J. A unified architecture for natural language processing: Deep neural networks with
multitask learning. In Proceedings of the 25th international conference on Machine learning ACM, 160-167 (2008).
7. Chong, M.Y.M.,. A study on plagiarism detection and plagiarism direction identification using natural language
processing techniques (2013).
8. Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, Kilian Q. Weinberger. From Word Embeddings To Document
Distances? Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63130, 2016.
9. Aristomenis Thanopoulos, Nikos Fakotakis and George kokkinakis.Tokenization for Knowledge-free Automatic
Extraction of Lexical Similarities. TALN 2003, Batz-sur-Mer, 11-14 juin 2003. Electrical and Computer
Engineering Department, University of Patras 26500, Rion, Greece(2003).
10. Hendrik Heuer. Text comparison using word vector representations and dimensionality reduction. Proceeding of
8th Python in Science Conference - Austin, Texas (August 18 - 23, 2009).
11. Richard Socher, Cliff Chiung-Yu Lin, Andrew Y. Ng, Christopher D. Manning. Parsing Natural Scenes and Natural
Language with Recursive Neural Networks. Computer Science Department, Stanford University, Stanford, CA
94305, USA, 2015.
12. Mikolov, T., Chen, K., Corrado, G., and Dean, J., 2013. Efficient estimation of word representations in vector
space. arXiv preprint arXiv:1301.3781.
13. Torres, S. and Gelbukh, A., Comparing similarity measures for original WSD lesk algorithm. Research in
Computing Science 43, 155-166 (2009).
14. S. M. Alzahrani, N. Salim, and A. Abraham, “Understanding plagiarism linguistic patterns, textual features, and
detection methods,” Trans. Sys.Man Cyber Part C, vol. 42, no. 2, pp. 133–149, Mar. 2012. [Online]. Available:
http://dx.doi.org/10.1109/TSMCC.2011.2134847
15. M. Chong and L. Specia, “Lexical generalisation for word-level matching in plagiarism detection,” in RANLP,
2011, pp. 704–709.
16. S. Brin, J. Davis, and H. Garcia-Molina, “Copy detection mechanisms for digital documents,” in SIGMOD
Conference, 1995, pp. 398–409.
17. D. R. White and M. Joy, “Sentence-based natural language plagiarism detection,” ACM Journal of Educational
Resources in Computing, vol. 4, no. 4, pp. 1–20, 2004.
18. S. Niezgoda and T. P. Way, “Snitch: a software tool for detecting cut and paste plagiarism,” in SIGCSE, 2006, pp.
51–55.
19. A. Barr´ on-Cedeno and P. Rosso, “On automatic plagiarism detection based on n-grams comparison,” in ECIR,
2009, pp. 696–700.
20. M. S. Pera and Y.-K. Ng, “A naıve bayes classifier for web document summaries created by using word similarity
and significant factors,” International Journal on Artificial Intelligence Tools, vol. 19, no. 4, pp. 465–486, 2010.
21. E. Stamatatos, “Plagiarism detection using stopword n-grams,” JASIST, vol. 62, no. 12, pp. 2512–2527, 2011.
22. J. Grman and R. Ravas, “Improved implementation for finding text similarities in large sets of data - notebook for
pan at clef 2011,” in CLEF (Notebook Papers/Labs/Workshop), 2011.
23. H.-H. Chen, M.-S. Lin, and Y.-C. Wei, “Novel association measures using web search with double checking,” in
ACL, 2006. [15] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of Machine Learning
Research, vol. 3, pp. 993–1022, 2003.
24. G. Tsatsaronis, I. Varlamis, and M. Vazirgiannis, “Text relatedness based on a word thesaurus,” J. Artif. Intell.
Res. (JAIR), vol. 37, pp. 1–39, 2010.
25. Uzuner, O., and Katz, B., and Nahnsen, T.: Using Syntactic Information to Identify Plagiarism. In: 2nd Workshop
on Building Educational Applications using NLP (2005)

Efficient Cross-Lingual Plagiarism Detection Using Bidirectional and Auto-Regressive Transformers
No ratings yet
Efficient Cross-Lingual Plagiarism Detection Using Bidirectional and Auto-Regressive Transformers
11 pages
Grammar Tests
No ratings yet
Grammar Tests
77 pages
English 1: Quarter 3: Poetry Module 1: Features, Bases, Sources and Topic Structures of Ancient Filipino Poetry
No ratings yet
English 1: Quarter 3: Poetry Module 1: Features, Bases, Sources and Topic Structures of Ancient Filipino Poetry
22 pages
Antiplagiarism Sftwares
No ratings yet
Antiplagiarism Sftwares
21 pages
6014
No ratings yet
6014
36 pages
My Projeact
No ratings yet
My Projeact
21 pages
1484-Article Text-5779-2-10-20210426
No ratings yet
1484-Article Text-5779-2-10-20210426
13 pages
A Deep Learning Based Technique For Plagiarism Detection: A Comparative Study
No ratings yet
A Deep Learning Based Technique For Plagiarism Detection: A Comparative Study
10 pages
Overview and Comparison of Plagiarism Detection Tools Overview and Comparison of Plagiarism Detection Tools
No ratings yet
Overview and Comparison of Plagiarism Detection Tools Overview and Comparison of Plagiarism Detection Tools
12 pages
Plagiarism Final
No ratings yet
Plagiarism Final
18 pages
Generative AI Report
No ratings yet
Generative AI Report
15 pages
My Project
No ratings yet
My Project
16 pages
Text Plagiarism Checker Using NLP: Presented by Under The Supervision of
No ratings yet
Text Plagiarism Checker Using NLP: Presented by Under The Supervision of
18 pages
Review1
No ratings yet
Review1
19 pages
13 - Plagiarism Detection
No ratings yet
13 - Plagiarism Detection
20 pages
Basawashreeeeeeeee
No ratings yet
Basawashreeeeeeeee
10 pages
Proposal - Plagiarism Detection in Text-Based Assignments Using Natural Language Processing Technique
No ratings yet
Proposal - Plagiarism Detection in Text-Based Assignments Using Natural Language Processing Technique
11 pages
Batch 20
No ratings yet
Batch 20
31 pages
Plagiarism Detection Techniques
No ratings yet
Plagiarism Detection Techniques
25 pages
Source Code Plagiarism
No ratings yet
Source Code Plagiarism
41 pages
Jhgythjgcn B
No ratings yet
Jhgythjgcn B
7 pages
Utilization of NLP Techniques in Plagiarism Detection System Through Semantic Analysis Using Word2Vec
No ratings yet
Utilization of NLP Techniques in Plagiarism Detection System Through Semantic Analysis Using Word2Vec
6 pages
Ijarcce 2024 134107
No ratings yet
Ijarcce 2024 134107
6 pages
Cppproject 5
No ratings yet
Cppproject 5
17 pages
A Survey of Numerous Text Similarity Approach
No ratings yet
A Survey of Numerous Text Similarity Approach
10 pages
A New Era of Plagiarism The Danger of Cheating Using AI
No ratings yet
A New Era of Plagiarism The Danger of Cheating Using AI
6 pages
Plagiarism Detection Framework Using Monte Carlo B
No ratings yet
Plagiarism Detection Framework Using Monte Carlo B
7 pages
Palagiarism Detection
No ratings yet
Palagiarism Detection
14 pages
Kurniawan 2018 IOP Conf. Ser.: Mater. Sci. Eng. 403 012074
No ratings yet
Kurniawan 2018 IOP Conf. Ser.: Mater. Sci. Eng. 403 012074
10 pages
A Two-Phase Plagiarism Detection System Based On Multi-Layer Long Short-Term Memory Networks
No ratings yet
A Two-Phase Plagiarism Detection System Based On Multi-Layer Long Short-Term Memory Networks
13 pages
A Comparison of Document Similarity Algorithms
No ratings yet
A Comparison of Document Similarity Algorithms
10 pages
Verb Tense Review
No ratings yet
Verb Tense Review
14 pages
Lawal O.O Et Al - Detecting Unacknowleged Plagiarism
No ratings yet
Lawal O.O Et Al - Detecting Unacknowleged Plagiarism
8 pages
Awe Emmanuel Project
No ratings yet
Awe Emmanuel Project
7 pages
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
No ratings yet
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
9 pages
Lesson Plan - Subject-Verb Agreement
No ratings yet
Lesson Plan - Subject-Verb Agreement
6 pages
Plagiarism Checker Amp Link Advisor Using Concepts of Levenshtein Distance Algorithm With Google Query Search - An Approach
No ratings yet
Plagiarism Checker Amp Link Advisor Using Concepts of Levenshtein Distance Algorithm With Google Query Search - An Approach
6 pages
Infinitive Past Participle Meaning
100% (1)
Infinitive Past Participle Meaning
4 pages
Plagiarism: Taxonomy, Tools and Detection Techniques
No ratings yet
Plagiarism: Taxonomy, Tools and Detection Techniques
17 pages
POSSESSIVE ADJECTIVES Secundaria
No ratings yet
POSSESSIVE ADJECTIVES Secundaria
6 pages
Plagiarism Detection Process Using Data Mining Techniques
No ratings yet
Plagiarism Detection Process Using Data Mining Techniques
8 pages
Semantic Exploration of Textual Analogies For Advanced Plagiarism Detection
No ratings yet
Semantic Exploration of Textual Analogies For Advanced Plagiarism Detection
4 pages
Plagiarism Detection Using Artificial in
No ratings yet
Plagiarism Detection Using Artificial in
4 pages
(IJCST-V8I4P13) :M. Chilakarao, K. Sri Sahitya, K. Hari Priya, N. Bala Manikanta, M. Deepika
No ratings yet
(IJCST-V8I4P13) :M. Chilakarao, K. Sri Sahitya, K. Hari Priya, N. Bala Manikanta, M. Deepika
8 pages
Research Paper1
No ratings yet
Research Paper1
5 pages
Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods
No ratings yet
Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods
17 pages
Plagiarism Detector NLP Theory
No ratings yet
Plagiarism Detector NLP Theory
2 pages
Plagiarism Detection Techniques
No ratings yet
Plagiarism Detection Techniques
20 pages
2018 Text Mining For Plagiarism Detection - Multivariate Pattern Detection For Recognition of Text Similarities PDF
No ratings yet
2018 Text Mining For Plagiarism Detection - Multivariate Pattern Detection For Recognition of Text Similarities PDF
8 pages
Plagiarism Detection Algorithm Using Natural Language Processing Based On Grammar Analyzing
No ratings yet
Plagiarism Detection Algorithm Using Natural Language Processing Based On Grammar Analyzing
13 pages
Present Simple Gru's Routine - TP 2
No ratings yet
Present Simple Gru's Routine - TP 2
2 pages
Ijresm V4 I4 34
No ratings yet
Ijresm V4 I4 34
3 pages
Ijarcce 2022 114158
No ratings yet
Ijarcce 2022 114158
6 pages
Writing A Research Proposal
No ratings yet
Writing A Research Proposal
8 pages
Gold PDF
100% (2)
Gold PDF
26 pages
IJRPR7794
No ratings yet
IJRPR7794
3 pages
Source Code Plagiarism Detection SCPDet A
No ratings yet
Source Code Plagiarism Detection SCPDet A
9 pages
Sheild Plagiarism Detection Improving Accuracy and Efficiency Enhancement in Text and Image Similarity Detection
No ratings yet
Sheild Plagiarism Detection Improving Accuracy and Efficiency Enhancement in Text and Image Similarity Detection
2 pages
Plagiarism Detection The Tool and The Case Study
No ratings yet
Plagiarism Detection The Tool and The Case Study
8 pages
1 Overview - of - Different - Plagiarism - Detecti
No ratings yet
1 Overview - of - Different - Plagiarism - Detecti
3 pages
Plagiarism Detection
No ratings yet
Plagiarism Detection
4 pages
Referat Plagiat 1
No ratings yet
Referat Plagiat 1
4 pages
Plagiarism and Detection Tools: An Overview
No ratings yet
Plagiarism and Detection Tools: An Overview
6 pages
Unit 7 - 66-72
No ratings yet
Unit 7 - 66-72
7 pages
Trắc NGhiệm Từ Vựng chủ Nghĩa Học
No ratings yet
Trắc NGhiệm Từ Vựng chủ Nghĩa Học
25 pages
Investigation of Indoor Environment Quality in Classroom - Case Study
No ratings yet
Investigation of Indoor Environment Quality in Classroom - Case Study
8 pages
Learning English Module 1 .Day 1 To 7 754
No ratings yet
Learning English Module 1 .Day 1 To 7 754
155 pages
Adjectives and Adverbs - Nada Kvacanovic
No ratings yet
Adjectives and Adverbs - Nada Kvacanovic
2 pages
Building and Environment
No ratings yet
Building and Environment
9 pages
The Subjunctives Bahasa Inggris
No ratings yet
The Subjunctives Bahasa Inggris
7 pages
Tense
No ratings yet
Tense
3 pages
English Fluency Plan
No ratings yet
English Fluency Plan
3 pages
Verbals Дистанційне з.ф.
No ratings yet
Verbals Дистанційне з.ф.
40 pages
General Introduction and Outline of This Thesis
No ratings yet
General Introduction and Outline of This Thesis
28 pages
Visit Ali Mohammed Saeedayoob Ali Mohammed Saeed
No ratings yet
Visit Ali Mohammed Saeedayoob Ali Mohammed Saeed
1 page
Reported Speech: Questions and Commands
No ratings yet
Reported Speech: Questions and Commands
2 pages
US Sanctions On Indian Companies
No ratings yet
US Sanctions On Indian Companies
16 pages
Entrance Exam JS 2019 2020
No ratings yet
Entrance Exam JS 2019 2020
11 pages
Comprehensive Database Management-1
No ratings yet
Comprehensive Database Management-1
66 pages
ChatGPT - MyLearning On Bottom-Up Parsing Algorithm
No ratings yet
ChatGPT - MyLearning On Bottom-Up Parsing Algorithm
21 pages
tiếng anh 2
No ratings yet
tiếng anh 2
17 pages
(SV) Introductions - Verb To Be (1) - 3
No ratings yet
(SV) Introductions - Verb To Be (1) - 3
13 pages
Ulangan Harian Kelas 8 Bab 3 On Going Activities
No ratings yet
Ulangan Harian Kelas 8 Bab 3 On Going Activities
1 page
Ucad Text L2 2024-2025 PDF
No ratings yet
Ucad Text L2 2024-2025 PDF
4 pages
Guide Passive Voice COMPLETE TENSES UPC07 PRESENT TENSES
No ratings yet
Guide Passive Voice COMPLETE TENSES UPC07 PRESENT TENSES
3 pages
Building and Environment: David Heinzerling, Stefano Schiavon, Tom Webster, Ed Arens
No ratings yet
Building and Environment: David Heinzerling, Stefano Schiavon, Tom Webster, Ed Arens
13 pages
PSYC2020 Fundamentals of Social Psychology - Assignment Guidelines - V3
No ratings yet
PSYC2020 Fundamentals of Social Psychology - Assignment Guidelines - V3
10 pages
Verbosprimeros2019 2020
No ratings yet
Verbosprimeros2019 2020
9 pages
And Reversal of Multidrug Resistance by GF120918, An Acridonecarboxamide Derivative
No ratings yet
And Reversal of Multidrug Resistance by GF120918, An Acridonecarboxamide Derivative
9 pages
Building and Environment: Ming Jin, Shichao Liu, Stefano Schiavon, Costas Spanos
No ratings yet
Building and Environment: Ming Jin, Shichao Liu, Stefano Schiavon, Costas Spanos
9 pages
Substitution Tables Long
No ratings yet
Substitution Tables Long
7 pages
Reversal of Multidrug Resistance in Vitro and in Vivo by 5-N-Formylardeemin, A New Ardeemin Derivative
No ratings yet
Reversal of Multidrug Resistance in Vitro and in Vivo by 5-N-Formylardeemin, A New Ardeemin Derivative
8 pages
Mdr1 C3435T Polymorphism and Childhood Acute Lymphoblastic Leukemia Susceptibility: An Updated Meta-Analysis
No ratings yet
Mdr1 C3435T Polymorphism and Childhood Acute Lymphoblastic Leukemia Susceptibility: An Updated Meta-Analysis
6 pages
Sciencedirect: Daylighting & Electric Lighting (Green Lighting)
No ratings yet
Sciencedirect: Daylighting & Electric Lighting (Green Lighting)
6 pages
Evaluation of C3435T MDR1 Gene Polymorphism in Adult Patient With Acute Lymphoblastic Leukemia
No ratings yet
Evaluation of C3435T MDR1 Gene Polymorphism in Adult Patient With Acute Lymphoblastic Leukemia
4 pages
Examples of Adverb Clauses
No ratings yet
Examples of Adverb Clauses
3 pages
Subject Object Pelmanism Game
No ratings yet
Subject Object Pelmanism Game
2 pages

A System For Detection of Plagiarism of Ideas Based On Deep Learning Algorithm

Uploaded by

A System For Detection of Plagiarism of Ideas Based On Deep Learning Algorithm

Uploaded by

A System for Detection of Plagiarism of Ideas Based on Deep Learning

Keywords: Plagiarism, Deep Learning, Preprocessing, Doc2vev, neural network,

Document 1 Pre-Processsing &

5 Details of our approach

5-1 Document representation phase

Learning Corpus Suspicious Document Source Document

Construct sentence Vector using doc2vec

Term Frequency-Inverse Sentence Frequency

A- Pre-Processing & Vector Representation

Figure 3: doc2vec principle

In other words, we treat each document as an additional word; Document ID / Paragraph ID is

B- Detect important sentences

𝒕𝒇𝒊𝒅𝒇𝒊,𝒋 = 𝒕𝒇𝒊,𝒋 . 𝒊𝒅𝒇𝒊 (1)

𝒏𝟏,𝟏 is the number of vectors of the closest sentences in a document.

5-2 Deep Learning phase

Figure 3 : our apprenticeship system

5-3 Detection of plagiarism

You might also like