0% found this document useful (0 votes)

67 views9 pages

A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text

This study presents a novel approach for plagiarism detection in Myanmar Unicode text using a deep learning model combined with Rabin-Karp hash code and Word2vec. The proposed system effectively identifies plagiarism in educational content from Myanmar Wikipedia, addressing the lack of existing tools for this language. The research emphasizes the importance of maintaining originality and integrity in Myanmar's academic landscape.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views9 pages

A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 14, No. 2, April 2025, pp. 1616~1624

ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i2.pp1616-1624  1616

A proposed approach for plagiarism detection in Myanmar

Unicode text

Sun Thurain Moe1, Khin Mar Soe1, Than Than Nwe2

1
Faculty of Computer Science, University of Computer Studies, Yangon, Myanmar
2
Faculty of Information Science, University of Information Technology, Yangon, Myanmar

Article Info ABSTRACT

Article history: Around the world, with technology that improves over time, almost
everyone can access the internet easily and quickly. With the increase in the
Received May 2, 2024 use of the internet, the plagiarism of information that is easily available on
Revised Oct 27, 2024 the internet has also increased. Such plagiarism seriously undermines
Accepted Nov 14, 2024 originality and ethical principles. In order to prevent these incidents, there is
plagiarism detection software for many countries and languages, but there is
no plagiarism detection software for the Myanmar language yet. In an
Keywords: attempt to fill that gap, this study proposed a deep learning model with
Rabin-Karp hash code and Word2vec model and built a plagiarism detection
Deep learning system. Our deep learning model was trained by randomly obtaining
Myanmar Unicode information from Myanmar Wikipedia. According to the experiments, our
Natural language processing proposed model can effectively detect plagiarism of educational content and
Plagiarism detection information from Myanmar Wikipedia. Moreover, it is possible to
Syllables segmentation distinguish plagiarized texts by rearranging words or substituting words with
some synonyms. This study contributes to a broader understanding of the
complexities of plagiarism in the Myanmar academic area and highlights the
importance of measures to effectively prevent plagiarism. It maintains the
credibility of education and promotes a culture that values originality and
intellectual integrity.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Sun Thurain Moe
Faculty of Computer Science, University of Computer Studies
D2, Room (608), Mindama Pyin Nyar Yeik Thar, Yangon, Myanmar
Email: [email protected]

1. INTRODUCTION
In the fields of literature and journalism, including various academic areas, the submission or
copying of intellectual property, which is someone else's efforts, without providing a reference or credit to
the original owner is gradually increasing, and it is becoming a challenge for various fields. The rapid and
easy access to vast amounts of information on the internet makes plagiarism attractive, and plagiarism
detection methods struggle to keep up with the growth of technologies such as artificial intelligence used in
plagiarism. The most advanced plagiarism detection systems available today use complex machine learning
and natural language processing techniques to find syntactic and semantic patterns in text. However, there is
a gap that needs to be filled, and that is the lack of proper application of these developments to Myanmar
Unicode text.
Plagiarism detection is an ever-evolving field within natural language processing, driven by the
increasing complexity of text and the sophisticated methods employed by those attempting to plagiarize.
Researchers have continuously explored various algorithms and techniques to improve the accuracy and
effectiveness of plagiarism detection systems. There are many different approaches in this sector, from rule-

Journal homepage: http://ijai.iaescore.com

Int J Artif Intell ISSN: 2252-8938  1617

based algorithms to advanced deep learning models, each contributing uniquely to improving detection
accuracy and efficiency.
Moe and Nwe [1] developed a highly accurate rule-based Myanmar syllable segmentation (MSS)
algorithm that achieves perfect segmentation accuracy on a large dataset of Myanmar Unicode text. This
algorithm's success underscores the potential of rule-based systems for handling specific linguistic
challenges. In the area of deep learning, El Mostafa and Benabbou [2] provided an extensive overview of
various propositions for plagiarism detection, highlighting the limitations of word granularity and Word2vec
methods in capturing the semantic nuances of sentences. Their study suggests the need for more sophisticated
models to accurately detect semantic plagiarism. Ali and Taqa [3] reviewed both traditional and modern
plagiarism detection techniques, concluding that intelligent and deep learning algorithms, which consider
lexical, syntactic, and semantic properties, outperform traditional methods, especially for large corpora. This
insight is pivotal for developing more effective plagiarism detection systems. Xiong et al. [4] introduced a
novel approach that integrates bidirectional encoder representations from transformers (BERT), an enhanced
artificial bee colony (ABC) optimization algorithm, and reinforcement learning (RL). This model addresses
imbalanced classification and has shown superior performance compared to existing models. Focused on
detecting plagiarism in social media content through a four-phase methodology involving data preprocessing, n-
gram evaluation, similarity analysis, and detection [5]. Their ensemble support vector machine based African
vulture optimization (ESVM-AVO) approach has demonstrated high accuracy and efficiency. Jambi et al. [6]
evaluated academic plagiarism detection methods using fuzzy multi-criteria decision-making (MCDM),
providing valuable recommendations for future systems. Eppa and Murali [7] proposed a multi-source
plagiarism detection method for C programming assignments, utilizing an attention-based model and density-
based spatial clustering of application (DBSCAN) clustering algorithm. Saeed and Taqa [8] developed an
application combining term frequency–inverse document frequency (TF-IDF) text encoding, natural
language processing, k-means clustering, and cosine similarity algorithms, while [9] enhanced plagiarism
detection using natural language processing and machine learning techniques, achieving impressive results
on benchmark datasets.
In cross-language plagiarism detection (CL-PD), Bouaine et al. [10] utilized Doc2vec embedding
techniques and a Siamese long short-term memory model, achieving outstanding accuracy and performance
metrics. Further advanced this field with transformer models and cross-lingual sentence alignment techniques
[11], [12]. AlZahrani and Al-Yahya [13] explored Arabic pretrained transformer-based models for authorship
attribution in Islamic law, fine-tuning models like ARBERT and AraELECTRA to achieve significant
results. Arabi and Akbari [14] proposed methods for detecting extrinsic plagiarism using pretrained networks
and WordNet ontologies, demonstrating high precision. Zouaoui and Rezeg [15] presented a multi-agent
indexing system for Arabic plagiarism detection, while El-Rashidy et al. [16] developed a system using
hyperplane equations for high accuracy, outperforming previous systems on standard datasets.
Elali and Rachid [17] examined artificial intelligence-based chatbots for detecting fabricated research, and
Anil et al. [18] compared the effectiveness of various plagiarism detection software on artificial intelligence
generated articles. Elkhatat et al. [19] evaluated artificial intelligence content detection tools' ability to
distinguish between human and artificial intelligence authored content, highlighting ongoing challenges in
this area. Foltýnek et al. [20] tested web-based text-matching systems, revealing that some systems can detect
certain plagiarized content but often misidentify non-plagiarized material. Muangprathub et al. [21] proposed
a formal concept analysis-based algorithm for document plagiarism detection, particularly effective with Thai
text collections. Tian et al. [22] introduced FPBirth for multi-threaded program plagiarism detection,
demonstrating significant performance improvements. Tlitova et al. [23] reviewed methods for identifying
cross-language borrowings in scientific articles, focusing on Russian-English pairs and the need for
specialized tools in this area. Ansorge et al. [24] presented a case study highlighting common errors in
paraphrased plagiarized texts, while Pal et al. [25] demonstrated improved accuracy in plagiarism detection
using natural language processing techniques, further advancing the field's capabilities.
These diverse studies collectively enhance our understanding and capability in plagiarism detection,
addressing various languages, contexts, and methodologies to ensure the integrity of academic and creative
works. The primary challenge addressed in this study is the lack of plagiarism detection tools for Myanmar
Unicode text. Without reliable plagiarism detection mechanisms, academic institutions and content creators
in Myanmar face difficulties in maintaining the integrity and originality of their work. Therefore, an
innovative method that makes use of the most recent developments in natural language processing and
machine learning is urgently needed in order to accurately detect plagiarism in Myanmar Unicode text.
Since the Myanmar language does not have a specific word cutoff, such as a space character, we
have worked step by step through complex preprocessing such as syllable segmentation, word tokenization,
stop word removal, and word embedding. Finally, we successfully built a very accurate and effective deep
learning model that can automatically identify text plagiarism cases across various topics on Myanmar
Wikipedia. To ensure the reliability and robustness of the model, we use a comprehensive evaluation process
A proposed approach for plagiarism detection in Myanmar Unicode text (Sun Thurain Moe)
1618  ISSN: 2252-8938

comparing its performance with established plagiarism detection techniques and manually annotated datasets.
In the sections on the following, we will explore the construction and preprocessing of the dataset, details of
the deep learning model architecture, and evaluation measures of this approach.

2. METHOD
In order to detect Myanmar Unicode plagiarism, input text, or documents are first processed to
separate syllables by the MSS algorithm. Then, using the pre-collected Myanmar word list and the longest
matching algorithm, it is converted into words. After that, stop words are removed from these words, and
sentences are segmented. Plagiarism detection will take a long time if the input sentences are compared with
all the texts on Myanmar Wikipedia. Therefore, keywords are extracted from the input sentences with the yet
another keyword extractor (YAKE) keyword extraction algorithm, and Wikipedia pages related to these
sentences are selected using the Wikipedia search application programming interface (API). Only the text
content is pulled from the selected Wikipedia pages, followed by syllable segmentation and word
tokenization. Then stop words are removed and segmented into sentences. Our proposed system design is
shown in Figure 1.

Input Text/Doc Text Content

Extraction

Text Processing Text Processing

Syllable Segmentation Syllable Segmentation

Wikipedia Search API

Word Tokenization Word Tokenization

Keyword Extraction

Stopword Removal Stopword Removal

Sentence Sentence
Segmentation Segmentation

Plagiarism Detection
Deep Fuzzy Jaro- Fuzzy Fuzzy Token
Fuzzy Cosine
Learning Winkler Levenshtein Sort

Paraphrase Counting and Similarity Calculation

Plagiarism Result

Figure 1. Myanmar Unicode plagiarism detection system

2.1. Myanmar syllable segmentation

The MSS algorithm extracts the Myanmar syllables from the input sentence based on the following
four rules [1]: i) If the ith syllabic element of input sentence is not a member of vowel_medial_group; ii) If

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Int J Artif Intell ISSN: 2252-8938  1619

the i-1th or ith or i+1th syllabic element of input sentence is not a consonant pairs symbol ‘္’; iii) If the i+1th
syllabic element of input sentence is not a consonant pairs symbol ‘္’; and iv) If the i+1th syllabic element of
input sentence is not a ‘Asat’ symbol ‘္’.

vowel_medial_group = [‘ေ္’, ‘္’, ‘္’, ‘္’, ‘္’, ‘္’, ‘္’, ‘္ ’, ‘္ ’, ‘္ ’, ‘္’, ‘္’, ‘္’, ‘္’, ‘ ္’, ‘္ ’]

If all the rules are correct, it has reached the start of the next Myanmar syllable. The example shown
below is a Myanmar sentence segmented into each Myanmar syllable using a syllable segmentation
algorithm.

Input Myanmar sentence:

ကယမင ကယခ င လမ တငအဆင မငလ ခ င အမင မည။

Output Myanmar syllables:

ကယ_မင _ကယ_ခ င _လမ _တင_အ_ဆင_ မင_လ _ခ င _အမ_င _မည_။

2.2. Word tokenization

We collected 46,837 Myanmar words and used a greedy longest-match-first technique to tokenize
Myanmar words. The algorithm picks the longest n-prefix of the remaining syllables that matches a word in
the pre-collected Myanmar word list. Syllables that are not included in the pre-collected Myanmar word list
are treated as unknown and dropped as stop words. Below is an example of converting Myanmar syllables
into Myanmar words.

Input Myanmar syllables:

ကယ_မင _ကယ_ခ င _လမ _တင_အ_ဆင_ မင_လ _ခ င _အမ_င _မည_။

Output Myanmar words:

ကယမင ကယခ င ၊ လမ ၊ အဆင မင၊ လ ခ င ၊ အမ၊ င မည

2.3. Stop words removal

There are no benchmark stop words that have been agreed upon in academia and specifically
defined in Myanmar natural language processing. In order to create a final stop word list, we combined our
ideas with the stop words previously discovered in Myanmar natural language processing research by other
researchers. Some of the stop words used in the proposed model are ‘၌’, ‘၍’, ‘၎င ’, ‘၏’, ‘က ေ ’,
‘ကျွ ေတ ’, ‘က မ’, ‘ ခင ’, ‘ ြင’, ‘ေ က င’, ‘ကတည က’ and all Myanmar digits [၀-၉].

2.4. Sentence segmentation

One of the fundamental processes in natural language processing is sentence segmentation. After
preprocessing the source and target corpora using the previously mentioned techniques, it separates a corpus
into discrete sentences. Sentence boundaries in the Myanmar corpus can be easily identified since the
Myanmar script uses a special character known as the "Pou ma" sign (။), similar to the full stop (.) in English,
to denote the end of a sentence.

2.5. Word2vec model

Google introduced Word2vec in 2013, which captures semantic and contextual similarities by
representing a continuous dense vector of words. A word embedding model can take massive textual corpora,
create a vocabulary dictionary, and generate a dense word embedding model that has a lower dimensionality
than bag-of-words models. There are two different model architectures, such as the continuous bag-of-words
(CBOW) and skip-gram models. In this work, we use the CBOW model and train with 45,399 sentences of
Myanmar Unicode text from 1,000 random Myanmar Wikipedia pages. Then the vector is used to represent
each word. In this phase, the entire text content of the Myanmar Wikipedia page is transformed into a matrix
of vectors, with each row representing a word. Figure 2 shows the visualization of our Word2vec model in

A proposed approach for plagiarism detection in Myanmar Unicode text (Sun Thurain Moe)
1620  ISSN: 2252-8938

the two-dimensional space of t-distributed stochastic neighbor embedding (t-SNE) with principal components
analysis (PCA), and each point represents a word.

Figure 2. Word2vec model

2.6. Rabin-Karp hash function

The Rabin-Karp algorithm was originally a checker that determines whether two strings (or
patterns) match. In this research, however, we employed the Rabin-Karp hashing approach to generate the
hash codes for Myanmar words and convert them to vectors. Unlike the Rabin-Karp technique, we did not
compare hashes according to linear time, but instead built a deep learning model and compared it. In this
work, we used the polynomial rolling hash and modular arithmetic methods defined in (1) to produce hash
codes for Myanmar words, ensuring that each Myanmar word received a unique hash code.

𝐻 = (𝑐1 ∗ 𝑏 𝑚−1 + 𝑐2 ∗ 𝑏 𝑚−2 + ⋯ + 𝑐𝑚 ∗ 𝑏 0 ) 𝑚𝑜𝑑 𝑄 (1)

Where H is the hash code, c is the integer American standard code for information interchange (ASCII) code
of the character in the word, b is the number of all Myanmar characters, m is the number of characters in the
word, and Q is a large prime number.

2.7. Deep learning model

After the preprocessing steps, we performed plagiarism detection on them using a deep learning
model. The deep learning model was trained using two different sentence vectorization techniques, such as
the Rabin-Karp rolling hash function and Word2vec. The training data includes 1,506 sentences randomly
taken from Myanmar Wikipedia pages. The training vectors are obtained after all the words from the training
sentences have been converted into hash code numbers, or Word2vec weights. The proposed model is
illustrated in Figure 3.
However, the length of the training vectors varies depending on how long the sentence is. We then
searched for the sentence with the most words and counted them, discovering that it contained almost
50 words. Each training vector was given a length of 50, and the blank spaces were filled with 1 s.
Concatenating the resulting 50-length vectors into 100-length vectors also includes adding the class label, as

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Int J Artif Intell ISSN: 2252-8938  1621

shown in Tables 1 and 2. When adding class labels, the class label is set to 1 if a vector repeats itself twice
and to 0 if it is adjacent to another randomly chosen vector. In this manner, we obtain a
3011×101-dimensional training dataset. It is important to clarify why each training vector's empty spaces
must be filled with 1s in this context.

Training Data Processing Phase

Text Document
Dataset Word Segmentation

Text Processing Dataset Creation

Vectorization
Syllable Segmentation (Word2Vec/Rabin-
Karp)

Sentence Training Dataset

Word Tokenization Segmentation Create Dataset

Stopword Removal Standardization

Deep Learning Training Phase

Deep Learning Model

Dense Layer(50, Input Dim=100,Activation=ReLu)

BatchNormalization Layer()
Dropout Layer(0.6)

Dense Layer(20, Activation=ReLu)

BatchNormalization Layer()
Dropout Layer(0.5)
Deep Learning Trained
Training Dataset
Model

Dense Layer(10, Activation=Softmax)

BatchNormalization Layer()
Dropout Layer(0.6)

Dense Layer(4, Activation=Sigmoid)

Dense Layer(2, Activation=Sigmoid)

Figure 3. Proposed deep learning model

Table 1. Training data (Rabin-Karp rolling hash)

Src W01 Src W02 … Src W49 Src W50 Tgt W01 Tgt W02 … Tgt W49 Tgt W50 Class
1207 2223 … 1 1 1207 2223 … 1 1 1
830 1146 … 1 1 830 1146 … 1 1 1
⁞ ⁞ … ⁞ ⁞ ⁞ ⁞ … ⁞ ⁞ ⁞
830 1146 … 1 1 236 467 … 1 1 0
455 920 … 1 1 515 1207 … 1 1 0

A proposed approach for plagiarism detection in Myanmar Unicode text (Sun Thurain Moe)
1622  ISSN: 2252-8938

Table 2. Training data (Word2vec)

Src W01 Src W02 … Src W49 Src W50 Tgt W01 Tgt W02 … Tgt W49 Tgt W50 Class
-0.01706486 -0.01323607 … 1 1 -0.01706486 -0.01323607 … 1 1 1
-0.01706486 -0.01334877 … 1 1 -0.01706486 -0.01334877 … 1 1 1
⁞ ⁞ … ⁞ ⁞ ⁞ ⁞ … ⁞ ⁞ ⁞
-0.01706486 -0.01334877 … 1 1 -0.03069331 -0.01706486 … 1 1 0
-0.01620913 -0.00342125 … 1 1 -0.01583702 -0.01572607 … 1 1 0

Checking for identity between the source string and the target string is the primary goal of
plagiarism detection. We aim to detect any instances of plagiarism in Myanmar Wikipedia articles. The entire
collection of articles on Myanmar Wikipedia will need to be used as training data if we choose to use a deep
learning model, which is what we typically do for this kind of requirement. There are many challenges
involved in doing this, including training data extraction, model training time, and hardware resources. Due
to the existence of these challenges, our deep learning model was separated from traditional operation models
and purposefully built as a probabilistic model based on weights similar to a logistics regression.
To maintain the weights of our trained model, we filled all empty spaces in the training vectors with
1 s. Our deep learning model does not use convolutional layers because our training dataset has a 1D
structure. It has only 5 dense, fully connected layers, and rectified linear unit (ReLU), softmax, and sigmoid
are used as activation functions. Using the holdout method, 20% of the 3,011 training datasets were divided,
and 602 were used as testing data for the model evaluation. Our proposed model has a 98% accuracy rate for
detecting plagiarism, as shown in Table 3.

Table 3. Results of proposed model

Precision Recall F1-score Support
Unmatched 1 0.96 0.98 302
Matched 0.96 1 0.98 300

Accuracy 0.98 602

Macro avg 0.98 0.98 0.98 602
Weighted avg 0.98 0.98 0.98 602

3. RESULTS AND DISCUSSION

As mentioned in subsection 2.7, we tested our proposed deep learning model using two different
vectorization techniques. The results indicated that the deep learning model trained with sentence
vectorization using the Rabin-Karp rolling hash function provided more accurate plagiarism detection results
than the model trained with Word2vec sentence vectorization. Our experiment is the first in the field of
Myanmar Unicode plagiarism detection, with no existing methods available for comparison. Plagiarism
detection techniques used in other languages, including English, are not applicable to Myanmar Unicode.
To evaluate the performance of our proposed model, we compared the results with those obtained
using well-known fuzzy string matching methods. The experiment involved 500 randomly selected sentences
from Myanmar Wikipedia, containing nearly 3,000 paraphrases. These sentences were tested for direct
copying and paraphrasing plagiarism, where word syntax was reversed. The results of this test are presented
in Table 4.
The experiment revealed that our proposed method, which combines a deep learning model with
Rabin-Karp sentence vectorization, produced the best results. However, our method still has some
weaknesses. As the first research for Myanmar Unicode plagiarism detection, focusing on direct copying and
pasting, the results shown in Table 4 are promising. Nevertheless, the method has limitations in detecting
other types of plagiarism, such as outlining and summarizing, where only the concept or idea is taken.
Further research is needed to develop adaptive methods for detecting these types of plagiarism.

Table 4. Experimental result

Method Similarity score (%)
Complete plagiarism Paraphrasing plagiarism
DL(Rabin-Karp) 95.6 95.6
DL(Word2Vec) 94.1 94.1
Fuzzy Jaro-Winkler 91.5 88.6
Fuzzy Levenshtein 92.4 59.4
Fuzzy Token Sort 91.9 91.5
Fuzzy Cosine 90.8 91.2

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Int J Artif Intell ISSN: 2252-8938  1623

4. CONCLUSION
Our study marks a significant step forward in the field of Myanmar Unicode plagiarism detection.
By testing a deep learning model trained with two different vectorization techniques, we found that sentence
vectorization with the Rabin-Karp rolling hash function provided superior accuracy compared to the
Word2vec-based approach. This experiment, the first of its kind for Myanmar Unicode, highlights the
limitations of applying plagiarism detection methods developed for other languages. While our method
showed promising results in detecting direct copying and paraphrasing, it still faces challenges in identifying
more complex forms of plagiarism, such as outlining and summarizing. Future research should focus on
developing adaptive methods to address these challenges, enhancing the robustness and accuracy of
plagiarism detection in Myanmar Unicode. In the future, such progress can be extended and significantly
contribute to maintaining academic and professional integrity while encouraging originality and creativity in
written works.

ACKNOWLEDGEMENTS
We would like to express our heartfelt gratitude to all those who contributed to the successful
completion of our publication paper. Additionally, we would like to express our gratitude for the support and
resources provided by the University of Computer Studies, Yangon, Myanmar. Their contributions were
essential to the completion of this work.

REFERENCES
[1] S. T. Moe and T. T. Nwe, “An algorithm for Myanmar syllable segmentation based on the official standard Myanmar Unicode
text,” in 2023 IEEE Conference on Computer Applications (ICCA), Feb. 2023, pp. 6–10, doi:
10.1109/ICCA51723.2023.10181391.
[2] H. El Mostafa and F. Benabbou, “A deep learning based technique for plagiarism detection: a comparative study,” IAES
International Journal of Artificial Intelligence (IJ-AI), vol. 9, no. 1, p. 81, Mar. 2020, doi: 10.11591/ijai.v9.i1.pp81-90.
[3] A. Ali and A. Taqa, “Analytical study of traditional and intelligent textual plagiarism detection approaches,” Journal of Education
and Science, vol. 31, no. 1, pp. 8–25, Mar. 2022, doi: 10.33899/edusj.2021.131895.1192.
[4] J. Xiong et al., “Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm
for pretraining weights,” Expert Systems with Applications, vol. 238, p. 122088, Mar. 2024, doi: 10.1016/j.eswa.2023.122088.
[5] S. V. Vadivu, P. Nagaraj, and B. A. S. Murugan, “Ensemble machine learning technique-based plagiarism detection over opinions
in social media,” Automatika, vol. 65, no. 3, pp. 983–991, Jul. 2024, doi: 10.1080/00051144.2024.2326383.
[6] K. M. Jambi, I. H. Khan, and M. A. Siddiqui, “Evaluation of different plagiarism detection methods: A fuzzy MCDM
perspective,” Applied Sciences, vol. 12, no. 9, p. 4580, Apr. 2022, doi: 10.3390/app12094580.
[7] A. Eppa and A. H. Murali, “Machine learning techniques for multisource plagiarism detection,” in 2021 IEEE International
Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Dec. 2021, pp. 1–5, doi:
10.1109/CSITSS54238.2021.9683752.
[8] A. A. M. Saeed and A. Y. Taqa, “A proposed approach for plagiarism detection in article documents,” SinkrOn, vol. 7, no. 2, pp.
568–578, Apr. 2022, doi: 10.33395/sinkron.v7i2.11381.
[9] F. K. AL-Jibory and others, “Hybrid system for plagiarism detection on a scientific paper,” Turkish Journal of Computer and
Mathematics Education (TURCOMAT), vol. 12, no. 13, pp. 5707–5719, 2021.
[10] C. Bouaine, F. Benabbou, and I. Sadgali, “Word embedding for high performance cross-language plagiarism detection
techniques,” International Journal of Interactive Mobile Technologies (iJIM), vol. 17, no. 10, pp. 69–91, May 2023, doi:
10.3991/ijim.v17i10.38891.
[11] R. S. R. Raj and G. R. Ramya, “Detection of plagiarism in contextual meaning using transformer model and community detection
algorithm,” Smart Trends in Computing and Communications, 2023, pp. 777–795, doi: 10.1007/978-981-99-0838-7_67.
[12] T. Ter-Hovhannisyan and K. Avetisyan, “Transformer-based multilingual language models in cross-lingual plagiarism detection,”
in 2022 Ivannikov Memorial Workshop (IVMEM), Sep. 2022, pp. 72–80, doi: 10.1109/IVMEM57067.2022.9983968.
[13] F. M. AlZahrani and M. Al-Yahya, “A transformer-based approach to authorship attribution in classical Arabic texts,” Applied
Sciences, vol. 13, no. 12, Jun. 2023, doi: 10.3390/app13127255.
[14] H. Arabi and M. Akbari, “Improving plagiarism detection in text document using hybrid weighted similarity,” Expert Systems
with Applications, vol. 207, Nov. 2022, doi: 10.1016/j.eswa.2022.118034.
[15] S. Zouaoui and K. Rezeg, “Multi-agents indexing system (MAIS) for plagiarism detection,” Journal of King Saud University -
Computer and Information Sciences, vol. 34, no. 5, pp. 2131–2140, May 2022, doi: 10.1016/j.jksuci.2020.06.009.
[16] M. A. El-Rashidy, R. G. Mohamed, N. A. El-Fishawy, and M. A. Shouman, “An effective text plagiarism detection system based
on feature selection and SVM techniques,” Multimedia Tools and Applications, vol. 83, no. 1, pp. 2609–2646, Jan. 2024, doi:
10.1007/s11042-023-15703-4.
[17] F. R. Elali and L. N. Rachid, “AI-generated research paper fabrication and plagiarism in the scientific community,” Patterns, vol.
4, no. 3, Mar. 2023, doi: 10.1016/j.patter.2023.100706.
[18] A. Anil et al., “Are paid tools worth the cost? a prospective cross-over study to find the right tool for plagiarism detection,”
Heliyon, vol. 9, no. 9, Sep. 2023, doi: 10.1016/j.heliyon.2023.e19194.
[19] A. M. Elkhatat, K. Elsaid, and S. Almeer, “Evaluating the efficacy of AI content detection tools in differentiating between human
and AI-generated text,” International Journal for Educational Integrity, vol. 19, no. 1, Sep. 2023, doi: 10.1007/s40979-023-
00140-5.
[20] T. Foltýnek et al., “Testing of support tools for plagiarism detection,” International Journal of Educational Technology in Higher
Education, vol. 17, no. 1, Dec. 2020, doi: 10.1186/s41239-020-00192-4.
[21] J. Muangprathub, S. Kajornkasirat, and A. Wanichsombat, “Document plagiarism detection using a new concept similarity in
formal concept analysis,” Journal of Applied Mathematics, vol. 2021, pp. 1–10, Mar. 2021, doi: 10.1155/2021/6662984.

A proposed approach for plagiarism detection in Myanmar Unicode text (Sun Thurain Moe)
1624  ISSN: 2252-8938

[22] Z. Tian, Q. Wang, C. Gao, L. Chen, and D. Wu, “Plagiarism detection of multi-threaded programs using frequent behavioral
pattern mining,” International Journal of Software Engineering and Knowledge Engineering, vol. 30, no. 11-12, pp. 1667–1688,
Nov. 2020, doi: 10.1142/S0218194020400252.
[23] A. Tlitova, A. Toschev, M. Talanov, and V. Kurnosov, “Meta-analysis of cross-language plagiarism and self-plagiarism detection
methods for Russian-English language pair,” Frontiers in Computer Science, vol. 2, Oct. 2020, doi: 10.3389/fcomp.2020.523053.
[24] L. Ansorge, K. Ansorgeová, and M. Sixsmith, “Plagiarism through paraphrasing tools—the story of one plagiarized text,”
Publications, vol. 9, no. 4, Oct. 2021, doi: 10.3390/publications9040048.
[25] S. K. Pal, O. J. Raffik, R. Roy, V. B. Lalman, S. Srivastava, and B. Sharma, “Automatic plagiarism detection using natural
language processing,” in 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom),
2023, pp. 218–222.

BIOGRAPHIES OF AUTHORS

Sun Thurain Moe is currently pursuing a Ph.D. degree at the University of

Computer Studies, Yangon, Myanmar. He received a Master's degree (M.I.Sc.) in information
science from the University of Computer Studies, Yangon. He has contributed significantly to
the field of Myanmar syllable segmentation, Myanmar plagiarism and credit scoring,
publishing three papers in IEEE. His research interests include Myanmar natural language
processing, machine learning, and action recognition. He can be contacted at email:
[email protected].

Dr. Khin Mar Soe received a Ph.D. (information technology) degree from the
University of Computer Studies, Yangon, Myanmar. She is a professor at the University of
Computer Studies, Yangon. Her research interests are in the areas of natural language
processing, part-of-speech tagging, machine translation, and Myanmar name entity
recognition. She can be contacted at email: [email protected].

Dr. Than Than Nwe received a Ph.D. (information technology) degree from the
University of Computer Studies, Yangon, Myanmar. She is a professor at the University of
Information Technology, Yangon. Her research interests are in the areas of information
retrieval, data mining, big data, and pattern recognition. She can be contacted at email:
[email protected].

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Chapter One 1.1 Background of The Study
100% (2)
Chapter One 1.1 Background of The Study
18 pages
The Art of Troubleshooting - Ebook - V2
No ratings yet
The Art of Troubleshooting - Ebook - V2
356 pages
Casio AP500
0% (1)
Casio AP500
42 pages
14S Operator Manual
100% (1)
14S Operator Manual
106 pages
IJRPR7794
No ratings yet
IJRPR7794
3 pages
Referat Plagiat 1
No ratings yet
Referat Plagiat 1
4 pages
Ijarcce 2022 114158
No ratings yet
Ijarcce 2022 114158
6 pages
Review1
No ratings yet
Review1
19 pages
A System For Detection of Plagiarism of Ideas Based On Deep Learning Algorithm
No ratings yet
A System For Detection of Plagiarism of Ideas Based On Deep Learning Algorithm
8 pages
Cppproject 5
No ratings yet
Cppproject 5
17 pages
Basawashreeeeeeeee
No ratings yet
Basawashreeeeeeeee
10 pages
IJCRT2312092
No ratings yet
IJCRT2312092
6 pages
ASurveyon Plagiarism Detection Systems
No ratings yet
ASurveyon Plagiarism Detection Systems
5 pages
6014
No ratings yet
6014
36 pages
Plagiarism Detection The Tool and The Case Study
No ratings yet
Plagiarism Detection The Tool and The Case Study
8 pages
Plagiarismchecker
No ratings yet
Plagiarismchecker
8 pages
IJCRT2409745
No ratings yet
IJCRT2409745
6 pages
Basawashree 1
No ratings yet
Basawashree 1
10 pages
1484-Article Text-5779-2-10-20210426
No ratings yet
1484-Article Text-5779-2-10-20210426
13 pages
Generative AI Report
No ratings yet
Generative AI Report
15 pages
Semantic Exploration of Textual Analogies For Advanced Plagiarism Detection
No ratings yet
Semantic Exploration of Textual Analogies For Advanced Plagiarism Detection
4 pages
Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods
No ratings yet
Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods
17 pages
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Plagiarism Final
No ratings yet
Plagiarism Final
18 pages
Cppproject 4
No ratings yet
Cppproject 4
17 pages
Copy Checker: Keywords:-Plagiarism System, Text Mining, Data Mining
No ratings yet
Copy Checker: Keywords:-Plagiarism System, Text Mining, Data Mining
3 pages
Synopsis 6 Sem 34 ECE
No ratings yet
Synopsis 6 Sem 34 ECE
9 pages
Sheild Plagiarism Detection Improving Accuracy and Efficiency Enhancement in Text and Image Similarity Detection
No ratings yet
Sheild Plagiarism Detection Improving Accuracy and Efficiency Enhancement in Text and Image Similarity Detection
2 pages
Ijarcce 2024 134107
No ratings yet
Ijarcce 2024 134107
6 pages
Awe Emmanuel Project
No ratings yet
Awe Emmanuel Project
7 pages
Plagiarism
No ratings yet
Plagiarism
5 pages
A Unified Framework For Text Extraction and Plagiarism Detection in Image-Based Content Using OCR and NLP
No ratings yet
A Unified Framework For Text Extraction and Plagiarism Detection in Image-Based Content Using OCR and NLP
10 pages
A Machine Learning Based Tool For Source Code Plagiarism Detection
No ratings yet
A Machine Learning Based Tool For Source Code Plagiarism Detection
7 pages
AI Based Student's Assignments Plagiarism Detector
No ratings yet
AI Based Student's Assignments Plagiarism Detector
11 pages
Proposal - Plagiarism Detection in Text-Based Assignments Using Natural Language Processing Technique
No ratings yet
Proposal - Plagiarism Detection in Text-Based Assignments Using Natural Language Processing Technique
11 pages
Articles Plagiarism
No ratings yet
Articles Plagiarism
11 pages
(IJCST-V8I4P13) :M. Chilakarao, K. Sri Sahitya, K. Hari Priya, N. Bala Manikanta, M. Deepika
No ratings yet
(IJCST-V8I4P13) :M. Chilakarao, K. Sri Sahitya, K. Hari Priya, N. Bala Manikanta, M. Deepika
8 pages
1 Overview - of - Different - Plagiarism - Detecti
No ratings yet
1 Overview - of - Different - Plagiarism - Detecti
3 pages
Plagiarism Detection
No ratings yet
Plagiarism Detection
4 pages
Ijresm V4 I4 34
No ratings yet
Ijresm V4 I4 34
3 pages
Plagiarism Detection Using Artificial in
No ratings yet
Plagiarism Detection Using Artificial in
4 pages
Theses and Capstone Projects Plagiarism Checker Using Kolmogorov Complexity Algorithm
No ratings yet
Theses and Capstone Projects Plagiarism Checker Using Kolmogorov Complexity Algorithm
19 pages
2018 Text Mining For Plagiarism Detection - Multivariate Pattern Detection For Recognition of Text Similarities PDF
No ratings yet
2018 Text Mining For Plagiarism Detection - Multivariate Pattern Detection For Recognition of Text Similarities PDF
8 pages
Plagiarism: Taxonomy, Tools and Detection Techniques
No ratings yet
Plagiarism: Taxonomy, Tools and Detection Techniques
17 pages
Text Plagiarism Checker Using NLP: Presented by Under The Supervision of
No ratings yet
Text Plagiarism Checker Using NLP: Presented by Under The Supervision of
18 pages
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
JETIR1706044
No ratings yet
JETIR1706044
3 pages
A Deep Learning Based Technique For Plagiarism Detection: A Comparative Study
No ratings yet
A Deep Learning Based Technique For Plagiarism Detection: A Comparative Study
10 pages
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
Palagiarism Detection
No ratings yet
Palagiarism Detection
14 pages
Jhgythjgcn B
No ratings yet
Jhgythjgcn B
7 pages
Mini - Project - Final (1) .PPTX (Read-Only) .
No ratings yet
Mini - Project - Final (1) .PPTX (Read-Only) .
15 pages
Plagiarism Detection Algorithm Using Natural Language Processing Based On Grammar Analyzing
No ratings yet
Plagiarism Detection Algorithm Using Natural Language Processing Based On Grammar Analyzing
13 pages
Title of Project
No ratings yet
Title of Project
19 pages
Overview and Comparison of Plagiarism Detection Tools Overview and Comparison of Plagiarism Detection Tools
No ratings yet
Overview and Comparison of Plagiarism Detection Tools Overview and Comparison of Plagiarism Detection Tools
12 pages
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
5/5 (1)
Plagiarism Detection Techniques
No ratings yet
Plagiarism Detection Techniques
20 pages
IJRPR14144
No ratings yet
IJRPR14144
3 pages
2011 12 Libraryhitech 29 4 Author
No ratings yet
2011 12 Libraryhitech 29 4 Author
11 pages
Batch 20
No ratings yet
Batch 20
31 pages
Programming Style On Source Code Plagiarism and Collusion Detection
No ratings yet
Programming Style On Source Code Plagiarism and Collusion Detection
12 pages
The Algorithmic Analyst: Mastering NLP For Modern Intelligence
From Everand
The Algorithmic Analyst: Mastering NLP For Modern Intelligence
Zhao Xintong
No ratings yet
The Impact of Artificial Intelligence Technology on Public School Curriculums of Mathematics-Sciences
From Everand
The Impact of Artificial Intelligence Technology on Public School Curriculums of Mathematics-Sciences
Noel Smythe
No ratings yet
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
No ratings yet
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
8 pages
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
No ratings yet
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
12 pages
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
No ratings yet
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
9 pages
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
No ratings yet
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
10 pages
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
No ratings yet
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
10 pages
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
No ratings yet
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
11 pages
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
No ratings yet
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
8 pages
Graph-Based Methods For Transaction Databases: A Comparative Study
No ratings yet
Graph-Based Methods For Transaction Databases: A Comparative Study
10 pages
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
No ratings yet
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
11 pages
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
No ratings yet
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
8 pages
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
No ratings yet
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
9 pages
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
No ratings yet
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
13 pages
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
No ratings yet
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
11 pages
Hybrid Model Detection and Classification of Lung Cancer
No ratings yet
Hybrid Model Detection and Classification of Lung Cancer
11 pages
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
No ratings yet
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
7 pages
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
No ratings yet
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
12 pages
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
No ratings yet
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
9 pages
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
No ratings yet
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
13 pages
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
No ratings yet
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
13 pages
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
No ratings yet
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
11 pages
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
No ratings yet
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
8 pages
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
No ratings yet
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
10 pages
U-Net For Wheel Rim Contour Detection in Robotic Deburring
No ratings yet
U-Net For Wheel Rim Contour Detection in Robotic Deburring
14 pages
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
No ratings yet
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
15 pages
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
No ratings yet
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
9 pages
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
No ratings yet
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
11 pages
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
No ratings yet
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
9 pages
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
No ratings yet
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
10 pages
Squeeze-Excitation Half U-Net and Synthetic Minority Oversampling Technique Oversampling For Papilledema Image Classification
No ratings yet
Squeeze-Excitation Half U-Net and Synthetic Minority Oversampling Technique Oversampling For Papilledema Image Classification
10 pages
A Comparative Analysis of Exponential Smoothing Method and Deep Learning Models For Bitcoin Price Prediction
No ratings yet
A Comparative Analysis of Exponential Smoothing Method and Deep Learning Models For Bitcoin Price Prediction
9 pages
Telangana State Report 10-05-2022
No ratings yet
Telangana State Report 10-05-2022
34 pages
Urological Oncology: A Comparison Between Clinical and Pathologic Staging in Patients With Bladder Cancer
No ratings yet
Urological Oncology: A Comparison Between Clinical and Pathologic Staging in Patients With Bladder Cancer
5 pages
ASIC Implementation of Efficient 16-Parallel Fast FIR Algorithm Filter Structure
No ratings yet
ASIC Implementation of Efficient 16-Parallel Fast FIR Algorithm Filter Structure
5 pages
Table of Contents (The Summary) : Intro
No ratings yet
Table of Contents (The Summary) : Intro
14 pages
Sony Ericsson Product
No ratings yet
Sony Ericsson Product
34 pages
Anthropology 14th Edition Carol R Ember HQ File Fast Access
No ratings yet
Anthropology 14th Edition Carol R Ember HQ File Fast Access
312 pages
Malpezzi Ozanne Thibodeau Characteristic Prices 59 Metro Areas Hedonic Indexes Hud-50814
No ratings yet
Malpezzi Ozanne Thibodeau Characteristic Prices 59 Metro Areas Hedonic Indexes Hud-50814
200 pages
Quick Start Guide: Register Your Product and Get Support at
No ratings yet
Quick Start Guide: Register Your Product and Get Support at
6 pages
Trabajo Final de Ingles Técnico
No ratings yet
Trabajo Final de Ingles Técnico
5 pages
Web of Science Core Collection:: Journal Evaluation Process and Selection Criteria
No ratings yet
Web of Science Core Collection:: Journal Evaluation Process and Selection Criteria
35 pages
SinclairCollins K-Series 02 2016
No ratings yet
SinclairCollins K-Series 02 2016
20 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
88 pages
Dr. Data New Fomat (June, 2015) BILAL
No ratings yet
Dr. Data New Fomat (June, 2015) BILAL
13 pages
Ing Bank Ar 2018
No ratings yet
Ing Bank Ar 2018
369 pages
24F - 48F DJ ADSS Specs 600 MTR
No ratings yet
24F - 48F DJ ADSS Specs 600 MTR
2 pages
My MVP in Volleyball: Individual Awards: Collegiate Awards
No ratings yet
My MVP in Volleyball: Individual Awards: Collegiate Awards
1 page
Q.18604 Cummin Genset Nta 855 - 1
100% (2)
Q.18604 Cummin Genset Nta 855 - 1
1 page
ECE CAD Introduction To AutoCAD
No ratings yet
ECE CAD Introduction To AutoCAD
5 pages
ENGLISH-8-Quarter 2-Week 5
100% (1)
ENGLISH-8-Quarter 2-Week 5
6 pages
Cambridge O Level: Environmental Management 5014/22
No ratings yet
Cambridge O Level: Environmental Management 5014/22
11 pages
Chapter 7 Software Reuse
No ratings yet
Chapter 7 Software Reuse
30 pages
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
No ratings yet
Service Manual, PM7100, English PT00112534 Rev A Release 8-2020
64 pages
Gr9 PT2 Portions 2024-25
No ratings yet
Gr9 PT2 Portions 2024-25
4 pages
Altman Z Score Model
No ratings yet
Altman Z Score Model
7 pages
Staad Questions PDF
No ratings yet
Staad Questions PDF
8 pages
Syllabus MKCU Semester 2
No ratings yet
Syllabus MKCU Semester 2
3 pages
A Pilgrimage To Asamankese
No ratings yet
A Pilgrimage To Asamankese
10 pages

A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text

Uploaded by

A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text

Uploaded by

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 14, No. 2, April 2025, pp. 1616~1624

A proposed approach for plagiarism detection in Myanmar

Sun Thurain Moe1, Khin Mar Soe1, Than Than Nwe2

Article Info ABSTRACT

Journal homepage: http://ijai.iaescore.com

Input Text/Doc Text Content

Text Processing Text Processing

Syllable Segmentation Syllable Segmentation

Wikipedia Search API

Word Tokenization Word Tokenization

Stopword Removal Stopword Removal

Paraphrase Counting and Similarity Calculation

Figure 1. Myanmar Unicode plagiarism detection system

2.1. Myanmar syllable segmentation

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Input Myanmar sentence:

Output Myanmar syllables:

2.2. Word tokenization

Input Myanmar syllables:

Output Myanmar words:

2.3. Stop words removal

2.4. Sentence segmentation

2.5. Word2vec model

Figure 2. Word2vec model

2.6. Rabin-Karp hash function

𝐻 = (𝑐1 ∗ 𝑏 𝑚−1 + 𝑐2 ∗ 𝑏 𝑚−2 + ⋯ + 𝑐𝑚 ∗ 𝑏 0 ) 𝑚𝑜𝑑 𝑄 (1)

2.7. Deep learning model

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Training Data Processing Phase

Text Processing Dataset Creation

Sentence Training Dataset

Stopword Removal Standardization

Deep Learning Training Phase

Deep Learning Model

Dense Layer(50, Input Dim=100,Activation=ReLu)

Dense Layer(20, Activation=ReLu)

Dense Layer(10, Activation=Softmax)

Dense Layer(4, Activation=Sigmoid)

Dense Layer(2, Activation=Sigmoid)

Figure 3. Proposed deep learning model

Table 1. Training data (Rabin-Karp rolling hash)

Table 2. Training data (Word2vec)

Table 3. Results of proposed model

Accuracy 0.98 602

3. RESULTS AND DISCUSSION

Table 4. Experimental result

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

Sun Thurain Moe is currently pursuing a Ph.D. degree at the University of

Int J Artif Intell, Vol. 14, No. 2, April 2025: 1616-1624

You might also like