0% found this document useful (0 votes)
31 views

Text Paraphrasing With Large Language Models-3

The document discusses paraphrasing text using large language models to avoid plagiarism and enhance understanding. It describes using Gemini AI and keyBERT to paraphrase text and wordNET for synonyms. A browser extension is also developed to improve the user interface for paraphrasing. The approach takes text as input and outputs a paraphrased version using these models.

Uploaded by

bebab40907
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Text Paraphrasing With Large Language Models-3

The document discusses paraphrasing text using large language models to avoid plagiarism and enhance understanding. It describes using Gemini AI and keyBERT to paraphrase text and wordNET for synonyms. A browser extension is also developed to improve the user interface for paraphrasing. The approach takes text as input and outputs a paraphrased version using these models.

Uploaded by

bebab40907
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Text Paraphrasing with Large Language Models

Abstract words in the text or paragraph that has been


used and may have different meanings for it.
The main focus of this paper is
paraphrasing the text using Large They may be confused that what
Language Model (LLM) model (Gemini could be the meaning of the word that has
AI) which is very useful for articles to been used in the text. In these kinds of
avoid plagiarism and repetitions also situations mostly people spend time to find
enhances the understanding. Also used that meaning and sometimes it might be
browser extension to enhance User difficult to identify. The ability to detect
interface (UI) and User experience (UX). similar sentences is essential for several
This paper provides the approach to text applications like text summarization,
paraphrasing using LLM model for question answering, and plagiarism
paraphrasing and keyBERT along with detection [2]. Paraphrasing has been done by
wordNET for complex word with their using Machine Learning Models, deep
synonyms. learning models and neural network models.
Teir accuracy remained up to only 67%
Keywords: maximum. Also recently started to use
Large Language Models. Large language
Large Language Model (LLM),
models (LLMs) have revolutionized natural
Gemini AI, keyBERT, wordNET.
language processing (NLP) and
demonstrated exceptional performance in
various NLP tasks for widely spoken
I. Introduction languages [3]. WordNet, which is a lexical
Text Paraphrasing plays an important database grouping English words into sets of
role for avoiding plagiarism, avoiding synonyms, and word embeddings [4].
repetition in research papers and enhancing Mostly all software techs are
understanding the complex text. For building the web applications or mobile
example, one who wants to write the applications so that everyone will access
research paper and want to use the logic that their application for a particular purpose, but
has been written in another paper with the for phrasing a text it might be time
same meaning. To do that paraphrasing will consuming to copy paste everything from
be the best solution for avoid plagiarism, one tab to another. The main moto of this
repetition also helps in better understanding research is to improve the accuracy by using
the complex text or paragraph. Paraphrases LLM for Paraphrasing the text also by
resemble monolingual translations from a providing the tool for paraphrasing the text
source sentence into other sentences that easily within the browser.
must preserve the original meaning [1].
Sometimes most of the people unable to
understand the meaning of some difficult
II. Literature Survey summaries of large volumes of text [9].
Researchers have been refining ATS
Recently Researcher have surveyed techniques, classifying them into Extractive,
mostly on how to generate and detect a Abstractive and Hybrid methods. Where
paraphrase and text summarization by using Extractive approach extracts the key
NLP and Neural Networks in different fields sentences from the source documents to
like social media tweets. Where the tool generate the summary. While Abstractive
performs deep analysis of the natural approach employs an intermediary
language sentence and utilizes sets of representation of the input documents [9].
paraphrasing techniques that can be used to Elmo embedding is a contextual embedding
transform structural parts of the dependency that had been used previously by many
tree of a sentence to an equivalent form and researchers in abstractive text
also change sentence words with their summarization techniques [10]. Hybrid
synonyms and antonyms [5]. So that the combines both extractive and abstractive
generated text will automatically be accurate features to summarize the text. Some of the
with the inner meaning of the sentence is researchers are trying to train a model to
syntactically correct. Also, by using Deep generate the paraphrase for a text without
Learning classification models Paraphrase prompt word. we can get diversified
Identification can be done. The intention is paraphrase sentences for other sentences by
to harness the memorizing power of Long enumerating different word in the source
Short-Term Memory (LSTM) neural sentence as the sensitive word [11]. A novel
network and the feature extracting capability paraphrase generation model that combines
of Convolutional Neural Network (CNN) in the Transformer architecture with part-of-
combination with the optimized word- speech features, and this model is trained
embedding approach which aims to capture using a Chinese corpus [12]. The Examples
wide-sentential contexts and word-order [6]. of GPT-2 generated paraphrased sentences
By doing this data augmentation of text data with scores of each pair [21]:
will become easy to train the models for
accurate results. In general, many image- to- Sentence used for ROUGE BLEU
text paired datasets need to be prepared for paraphrase Score Score
A prisoner can asphyxiate
robust image captioning, but such datasets
himself in 90 seconds and,
cannot be collected in practical cases [7]. after eight minutes or so, he
0.4706 0.4730
Another way for paraphrasing is based on n- will be brain dead.
gram approach. N-grams are relevant words The restaurant is a carved-
off space up a couple of
of text document that can be applied for a stairs to one side,
range of Natural Language Processing dominated by faux bare- 0.5000 0.5348
(NLP) applications [8]. The candidate brick columns, faux-wood
floors and an air of fetid
paraphrases are generated based on trigrams
despondency.
approach [8]. I signed a bill that made the
problem worse, and I want 0.4667 0.5299
One of the case studies is Automatic to admit it, he said.
Text Summarization (ATS), it is a rapidly It said the damage to the 0.4545 0.5445
growing field that aims to save readers time wing provided a pathway
for hot gasses to penetrate
and effort by automatically generating
the ship’s thermal armor different SMT models they are: phrase-
during Columbia’s ill-fated based, hierarchical phrase-based and
reentry.
The task of generating or identifying operation sequence model. While many
the semantic equivalence for different existing systems simply consist in rule-
elements of language such as words based models, the recent success of the Deep
sentences; is an essential part of the natural Neural Networks in several NLG tasks
language processing [13]. Semantic naturally suggests the possibility of
equivalence is metric for the paraphrase exploiting such networks for generating
generator to check the equivalence meaning paraphrases [19]. Numerous people search
of a sentence with paraphrase. Where text and read vast amounts of information to
summarization is required for long texts and ascertain what is necessary, comprehensive,
this has been developed for different and pertinent to their goals [20]. Most
languages like Ho Language which is tribe published documents entice readers to read
language in India. The stories were the rest but frequently omit the most
summarized using TF-IDF and Text Rank important details [20].
algorithm and the generated summaries were Based on the previous work we
manually evaluated on different parameters wanted to develop the browser extension to
[14]. Many texts summarization surveys has save time for understanding the complex
been going on since 2022 throughout the long text to save more time and to avoid
world and a survey proves a general plagiarism and repetition in research paper
overview of the work that has been within the browser.
implemented with topic of automatic text
summarizing in different languages and
utilizing several text summarization III. Methodology
methodologies [15].
A. Block Diagram:
To extract the most important details
from legal papers Feature vector method can Block Diagram for the work we have
incorporates both generic text elements and done is described below:
domain specific legal features [16]. It is now
more crucial than ever to offer enhanced
methods for accurately and effectively
identifying and presenting textual material
due to the rapid growth of internet
information [17]. We also have a research
paper where Statistical Machine Translation
(SMT) approach is used to generate the
paraphrase. The system trained 89K
sentence pairs that are manually collected Here, the user or client will select the text
from Facebook Comments and daily for paraphrasing after selecting the text it
conversation corpus and also 89K Burmese will popup in the extension by onclick on
Paraphrase Words are collected from extension icon within the browser. Then the
Burmese Wiktionary [18]. Used Three user needs to click on the rephrase button to
paraphrase the text selected. Then the text is IV. Implementation
sent to Backend where model will generate
the paraphrased text along with the difficult The implementation of the extension is done
words and their synonyms. After paraphrase in a local system:
generation the paraphrased text will be
shown in the browser extension.
B. Technologies:
Technologies used in this application
are Python for model generation, where we
also used nltk library for formatting the
synonyms that are generated form wordNET
for the difficult word in the text provided,
keyBERT model for finding the difficult The above figure shows the user interface of
words in the text provided and Gemini AI the extension created. In this interface the
model for text paraphrasing. Flask for selected text will automatically pops in the
backend server to interact with the model extension interface to paraphrase the text.
and the frontend of the extension in the Once onclick the rephrase button the
browser. Also use ROUGE score and BLEU rephrased text will be shown along with the
score for Evaluation of model. difficult words and their synonyms as shown
ROUGE Score is a set of metrics below:
used for evaluation automatic
summarization and machine translation
systems. It measures the overlap between
the generated summary or translation and
one or more reference summaries or
translations. There are several variants of
ROUGE Score, denoted as ROUGE-N,
ROUGE-L and ROUGE-W, among others.
They focus on different aspects of overlap
between the generated and reference texts.
ROUGE-L and ROUGE-W are similar in This is how browser extension will interact
content but differ in the units of analysis with the user to save time from navigating
they consider. BLEU Score is a metric used from one tab to another by providing the
for evaluating the quality of machine- paraphrased text with difficult words and
translated text against one or more reference their synonyms in the same tab.
translations. It calculates the precision of the
n-grams (continuous sequence of n words)
generated by machine translation compared V. Results and Analysis
to those in the reference translations. The accuracy metrics that are used to
check the performance of the model are
ROUGE Score and BLEU Score. ROUGE
stands for Recall-Oriented Understudy for Accuracy metric Score
Gisting Evaluation. ROUGE Score is a set Rouge Score 0.7212
of metrics commonly used for text BLEU Score 0.8806
summarization tasks. Where the goal is
automatically generating a concise summary
According to the previous model
of a longer text. It was designed to evaluate
scores mentioned in the literature survey
the quality of machine-generated summaries
these seem to achieve remarkable results
by comparing them to reference summaries
which seem to be far better from the GPT-2
provided by humans. We used ROUGE-N as
model.
an evaluation metric:
ROUGE-N Score =
Number of overlapping n−grams V. Conclusion
Total number of n−grams∈reference summary
This research shows the incredible
Where: results for paraphrasing using Gemini AI
N = Order of n-grams being considered model than previous models like GPT-2.
Also, Browser Extension will improve the
“Number of overlapping n-grams” = count User Experience for avoiding plagiarism and
of n-grams that appear both in the generated repetition for their researches, surveys, etc.
summary and the reference summary Along with Paraphrased text difficult words
and synonyms are also provided for
“Total number of n-grams in reference
enhancing understanding of users. This will
summary” = Total count of n-grams in the
help in better understanding of a complex
reference summary
long texts or paragraphs by providing the
BLEU (BiLingual Evaluation synonyms for complex words with the help
Undestudy) is a metric for automatically of wordNET and keyBERT.
evaluating machine-translated text. The
BLEU Score is a number between zero and
one that measures the similarity of the References
machine-translated text to a set of high-
[1] J. Effendi, S. Sakti and S. Nakamura, "Creation of a
quality reference translation. multi-paraphrase corpus based on various elementary
N operations," 2017 The Oriental Chapter of the International
1
BLEU Score = BP × exp (∑ log P n) Coordinating Committee on Speech Databases and Speech
n =1 N I/O Systems and Assessment (O-COCOSDA)

[2] M. Rohith, M. J. Venkat, P. V. Akhil, M. S. S. Tarun and


Where: D. Gupta, "Telugu Paraphrase Detection Using Siamese
Network," 2022 Computing Communication and
BP = Brevity Penalty factor to penalize short Networking Technologies (ICCCNT)
translations.
[3] Z. Bimagambetova, D. Rakhymzhanov, A. Jaxylykova
Pn= Precision for n-grams, and N is the and A. Pak, "Evaluating Large Language Models for
Sentence Augmentation in Low-Resource Languages: A
maximum order of n-grams considered. Case Study on Kazakh," 2023 Optimization Problems of
Complex Systems (OPCS)
The scores of the model are:
[4] Q. Li, S. Shah, M. Ghassemi, R. Fang, A. Nourbakhsh International Conference on Computing Methodologies and
and X. Liu, "Using paraphrases to improve tweet Communication (ICCMC), Erode, India, 2022, pp. 812-816
classification: Comparing WordNet and word embedding
approaches," 2016 Big Data (Big Data), Washington, DC [16] A. V. Saravade and P. R. Deshmukh, "Improving
Feature Vector for Extractive Text Summarization of Legal
[5] I. Perikos and I. Hatzilygeroudis, "A methodology for Judgement," 2023 International Conference on Electrical,
generating natural language paraphrases," 2016 Electronics, Communication and Computers (ELEXCOM),
Information, Intelligence, Systems & Applications (IISA) Roorkee, India, 2023, pp. 1-5

[6] D. R. Kubal and A. V. Nimkar, "A Hybrid Deep [17] R. Ramachandran, S. Jayachandran and V. Das, "A
Learning Architecture for Paraphrase Identification," 2018 Novel Method for Text Summarization and Clustering of
Computing, Communication and Networking Technologies Documents," 2022 IEEE 3rd Global Conference for
(ICCCNT) Advancement in Technology (GCAT), Bangalore, India,
2022, pp. 1-6
[7] R. Masumura, N. Makishima, M. Ihori, A. Takashima,
T. Tanaka and S. Orihashi, "Text-to-Text Pre-Training with [18] M. M. Htay, Y. K. Thu, H. A. Thant and T. Supnithi,
Paraphrasing for Improving Transformer-Based Image "Statistical Machine Translation for Myanmar Language
Captioning," 2023, pp. 516-520 Paraphrase Generation," 2020 15th International Joint
Symposium on Artificial Intelligence and Natural Language
[8] A. I. Gadag and B. M. Sagar, "N-gram based paraphrase Processing (iSAI-NLP), Bangkok, Thailand, 2020, pp. 1-6
generator from large text document," 2016 International
Conference on Computation System and Information [19] A. Globo, A. Trevisi, A. Zugarini, L. Rigutini, M.
Technology for Sustainable Solutions (CSITSS), Maggini and S. Melacci, "Neural Paraphrasing by
Bengaluru, India, pp. 91-94 Automatically Crawled and Aligned Sentence Pairs," 2019
Sixth International Conference on Social Networks
[9] B. Khan, Z. A. Shah, M. Usman, I. Khan and B. Niazi, Analysis, Management and Security (SNAMS), Granada,
"Exploring the Landscape of Automatic Text Spain, 2019, pp. 429-434
Summarization: A Comprehensive Survey," in IEEE
Access, vol. 11, pp. 109819-109840, 2023 [20] A. A. Al-Banna and A. K. Al-Mashhadany, "Automatic
Text Summarization Based on Pre-trained Models," 2023
[10] H. Gupta and M. Patel, "Study of Extractive Text Al-Sadiq International Conference on Communication and
Summarizer Using The Elmo Embedding," 2020 Fourth Information Technology (AICCIT), Al-Muthana, Iraq,
International Conference on I-SMAC (IoT in Social, 2023, pp. 80-84
Mobile, Analytics and Cloud) (I-SMAC), Palladam, India,
2020, pp. 829-834 [21] Sam Witteveen and Martin Andrews, “Paraphrasing
with Large Language Models”, 2019 In Proceedings of the
[11] R. Liu and B. Song, "Freedom Sentence Paraphrase 3rd Workshop on Neural Generation and Translation,
Method," 2022 8th Annual International Conference on pages 215–220, Hong Kong. Association for Computational
Network and Information Systems for Computers Linguistics
(ICNISC), Hangzhou, China, 2022, pp. 522-526

[12] Y. -C. Tsai and F. -C. Lin, "Paraphrase Generation


Model Integrating Transformer Architecture, Part-of-
Speech Features, and Pointer Generator Network," in IEEE
Access, vol. 11, pp. 30109-30117, 2023

[13] A. Gadag and B. M. Sagar, "A review on different


methods of paraphrasing," 2016 International Conference
on Electrical, Electronics, Communication, Computer and
Optimization Techniques (ICEECCOT), Mysuru, India,
2016, pp. 188-191

[14] D. Bankira, S. Panda, S. Ranjan, H. S. Ali, S. Parida


and N. Walubita, "Automatic Extractive text
Summarization for Ho Language," 2023 OITS International
Conference on Information Technology (OCIT), Raipur,
India, 2023, pp. 915-919

[15] G. MalarSelvi and A. Pandian, "Analysis of Different


Approaches for Automatic Text Summarization," 2022 6th

You might also like