0% found this document useful (0 votes)
277 views22 pages

Paraphrasing Hindi Text

This document describes a project to develop a paraphrasing tool for Hindi text. The project is being undertaken by 4 students - Anupam Teli, Shubham Hande, Deepak Manney, Yashodhan Joglekar - under the guidance of Ms. Trupti Patil. The tool will use Flask and OpenAI to identify paraphrases in Hindi text, which is an important task for applications like plagiarism detection that has not yet been addressed for the Hindi language.

Uploaded by

Anupam Teli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
277 views22 pages

Paraphrasing Hindi Text

This document describes a project to develop a paraphrasing tool for Hindi text. The project is being undertaken by 4 students - Anupam Teli, Shubham Hande, Deepak Manney, Yashodhan Joglekar - under the guidance of Ms. Trupti Patil. The tool will use Flask and OpenAI to identify paraphrases in Hindi text, which is an important task for applications like plagiarism detection that has not yet been addressed for the Hindi language.

Uploaded by

Anupam Teli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

BHARATI VIDYAPEETH DEEMED TO BE UNIVERSITY

DEPARTMENT OF ENGINEERING AND TECHNOLOGY, NAVI MUMBAI


CAMPUS
DEPARTMENT OF INFORMATION TECHNOLOGY

PROJECT TITLE:- PARAPHRASING TOOL FOR HINDI TEXT

PROJECT MEMBERS:- 1) ANUPAM TELI


2) SHUBHAM HANDE
3) DEEPAK MANNEY
4) YASHODHAN JOGLEKAR
GUIDE:- MS.TRUPTI PATIL
Paraphrasing refers to writing that either differs
in its textual content or is dissimilar in
rearrangement of words but conveys the same
meaning.

Identifying a paraphrase is exceptionally


important in various real life applications such
ABSTRACT as Information Retrieval, Plagiarism Detection,
Text Summarization and Question Answering.

A large amount of work in Paraphrase Detection


has been done in English and many Indian
Languages. However, there is no existing system
to identify paraphrases in Marathi. This is the
first such endeavor in the Marathi Language.
PRESENTATION OUTLINE

 INTRODUCTION
 LITERATURE REVIEW
 DESIGN METHODOLOGY
 PROBLEM STATEMENT
 PROPOSED METHODS
 WORKING STEPS
 TECHNOLOGIES USED
 WORKFLOW
 PROPOSED SYSTEM
 CONCLUSION AND FUTURE WORK
 REFERENCES
Hindi is an Indian language spoken predominantly in all
states. Paraphrasing is the process of rewording a text in a
manner that retains the original meaning while using
different words. It is an essential skill for effective
communication and is particularly useful in academic
writing, content creation, and language translation. In this
project, we aim to develop a Hindi paraphrasing tool using
Flask, OpenAI.
Flask is a micro web framework written in Python that is
used for building web applications. It is a lightweight
framework that provides flexibility and scalability. Flask is
particularly suitable for developing small to medium-sized
INTRODUCTION web applications that do not require extensive libraries or
tools. Flask provides support for HTTP requests, routing,
templates, and more.

OpenAI is an artificial intelligence research organization


that aims to create safe and beneficial AI. OpenAI is known
for developing state-of-the-art language models such as
GPT-3. These language models can generate human-like text
that can be used for various natural language processing
tasks such as language translation, text summarization, and
text generation.
INTRODUCTION

 Helsinki-NLP is a Python library that provides easy-to-use interfaces for various natural language
processing tasks such as text classification, text generation, and language translation. It is built on top of
popular deep learning frameworks such as PyTorch and TensorFlow

 The first step in developing our Hindi paraphrasing tool is to gather a dataset of Hindi text. We can use
publicly available datasets such as the Hindi Wikipedia, news articles, and social media posts. We can
also use web scraping techniques to collect Marathi text from various websites.

 Once we have a dataset of Hindi text, we can use Helsinki-NLP to pre-process the text. Pre-processing
involves cleaning the text, removing stop words, and tokenizing the text. Tokenization is the process of
splitting the text into individual words or tokens.

 After pre-processing the text, we can use Open Ai's GPT-3 language model to generate paraphrases.
GPT-3 is a state-of-the-art language model that can generate human-like text. It is trained on a massive
amount of text data and can generate text in various styles and tones.
LITERATURE REVIEW
SR.N PAPER TITLE PUBLICATIO PAPER GAP
O N YEAR OUTCOME ANALYSIS

1 Detection of paraphrases 2018 Paraphrase Paraphrase word


for Devanagari languages detection is upto 40 words
using support Vector challenging
Machine task for
Devanagari
Languages like
Hindi.
2 Paraphrase Detection in 2019 The Its provides an
Hindi Language using international paraphrasing
Syntactic Features of languages uses sentences in
Phrase to check the two form only
semantic
similarity and
lexical
matching of
the two
sentences with
the help of
WordNet
3 Phrase 2021 Grammatical Its makes an
Composing Error Correction grammatical
Tool using and Detection by mistakes after
Natural using paraphrase of
Language Transformer words
Processing Model have
evolved into
existence. As
research is
continuously in
process, the need
of development
of the most
optimal model is
required.
4 A Novel 2019 Creating Synonyms are
approach to paraphrase by not properly
Paraphrase applying applied for
Hindi Sentences synonyms and paraphrasing
using NLP antonym
replacement

5 Detecting 2018 We describe an In this paper, it


Paraphrases in approach to the is developed
Indian Detecting system for
Languages Paraphrase Malaylam and
based on problem in India Hindi language
Gradient Tree Language that
Boosting makes used of
the Gradient
Tree Boosting.
6 Language 2020 In this paper, we require
Independent presented our large
Paraphrases NLP-NITMZ memory,
Detection system used for slow
DPIL shared task. execution
Overall, our
approach looks
promising, but
needs some
improvement.
7 An Eccentric 2020 The paper proposes a The paper lacks
Approach for method for detecting detailed analysis of
Paraphrase paraphrases using limitations, comparison
Detection using semantic matching and with state-of-the-art
Semantic Matching SVMs, achieving high methods, feature
and Support Vector accuracy on selection criteria, and
Machine benchmark datasets. computational
efficiency, which should
be addressed in future
research.

8 Paraphrase Detection 2018 detecting paraphrases incorporating semantic


in Hindi Language in Hindi language features to improve
using Syntactic based on syntactic paraphrase detection in
Features of Phrase features of phrases. Hindi.
9 The Study 2020 Paraphrase accurate and
and Review detection comprehensive
of techniques in
Paraphrase machine
Detection learning
Techniques involve using
in Machine various models
Learning and algorithms
to identify
whether two
sentences or
phrases have
the same
meaning or
convey the
same message
10 Paraphrase 2019 Paraphrase
Identification on identification
the Basis of using supervised
Supervised machine learning
Machine Learning techniques
Techniques involves training
models with
labeled data to
classify whether
two sentences or
phrases are
paraphrases or not
Mostly the research has been done for
English and Indian regional languages such
as Hindi, Gujarati and Kannada.

SCOPE OF
PROJECT
However, no paraphrase tool work has
been yet done for the Hindi language. This
is the first footstep towards detecting
Paraphrases in Hindi sentences.
DESIGN METHODOLOGY
PROBLEM
STATEMENT
 The problem addressed by
this application is the need
for a tool that can
paraphrase Hindi text and
convert it into audio.
 This can be useful for
individuals who are
visually impaired or have
difficulty reading Hindi
text. Additionally, it can be
used to simplify complex
Hindi text, making it easier
for non-native speakers to
understand.
TECHNOLOGIES USED

 The main technologies used in this application is PYTHON


3.10.5.
 We have used pretrained dataset modle.
 GTP-3 pre trained model
 Artificial intelligence
 And use Natural language Processing (NLP)
 The given code is an implementation of a
web application that can paraphrase
Hindi text using OpenAI's GPT-3. The
application takes input in the form of
Hindi text and paraphrase into same
meaning but different.
 The text is first transcribed (if audio is
given as input), and then translated to
PROPOSED Hindi using the Helsinki-NLP/opus-mt-
mr-en pipeline.

METHOD’S  The resulting Hindi text is cleaned and


used as input to the GPT-3 model for
paraphrasing. The paraphrased Hindi text
is then translated back to Hindi sentence
using the Helsinki-NLP/opus-mt-en-mr
pipeline.

Screen shot of project
RESEARCH PAPER STATUS
Conclusion

 The project's main objective was to develop a system


that could generate paraphrases of Hindi sentences
using the power of deep learning and NLP. Hindi
paraphrasing tool that uses a pretrained dataset can
greatly help with language-related tasks in Hindi.

 It uses advanced technology to generate different ways


of saying the same thing, which saves time and makes
writing easier. It can be a useful tool for content
creators, writers who are stuck, and people who want to
improve their vocabulary. Though there are challenges,
ongoing research can make the tool even better,
benefiting productivity, language learning, and the
overall improvement of Hindi language processing.
REFERENCES
 Atoum, I., & Otoom, A. (2016). Efficient hybrid semantic text similarity using WordNet and a
corpus. Int. J. Adv. Comput. Sci. Appl, 7(9), 124-130.
 Gunasinghe, U. L. D. N., De Silva, W. A. M., de Silva, N. H. N. D., Perera, A. S., Sashika, W.
A. D., & Premasiri, W. D. T. P. (2014, December). Sentence similarity measuring by vector
space model. In 2014 14th International Conference on Advances in ICT for Emerging
Regions (ICTer) (pp. 185-189). IEEE.
 Sarkar, S., Saha, S., Bentham, J., Pakray, P., Das, D., & Gelbukh, A. F. (2016, January). NLP-
NITMZ@ DPIL-FIRE2016: Language Independent Paraphrases Detection. In FIRE
(Working Notes) (pp. 256-259).
 Fernando, S., & Stevenson, M. (2008, March). A semantic similarity approach to paraphrase
detection. In Proceedings of the 11th annual research colloquium of the UK special interest
group for computational linguistics (pp. 45-52).
 Kozareva, Z., & Montoyo, A. (2006, August). Paraphrase identification on the basis of
supervised machine learning techniques. In International conference on natural language
processing (in Finland) (pp. 524-533). Springer, Berlin, Heidelberg.
 Lee, J. C., & Cheah, Y. (2015). Paraphrase detection using string similarity with synonyms.
 In The Fourth Asian Conference on Information Systems, ACIS.
 Nguyen-Son, H. Q., Miyao, Y., & Echizen, I. (2015, October). Paraphrase detection based on
identical phrase and similar word matching. In Proceedings of the 29th Pacific Asia
Conference on Language, Information and Computation (pp. 504-512).

You might also like