0% found this document useful (0 votes)
20 views8 pages

Laboratory Practice VI Natural Language Processing

The document outlines a mini project on developing a sentence autocompletion model using Natural Language Processing (NLP) techniques. The project aims to enhance user experience by providing automatic suggestions for frequently asked questions in customer service interactions. It details the dataset used, the implementation of TF-IDF for text processing, and the results demonstrating the effectiveness of the autocomplete feature in improving typing efficiency.

Uploaded by

Bhushan Mahajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Laboratory Practice VI Natural Language Processing

The document outlines a mini project on developing a sentence autocompletion model using Natural Language Processing (NLP) techniques. The project aims to enhance user experience by providing automatic suggestions for frequently asked questions in customer service interactions. It details the dataset used, the implementation of TF-IDF for text processing, and the results demonstrating the effectiveness of the autocomplete feature in improving typing efficiency.

Uploaded by

Bhushan Mahajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Laboratory Practice VI

Natural Language Processing

Mini Project
Sentence Autocompletion

Group Members
Omkar Jagtap(19CO033)
Shreya Jagtap(19CO035)
Abhishek Mulik(19CO051)
Tanvi Paigude(19CO066)

Class: BE Computer A

Faculty:
Aim:
To develop a sentence autocompletion model.

Abstract:
Imagine that you were a representative replying to customer online and you are
asking the same questions over and over to your customer. Would you like to get
automatic suggestions instead of typing the same thing again and again ?

An autocomplete can be helpful, faster, convenient, and correct any grammatical /


spelling error at the same time.

Introduction:
Autocomplete is a user interface function in which an application predicts a
word or phrase that the user needs to type without the user having to type it
entirely.
In modern applications, word completion or autocomplete or autosuggest is a
popular user interface feature. Its aim is to predict what the user wants to type
and add sections of the text automatically.
By providing available options, the aim is to speed up typing, assist those with
typing problems, correct/prevent spelling errors, and promote information
retrieval. Witten and Daraghs’ work on the Reactive Keyboard from 1983 may
be the earliest example of the concept. Several other methods have been
identified since then, but the basic concept has remained the same.
Word processors (MS Word, OpenOffice.org), programming editors (Emacs,
Eclipse), desktop applications (web browsers, e-mail clients), HTML form
elements on websites, web applications (Google Suggest, web-based e-mail
clients), mobile phone interfaces, Unix terminals, and so on all have the
feature.

Whenever you search something on Google, after typing 2-3 letters, it indicates
the feasible search terms. Or in case you look for something with typos, it
corrects them and still finds relevant results for you. Isn’t it amazing?

It is something that makes us every day however by no means will pay lots of
interest to it. It is an important application of natural language processing and
a splendid occurrence of what it is far meaning for a great many all
throughout the planet, including you and me. Search autocomplete and
autocorrect each help usi discovering right results much productively.
Presently, various gatherings have additionally begun utilizing this element on
their sites, as Facebook and Quora.

Dataset
The file contains 22K conversations between a customer and a representative.
For the purpose of this project, we are only interested in completing the
threads of the representative.
Data Selection and Cleaning:
The information is reaching to partition the strings from the client and the
representative, separate the sentences based on the accentuation (we are
going to keep the accentuation), the ultimate content will be cleaned up with a
few light regex and as it were the sentence bigger than 1 word will be kept.
Finally, since the agent has the inclination to inquire the same question over
and over once more, the autocomplete is amazingly valuable by proposing a
complete sentence. In our case, we are going to check the number of events of
the same sentence so ready to utilize it as a highlight afterward and erase the
duplicates.

Implementation:
Generate TFI-DF vectorizer:
In information retrieval, tf–idf or TFIDF, full forming as term frequency– inverse
document frequency, is nothing but a numerical statistic that is intended to
reflect how important a word is to a document in a collection or corpus. The
most common use of the tf-idf. The tf–idf value increases in direct proportion
to the number of times a term appears in the document and is compensated
by the number of documents in the corpus that contain the word, which tends
to justify the fact that certain words appear more frequently in general.TF-IDF
weight speaks to the relative significance of a term within the document and
whole corpus. TF stands for Term

Frequency:
It calculates how as often as possible a term appears in a document. Since,
each document size varies, a term may show up more in a long-sized document
than
a brief one. In this way, the length of the document frequently separates Term
frequency.

IDF (Inverse Document Frequency):


When a word appears in all the records, it is of no use. The words "the," "an,"
"on," "of," and "a" are just a few examples. They appear often in a text but are
of minor significance. The importance of these terms is reduced by IDF, while
the importance of uncommon terms is increased. The more the value of IDF,
the more distinct the term becomes.

Term Frequency-Inverse Report Recurrence:


TF-IDF works by penalizing the foremost commonly occurring words by
allotting them less weightage whereas giving high weightage to terms, which
are present within the legitimate subset of the corpus, and has high event in a
specific document. Frequency and Inverse Document Frequency is the product.
Use of Ranking Function
The autocomplete is calculating the similarity between the sentence in the
data and the prefix of the sentence written by the representative. As a weight
feature, we chose to reorder using the frequency of the most common similar
sentence. Cosine similarity is the easiest approach to ascertain the similarity, in
this technique, in the query for each sentence, we add tf-idf values of the
tokens. For example, if the query “hello world,” we need to check in every
sentence if these words exist and if the word exists, then we calculate
similarity score using linear kernel we will sort and take the top 3 sentences.
Results:
Conclusion:
In our project we gave successfully implemented autocomplete using NLP.
As seen, NLP is an important part of our life and autocomplete is one of its
applications which is of real help to us and has made human typing easier.
This project helps save a lot of time and effort in commercial world as well.

You might also like