0% found this document useful (0 votes)
18 views6 pages

NLP Ai X

Natural Language Processing (NLP) enables computers to understand and process human languages, with applications including automatic summarization, sentiment analysis, text classification, and virtual assistants. Chatbots are a prominent NLP application, categorized into scriptbots and smart-bots based on their complexity and functionality. Key processes in NLP include text normalization, tokenization, stemming, and lemmatization, which help convert raw data into meaningful information for machine learning techniques.

Uploaded by

avrsrivastav2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views6 pages

NLP Ai X

Natural Language Processing (NLP) enables computers to understand and process human languages, with applications including automatic summarization, sentiment analysis, text classification, and virtual assistants. Chatbots are a prominent NLP application, categorized into scriptbots and smart-bots based on their complexity and functionality. Key processes in NLP include text normalization, tokenization, stemming, and lemmatization, which help convert raw data into meaningful information for machine learning techniques.

Uploaded by

avrsrivastav2010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Natural Language Processing

NLP (Natural Language Processing), is dedicated to making it possible for computers to


comprehend and process human languages. Artificial intelligence (AI) is a subfield of
linguistics, computer science, information engineering, and artificial intelligence that
studies how computers interact with human (natural) languages, particularly how to
train computers to handle and analyse massive volumes of natural language data.

Applications of NLP

Most people utilize NLP apps on a regular basis in their daily lives.:

Automatic Summarization – Automatic summarization is useful for gathering data from


social media and other online sources, as well as for summarizing the meaning of
documents and other written materials.

Sentiment Analysis – To better comprehend what internet users are saying about a
company’s goods and services, businesses use natural language processing tools like
sentiment analysis to understand the customer requirement.

Indicators of their reputation – Sentiment analysis goes beyond establishing simple


polarity to analyse sentiment in context to help understand what is behind an
expressed view. This is very important for understanding and influencing purchasing
decisions.

Text classification – Text classification enables you to classify a document and organise
it to make it easier to find the information you need or to carry out certain tasks. Spam
screening in email is one example of how text categorization is used.

Virtual Assistants – These days, digital assistants like Google Assistant, Cortana, Siri,
and Alexa play a significant role in our lives. Not only can we communicate with them,
but they can also facilitate our life.

Chatbots
A chatbot is one of the most widely used NLP applications. Many chatbots on the
market now employ the same strategy as we did in the instance above.

• Mitsuku Bot*
https://www.pandorabots.com/mitsuku/

• CleverBot*
https://www.cleverbot.com/

• Jabberwacky*
http://www.jabberwacky.com/

• Haptik*
https://haptik.ai/contact-us

• Rose*
http://ec2-54-215-197-164.us-west-1.compute.amazonaws.com/speech.php

• Ochatbot*
https://www.ometrics.com/blog/list-of-fun-chatbots/

There are 2 types of chatbots

Scriptbot Smart-bot

Script bots are easy to make Smart-bots are flexible and powerful

Script bots work around a script which is Smart bots work on bigger databases and other
programmed in them resources directly

Mostly they are free and are easy to integrate


Smart bots learn with more data
to a messaging platform

No or little language processing skills Coding is required to take this up on board

Limited functionality Wide functionality

Human Language VS Computer Language


Humans need language to communicate, which we constantly process. Our brain
continuously processes the sounds it hears around us and works to make sense of
them. Our brain continuously processes and stores everything, even as the teacher is
delivering the lesson in the classroom.

The Computer Language is understood by the computer, on the other hand. All input
must be transformed to numbers before being sent to the machine. And if a single error
is made while typing, the machine throws an error and skips over that area. Machines
only use extremely simple and elementary forms of communication.

Data Processing

Data Processing is a method of manipulation of data. It means the conversion of raw


data into meaningful and machine-readable content. It basically is a process of
converting raw data into meaningful information.

Since human languages are complex, we need to first of all simplify them in order to
make sure that the understanding becomes possible. Text Normalisation helps in
cleaning up the textual data in such a way that it comes down to a level where its
complexity is lower than the actual data. Let us go through Text Normalisation in detail.

Text Normalisation

The process of converting a text into a canonical (standard) form is known as text
normalisation. For instance, the canonical form of the word “good” can be created from
the words “gooood” and “gud.” Another illustration is the reduction of terms that are
nearly identical, such as “stopwords,” “stop-words,” and “stop words,” to just
“stopwords.”

Sentence Segmentation

Under sentence segmentation, the whole corpus is divided into sentences. Each
sentence is taken as a different data so now the whole corpus gets reduced to
sentences.

Tokenisation

Sentences are first broken into segments, and then each segment is further divided into
tokens. Any word, number, or special character that appears in a sentence is referred
to as a token. Tokenization treats each word, integer, and special character as a
separate entity and creates a token for each of them.

Removing Stopwords, Special Characters and Numbers

In this step, the tokens which are not necessary are removed from the token list. What
can be the possible words which we might not require?
Stopwords are words that are used frequently in a corpus but provide nothing useful.
Humans utilise grammar to make their sentences clear and understandable for the
other person. However, grammatical terms fall under the category of stopwords
because they do not add any significance to the information that is to be
communicated through the statement. Stopwords include a, an, and, or, for, it, is, etc.

Converting text to a common case

After eliminating the stopwords, we change the text’s case throughout, preferably to
lower case. This makes sure that the machine’s case-sensitivity does not treat similar
terms differently solely because of varied case usage.

Stemming

The remaining words are boiled down to their root words in this step. In other words,
stemming is the process of stripping words of their affixes and returning them to their
original forms.

Lemmatization

Stemming and lemmatization are alternate techniques to one another because they
both function to remove affixes. However, lemmatization differs from both of them in
that the word that results from the elimination of the affix (also known as the lemma) is
meaningful.

Bag of Words

A bag-of-words is a textual illustration that shows where words appear in a document.


There are two components: a collection of well-known words. a metric for the amount
of well-known words.

A Natural Language Processing model called Bag of Words aids in the extraction of
textual information that can be used by machine learning techniques. We gather the
instances of each term from the bag of words and create the corpus’s vocabulary.
Document Information
Topic Modelling Stop word filtering
Classification Retrieval System

Helps in classifying the To extract the Helps in removing the


It helps in predicting
type and genre of a important information unnecessary words
the topic for a corpus.
document. out of a corpus. out of a text body.

Here is the step-by-step approach to implement bag of words algorithm:

1. Text Normalisation: Collect data and pre-process it


2. Create Dictionary: Make a list of all the unique words occurring in the corpus.
(Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many
times the word from the unique list of words has occurred.
4. Create document vectors for all the documents.

Term Frequency

The measurement of a term’s frequency inside a document is called term frequency.


The simplest calculation is to count the instances of each word. However, there are
ways to change that value based on the length of the document or the frequency of the
term that appears the most often.

Inverse Document Frequency

A term’s frequency inside a corpus of documents is determined by its inverse


document frequency. It is calculated by dividing the total number of documents in the
corpus by the number of documents that contain the phrase.

You might also like