0% found this document useful (0 votes)

20 views85 pages

Applications of NLP

The document outlines various applications of Natural Language Processing (NLP), including machine translation, chatbots, text classification, sentiment analysis, and text summarization. It also discusses information retrieval techniques, the Vector Space Model, Bag of Words, N-grams, and TF-IDF for document representation and similarity measurement. Additionally, it covers information extraction, question answering systems, and text classification methodologies, highlighting their significance in processing and understanding unstructured text data.

Uploaded by

priti.malkhede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views85 pages

Applications of NLP

Uploaded by

priti.malkhede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Applications of NLP

Tushar B. Kute,
http://tusharkute.com
NLP Applications
NLP Applications

• Text Communication and Interaction:

• Machine Translation: Automatically translating text
between languages, breaking down language barriers.
• Chatbots and Virtual Assistants: Developing chatbots that
can understand and respond to user queries in a
conversational way, or virtual assistants that can perform
tasks based on spoken instructions.
• Text Classification and Spam Filtering: Categorizing text
data like emails or social media posts (e.g., spam detection,
sentiment analysis).
• Text Summarization: Automatically generating concise
summaries of lengthy documents or articles.
NLP Applications

• Language Understanding and Analysis:

– Sentiment Analysis: Extracting sentiment or opinion
(positive, negative, neutral) from text data like reviews,
social media posts, or surveys.
– Named Entity Recognition (NER): Identifying and
classifying named entities in text, such as people,
organizations, locations, dates, monetary values, etc.
– Topic Modeling: Discovering underlying thematic
structures in large collections of documents.
– Part-of-Speech (POS) Tagging: Assigning grammatical
labels (e.g., noun, verb, adjective) to each word in a
sentence for deeper syntactic analysis.
NLP Applications

• Content Creation and Text Generation:

– Machine Writing: Generating different creative text
formats like poems, code, scripts, emails, or
marketing copy, based on specific styles or
instructions.
– Text Paraphrasing and Re-writing: Rewriting
sentences or passages while preserving the meaning
but using different wording or sentence structures.
– Automatic Text Summarization: Creating shorter
versions of lengthy documents or articles that
capture the main points.
NLP Applications

• Additional Applications:
• Speech Recognition: Converting spoken language into text
format, enabling voice-enabled applications.
• Text-to-Speech (TTS): Converting written text into spoken
language for applications like audiobooks or assistive
technologies.
• Optical Character Recognition (OCR): Extracting text from
images or scanned documents.
• Author Identification: Identifying the author of a text based
on stylistic patterns.
• Information Retrieval: Finding relevant documents or
information from large collections of text data.
Information retrieval

• Information retrieval (IR) is the field of computer

science concerned with finding and accessing
information from large collections of data, typically
text-based.
• It's the foundation for many technologies you use
every day, like search engines and library catalogs.
– The primary goal of IR is to identify and deliver
information items (documents, web pages, etc.)
that are relevant to a user's information need.
This need is often expressed as a search query.
Information retrieval

• Process:
– User Query: The user submits a query that
specifies their information need.
– This query can be a simple keyword search or a
more complex phrase expressing a specific topic
or question.
Information retrieval

• Retrieval Process:
• The IR system retrieves a set of documents or data items that
are potentially relevant to the query. This might involve
techniques like:
– Indexing: Preprocessing and storing information about
documents in a structured way to facilitate efficient retrieval.
– Matching: Comparing the user's query with the indexed
information to identify documents with a high degree of
relevance. Different matching algorithms can be used based
on keywords, phrases, or semantic similarity.
– Ranking: Ranking the retrieved documents based on their
estimated relevance to the user's query. This ranking helps
users prioritize which documents to examine first.
Information retrieval

• Evaluation:
– The effectiveness of an IR system is often
evaluated by metrics like precision (proportion
of retrieved documents that are relevant) and
recall (proportion of relevant documents that
are retrieved).
Information retrieval

• Applications:
• Web Search Engines: Tools like Google, Bing, and DuckDuckGo
use IR techniques to crawl and index the web, enabling users to
find relevant information through search queries.
• Library Catalogs: Online library catalogs utilize IR to help users
search for books, articles, and other library resources based on
keywords, titles, authors, or other criteria.
• Email Search: Search functionalities within email applications
rely on IR techniques to find specific emails based on keywords
or senders/recipients.
• E-commerce Product Search: Product search on e-commerce
websites uses IR to match user queries with product
descriptions, specifications, and attributes.
Vector Space Model

• The Vector Space Model (VSM) is a fundamental and

widely used technique in information retrieval (IR)
for representing documents and queries as vectors
in a high-dimensional space.
• VSM represents documents and queries as vectors
in a multi-dimensional space where each dimension
corresponds to a unique term in the vocabulary of
all documents in the collection.
• The weight or value associated with each term in a
document's vector reflects the term's importance or
relevance to that document.
Vector Space Model

• Documents and queries are represented as rows

in a term-document matrix.
• Each column represents a unique term.
• The value at a specific row (document) and
column (term) intersection indicates the weight
of that term within that document.
Vector Space Model

• Term Frequency (TF): The raw frequency of a term's

occurrence within a document. Higher frequency suggests
more relevance.
• Inverse Document Frequency (IDF): Considers the term's
overall importance across the entire document collection.
Terms appearing in many documents have lower IDF
weights, while terms specific to a few documents have
higher weights.
• TF-IDF: Combines TF and IDF, giving more weight to terms
that are frequent within a document but rare across the
entire collection. This helps focus on terms that are
distinctive and informative for that specific document.
Bag of Words
Bag of words

• Bag of words is a Natural Language Processing

technique of text modelling.
• In technical terms, we can say that it is a method
of feature extraction with text data.
• This approach is a simple and flexible way of
extracting features from documents.
Bag of words

• A bag of words is a representation of text that

describes the occurrence of words within a document.
• We just keep track of word counts and disregard the
grammatical details and the word order.
• It is called a “bag” of words because any information
about the order or structure of words in the
document is discarded.
• The model is only concerned with whether known
words occur in the document, not where in the
document.
Bag of words: Why?

• One of the biggest problems with text is that it is messy

and unstructured, and machine learning algorithms
prefer structured, well defined fixed-length inputs and
by using the Bag-of-Words technique we can convert
variable-length texts into a fixed-length vector.
• Also, at a much granular level, the machine learning
models work with numerical data rather than textual
data.
• So to be more specific, by using the bag-of-words (BoW)
technique, we convert a text into its equivalent vector
of numbers.
Bag of words: Example

• Sentences:
– The quick brown fox jumps over the lazy dog.
– The cat chases the mouse and it squeaks
loudly.
Bag of words: Example
N-grams

• Again same questions, what are n-grams and why

do we use them? Let us understand this with an
example below-

• Sentence 1: “This is a good job. I will not miss it for

anything”

• Sentence 2: ”This is not good at all”

N-grams

• For this example, let us take the vocabulary of 5 words

only. The five words being-
– good
– job
– miss
– not
– all
• So, the respective vectors for these sentences are:
“This is a good job. I will not miss it for
anything”=[1,1,1,1,0]
”This is not good at all”=[1,0,0,1,1]
N-grams

• Can you guess what is the problem here? Sentence 2 is a

negative sentence and sentence 1 is a positive sentence.
Does this reflect in any way in the vectors above? Not at
all.
• So how can we solve this problem? Here come the N-
grams to our rescue.
• An N-gram is an N-token sequence of words: a 2-gram
(more commonly called a bigram) is a two-word
sequence of words like “really good”, “not good”, or
“your homework”, and a 3-gram (more commonly called a
trigram) is a three-word sequence of words like “not at
all”, or “turn off light”.
N-grams

• For example, the bigrams in the first line of text in the previous
section: “This is not good at all” are as follows:
– “This is”
– “is not”
– “not good”
– “good at”
– “at all”
• Now if instead of using just words in the above example, we use
bigrams (Bag-of-bigrams) as shown above. The model can
differentiate between sentence 1 and sentence 2.
• So, using bi-grams makes tokens more understandable (for
example, “HSR Layout”, in Bengaluru, is more informative than
“HSR” and “layout”)
The TF-IDF Vectorizer

• The TF*IDF algorithm is used to weigh a keyword in

any document and assign the importance to that
keyword based on the number of times it appears
in the document.
• Put simply, the higher the TF*IDF score (weight),
the rarer and more important the term, and vice
versa.
• Each word or term has its respective TF and IDF
score. The product of the TF and IDF scores of a
term is called the TF*IDF weight of that term.
The TF-IDF Vectorizer

• The TF (term frequency) of a word is the number of times

it appears in a document. When you know it, you’re able
to see if you’re using a term too often or too
infrequently.
– TF(t) = (Number of times term t appears in a
document) / (Total number of terms in the document).
• The IDF (inverse document frequency) of a word is the
measure of how significant that term is in the whole
corpus.
– IDF(t) = log_e(Total number of documents / Number
of documents with term t in it).
The TF-IDF Vectorizer
Example:

• 1. It was a beautiful rainy day that made by whole

day awesome.
• 2. We made it awesome by adding more flavors on
that day.
Example:
Vector Space Model

• Document Similarity:
– Once documents and queries are represented as
vectors, VSM calculates the similarity between
them.
– Common similarity measures include cosine
similarity, which considers the angle between
the two vectors in the high-dimensional space. A
higher cosine similarity score indicates a closer
semantic relationship between the document
and the query.
Information Extraction using sequence labelling

• Information extraction (IE) using sequence labeling

is a powerful technique for automatically extracting
specific pieces of information from text data.
• Sequence labeling models process text data one
element (word or character) at a time, predicting a
label for each element that indicates its role in the
information you want to extract.
Information Extraction using sequence labelling

• Process:
– Data Preparation:
• Define the information you want to extract
(e.g., names of people, locations,
organizations, dates).
• Annotate a training dataset where each word
or character in a sentence is labeled with its
corresponding role (e.g., "B-PER" for the
beginning of a person's name, "I-PER" for the
middle of a person's name).
Information Extraction using sequence labelling

• Sequence Labeling Model:

– Popular choices include:
• Bidirectional Long Short-Term Memory (BiLSTM): A
recurrent neural network (RNN) architecture that can
capture contextual information from both directions
of the sentence.
• Conditional Random Fields (CRFs): Probabilistic
graphical models that consider dependencies between
labels for consecutive elements in the sequence.
– The model is trained on the annotated data, learning to
predict the correct label for each element in a new
unseen sentence.
Information Extraction using sequence labelling

• Example:
– Sentence: "Barack Obama, the former president
of the United States, visited Paris."
– Labels: "B-PER Barack I-PER Obama B-TITLE
president I-TITLE of the I-ORG United I-ORG
States I-LOC Paris."
– Extracted Information: Person: Barack Obama,
Title: president of the United States, Location:
Paris
Information Extraction using sequence labelling

• Applications:
– Named Entity Recognition (NER): Identifying and
classifying named entities like people,
organizations, locations, dates, etc.
– Relation Extraction: Extracting relationships
between entities (e.g., "works at", "located in").
– Event Extraction: Identifying and classifying events
described in text (e.g., "protest", "financial
transaction").
– Question Answering: Extracting answers to specific
questions from factual text data.
Question answers system

• Question answering (QA) systems are computer

programs designed to automatically answer
questions posed in natural language.
• They aim to bridge the gap between humans
and information retrieval by providing more
direct and user-friendly access to knowledge.
Question answers system

• Users submit questions in natural language

(e.g., "What is the capital of France?").
• The system understands the question's intent
and retrieves relevant information from a
knowledge base or vast collection of text data.
• The system processes the information and
generates an answer that directly addresses the
user's query.
QA System : Approaches

• Retrieval-Based QA: Focuses on finding documents or

passages containing the answer to the question. This
might involve information retrieval techniques and
keyword matching.
• Knowledge-Based QA: Leverages a structured knowledge
base containing information about entities, relationships,
and facts. The system queries the knowledge base to find
answers based on the question's meaning.
• Generative QA: Utilizes natural language generation
techniques to formulate an answer directly, even if the
answer isn't explicitly stated in the knowledge base. This
often involves deep learning models.
QA System : Applications

• Search Engines: Many search engines incorporate QA

functionalities to provide more user-friendly and
informative answers to search queries.
• Virtual Assistants: Chatbots and virtual assistants leverage
QA systems to answer user questions and complete tasks
based on natural language instructions.
• Education and Training: Educational platforms can utilize
QA systems to provide immediate feedback and answer
student questions on various topics.
• Customer Service: QA systems can be integrated into
customer service chatbots to answer frequently asked
questions and provide support.
Text Classification / Categorization

• Text Classification is the processing of labeling or

organizing text data into groups.
• It forms a fundamental part of Natural Language
Processing. In the digital age that we live in we are
surrounded by text on our social media accounts, in
commercials, on websites, Ebooks, etc.
• The majority of this text data is unstructured, so
classifying this data can be extremely useful.
Text Classification
Text Classification: Applications

• Spam detection in emails

• Sentiment analysis of online reviews
• Topic labeling documents like research papers
• Language detection like in Google Translate
• Age/gender identification of anonymous users
• Tagging online content
• Speech recognition used in virtual assistants like
Siri and Alexa
Rule Based Approach

• These approaches make use of handcrafted linguistic

rules to classify text.
• One way to group text is to create a list of words
related to a certain column and then judge the text
based on the occurrences of these words.
• For example, words like “fur”, “feathers”, “claws”, and
“scales” could help a zoologist identify texts talking
about animals online.
• These approaches require a lot of domain knowledge
to be extensive, take a lot of time to compile, and are
difficult to scale.
Machine Learning Approach
Text Summarization

• Text summarization is the process of generating

short, fluent, and most importantly accurate
summary of a respectively longer text document.
• The main idea behind automatic text summarization
is to be able to find a short subset of the most
essential information from the entire set and
present it in a human-readable format.
• As online textual data grows, automatic text
summarization methods have the potential to be
very helpful because more useful information can be
read in a short time.
Text Summarization
Why Auto Text Summarization?

• Summaries reduce reading time.

• When researching documents, summaries make the
selection process easier.
• Automatic summarization improves the effectiveness of
indexing.
• Automatic summarization algorithms are less biased than
human summarization.
• Personalized summaries are useful in question-answering
systems as they provide personalized information.
• Using automatic or semi-automatic summarization systems
enables commercial abstract services to increase the number
of text documents they are able to process.
Text Summarization Types
Text Summarization

• Based on input type:

– Single Document, where the input length is
short. Many of the early summarization
systems dealt with single-document
summarization.
– Multi-Document, where the input can be
arbitrarily long.
Text Summarization

• Based on the purpose:

– Generic, where the model makes no assumptions about the
domain or content of the text to be summarized and treats
all inputs as homogeneous. The majority of the work that
has been done revolves around generic summarization.
– Domain-specific, where the model uses domain-specific
knowledge to form a more accurate summary. For example,
summarizing research papers of a specific domain,
biomedical documents, etc.
– Query-based, where the summary only contains information
that answers natural language questions about the input
text.
Text Summarization

• Based on output type:

– Extractive, where important sentences are selected
from the input text to form a summary. Most
summarization approaches today are extractive in
nature.
– Abstractive, where the model forms its own phrases
and sentences to offer a more coherent summary,
like what a human would generate. This approach is
definitely more appealing, but much more difficult
than extractive summarization.
TextRank Algorithm

• TextRank is an extractive summarization

technique.
• It is based on the concept that words which
occur more frequently are significant. Hence,
the sentences containing highly frequent words
are important .
• Based on this , the algorithm assigns scores to
each sentence in the text . The top-ranked
sentences make it to the summary.
TextRank Algorithm
Sentiment Analysis

• Sentiment analysis, also known as opinion mining, is

a technique in natural language processing (NLP)
that aims to understand the emotional tone or
opinion expressed in a piece of text.
• It analyzes text data to classify the sentiment as
positive, negative, or neutral.
Sentiment Analysis
Sentiment Analysis: Approaches

• Lexicon-Based Approach: Relies on sentiment lexicons,

which are large dictionaries containing words with
predefined sentiment scores (positive, negative, or
neutral). The sentiment score of a text is calculated
based on the sentiment scores of the words it contains.
• Machine Learning Approach: Trains machine learning
models on labeled data sets of text with known
sentiment. These models can then be used to classify
the sentiment of new, unseen text data. This approach
can be more nuanced than lexicon-based methods.
Sentiment Analysis: Approaches
Sentiment Analysis: Applications

• Understanding Customer Reviews: Businesses can analyze

customer reviews of products or services to gauge overall
sentiment and identify areas for improvement.
• Social Media Monitoring: Brands can track sentiment on
social media platforms to understand public perception
and respond to negative feedback.
• Market Research: Analyzing online opinions can help
understand customer preferences and inform marketing
strategies.
• News Analysis: Sentiment analysis can be used to
understand the overall tone of news articles or social
media discussions about current events.
Sentiment Analysis: Challenges

• Sarcasm and Irony: Text can be subjective and

contain sarcasm or irony, which can be difficult for
sentiment analysis tools to detect.
• Context and Nuance: Sentiment analysis might not
always capture the full context of a situation or the
subtle nuances of human language.
• Multilingual Sentiment Analysis: Analyzing
sentiment across different languages presents
additional challenges due to cultural and linguistic
variations.
Named Entity Recognition

• Named entity recognition (NER) — sometimes referred

to as entity chunking, extraction, or identification — is
the task of identifying and categorizing key information
(entities) in text.
• An entity can be any word or series of words that
consistently refers to the same thing. Every detected
entity is classified into a predetermined category.
• For example, an NER machine learning (ML) model
might detect the word “MITU Skillologies” in a text and
classify it as a “Company”.
Named Entity Recognition

• NER is a form of natural language processing

(NLP), a subfield of artificial intelligence.
• NLP is concerned with computers processing
and analyzing natural language, i.e., any
language that has developed naturally, rather
than artificially, such as with computer coding
languages.
Named Entity Recognition
Named Entity Recognition

• Person
– E.g., Elvis Presley, Audrey Hepburn, David Beckham
• Organization
– E.g., Google, Mastercard, University of Oxford
• Time
– E.g., 2006, 16:34, 2am
• Location
– E.g., Trafalgar Square, MoMA, Machu Picchu
• Work of art
– E.g., Hamlet, Guernica, Exile on Main St.
How NER used?

• NER is suited to any situation in which a high-

level overview of a large quantity of text is
helpful.
• With NER, you can, at a glance, understand the
subject or theme of a body of text and quickly
group texts based on their relevancy or
similarity.
How NER used?

• Human resources
– Speed up the hiring process by summarizing
applicants’ CVs; improve internal workflows by
categorizing employee complaints and questions
• Customer support
– Improve response times by categorizing user
requests, complaints and questions and filtering
by priority keywords
How NER used?

• Search and recommendation engines

– Improve the speed and relevance of search
results and recommendations by summarizing
descriptive text, reviews, and discussions
– Booking.com is a notable success story here
• Content classification
– Surface content more easily and gain insights
into trends by identifying the subjects and
themes of blog posts and news articles
How NER used?

• Health care
– Improve patient care standards and reduce workloads by
extracting essential information from lab reports
– Roche is doing this with pathology and radiology reports
• Academia
– Enable students and researchers to find relevant material
faster by summarizing papers and archive material and
highlighting key terms, topics, and themes
– The EU’s digital platform for cultural heritage,
Europeana, is using NER to make historical newspapers
searchable
Pre-processing

• To prepare the text data for the model building we

perform text preprocessing. It is the very first step of
NLP projects. Some of the preprocessing steps are:
– Removing punctuations like . , ! $( ) * % @
– Removing URLs
– Removing Stop words
– Lower casing
– Tokenization
– Stemming
– Lemmatization
Why Pre-processing?

• Significance of text preprocessing in the

performance of models.
• Data preprocessing is an essential step in building a
Machine Learning model and depending on how
well the data has been preprocessed; the results
are seen.
• In NLP, text preprocessing is the first step in the
process of building a model.
NLTK

• The Natural Language Toolkit, or more commonly

NLTK, is a suite of libraries and programs for symbolic
and statistical natural language processing (NLP) for
English written in the Python programming language.
• It was developed by Steven Bird and Edward Loper in
the Department of Computer and Information
Science at the University of Pennsylvania.
• NLTK includes graphical demonstrations and sample
data. It is accompanied by a book that explains the
underlying concepts behind the language processing
tasks supported by the toolkit, plus a cookbook.
NLTK

• NLTK is intended to support research and teaching in

NLP or closely related areas, including empirical
linguistics, cognitive science, artificial intelligence,
information retrieval, and machine learning.
• NLTK has been used successfully as a teaching tool, as
an individual study tool, and as a platform for
prototyping and building research systems.
• There are 32 universities in the US and 25 countries
using NLTK in their courses.
• NLTK supports classification, tokenization, stemming,
tagging, parsing, and semantic reasoning functionalities
nltk.org
Install nltk

• !pip install nltk -U

• Installing nltk packages

– import nltk
– nltk.download(‘package-name’)
Using Python Scripts
Using Python Scripts
Chatbot

• Chatbots are software applications that use

artificial intelligence & natural language
processing to understand what a human wants,
and guides them to their desired outcome with
as little work for the end user as possible.
• Like a virtual assistant for your customer
experience touchpoints.
Chatbot

• A well designed & built chatbot will:

– Use existing conversation data (if available)
to understand the type of questions people
ask.
– Analyze correct answers to those questions
through a ‘training’ period.
– Use machine learning & NLP to learn context,
and continually get better at answering those
questions in the future.
Chatbot
Chatbot

• One of the most interesting parts of the chatbot

software space is the variety of ways you can build a
chatbot.
• The underlying technology can vary quite a bit, but it
really all comes down to what your goals are. At the
highest level, there are three types of chatbots most
consumers see today:
– Rules-Based Chatbots – These chatbots follow pre-
designed rules, often built using a graphical user
interface where a bot builder will design paths using
a decision tree.
Chatbot

• Continued...
– AI Chatbots – AI chatbots will automatically learn
after an initial training period by a bot developer.
– Live Chat – These bots are primarily used by
Sales & Sales Development teams. They can also
be used by Customer Support organizations, as
live chat is a more simplistic chat option to
answer questions in real-time.
Chatbot
Dialogflow Chatbot

• Dialogflow, a Google Cloud Platform service,

allows you to build conversational interfaces
(chatbots) for various applications like websites,
mobile apps, or messaging platforms.
• It utilizes machine learning to understand user
intents and generate appropriate responses.
Dialogflow Chatbot
Summary

• NLP bridges the language gap, allowing computers to

understand and process human language.
• It powers machine translation, transforming text from one
language to another.
• NLP also fuels chatbots and virtual assistants, enabling
them to respond to our questions and requests in a
conversational manner.
• Furthermore, it empowers sentiment analysis, revealing the
emotions and opinions hidden within text.
• By unlocking the secrets of human language, NLP is
revolutionizing the way we interact with machines and
information in the digital world.
Thank you
This presentation is created using LibreOffice Impress 7.4.1.2, can be used freely as per GNU General Public License

@mITuSkillologies @mitu_group @mitu-skillologies @MITUSkillologies

Web Resources
https://mitu.co.in
@mituskillologies http://tusharkute.com @mituskillologies

[email protected]
[email protected]

PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
Human Rights in The Age of Artificial Intelligence
100% (1)
Human Rights in The Age of Artificial Intelligence
40 pages
Project Report (Amazon Review (Sentiment Analysis) )
No ratings yet
Project Report (Amazon Review (Sentiment Analysis) )
31 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Vik: A Chatbot To Support Patients With Chronic Diseases: Mini Review
No ratings yet
Vik: A Chatbot To Support Patients With Chronic Diseases: Mini Review
4 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Digital Assignment-1 Literature Review On Twitter Sentiment Analysis Name: G.Tirumala Reg No: 16BCE0202 1)
No ratings yet
Digital Assignment-1 Literature Review On Twitter Sentiment Analysis Name: G.Tirumala Reg No: 16BCE0202 1)
9 pages
Knime - Words To Wisdom
100% (2)
Knime - Words To Wisdom
177 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
37 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
A New Approach To Represent Textual Documents Using CVSM
No ratings yet
A New Approach To Represent Textual Documents Using CVSM
6 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Ai - Unit 1
No ratings yet
Ai - Unit 1
37 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
Name: Tran Nguyen Anh Thoai: Course Code: Courseword Leader: Due Date: Centre: Greenwich, HCMC Word
No ratings yet
Name: Tran Nguyen Anh Thoai: Course Code: Courseword Leader: Due Date: Centre: Greenwich, HCMC Word
53 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
Text Mining: Lecturer: Dr. Nguyen Thi Ngoc Anh
No ratings yet
Text Mining: Lecturer: Dr. Nguyen Thi Ngoc Anh
27 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
IRT Unit 5
No ratings yet
IRT Unit 5
31 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Toxic Comments Classification
No ratings yet
Toxic Comments Classification
10 pages
Web Information Retrieval
No ratings yet
Web Information Retrieval
10 pages
Web Application For Screening Resume: January 2019
No ratings yet
Web Application For Screening Resume: January 2019
10 pages
Chapter #7 Applicatios of NLP (Reading Ass)
No ratings yet
Chapter #7 Applicatios of NLP (Reading Ass)
58 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
BDA3
No ratings yet
BDA3
61 pages
Text Databases and Information Retrieval: Riloff, Hollaar@cs - Utah.edu&
No ratings yet
Text Databases and Information Retrieval: Riloff, Hollaar@cs - Utah.edu&
3 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
NLP Ir
No ratings yet
NLP Ir
24 pages
Text Mining
No ratings yet
Text Mining
25 pages
CT075!3!2 DTM Topic 12 Text Data Mining
No ratings yet
CT075!3!2 DTM Topic 12 Text Data Mining
25 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
Application of Semirings: Automata Theory
No ratings yet
Application of Semirings: Automata Theory
6 pages
DS Finalexam (Thxtoshravani)
No ratings yet
DS Finalexam (Thxtoshravani)
31 pages
NLTK 3
No ratings yet
NLTK 3
5 pages
S41467-022-30761-2-Towards Artificial General Intelligence Via A Multimodal Foundation Model
No ratings yet
S41467-022-30761-2-Towards Artificial General Intelligence Via A Multimodal Foundation Model
13 pages
Text Processing For NLP Text Processing
No ratings yet
Text Processing For NLP Text Processing
15 pages
Vinay CSP
No ratings yet
Vinay CSP
70 pages
AI
No ratings yet
AI
13 pages
Summer Intern Report 1
No ratings yet
Summer Intern Report 1
31 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
NLP PHD Thesis
100% (3)
NLP PHD Thesis
6 pages
Applications of AI Assignment I
No ratings yet
Applications of AI Assignment I
7 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
Syntax Parsing
No ratings yet
Syntax Parsing
95 pages
Unit 2 FDS Part A
No ratings yet
Unit 2 FDS Part A
53 pages
Feature Eng
No ratings yet
Feature Eng
34 pages
NLP Module 6
No ratings yet
NLP Module 6
30 pages
Learn 4
No ratings yet
Learn 4
27 pages
Department of Computer Science and Engineering: Detection of Child Predators and Cyber Harassers On Social Media
No ratings yet
Department of Computer Science and Engineering: Detection of Child Predators and Cyber Harassers On Social Media
9 pages
Unit-5 Ai
No ratings yet
Unit-5 Ai
74 pages
Application NLP
No ratings yet
Application NLP
23 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
Resume Parsing Report M
No ratings yet
Resume Parsing Report M
103 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
Unit 2 FDS Part B
No ratings yet
Unit 2 FDS Part B
59 pages
AI's Impact On Digital Marketing
No ratings yet
AI's Impact On Digital Marketing
7 pages
CourseContent MSAI 15aug24
No ratings yet
CourseContent MSAI 15aug24
8 pages
Unit 5 AI
No ratings yet
Unit 5 AI
9 pages
(Ebooks PDF) Download (Ebook PDF) MIS 10th Edition by Hossein Bidgoli Full Chapters
100% (3)
(Ebooks PDF) Download (Ebook PDF) MIS 10th Edition by Hossein Bidgoli Full Chapters
49 pages
Text Classification Using NLP
No ratings yet
Text Classification Using NLP
28 pages
Class 11 AI
No ratings yet
Class 11 AI
9 pages
5 Questions
No ratings yet
5 Questions
19 pages
Ai & ML Unit-3 Ir & Ie
No ratings yet
Ai & ML Unit-3 Ir & Ie
15 pages
Latest AI Trends To Look Forward To in 2025
No ratings yet
Latest AI Trends To Look Forward To in 2025
5 pages
Ai Notes-Viii Complete Notes
No ratings yet
Ai Notes-Viii Complete Notes
6 pages
Unit 3 AI-ML Driven Data Science and Automation
No ratings yet
Unit 3 AI-ML Driven Data Science and Automation
49 pages
Final
No ratings yet
Final
14 pages
Artificial Intelligence in Information Retrieval
No ratings yet
Artificial Intelligence in Information Retrieval
5 pages
Extendible Hashing
No ratings yet
Extendible Hashing
7 pages
Module 5
No ratings yet
Module 5
57 pages
ANNand Its Applications
No ratings yet
ANNand Its Applications
16 pages
AI UNIT-5 Notes
No ratings yet
AI UNIT-5 Notes
27 pages
Research Paper 2
No ratings yet
Research Paper 2
7 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
57 pages
WINSEM2023-24 BCSE306L TH VL2023240500598 2024-04-30 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE306L TH VL2023240500598 2024-04-30 Reference-Material-I
44 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Samaksh Gupta Programming Ass. IR
No ratings yet
Samaksh Gupta Programming Ass. IR
13 pages
Ai Unit5
No ratings yet
Ai Unit5
16 pages
Module 7
No ratings yet
Module 7
53 pages
Adoption of Artificial Intelligence in Library and Information Science in The 21st Century
No ratings yet
Adoption of Artificial Intelligence in Library and Information Science in The 21st Century
11 pages
Lect 5
No ratings yet
Lect 5
40 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
Towards Responsible Machine Translation Ethical and Legal Considerations in Machine Translation Helena Moniz Instant Download
No ratings yet
Towards Responsible Machine Translation Ethical and Legal Considerations in Machine Translation Helena Moniz Instant Download
65 pages
Bulu
No ratings yet
Bulu
47 pages
Module 3-2
No ratings yet
Module 3-2
17 pages
Free Courses in AI ML From MIT Harvard
No ratings yet
Free Courses in AI ML From MIT Harvard
2 pages
Intro To TM
No ratings yet
Intro To TM
32 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Applications of NLP

Uploaded by

Applications of NLP

Uploaded by

Applications of NLP

• Text Communication and Interaction:

• Language Understanding and Analysis:

• Content Creation and Text Generation:

• Information retrieval (IR) is the field of computer

• The Vector Space Model (VSM) is a fundamental and

• Documents and queries are represented as rows

• Term Frequency (TF): The raw frequency of a term's

• Bag of words is a Natural Language Processing

• A bag of words is a representation of text that

• One of the biggest problems with text is that it is messy

• Again same questions, what are n-grams and why

• Sentence 1: “This is a good job. I will not miss it for

• Sentence 2: ”This is not good at all”

• For this example, let us take the vocabulary of 5 words

• Can you guess what is the problem here? Sentence 2 is a

• The TF*IDF algorithm is used to weigh a keyword in

• The TF (term frequency) of a word is the number of times

• 1. It was a beautiful rainy day that made by whole

• Information extraction (IE) using sequence labeling

• Sequence Labeling Model:

• Question answering (QA) systems are computer

• Users submit questions in natural language

• Retrieval-Based QA: Focuses on finding documents or

• Search Engines: Many search engines incorporate QA

• Text Classification is the processing of labeling or

• Spam detection in emails

• These approaches make use of handcrafted linguistic

• Text summarization is the process of generating

• Summaries reduce reading time.

• Based on input type:

• Based on the purpose:

• Based on output type:

• TextRank is an extractive summarization

• Sentiment analysis, also known as opinion mining, is

• Lexicon-Based Approach: Relies on sentiment lexicons,

• Understanding Customer Reviews: Businesses can analyze

• Sarcasm and Irony: Text can be subjective and

• Named entity recognition (NER) — sometimes referred

• NER is a form of natural language processing

• NER is suited to any situation in which a high-

• Search and recommendation engines

• To prepare the text data for the model building we

• Significance of text preprocessing in the

• The Natural Language Toolkit, or more commonly

• NLTK is intended to support research and teaching in

• !pip install nltk -U

• Installing nltk packages

• Chatbots are software applications that use

• A well designed & built chatbot will:

• One of the most interesting parts of the chatbot

• Dialogflow, a Google Cloud Platform service,

• NLP bridges the language gap, allowing computers to

@mITuSkillologies @mitu_group @mitu-skillologies @MITUSkillologies

You might also like