Applications of NLP
Applications of NLP
Tushar B. Kute,
http://tusharkute.com
NLP Applications
NLP Applications
• Additional Applications:
• Speech Recognition: Converting spoken language into text
format, enabling voice-enabled applications.
• Text-to-Speech (TTS): Converting written text into spoken
language for applications like audiobooks or assistive
technologies.
• Optical Character Recognition (OCR): Extracting text from
images or scanned documents.
• Author Identification: Identifying the author of a text based
on stylistic patterns.
• Information Retrieval: Finding relevant documents or
information from large collections of text data.
Information retrieval
• Process:
– User Query: The user submits a query that
specifies their information need.
– This query can be a simple keyword search or a
more complex phrase expressing a specific topic
or question.
Information retrieval
• Retrieval Process:
• The IR system retrieves a set of documents or data items that
are potentially relevant to the query. This might involve
techniques like:
– Indexing: Preprocessing and storing information about
documents in a structured way to facilitate efficient retrieval.
– Matching: Comparing the user's query with the indexed
information to identify documents with a high degree of
relevance. Different matching algorithms can be used based
on keywords, phrases, or semantic similarity.
– Ranking: Ranking the retrieved documents based on their
estimated relevance to the user's query. This ranking helps
users prioritize which documents to examine first.
Information retrieval
• Evaluation:
– The effectiveness of an IR system is often
evaluated by metrics like precision (proportion
of retrieved documents that are relevant) and
recall (proportion of relevant documents that
are retrieved).
Information retrieval
• Applications:
• Web Search Engines: Tools like Google, Bing, and DuckDuckGo
use IR techniques to crawl and index the web, enabling users to
find relevant information through search queries.
• Library Catalogs: Online library catalogs utilize IR to help users
search for books, articles, and other library resources based on
keywords, titles, authors, or other criteria.
• Email Search: Search functionalities within email applications
rely on IR techniques to find specific emails based on keywords
or senders/recipients.
• E-commerce Product Search: Product search on e-commerce
websites uses IR to match user queries with product
descriptions, specifications, and attributes.
Vector Space Model
• Sentences:
– The quick brown fox jumps over the lazy dog.
– The cat chases the mouse and it squeaks
loudly.
Bag of words: Example
N-grams
• For example, the bigrams in the first line of text in the previous
section: “This is not good at all” are as follows:
– “This is”
– “is not”
– “not good”
– “good at”
– “at all”
• Now if instead of using just words in the above example, we use
bigrams (Bag-of-bigrams) as shown above. The model can
differentiate between sentence 1 and sentence 2.
• So, using bi-grams makes tokens more understandable (for
example, “HSR Layout”, in Bengaluru, is more informative than
“HSR” and “layout”)
The TF-IDF Vectorizer
• Document Similarity:
– Once documents and queries are represented as
vectors, VSM calculates the similarity between
them.
– Common similarity measures include cosine
similarity, which considers the angle between
the two vectors in the high-dimensional space. A
higher cosine similarity score indicates a closer
semantic relationship between the document
and the query.
Information Extraction using sequence labelling
• Process:
– Data Preparation:
• Define the information you want to extract
(e.g., names of people, locations,
organizations, dates).
• Annotate a training dataset where each word
or character in a sentence is labeled with its
corresponding role (e.g., "B-PER" for the
beginning of a person's name, "I-PER" for the
middle of a person's name).
Information Extraction using sequence labelling
• Example:
– Sentence: "Barack Obama, the former president
of the United States, visited Paris."
– Labels: "B-PER Barack I-PER Obama B-TITLE
president I-TITLE of the I-ORG United I-ORG
States I-LOC Paris."
– Extracted Information: Person: Barack Obama,
Title: president of the United States, Location:
Paris
Information Extraction using sequence labelling
• Applications:
– Named Entity Recognition (NER): Identifying and
classifying named entities like people,
organizations, locations, dates, etc.
– Relation Extraction: Extracting relationships
between entities (e.g., "works at", "located in").
– Event Extraction: Identifying and classifying events
described in text (e.g., "protest", "financial
transaction").
– Question Answering: Extracting answers to specific
questions from factual text data.
Question answers system
• Person
– E.g., Elvis Presley, Audrey Hepburn, David Beckham
• Organization
– E.g., Google, Mastercard, University of Oxford
• Time
– E.g., 2006, 16:34, 2am
• Location
– E.g., Trafalgar Square, MoMA, Machu Picchu
• Work of art
– E.g., Hamlet, Guernica, Exile on Main St.
How NER used?
• Human resources
– Speed up the hiring process by summarizing
applicants’ CVs; improve internal workflows by
categorizing employee complaints and questions
• Customer support
– Improve response times by categorizing user
requests, complaints and questions and filtering
by priority keywords
How NER used?
• Health care
– Improve patient care standards and reduce workloads by
extracting essential information from lab reports
– Roche is doing this with pathology and radiology reports
• Academia
– Enable students and researchers to find relevant material
faster by summarizing papers and archive material and
highlighting key terms, topics, and themes
– The EU’s digital platform for cultural heritage,
Europeana, is using NER to make historical newspapers
searchable
Pre-processing
• Continued...
– AI Chatbots – AI chatbots will automatically learn
after an initial training period by a bot developer.
– Live Chat – These bots are primarily used by
Sales & Sales Development teams. They can also
be used by Customer Support organizations, as
live chat is a more simplistic chat option to
answer questions in real-time.
Chatbot
Dialogflow Chatbot
Web Resources
https://mitu.co.in
@mituskillologies http://tusharkute.com @mituskillologies