0% found this document useful (0 votes)
4 views22 pages

CS255 Lecture 01.2 - NLP Pipleline

NLP Pipeline

Uploaded by

Ahmar Rashid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views22 pages

CS255 Lecture 01.2 - NLP Pipleline

NLP Pipeline

Uploaded by

Ahmar Rashid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

NLP Pipeline

Photo by Pexels
NLP Pipeline
1. Data Acquisition Data
2. Text Preparation Acquisition
a) Text Cleanup
b) Basic Preprocessing
c) Advanced Preprocessing
3. Feature Engineering
Text Deployment
4. Modeling Preparation
a) Model Building
b) Evaluation
5. Deployment
a) Model Deployment
Monitoring Feature
b)
Modeling
c) Model Update Engineering

Natural Language Processing In 5 Minutes


NLP Pipeline: Data Acquisition
Data Acquisition

Available Others Have It Not Available

In your Less Public Data


At Set Gather
Company than Web
your Data
Databas Require Scrapping
Table using
e d API
Data Data
Engineeri Augmentati PDF
Use Survey
ng on Image Forms etc.
Back Audio
Synony Bigra Addition
Translat
ms m Flip al Noise
e Speech-to-Text
Source: CampusX - Text Preprocessing | NLP Course Lecture 3
NLP Pipeline: Text Preparation
Text Preparation

Basic Text Advanced Text


Basic Cleanup Preprocessing
Preprocessing
HTML Tag Basic Optional PoS Tagging
Cleaning
Tokenizatio Stop Word Removal
Emoji n Stemming Parsing
Cleaning
Lemmatization
Sentenc Wor Removing Coreference
Spell e d Resolution
Punctuations,
Checking
Digits, etc.
Lower/Upper Casing
Language Detection
Source: CampusX - Text Preprocessing | NLP Course Lecture 3
NLP Pipeline: Feature Engineering
Feature Engineering is Problem A basic Technique
Specific
Review Text Sentiment +ve Words -ve Words Neutral Sentiment
Review 1 0 (-ve Words
Sentiment) 8 2 7 0
Review 2 1 (+eve
Sentiment) 6 7 3 1

(5000,2) (5000,4)
ML Based DL Based
Bag of Words Feature Feature
Engineering Engineering
TFIDF Manual Automatic
Advance
Selection Selection
d One Hot Adv: Adv: Domain
Techniqu Knowledge Not
Encoding (Interpretable): You
es know your Features Required
(OHE) DisAdv: (Not
DisAdv: Domain
Word2Vec Interpretable): You
Knowledge
don’t know your
Required
Source: CampusX - Text Preprocessing | NLP Course Lecture 3 Features
NLP Pipeline: Modeling
Modeling

Model Buildup Evaluation

Heuristi ML Algo DL Cloud


c Algo API
Almost no Some A lot of
e.g., BERT
Data Data Data
Available Available Transfer
Available
+ve -ve Words From Learning
Words Spammer
? Heuristi
4 6 0 c +
ML Algo
3 5 1
Source: CampusX - Text Preprocessing | NLP Course Lecture 3
NLP Pipeline: Modeling
Modeling

Model Buildup Evaluation

Heuristi ML Algo DL Cloud Intrinsic Extrins


c Algo API Model ic
Business
Almost no Some A lot of Centric Centric
Data Data Data
Available Available
AvailableUses How many times
Transfer Perplexit As user actually
Learning y Uses the suggestion?

Tex Gen. Example


Source: CampusX - Text Preprocessing | NLP Course Lecture 3
NLP Pipeline: Deployment
Deployment

Deploy Monitoring Update

DashBoard Update
API Model/Product
Based upon new
Microservice
data/environme
Deployed on
Intrinsic/Extrinsic nt
Cloud DashBoard
Evals.

Chatbot

Source: CampusX - Text Preprocessing | NLP Course Lecture 3


NLP Text Processing Steps
1. Segmentation Segmentatio
Tokenization Stop Words
n
2. Tokenization
3. Stop Words
4. Stemming
5. Lemmatization Stemming
6. Speech Tagging
7. Entity Tagging

Entity Speech Lemmatizati


Tagging Tagging on

Source: CampusX - Text Preprocessing | NLP Course Lecture 3


NLP Text Processing Steps
 1. Segmentation
 The first step is segmentation, which is breaking down the
entire document into its constituent sentences.
 This can be done by segmenting the article along its
punctuations, like full stops and commas.

Natural Language Processing In 5 Minutes


NLP Text Processing Steps
 2. Tokenization
 For the algorithm, to understand these sentences, we get the
words in the sentence, and explain them individually to our
algorithm.
 So, we break down our sentence into its constituent words
and store them, which is called tokenization, where a word
is called a token.

Natural Language Processing In 5 Minutes


NLP Text Processing Steps
 3. Stop Words
 We can make the learning process faster by getting rid of non-
essential words, which do not add much to our statement and are
just to make our statement sound more cohesive
 These words, such as ‘are’ , ‘and’, ‘the’, are called stop words.

Natural Language Processing In 5 Minutes


NLP Text Processing Steps
 4. Stemming
 Once, we have the basic form of our document, we need
to explain it to our machine.
 We first start off by explaining that some words like
‘skipping’, ‘skips’, ‘skipped’, are the same words, with
added prefixes or suffixes. This is called stemming.

Natural Language Processing In 5 Minutes


NLP Text Processing Steps
 5. Lemmatization
 We also identify the base words for different words, like
‘mood’, ‘gender’, etc.
 This is called lemmatization, stemming from the base word
lemma.

Natural Language Processing In 5 Minutes


NLP Text Processing Steps
 6. Speech Tagging
 Then, we explain the concept of nouns, verbs, articles and
other parts of speech to the machine
 by adding these tags to our words.
 This is called part of speech tagging.

Natural Language Processing In 5 Minutes


NLP Text Processing Steps
 7. Entity Tagging
 Next, we introduce our machine to specific references
and everyday names by flagging names of movies,
important personalities, or locations etc., that may occur
in the document. This is called named entity tagging.

Natural Language Processing In 5 Minutes


NLP Model Buildup
 Once, we have our base words and tags,
 We use a machine learning algorithm like naïve Bayes to
teach our model, human sentiment, speech, etc.

Xi , Xj are conditionally independent given C


Clas
s

X1 X2 ... Xn

Natural Language Processing In 5 Minutes


NLP Steps
 At the end of the day, most of the techniques used in NLP
are the simple grammar techniques, which we have been
taught in the school.
 Question?
 Which of the following NLP techniques are used to obtain
words from sentences
a) Stemming
b) Tokenization
c) Lemmetization
d) Segmentation

Natural Language Processing In 5 Minutes


NLP Market
 There
is an increasing demand in the market for
automated language solutions
 Companies are actively looking for NLP experts to
join them, and are prepared to offer highly
lucrative salaries

Natural Language Processing In 5 Minutes


C-3PO: A Familiar Face
Golden Hospitality
Robot
C-3PO is a well-known character from
the Star Wars franchise, loved by fans
all over the world.
 As a golden robot, he serves as a
protocol droid and provides assistance
in various situations.
 His distinct personality, fluent speech,
and human-like manners make him
instantly recognizable.
 C-3PO's presence adds a touch of
humor, warmth, and relatability to the Photo by Pexels

Star Wars universe.


The Reality of Human-Like Machines
 Talking Machines - A Reality
 Machines today can talk and
respond to us in a human-like
manner.
 This technological advancement
is no longer just a concept.
 Chatbots, voice assistants, and
virtual agents are examples of
such machines.
 They mimic human
conversation and provide Photo by Pexels

seamless interactions.
Unleash the Power of NLP
 NLP is easy to learn and provides
a powerful set of tools for
language processing.
 Automated language solutions
using NLP can revolutionize
industries such as customer
service, sentiment analysis, and
content generation.
 By becoming an expert in NLP,
you can leverage its potential to
create intelligent chatbots,
automate language translations, Photo by Pexels
and improve text analysis.

You might also like