0% found this document useful (0 votes)

112 views12 pages

Chapter-1 Introduction To NLP

This document provides an overview of natural language processing (NLP) and various techniques used in NLP including text preprocessing, syntax analysis, and semantics analysis. Some key techniques discussed are tokenization, lowercasing, stop word removal, stemming, lemmatization, parsing, word segmentation, sentence breaking, and morphological segmentation. The goal of text preprocessing techniques is to transform raw text into a clean, consistent format for further NLP analysis. Syntax analysis examines the structure and arrangement of words while semantics analysis focuses on word meaning and context.

Uploaded by

Sruja Koshti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views12 pages

Chapter-1 Introduction To NLP

Uploaded by

Sruja Koshti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapter-1 Introduction to NLP

 Introduction to Natural Language Processing: -

 NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence.
 It is the technology that is used by machines to understand, analyse, manipulate, and
interpret human's languages.
 It helps developers to organize knowledge for performing tasks such as translation,
automatic summarization, Named Entity Recognition (NER), speech recognition,
relationship extraction, and topic segmentation.

 Text pre-processing: -
 Text data derived from natural language is unstructured and noisy.
 Text pre-processing involves transforming text into a clean and consistent
format that can then be fed into a model for further analysis and learning.
 Text pre-processing techniques may be general so that they are applicable to
many types of applications, or they can be specialized for a specific task.
 For example, the methods for processing scientific documents with equations
and other mathematical symbols can be quite different from those for dealing
with user comments on social media.
 However, some steps, such as sentence segmentation, tokenization, spelling
corrections, and stemming, are common to both.
 Here's what you need to know about text preprocessing to improve your
natural language processing (NLP).
 Technique and method used in NLP:-
 Syntax and semantic analysis are two main techniques used with natural
language processing.
 Syntax is the arrangement of words in a sentence to make grammatical
sense.
 NLP uses syntax to assess meaning from a language based on grammatical
rules.
 Syntax techniques include:

1. Parsing

 Parsing is the process of figuring out the grammatical structure of a sentence,

determining which words belong together as phrases and which are the subject or
object of a verb.
 This NLP technique offers additional context about a text in order to help with
processing and analyzing it accurately.

 This is how parsing might work on a short sentence.

2. Word segmentation.
 This is the act of taking a string of text and deriving word forms from it.
 Example: A person scans a handwritten document into a computer.
 The algorithm would be able to analyze the page and recognize that the words are
divided by white spaces.
3. Sentence breaking.
 This places sentence boundaries in large texts.
 Example: A natural language processing algorithm is fed the text, "The
dog barked. I woke up."
 The algorithm can recognize the period that splits up the sentences using
sentence breaking.
4. Morphological segmentation.
 This divides words into smaller parts called morphemes.
 Example: The word untestably would be broken into [[un[[test]able]]ly],
where the algorithm recognizes "un," "test," "able" and "ly" as
morphemes.
 This is especially useful in machine translation and speech recognition.
5. Stemming.
 This divides words with inflection in them to root forms.
 Example: In the sentence, "The dog barked," the algorithm would be able
to recognize the root of the word "barked" is "bark."
 This would be useful if a user was analyzing a text for all instances of the
word bark, as well as all of its conjugations.
 The algorithm can see that they are essentially the same word even
though the letters are different.
 Semantics involves the use of and meaning behind words.
 Natural language processing applies algorithms to understand the
meaning and structure of sentences.
 Semantics techniques include:
1. Word sense disambiguation.
 This derives the meaning of a word based on context.
 Example: Consider the sentence, "The pig is in the pen."
 The word pen has different meanings.
 An algorithm using this method can understand that the use of the
word pen here refers to a fenced-in area, not a writing implement.
2. Named entity recognition.
 This determines words that can be categorized into groups.
 Example: An algorithm using this method could analyze a news article and
identify all mentions of a certain company or product.
 Using the semantics of the text, it would be able to differentiate between
entities that are visually the same.
 For instance, in the sentence, "Daniel McDonald's son went to McDonald's
and ordered a Happy Meal," the algorithm could recognize the two instances
of "McDonald's" as two separate entities -- one a restaurant and one a
person.
3. Natural language generation.
 This uses a database to determine semantics behind words and generate
new text.
 Example: An algorithm could automatically write a summary of findings from a
business intelligence platform, mapping certain words and phrases to
features of the data in the BI platform.
 Another example would be automatically generating news articles or tweets
based on a certain body of text used for training.
 Text pre-processing pipeline (part of text pre-processing):-

1. Tokenization: -

 The tokenization stage involves converting a sentence into a stream

of words, also called “tokens.”
 Tokens are the basic building blocks upon which analysis and other
methods are built.
 NLP toolkits allow users to input multiple criteria based on which
word boundaries are determined.
 For example, you can use a whitespace or punctuation to determine
if one word has ended and the next one has started.
 Again, in some instances, these rules might fail.
 For example, don’t, it’s, etc. are words themselves that contain
punctuation marks and have to be dealt with separately.

 Tokenization is the process of segmenting the text into a list

of tokens.
 In the case of sentence tokenization, the token will be
sentenced and in the case of word tokenization, it will be the
word.
 It is a good idea to first complete sentence tokenization and
then word tokenization, here output will be the list of lists.
 Tokenization is performed in each & every NLP pipeline.

2. Lower casing: -

 This step is used to convert all the text to lowercase letters.

 This is useful in various NLP tasks such as text classification,
information retrieval, and sentiment analysis.
 Code:-

Sentence= “Books are on the table”

Sentence = Sentence.lowercase(Sentence)

print(Sentence)

3. Stop word removal:

 What are Stop words?
 Stop Words: A stop word is a commonly used word (such as “the”, “a”, “an”, “in”)
that a search engine has been programmed to ignore, both when indexing entries
for searching and when retrieving them as the result of a search query.
 We would not want these words to take up space in our database, or taking up
valuable processing time.
 For this, we can remove them easily, by storing a list of words that you consider to
stop words.
 NLTK(Natural Language Toolkit) in python has a list of stopwords stored in 16
different languages.
 Stop words are commonly occurring words in a language such as “the”, “and”, “a”,
etc.
 They are usually removed from the text during preprocessing because they do not
carry much meaning and can cause noise in the data.
 This step is used in various NLP tasks such as text classification, information
retrieval, and topic modelling.
4. Stemming

 Stemming and lemmatization are used to reduce words to their base form,
which can help reduce the vocabulary size and simplify the text.
 Stemming involves stripping the suffixes from words to get their stem,
whereas lemmatization involves reducing words to their base form based on
their part of speech.
 This step is commonly used in various NLP tasks such as text
classification, information retrieval, and topic modelling.

5. Lemmatization:

 Unlike stemming, lemmatization reduces the words to a word

existing in the language.
 For lemmatization to resolve a word to its lemma, part of speech of
the word is required.
 This helps in transforming the word into a proper root form.
 Lemmatization is preferred over Stemming because lemmatization
does a morphological analysis of the words.

6. Regular Expressions
 A regular expression (RE) is a language for specifying text search strings.
 RE helps us to match or find other strings or sets of strings, using a
specialized syntax held in a pattern.
 Regular expressions are used to search texts in UNIX as well as in MS
WORD in identical way.
 We have various search engines using a number of RE features.
 Properties of Regular Expressions
 Followings are some of the important properties of RE −
 American Mathematician Stephen Cole Kleene formalized the Regular Expression
language.
 RE is a formula in a special language, which can be used for specifying simple
classes of strings, a sequence of symbols.
 In other words, we can say that RE is an algebraic notation for characterizing a set of
strings.
 Regular expression requires two things, one is the pattern that we wish to search and
other is a corpus of text from which we need to search.
Mathematically, A Regular Expression can be defined as follows −
 ε is a Regular Expression, which indicates that the language is having an empty
string.
 φ is a Regular Expression which denotes that it is an empty language.
 If X and Y are Regular Expressions, then
o X, Y
o X.Y(Concatenation of XY)
o X+Y (Union of X and Y)
o X*, Y* (Kleen Closure of X and Y) are also regular expressions.
 If a string is derived from above rules then that would also be a regular expression.
(0 + 10*) {0, 1, 10, 100, 1000, 10000, … }

(010) {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) {ε, 0, 1, 01}

(a+b)* It would be set of strings of a’s and b’s of any length which also includes the null string i.e. {ε, a, b, aa , ab , bb , ba, aaa…….}

(a+b)*abb It would be set of strings of a’s and b’s ending with the string abb i.e. {abb, aabb, babb, aaabb, ababb, …………..}

(11)* It would be set consisting of even number of 1’s which also includes an empty string i.e. {ε, 11, 1111, 111111, ……….}

(aa)*(bb)*b It would be set of strings consisting of even number of a’s followed by odd number of b’s i.e. {b, aab, aabbb, aabbbbb, aaaab,
aaaabbb, …………..}

(aa + ab + ba + It would be string of a’s and b’s of even length that can be obtained by concatenating any combination of the strings aa, ab, ba
bb)* and bb including null i.e. {aa, ab, ba, bb, aaab, aaba, …………..}
https://www.tutorialspoint.com/natural_language_processing/
natural_language_processing_word_level_analysis.htm#:~:text=Regular%20Expressions,syntax%20held%20in%20a
%20pattern.
 Minimum edit text.
 Many NLP tasks are concerned with measuring how similar two strings are.
 Spell correction: The user typed “graffe” , Which is closest? : graf grail giraffe
 the word giraffe, which differs by only one letter from graffe, seems intuitively to be more similar than, say
grail or graf,
 The minimum edit distance between two strings is defined as the minimum number of editing operations
(insertion, deletion,substitution) needed to transform one string into another.
 The minimum edit distance between intention and execution can be visualized using their alignment.
 Given two sequences, an alignment is a correspondence between substrings of the two sequences.
 Given two strings str1 and str2 and below operations that can be performed on str1. Find minimum number
of edits (operations) required to convert ‘str1’ in to ‘str2’.

1. Insert
2. Remove
3. Replace
 All of the above operations are of equal cost.
 Input: str1 = “geek”, str2 = “gesek”
 Output: 1
 Explanation: We can convert str1 into str2 by inserting a ‘s’.
 Input: str1 = “cat”, str2 = “cut”
 Output: 1
 Explanation: We can convert str1 into str2 by replacing ‘a’ with ‘u’.
 Input: str1 = “sunday”, str2 = “saturday”
 Output: 3
 Explanation: Last three and first characters are same. We basically need to convert “un” to “atur”.
 This can be done using below three operations.
 Replace ‘n’ with ‘r’, insert t, insert a.
 https://youtu.be/We3YDTzNXEk

 In this method, we use bottom up approach to compute the edit distance between str1 and
str2.
 We start by computing edit distance for smaller subproblems and use the results of these
smaller subproblems to compute results for subsequent larger problems.
 The results are stored in a two dimensional array as shown below.
 Each cell (m,n) of this array represents distance first 'm' characters of str1 and first 'n'
characters of str2.
 For example, when 'm' is 0, distance between str1 which is of 0 length and str2 of 'n' length
is 'n'.
 Please observe 0th row of above matrix.
 Same is the case for values in 0th column where str2 is of 0 length.
 Now in this matrix, for cell (m,n) which represents distance between str1 of length 'm'
characters and str2 of length 'n' characters, if 'm'th character of str1 and 'n'th character of
str2 are same, then we simply need to fill cell(m,n) using value of cell (m-1, n-1) which
represents edit distance between first 'm-1' characters if str1 and first 'n-1' characters of
str2.
 Notice the red arrows in the above array.
 If 'm'th character of str1 is not equal to 'n'th character of str2, then we choose minimum
value from following three cases-
1. Delete 'm'th character of str1 and compute edit distance between 'm-1'
characters of str1 and 'n' characters of str2.
 For this computation, we simply have to do - (1 + array[m-1][n])
where 1 is the cost of delete operation and array[m-1][n] is edit
distance between 'm-1' characters of str1 and 'n' characters of str2.
2. Similarly, for the second case of inserting last character of str2 into str1, we
have to do - (1 + array[m][n-1]).
3. And for the third case of substituting last character of str1 by last character
of str2 we use - (1 + array[m-1] [n-1]).
 Please checkout function 'find Distance (String str1, String str2)' in code snippet for
implementation details.
 The time and space complexity of this method is O(mn) where 'm' is the length of str1 and
'n' is the length of str2.
 Edit distance isn’t sufficient We often need to align each character of the two strings to each
other.
 We do this by keeping a “backtrace”
 Every time we enter a cell, remember where we came from
 When we reach the end, – Trace back the path from the upper right corner to read off the
alignment
 Time complexity: O(nm)
 Space complexity: O(nm)
 Back trace: O(n+m)

 POS (Parts of speech) tagging:-

 Part-of-speech (POS) tagging is a process in natural language processing (NLP)

where each word in a text is labeled with its corresponding part of speech.

 This can include nouns, verbs, adjectives, and other grammatical categories.

 POS tagging is useful for a variety of NLP tasks, such as information extraction,
named entity recognition, and machine translation.

 It can also be used to identify the grammatical structure of a sentence and to

disambiguate words that have multiple meanings.

 POS tagging is typically performed using machine learning algorithms, which

are trained on a large annotated corpus of text.

 The algorithm learns to predict the correct POS tag for a given word based on
the context in which it appears.

 There are various POS tagging schemes that have been developed, each with its
own set of tags and rules.

 Some common POS tagging schemes include the Penn Treebank tag set and
the Universal Dependencies tag set .

 Let’s take an example,

 Text: “The cat sat on the mat.”

 POS tags:
 The: determiner

 cat: noun

 sat: verb

 on: preposition

 the: determiner

 mat: noun

 In this example, each word in the sentence has been labeled with its
corresponding part of speech.

 The determiner “the” is used to identify specific nouns, while the noun “cat”
refers to a specific animal.

 The verb “sat” describes an action, and the preposition “on” describes the
relationship between the cat and the mat.

 POS tagging is a useful tool in natural language processing (NLP) as it allows

algorithms to understand the grammatical structure of a sentence and to
disambiguate words that have multiple meanings.

 It is typically performed using machine learning algorithms that are trained on

a large annotated corpus of text.

 Identifying part of speech of word is not just mapping words to their respective
POS tags.

 Same word might have different part of speech tag based on different context.

 Thus it is not possible to have common mapping for parts of speech tags.

 When you have a huge corpus manually finding different part-of-speech for
each word is a scalable solution.

 As tagging itself might take days.

 This is why we rely on tool-based POS tagging.

 But why are we tagging these words with their parts of speech?

 Use of parts of Speech Tagging in NLP

 There are several reasons why we might tag words with their parts of speech
(POS) in natural language processing (NLP):

1. To understand the grammatical structure of a sentence:

 By labeling each word with its POS, we can better

understand the syntax and structure of a sentence.

 This is useful for tasks such as machine translation and

information extraction, where it is important to know how
words relate to each other in the sentence.
2. To disambiguate words with multiple meanings:
 Some words, such as “bank,” can have multiple meanings depending
on the context in which they are used.
 By labeling each word with its POS, we can disambiguate these
words and better understand their intended meaning.
3. To improve the accuracy of NLP tasks:
 POS tagging can help improve the performance of various NLP tasks,
such as named entity recognition and text classification.
 By providing additional context and information about the words in
a text, we can build more accurate and sophisticated algorithms.
4. To facilitate research in linguistics:
 POS tagging can also be used to study the patterns and
characteristics of language use and to gain insights into the structure
and function of different parts of speech.

 Steps Involved in the POS tagging

 Here are the steps involved in a typical example of part-of-speech (POS) tagging in natural
language processing (NLP):
1. Collect a dataset of annotated text:
 This dataset will be used to train and test the POS tagger.
 The text should be annotated with the correct POS tags for each
word.
2. Preprocess the text:
 This may include tasks such as tokenization (splitting the text into
individual words), lowercasing, and removing punctuation.
3. Divide the dataset into training and testing sets:
 The training set will be used to train the POS tagger, and the testing
set will be used to evaluate its performance.
4. Train the POS tagger:
 This may involve building a statistical model, such as a hidden
Markov model (HMM), or defining a set of rules for a rule-based or
transformation-based tagger.
 The model or rules will be trained on the annotated text in the
training set.
5. Test the POS tagger:
 Use the trained model or rules to predict the POS tags of the words
in the testing set.
 Compare the predicted tags to the true tags and calculate metrics
such as precision and recall to evaluate the performance of the
tagger.
6. Fine-tune the POS tagger:
 If the performance of the tagger is not satisfactory, adjust the model
or rules and repeat the training and testing process until the desired
level of accuracy is achieved.
7. Use the POS tagger:
 Once the tagger is trained and tested, it can be used to perform POS
tagging on new, unseen text.
 This may involve preprocessing the text and inputting it into the
trained model or applying the rules to the text.
 The output will be the predicted POS tags for each word in the text.
 Application of POS Tagging
 There are several real-life applications of part-of-speech (POS) tagging in natural language
processing (NLP):
1. Information extraction:
 POS tagging can be used to identify specific types of information in a
text, such as names, locations, and organizations.
 This is useful for tasks such as extracting data from news articles or
building knowledge bases for artificial intelligence systems.
2. Named entity recognition:
 POS tagging can be used to identify and classify named entities in a
text, such as people, places, and organizations.
 This is useful for tasks such as building customer profiles or
identifying key figures in a news story.
3. Text classification:
 POS tagging can be used to help classify texts into different
categories, such as spam emails or sentiment analysis.
 By analysing the POS tags of the words in a text, algorithms can
better understand the content and tone of the text.
4. Machine translation:
 POS tagging can be used to help translate texts from one language
to another by identifying the grammatical structure and
relationships between words in the source language and mapping
them to the target language.
5. Natural language generation:
 POS tagging can be used to generate natural-sounding text by
selecting appropriate words and constructing grammatically correct
sentences.
 This is useful for tasks such as chatbots and virtual assistants.

Poetry Passage 1 - Villanelle of The Seductress
100% (1)
Poetry Passage 1 - Villanelle of The Seductress
2 pages
Mas. Amz. 104
No ratings yet
Mas. Amz. 104
111 pages
Fees Management System
100% (1)
Fees Management System
21 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Truth and Lies: A Secret 11.1
100% (1)
Truth and Lies: A Secret 11.1
7 pages
CHAPTER-3
No ratings yet
CHAPTER-3
12 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
Selection List 2023-2024 Inspire Awards
100% (1)
Selection List 2023-2024 Inspire Awards
39 pages
Modal Verbs
No ratings yet
Modal Verbs
30 pages
[Ebooks PDF] download The Western Manuscripts in the Library of Trinity College Cambridge Volume 2 A Descriptive Catalogue 1st Edition Montague Rhodes James full chapters
100% (2)
[Ebooks PDF] download The Western Manuscripts in the Library of Trinity College Cambridge Volume 2 A Descriptive Catalogue 1st Edition Montague Rhodes James full chapters
72 pages
Module-I_NLP (1)
No ratings yet
Module-I_NLP (1)
35 pages
Bai Tap I Learn Smart World 6 Unit 7 Movies
No ratings yet
Bai Tap I Learn Smart World 6 Unit 7 Movies
3 pages
NLP -Natural Language Processing and APPLICATION
No ratings yet
NLP -Natural Language Processing and APPLICATION
31 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
32 pages
SSC CHSL 2023 August 14 Shift 3
No ratings yet
SSC CHSL 2023 August 14 Shift 3
30 pages
DET Writing
No ratings yet
DET Writing
4 pages
nlp
No ratings yet
nlp
35 pages
NLP FINAL
No ratings yet
NLP FINAL
33 pages
Ebook - Electronics Tutorial PDF
100% (1)
Ebook - Electronics Tutorial PDF
213 pages
Unit 1 Part of Speech 66
No ratings yet
Unit 1 Part of Speech 66
5 pages
CODING & DECODING_
No ratings yet
CODING & DECODING_
30 pages
Video Who Is The Responsible Corporation - A Multimodal Analysis Ofl
No ratings yet
Video Who Is The Responsible Corporation - A Multimodal Analysis Ofl
57 pages
NLP PPT
No ratings yet
NLP PPT
58 pages
NLP
No ratings yet
NLP
81 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
VO_MCA_SEM 4 _ Text Mining _U2
No ratings yet
VO_MCA_SEM 4 _ Text Mining _U2
15 pages
Eric Hamp Indo - European - Languages PDF
No ratings yet
Eric Hamp Indo - European - Languages PDF
17 pages
Focus
No ratings yet
Focus
4 pages
The House With A Thousand Stories Has Garnered Rave Reviews For Its First-Time Author.
No ratings yet
The House With A Thousand Stories Has Garnered Rave Reviews For Its First-Time Author.
1 page
module-1
No ratings yet
module-1
49 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Campione! 13 - Princess Goddess of The South Seas
No ratings yet
Campione! 13 - Princess Goddess of The South Seas
314 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
UNIT I
No ratings yet
UNIT I
12 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
Recognition of Off-Line Kannada Handwritten Charac PDF
No ratings yet
Recognition of Off-Line Kannada Handwritten Charac PDF
11 pages
Natural language processing
No ratings yet
Natural language processing
4 pages
Word Formation Ful Less
No ratings yet
Word Formation Ful Less
5 pages
UNIT III
No ratings yet
UNIT III
6 pages
CAT King study material 5
No ratings yet
CAT King study material 5
21 pages
Introduction to NLP_first_week_lecture_1st
No ratings yet
Introduction to NLP_first_week_lecture_1st
6 pages
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
No ratings yet
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
NLP-Unit-1-part1
No ratings yet
NLP-Unit-1-part1
61 pages
Unit 1 NLP and TA
No ratings yet
Unit 1 NLP and TA
9 pages
NLP
No ratings yet
NLP
14 pages
Introduction to NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
No ratings yet
Introduction to NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
35 pages
1_NLP.docx
No ratings yet
1_NLP.docx
26 pages
UNIT 02 - Guessing Meaning From The Context
No ratings yet
UNIT 02 - Guessing Meaning From The Context
4 pages
unit 4 (1)
No ratings yet
unit 4 (1)
39 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
Atithi Devo Bhavah
No ratings yet
Atithi Devo Bhavah
43 pages
Test Questions Assesment Portfolio
No ratings yet
Test Questions Assesment Portfolio
13 pages
AI-2
No ratings yet
AI-2
7 pages
UNIT-5 Quetions - answers.docx
No ratings yet
UNIT-5 Quetions - answers.docx
10 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Test Clasa A VIII-A (Sem 1)
No ratings yet
Test Clasa A VIII-A (Sem 1)
2 pages
NLP unit1
No ratings yet
NLP unit1
24 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
Disaster Managment
No ratings yet
Disaster Managment
3 pages
Text Analytics and Natural Language Processing - KAI073.docx
No ratings yet
Text Analytics and Natural Language Processing - KAI073.docx
24 pages
nlp
No ratings yet
nlp
19 pages
Fry'S First 100 Words Snakes and Ladders Games - Clever Classroom TPT ©
No ratings yet
Fry'S First 100 Words Snakes and Ladders Games - Clever Classroom TPT ©
9 pages
Project 2
No ratings yet
Project 2
9 pages
NLP Self
No ratings yet
NLP Self
22 pages
Sohrab
No ratings yet
Sohrab
124 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Presentation of Inheritance in Java
No ratings yet
Presentation of Inheritance in Java
25 pages
Introduction to Data Science_Week 7_LAQ's
No ratings yet
Introduction to Data Science_Week 7_LAQ's
4 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
1) What Is Natural Language Processing?
No ratings yet
1) What Is Natural Language Processing?
14 pages
Project 2
No ratings yet
Project 2
13 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Seminar Report1
No ratings yet
Seminar Report1
17 pages
What Is Computational Linguistics
No ratings yet
What Is Computational Linguistics
14 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
NLP QB
100% (2)
NLP QB
14 pages
Pointers Grade 3 TMT
No ratings yet
Pointers Grade 3 TMT
2 pages
Sample Resume - Fresher
No ratings yet
Sample Resume - Fresher
2 pages
English CG Final
No ratings yet
English CG Final
23 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet

Chapter-1 Introduction To NLP

Uploaded by

Chapter-1 Introduction To NLP

Uploaded by

Chapter-1 Introduction to NLP

 Introduction to Natural Language Processing: -

 Parsing is the process of figuring out the grammatical structure of a sentence,

 This is how parsing might work on a short sentence.

 The tokenization stage involves converting a sentence into a stream

 Tokenization is the process of segmenting the text into a list

 This step is used to convert all the text to lowercase letters.

Sentence= “Books are on the table”

3. Stop word removal:

 Unlike stemming, lemmatization reduces the words to a word

(0*10*) {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) {ε, 0, 1, 01}

 POS (Parts of speech) tagging:-

 Part-of-speech (POS) tagging is a process in natural language processing (NLP)

 It can also be used to identify the grammatical structure of a sentence and to

 POS tagging is typically performed using machine learning algorithms, which

 Let’s take an example,

 Text: “The cat sat on the mat.”

 POS tagging is a useful tool in natural language processing (NLP) as it allows

 It is typically performed using machine learning algorithms that are trained on

 As tagging itself might take days.

 This is why we rely on tool-based POS tagging.

 Use of parts of Speech Tagging in NLP

1. To understand the grammatical structure of a sentence:

 By labeling each word with its POS, we can better

 This is useful for tasks such as machine translation and

 Steps Involved in the POS tagging

You might also like

(010) {1, 01, 10, 010, 0010, …}