0% found this document useful (0 votes)

66 views33 pages

ANLP semVI Labmanual

This lab manual document provides information about the Applied Natural Language Processing lab for the third year semester. It includes the vision, mission, and objectives of the institution. It also includes an index listing the 10 experiments that are part of the lab: text preprocessing in Python, stemming operations, lemmatization and morphology, calculating bigrams and sentence probabilities, part-of-speech tagging, Naive Bayes classification, Viterbi decoding for POS tagging, analyzing context and training corpus size for POS, chunking, and a mini-project. The document provides the framework and overview of the topics and experiments covered in the natural language processing lab.

Uploaded by

kun.dha.rt22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views33 pages

ANLP semVI Labmanual

Uploaded by

kun.dha.rt22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 33

Lab Manual

Third Year Semester-VI

Information Technology
Subject: Applied Natural
Language Processing Lab

Even Semester

Institutional Vision, Mission

Our Vision
To foster and permeate higher and quality education with value added engineering, technology
programs, providing all facilities in terms of technology and platforms for all round development
with social awareness and nurture the youth with international competencies and exemplary level
of employability even under highly competitive environment so that they are innovative, adaptable
and capable of handling problems faced by our country and world at large.

Our Mission
The Institution is committed to mobilize the resources and equip itself with men and materials of
excellence, there by ensuring that the Institution becomes a pivotal center of service to Industry,
Academy, and society with the latest technology. RAIT engages different platforms such as
technology enhancing Student Technical Societies, Cultural platforms, Sports excellence centers,
Entrepreneurial Development Centers and a Societal Interaction Cell. To develop the college to
become an autonomous institution & deemed university at the earliest, we provide facilities for
advanced research and development programs on par with international standards. We also seek
to invite international and reputed national Institutions and Universities to collaborate with our
institution on the issues of common interest of teaching and learning sophistication
Index
Sr. No. Contents
1. List of Experiments
2. Experiment Plan and Course Outcomes
3. Study and Evaluation Scheme
4. Experiment No. 1
5. Experiment No. 2
6. Experiment No. 3
7. Experiment No. 4
8. Experiment No. 5
9. Experiment No. 6
10. Experiment No. 7
11. Experiment No. 8
12. Experiment No. 9
13. Experiment No. 10

List of Experiments
Sr. No. Experiments Name
1 To perform text preprocessing in python.
2 To Perform stemming operations on text.

3
To Perform lamitization operations on text and to understand the morphology
of a word by the use of Add-Delete table.

4 To learn to calculate bigrams from a given corpus and calculate probability of

a sentence.
5 To find POS tags of words in a sentence
6 Text Classification using Naive Bayes Classifier
To find POS tags of words in a sentence using Viterbi decoding.
7
8
The experiment is to know the importance of context and size of training
corpus in learning Parts of Speech
9 To understand the concept of chunking and get familiar with the basic chunk
tagset.

10 Mini-Project

Experiment Plan & Course Outcome

Lab Outcomes:

CO1 Understand morphological features of word

CO2
Learn about generation of word forms

CO3
Understand use of add-Delete table for word

CO4
To apply add-one smoothing on sparse bigram table.

CO5
Understand POS Tagging using Markov model
CO6
Understand POS Tagging using Viterbi decoding

Study and Evaluation Scheme

Course
Course Name Teaching Scheme Credits Assigned
Code
Natural Theory Practical Tutorial Theory Practical Tutorial Total
ITLDL Language /oral
O6022 Processing
Lab -- 02 -- -- 01 -- 01

Course Code Course Name Examination Scheme

Natural Term Work Practical Total
ITLDLO6022 Language /oral
Processing 25 25 50
Lab

Applied Natural Language Processing Lab

Experiment No. 1
To Perform text preprocessing in python.

Experiment No. 1

Aim: To Perform text preprocessing in python.

Objective : To understand how text preprocessing works in python.

Theory:

1)nltk LIBRARY:-
NLTK, or Natural Language Toolkit, is a Python Package that you can use for NLP. A lot
of the data that you could be analyzing is unstructured data and contains human-
readable text. Before you can analyze that data programmatically, you first need to
preprocess it.

2)TEXT LOWERCASE:-
It is necessary to convert the text to lower case as it is case sensitive.
Function Used:-
text.lower()

3)REMOVE NUMBERS:-
Python provide a regex module that has a built-in function sub() to remove numbers
from the string. This method replaces all the occurrences of the given pattern in the
string with a replacement string. If the pattern is not found in the string, then it returns
the same string.
Function Used:-
re.sub()

4)REMOVE PUNCTUATION:-
Using Translate():-
The First two arguments for string.translate method is empty strings, and the third
input is a Python list of the punctuation that should be removed. This instructs the
Python method to eliminate punctuation from a string. This is one of the best ways to
strip punctuation from a string
Function Used:-
text.translate()

5)REMOVE DEFAULT STOPWORDS:-

Using Python’s NLTK Library:-
The NLTK Library is one of the oldest and most commonly used Python libraries for
Natural Language Processing. NLTK supports stop word removal, and you can find the
list of stop words in the corpus module. To remove stop words from a sentence, you
can divide your text into words and then remove the word if it exists in the list of stop
words provided by NLTK

Conclusion: We Understood the Python NLTK functions regarding text preprocessing.

References:
Questions:
• What is text preprocessing in python?
• What is meant by tokenization?
• Explain the various steps of text preprocessing?
Applied Natural Language Processing Lab

Experiment No. 2

To Perform stemming operations on text.

Experiment No. 2

Aim: To Perform stemming operations on text.

Objective :
To understand how stemming works on text
To learn how to use different algorithms for stemming operations.

Theory:

1)Stemming :

From Stemming we will process of getting the root form of a word. Root or Stem is the
part to which inflectional affixes(like -ed, -ize, etc) are added. We would create the
stem words by removing the prefix of suffix of a word. So, stemming a word may not
result in actual words.

For Example: Mangoes ---> Mango

Boys ---> Boy

going ---> go

If our sentences are not in tokens, then we need to convert it into tokens. After we
converted strings of text into tokens, then we can convert those word tokens into their
root form. These are the Porter stemmer, the snowball stemmer, and the Lancaster
Stemmer. We usually use Porter stemmer among them.

2) Porter Stemmer:

The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the
commoner morphological and inflexional endings from words in English. Its main use is
as part of a term normalisation process that is usually done when setting up
Information Retrieval systems.

Conclusion: We Understood the stemming operation on text using porter stemmer.

References:
Questions:
• What is meant by Stemming?
• What are the different types of morphology?
• What are different types of Stemming algorithms?
• Explain Porter Stemmer.
Applied Natural Language Processing Lab

Experiment No. 3

To Perform lemitization operations on text and to

understand the morphology of a word by the use of Add-
Delete table.
Experiment No. 3

Aim: To Perform lemitization operations on text and to understand the morphology of a word by
the use of Add-Delete table.

Objective :
To understand how slemitization operations works on text
To learn morphology of a word by using Add-Delete table on VLab

Theory:

1)Lemitization :

As stemming, lemmatization do the same but the only difference is that lemmatization
ensures that root word belongs to the language. Because of the use of lemmatization we
will get the valid words. In NLTK(Natural language Toolkit), we use WordLemmatizer
to get the lemmas of words. We also need to provide a context for the lemmatization.So,
we added pos(parts-of-speech) as a parameter.

2) Morphology:

Morphemes are considered as smallest meaningful units of language. These

morphemes can either be a root word(play) or affix(-ed). Combination of these
morphemes is called morphological process. So, word "played" is made out of 2
morphemes "play" and "-ed". Thus finding all parts of a word(morphemes) and thus
describing properties of a word is called "Morphological Analysis".
Conclusion: We Understood the lemitization operation on text and also we studied the
morphology of word by table on VLab.

References:

Questions:
• What is meant by lemitization?
• Define Morphology.
• List different applications of lemitization
• Differentiate between Stemming and lemitization.
Applied Natural Language Processing Lab

Experiment No. 4

To learn to calculate bigrams from a given corpus and

calculate probability of a sentence.
Experiment No. 4

Aim: To learn to calculate bigrams from a given corpus and calculate probability of a sentence.

Objective :
To understand concept of N-Gram
To understand how to calculate probability for bigrams

Theory:

A combination of words forms a sentence. However, such a formation is meaningful

only when the words are arranged in some order.
Eg: Sit I car in the

Such a sentence is not grammatically acceptable. However some perfectly grammatical

sentences can be nonsensical too!

Eg: Colorless green ideas sleep furiously

One easy way to handle such unacceptable sentences is by assigning probabilities to

the strings of words i.e, how likely the sentence is.

1) Probability of Sentence:
If we consider each word occurring in its correct location as an independent event,the
probability of the sentences is : P(w(1), w(2)..., w(n-1), w(n))

Using chain rule: =

P(w(1)) * P(w(2) | w(1)) * P(w(3) | w(1)w(2)) ... P(w(n) | w(1)w(2) ... w(n-1))

2) Bigrams:

We can avoid this very long calculation by approximating that the probability of a given
word depends only on the probability of its previous words. This assumption is called
Markov assumption and such a model is called Markov model- bigrams. Bigrams can be
generalized to the n-gram which looks at (n-1) words in the past. A bigram is a first-
order Markov model.

Therefore , P(w(1), w(2)..., w(n-1), w(n)) = P(w(2)|w(1)) P(w(3)|w(2)) ... P(w(n)|w(n-1))

We use <s> tag to mark the beginning and </s> as end of a sentence.

A bigram table for a given corpus can be generated and used as a lookup table for
calculating probability of sentences.

Conclusion: We Understood the concept of bigrams and how to calculate probability of

sentence using chain rule.

References:
Questions:
• Role of n-gram language model in NLP.\
• Define chain rule, bigram.
• Explain term perplexity.
• What is meant by Markov Model.

Experiment No. 5
Aim: To find POS tags of words in a sentence

Objective :
To understand concept part of speech tagging
To understand how to build pos tagger

Theory:

Part-of-speech (POS) tagging is an important Natural Language Processing (NLP)

concept that categorizes words in the text corpus with a particular part of speech tag
(e.g., Noun, Verb, Adjective, etc.)

POS tagging could be the very first task in text processing for further downstream tasks
in NLP, like speech recognition, parsing, machine translation, sentiment analysis, etc.

The particular POS tag of a word can be used as a feature by various Machine Learning
algorithms used in Natural Language Processing.

Example Sentence : Learn NLP from Scaler

Learn -> ADJECTIVE NLP -> NOUN from -> PREPOSITION Scaler -> NOUN

There are various techniques that can be used for POS tagging such as

• Rule-based POS tagging: The rule-based POS tagging models apply a set of
handwritten rules and use contextual information to assign POS tags to
words. These rules are often known as context frame rules. One such rule
might be: “If an ambiguous/unknown word ends with the suffix ‘ing’ and is
preceded by a Verb, label it as a Verb”.
• Transformation Based Tagging: The transformation-based approaches use
a pre-defined set of handcrafted rules as well as automatically induced
rules that are generated during training.
• Deep learning models: Various Deep learning models have been used for
POS tagging such as Meta-BiLSTM which have shown an impressive
accuracy of around 97 percent.
• Stochastic (Probabilistic) tagging: A stochastic approach includes
frequency, probability or statistics. The simplest stochastic approach finds
out the most frequently used tag for a specific word in the annotated
training data and uses this information to tag that word in the
unannotated text. But sometimes this approach comes up with sequences
of tags for sentences that are not acceptable according to the grammar
rules of a language. One such approach is to calculate the probabilities of
various tag sequences that are possible for a sentence and assign the POS
tags from the sequence with the highest probability. Hidden Markov
Models (HMMs) are probabilistic approaches to assign a POS Tag.

POS tagging - Hidden Markov Model

Hidden Markov Model has two important components-

1)Transition Probabilities: The one-step transition probability is the probability of

transitioning from one state to another in a single step.

2)Emission Probabilties: : The output probabilities for an observation from state.

Emission probabilities B = { bi,k = bi(ok) = P(ok | qi) }, where ok is an Observation.
Informally, B is the probability that the output is ok given that the current state is qi

Conclusion: We Understood the concept of POS tagging and seen about pos tagger
Experiment No. 6

Aim: Text Classification using Naive Bayes Classifier

Objective :
To understand concept of Text Classification
To understand how to use Naive bayes classifier for text classification

Theory:

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’

Theorem. It is not a single algorithm but a family of algorithms where all of them share
a common principle, i.e. every pair of features being classified is independent of each
other.

Naive Bayes classifiers have been heavily used for text classification and text analysis
machine learning problems.

Text Analysis is a major application field for machine learning algorithms. However the
raw data, a sequence of symbols (i.e. strings) cannot be fed directly to the algorithms
themselves as most of them expect numerical feature vectors with a fixed size rather
than the raw text documents with variable length.

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’

Theorem. It is not a single algorithm but a family of algorithms where all of them share
a common principle, i.e. every pair of features being classified is independent of each
other.
Text Analysis is a major application field for machine learning algorithms. However the
raw data, a sequence of symbols (i.e. strings) cannot be fed directly to the algorithms
themselves as most of them expect numerical feature vectors with a fixed size rather
than the raw text documents with variable length.

In order to address this, scikit-learn provides utilities for the most common ways to
extract numerical features from text content, namely:

• tokenizing strings and giving an integer id for each possible token, for
instance by using white-spaces and punctuation as token separators.
• counting the occurrences of tokens in each document.

Conclusion: We Understood the concept of Naive bayes classifier and studied about
text classification using Naive Bayes Classifier.
Experiment No. 7

Aim: To find POS tags of words in a sentence using Viterbi decoding.

Objective :
To understand concept Viterbi decoding
To find POS tags of words in a sentence using Viterbi decoding.

Theory:

In this experiment it will be used to find the POS tag sequence for a given sentence.
When we have emission and transition matrix, various algorithms can be applied to
find out the POS tags for words. Some of possible algorithms are: Backward algorithm,
forward algorithm and viterbi algorithm. Here, in this experiment, you can get familiar
with Viterbi Decoding

Viterbi Decoding is based on dynamic programming. This algorithm takes emission and
transmission matrix as the input. Emission matrix gives us information about proabities
of a POS tag for a given word and transmission matrix gives the probability of transition
from one POS tag to another POS tag. It observes sequence of words and returns the
state sequences of POS tags along with its probability.

Conclusion: We Understood the concept of POS tags with the help of Viterbi Decoding
Experiment No. 8
Experiment No. 9

Aim: To understand the concept of chunking and get familiar with the basic chunk tagset.

Objective :
To understand concept Chunking
To understand chunk tagset and how to do chunking

Theory:

A chunk is a collection of basic familiar units that have been grouped together and
stored in a person’s memory. In natural language, chunks are collective higher order
units that have discrete grammatical meanings (noun groups or phrases, verb groups,
etc.)
Chunking is a process of extracting phrases (chunks) from unstructured text. Instead of
using a single word which may not represent the actual meaning of the text, it’s
recommended to use chunk or phrase

Chunk Types

The chunk types are based on the syntactic category part. Besides the head a chunk
also contains modifiers (like determiners, adjectives, postpositions in NPs).

The basic types of chunks in English are:

Chunk type Tag Name

1 Noun NP

2 Verb VP
3 Adverb ADVP

4 Adjectivial ADJP

5 Prepositional PP

In order to create an NP-chunk, we will first define a chunk grammar using POS tags,
consisting of rules that indicate how sentences should be chunked. We will define this
using a single regular expression rule.

In this case, we will define a simple grammar with a single regular-expression rule. This
rule says that an NP chunk should be formed whenever the chunker finds an optional
determiner (DT) followed by any number of adjectives (JJ) and then a noun (NN) then
the Noun Phrase(NP) chunk should be formed.

The result is a tree, which we can either print or display graphically.

Conclusion: We Understood the concept of Chunking and tagset of different chunks

NLP Lab Manual Lab Work
No ratings yet
NLP Lab Manual Lab Work
24 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
16 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
Ai TXT Unit2
No ratings yet
Ai TXT Unit2
14 pages
3.Nlp Lab Manual
No ratings yet
3.Nlp Lab Manual
18 pages
NLP Lab Manual Final
No ratings yet
NLP Lab Manual Final
25 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
NLP Sem Questions and Answers
No ratings yet
NLP Sem Questions and Answers
72 pages
Word Level Analysis (NLP)
No ratings yet
Word Level Analysis (NLP)
28 pages
Chapter V - Working With Text Data
No ratings yet
Chapter V - Working With Text Data
30 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
Lab 2
No ratings yet
Lab 2
49 pages
18 Text Mining - Text Preprocessing
No ratings yet
18 Text Mining - Text Preprocessing
40 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
NLP Record
No ratings yet
NLP Record
15 pages
Experiment 3 Manual
No ratings yet
Experiment 3 Manual
7 pages
NLP Exp-123
No ratings yet
NLP Exp-123
6 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
Viva Questions
No ratings yet
Viva Questions
6 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
NLP Pipeline
No ratings yet
NLP Pipeline
50 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Token Ization
No ratings yet
Token Ization
5 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
NLTK
No ratings yet
NLTK
3 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP Lab Manual LP 6
No ratings yet
NLP Lab Manual LP 6
43 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
VO - MCA - SEM 4 - Text Mining - U2
No ratings yet
VO - MCA - SEM 4 - Text Mining - U2
15 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
Unit 1 Quizz - Touchstone 4
70% (10)
Unit 1 Quizz - Touchstone 4
2 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
Phrasal Verb Lop 8
No ratings yet
Phrasal Verb Lop 8
8 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
29 pages
CAT King Study Material 5
No ratings yet
CAT King Study Material 5
21 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Teacher's Book 4
No ratings yet
Teacher's Book 4
260 pages
Lesson Plan - Mother Tongue 1
No ratings yet
Lesson Plan - Mother Tongue 1
6 pages
Parts of Speech: Nouns Verbs Adverbs
No ratings yet
Parts of Speech: Nouns Verbs Adverbs
9 pages
List of Word Form
100% (1)
List of Word Form
2 pages
Introduction Samskritam
No ratings yet
Introduction Samskritam
24 pages
Connectives That Express Cause and Effect, Contrast, and Condition
No ratings yet
Connectives That Express Cause and Effect, Contrast, and Condition
20 pages
The Third Person Form
No ratings yet
The Third Person Form
8 pages
1 Flash5 Grammar Extra Practice 1
No ratings yet
1 Flash5 Grammar Extra Practice 1
2 pages
Mixed Conditionals
No ratings yet
Mixed Conditionals
8 pages
Active and Passive Voice
No ratings yet
Active and Passive Voice
14 pages
Comparatives and Superlative Exercises
No ratings yet
Comparatives and Superlative Exercises
3 pages
Year 3 English Grammar and Punctuation Test 2
No ratings yet
Year 3 English Grammar and Punctuation Test 2
4 pages
Grammaticalization of Directional Complements in Mandarin Chinese
No ratings yet
Grammaticalization of Directional Complements in Mandarin Chinese
20 pages
2nd Meeting
No ratings yet
2nd Meeting
14 pages
Examen Diagnostico 1°,2°, 3°
No ratings yet
Examen Diagnostico 1°,2°, 3°
3 pages
E11 5ban
No ratings yet
E11 5ban
7 pages
Grammar Notes
No ratings yet
Grammar Notes
35 pages
Usborne Readers Level 2 Structures
100% (1)
Usborne Readers Level 2 Structures
3 pages
Another Version الجزء النظري
No ratings yet
Another Version الجزء النظري
5 pages
A1 Checklist
No ratings yet
A1 Checklist
2 pages
Transport - Entry 1 - Induction
No ratings yet
Transport - Entry 1 - Induction
16 pages
There Was, There Were
No ratings yet
There Was, There Were
3 pages
τις - Wiktionary
No ratings yet
τις - Wiktionary
4 pages
Passive Voice - Make Sentences - PDF Worksheet - B1 - PA006
No ratings yet
Passive Voice - Make Sentences - PDF Worksheet - B1 - PA006
1 page
Prepositions. Functions and Uses-8-Quirk
No ratings yet
Prepositions. Functions and Uses-8-Quirk
3 pages
Unit 14 Review
No ratings yet
Unit 14 Review
14 pages
102 Contenidos Ingles A1 y A2
No ratings yet
102 Contenidos Ingles A1 y A2
2 pages
7th Grade Recovery 2
No ratings yet
7th Grade Recovery 2
4 pages
PYTHON DATA SCIENCE FOR BEGINNERS: Unlock the Power of Data Science with Python and Start Your Journey as a Beginner (2023 Crash Course)
From Everand
PYTHON DATA SCIENCE FOR BEGINNERS: Unlock the Power of Data Science with Python and Start Your Journey as a Beginner (2023 Crash Course)
Rufus Johnston
No ratings yet
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
The spaCy Handbook: Simplifying Natural Language Processing
From Everand
The spaCy Handbook: Simplifying Natural Language Processing
Robert Johnson
No ratings yet

ANLP semVI Labmanual

Uploaded by

ANLP semVI Labmanual

Uploaded by

Lab Manual

Third Year Semester-VI

Institutional Vision, Mission

4 To learn to calculate bigrams from a given corpus and calculate probability of

Experiment Plan & Course Outcome

CO1 Understand morphological features of word

Study and Evaluation Scheme

Course Code Course Name Examination Scheme

Applied Natural Language Processing Lab

Aim: To Perform text preprocessing in python.

Objective : To understand how text preprocessing works in python.

5)REMOVE DEFAULT STOPWORDS:-

Conclusion: We Understood the Python NLTK functions regarding text preprocessing.

To Perform stemming operations on text.

Aim: To Perform stemming operations on text.

For Example: Mangoes ---> Mango

Boys ---> Boy

Conclusion: We Understood the stemming operation on text using porter stemmer.

To Perform lemitization operations on text and to

Morphemes are considered as smallest meaningful units of language. These

To learn to calculate bigrams from a given corpus and

A combination of words forms a sentence. However, such a formation is meaningful

Such a sentence is not grammatically acceptable. However some perfectly grammatical

Eg: Colorless green ideas sleep furiously

One easy way to handle such unacceptable sentences is by assigning probabilities to

Using chain rule: =

Therefore , P(w(1), w(2)..., w(n-1), w(n)) = P(w(2)|w(1)) P(w(3)|w(2)) ... P(w(n)|w(n-1))

Conclusion: We Understood the concept of bigrams and how to calculate probability of

Part-of-speech (POS) tagging is an important Natural Language Processing (NLP)

Example Sentence : Learn NLP from Scaler

POS tagging - Hidden Markov Model

Hidden Markov Model has two important components-

1)Transition Probabilities: The one-step transition probability is the probability of

2)Emission Probabilties: : The output probabilities for an observation from state.

Aim: Text Classification using Naive Bayes Classifier

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’

Aim: To find POS tags of words in a sentence using Viterbi decoding.

The basic types of chunks in English are:

Chunk type Tag Name

The result is a tree, which we can either print or display graphically.

Conclusion: We Understood the concept of Chunking and tagset of different chunks

You might also like