0% found this document useful (0 votes)

15 views23 pages

NLB Final Lab Manual

The document is a lab manual for the B.E CSE (Artificial Intelligence and Machine Learning) course at Mahendra Institute of Technology for the academic year 2024-2025, focusing on Natural Language Processing. It outlines various experiments including text preprocessing, morphological analysis, N-gram models, and POS tagging, detailing their aims, theories, and code implementations. Each experiment aims to provide hands-on experience with key NLP concepts and techniques.

Uploaded by

w46436224

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views23 pages

NLB Final Lab Manual

Uploaded by

w46436224

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

MAHENDRA INSTITUTE OF TECHNOLOGY (AUTONOMOUS)

Mahendhirapuri, Mallasamudram,Namakkal- 637 503

DEPARTMENT OF B.E CSE (ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)

LAB MANUAL

Academic Year: 2024-2025 (Even Semester)

Year/Sem.: III/VI
Course Code &Title: AI2214601 &Natural Language Processing
Regulation: R2022

1
Page No. Marks Sign
Sr. No. List of Experiments

1 Preprocessing of text (Tokenization, Filtration, Script

Validation, Stop Word Removal, Stemming)

2 Morphological Analysis

3 N-gram model

4 POS tagging

5 Chunking

6 Named Entity Recognition

7 Virtual Lab on Word Generator

8 Mini Project based on NLP Application

2
EX.NO:1
PREPROCESSING OF TEXT

Aim: To study Preprocessing of text (Tokenization, Filtration, Script Validation, Stop Word Removal,
Stemming)

Theory:

To preprocess your text simply means to bring your text into a form that is predictable and analyzable for your
task. A task here is a combination of approach and domain.

Machine Learning needs data in the numeric form. We basically used encoding technique (BagOfWord, Bi-
gram,n-gram, TF-IDF, Word2Vec) to encode text into numeric vector. But before encoding we first need to
clean the text data and this process to prepare (or clean) text databefore encoding is called text
preprocessing, this is the very first step to solve the NLP problems.

Tokenization:
Tokenization is about splitting strings of text into smaller pieces, or “tokens”. Paragraphs can be tokenized
into sentences and sentences can be tokenized into words.

Filtration:
Similarly, if we are doing simple word counts, or trying to visualize our text with a word cloud, stopwords are
some of the most frequently occurring words but don’t really tell us anything. We’re often better off tossing
the stopwords out of the text. By checking the Filter Stopwords option in the Text Pre-processing tool, you
can automatically filter these words out.

Script Validation:
The script must be validated properly.

Stemming
Stemming is the process of reducing inflection in words (e.g. troubled, troubles) to their root form (e.g.
trouble). The “root” in this case may not be a real root word, but just a canonical form of the original word.

Stemming uses a crude heuristic process that chops off the ends of words in the hope of correctly transforming
words into its root form. So, the words “trouble”, “troubled” and “troubles” might actually be converted
to troublinstead of trouble because the ends were just chopped off (ughh, how crude!).

There are different algorithms for stemming. The most common algorithm, which is also known to be
empirically effective for English, is Porters Algorithm. Here is an example of stemming in action with Porter
Stemmer:

3
Stopword Removal
Stop words are a set of commonly used words in a language. Examples of stop words in English are “a”,
“the”, “is”, “are” and etc. The intuition behind using stop words is that, by removing low information words
from text, we can focus on the important words instead.

For example, in the context of a search system, if your search query is “what is text preprocessing?”, you
want the search system to focus on surfacing documents that talk about text preprocessing over documents
that talk about what is. This can be done by preventing all words from your stop word list from being
analyzed. Stop words are commonly applied in search systems, text classification applications, topic
modeling, topic extraction and others.
In my experience, stop word removal, while effective in search and topic extraction systems, showed to be
non-critical in classification systems. However, it does help reduce the number of features in consideration
which helps keep your models decently sized.
Here is an example of stop word removal in action. All stop words are replaced with a dummy character, W:

Code:
String handling:
print(len("what it is what it isnt"))
s=["what","it","is" ,"what","it","isnt"]
print(len(s))
x=sorted(s)
print(s)
print(x)
d=x+s
print(d)

4
print(d)
Output:
23
6
['what', 'it', 'is', 'what', 'it', 'isnt']
['is', 'isnt', 'it', 'it', 'what', 'what']
['is', 'isnt', 'it', 'it', 'what', 'what', 'what', 'it', 'is', 'what', 'it', 'isnt']s

File handling (tokenization and filtering):

for line in open("file.txt"):

for word in line.split():
if word.endswith('ing'):
print(word)
print(len(word))

Output:
eating

dancing

jumping

File.txt
I like eating in restraunt, I like dancing too.
My daughter like bungee jumping.

Result:
In the above experiment we have studied regarding preprocessing of text in detail like filtration, stop word
removal, tokenization, stemming, script validation and have tried to implement the code for it and have
successfully executed it.

5
EX.NO:2
MORPHOLOGICAL ANALYSIS

Aim: To Study Morphological Analysis

Theory:
Morphological Analysis:

While performing the morphological analysis, each particular word is analyzed. Non-word tokens such as
punctuation are removed from the words. Hence the remaining words are assigned categories. For instance,
Ram’s iPhone cannot convert the video from .mkv to .mp4. In Morphological analysis, word by word the
sentence is analyzed.So here, Ram is a proper noun, Ram’s is assigned as possessive suffix and .mkv and
.mp4 is assigned as a file extension.
As shown above, the sentence is analyzed word by word. Each word is assigned a syntactic category. The file
extensions are also identified present in the sentence which is behaving as an adjective in the above example.
In the above example, the possessive suffix is also identified. This is a very important step as the judgment of
prefixes and suffixes will depend on a syntactic category for the word. For example, swims and swims are
different. One makes it plural, while the other makes it a third-person singular verb. If the prefix or suffix is
incorrectly interpreted then the meaning and understanding of the sentence are completely changed. The
interpretation assigns a category to the word. Hence, discard the uncertainty from the word.

Regular Expression:

Regular expressions also called regex. It is a very powerful programming tool that is used for a variety of
purposes such as feature extraction from text, string replacement and other string manipulations. A regular
expression is a set of characters, or a pattern, which is used to find sub strings in a given string. for ex.
extracting all hashtags from a tweet, getting email id or phone numbers etc.,from a large unstructured text
content.

In short, if there’s a pattern in any string, you can easily extract, substitute and do variety of other string
manipulation operations using regular expressions. Regular expressions are a language in itself since they
have their own compilers and almost all popular programming languages support working with regexes.

Stop Word Removal:

The words which are generally filtered out before processing a natural language are called stop words. These
are actually the most common words in any language (like articles, prepositions,

pronouns, conjunctions, etc) and does not add much information to the text. Examples of a few stop words in
English are “the”, “a”, “an”, “so”, “what”.

Stop words are available in abundance in any human language. By removing these words, we remove the low-
level information from our text in order to give more focus to the important information. In order words, we

6
can say that the removal of such words does not show any negative consequences on the model we train for
our task.

Removal of stop words definitely reduces the dataset size and thus reduces the training time due to the fewer
number of tokens involved in the training.

Synonym:

The word synonym defines the relationship between different words that have a similar meaning. A simple
way to decide whether two words are synonymous is to check for substitutability. Two Words are synonyms
in a context if they can be substituted for each for each other without changing the meaning of the sentence.

Stemming:

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the
roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and
natural language processing (NLP).

Code:

Regular Expression:
import re
input="The 5 biggest animals are 1. Elephant,2 Rhino and 3 dinasaur"
input=input.lower()
print(input)
result= re.sub(r'\d+','',input)
print(result)

Output:
the 5 biggest animals are 1. elephant,2 rhino and 3 dinasaur
the biggest animals are . elephant, rhino and dinasaur

7
Stop word removal:
def punctuations(raw_review):
text = raw_review
text = text.replace("n't", ' not')
text = text.replace("'s", ' is')
text = text.replace("'re", ' are')
text = text.replace("'ve", ' have')
text = text.replace("'m", ' am')
text = text.replace("'d", ' would')
text = text.replace("'ll", ' will')
text = text.replace("in", 'ing')
import re
letters_only = re.sub("[^a-zA-Z]"," ",text)
return(''.join(letters_only))
t="Hows's my team doin, you're supposed to be not loosin"
p=punctuations(t)
print(p)

Output
Hows is my team doing you are supposed to be not loosing

Synonym:
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet
synonyms = []
for syn in wordnet.synsets(Machine'):
for lemma in syn.lemmas():
synonyms.append(lemma.name())
print(synonyms)

Output:
['machine', 'machine', 'machine', 'machine', 'simple_machine', 'machine', 'political_machine', 'car', 'auto', 'automobile',
'machine', 'motorcar', 'machine', 'machine']

Stemming:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
print(stemmer.stem('eating'))
print(stemmer.stem('ate'))

8
Output:
eat
ate

Result:

Thus, in the above experiment we have studied regarding morphological analysis in detail with stemming,
synonym, stop word removal, regular expression and tried to implement the code and got proper output.

9
EX.NO:3
N-GRAM MODEL

Aim: To study N-gram model

Theory:

Given a sequence of N-1 words, an N-gram model predicts the most probable word that might follow this
sequence. It's a probabilistic model that's trained on a corpus of text. Such a model is useful in
many NLP applications including speech recognition, machine translation and predictive text input.

An N-gram model is built by counting how often word sequences occur in corpus text and then estimating the
probabilities. Since a simple N-gram model has limitations, improvements are often made via smoothing,
interpolation and backoff.

An N-gram model is one type of a Language Model (LM), which is about finding the probability distribution
over word sequences.

Consider two sentences: "There was heavy rain" vs. "There was heavy flood". From experience, we know that
the former sentence sounds better. An N-gram model will tell us that "heavy rain" occurs much more often
than "heavy flood" in the training corpus. Thus, the first sentence is more probable and will be selected by the
model.

A model that simply relies on how often a word occurs without looking at previous words is called unigram.
If a model considers only the previous word to predict the current word, then it's called bigram. If two
previous words are considered, then it's a trigram model.

An n-gram model for the above example would calculate the following probability:

P('There was heavy rain') = P('There', 'was', 'heavy', 'rain') = P('There')P('was'|'There')P('heavy'|'There

was')P('rain'|'There was heavy')

Since it's impractical to calculate these conditional probabilities, using Markov assumption, we approximate
this to a bigram model:

P('There was heavy rain') ~ P('There')P('was'|'There')P('heavy'|'was')P('rain'|'heavy')

In speech recognition, input may be noisy and this can lead to wrong speech-to-text conversions. N-gram
models can correct this based on their knowledge of the probabilities. Likewise, N-gram models are used in
machine translation to produce more natural sentences in the target language.

When correcting for spelling errors, sometimes dictionary lookups will not help. For example, in the phrase
"in about fifteen mineuts" the word 'minuets' is a valid dictionary word but it's incorrect in this context. N-
gram models can correct such errors.

10
N-gram models are usually at word level. It's also been used at character level to do stemming, that is,
separate the root word from the suffix. By looking at N-gram statistics, we could also classify languages or
differentiate between US and UK spellings. For example, 'sz' is common in Czech; 'gb' and 'kp' are common in
Igbo.

In general, many NLP applications benefit from N-gram models including part-of-speech tagging, natural
language generation, word similarity, sentiment extraction and predictive text input.

Code:

import re
from nltk.util import ngrams
s = "Machine learning is an important part of AI " "and AI is going to become inmporant for daily functionong "
tokens = [token for token in s.split(" ")]
output = list(ngrams(tokens, 2))
print(output)

Output:

[('Machine', 'learning'), ('learning', 'is'), ('is', 'an'), ('an', 'important'), ('important', 'part'), ('part', 'of'), ('of', 'AI'), ('AI',
'and'), ('and', 'AI'), ('AI', 'is'), ('is', 'going'), ('going', 'to'), ('to', 'become'), ('become', 'inmporant'), ('inmporant', 'for'), ('for',
'daily'), ('daily', 'functionong'), ('functionong', '')]

Result:

Thus, in the above experiment we have studied regarding N-Gram Model in detail with the help of theory and
then tried to implement the code and successfully executed it.

11
EX.NO:4
POS TAGGING

Aim: To study POS tagging

Theory:

It is a process of converting a sentence to forms – list of words, list of tuples (where each tuple is having a
form (word, tag)). The tag in case of is a part-of-speech tag, and signifies whether the word is a noun,
adjective, verb, and so on.

Default tagging is a basic step for the part-of-speech tagging. It is performed using the DefaultTagger class.
The DefaultTagger class takes ‘tag’ as a single argument. NN is the tag for a singular noun. DefaultTagger is
most useful when it gets to work with most common part-of-speech tag. that’s why a noun tag is
recommended.

12
Tagging is a kind of classification that may be defined as the automatic assignment of description to the
tokens. Here the descriptor is called tag, which may represent one of the part-of-speech, semantic information
and so on.

Now, if we talk about Part-of-Speech (PoS) tagging, then it may be defined as the process of assigning one of
the parts of speech to the given word. It is generally called POS tagging. In simple words, we can say that
POS tagging is a task of labelling each word in a sentence with its appropriate part of speech. We already
know that parts of speech include nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-
categories.

Most of the POS tagging falls under Rule Base POS tagging, Stochastic POS tagging and Transformation
based tagging.

Rule-based POS Tagging

One of the oldest techniques of tagging is rule-based POS tagging. Rule-based taggers use dictionary or
lexicon for getting possible tags for tagging each word. If the word has more than one possible tag, then rule-
based taggers use hand-written rules to identify the correct tag. Disambiguation can also be performed in rule-
based tagging by analyzing the linguistic features of a word along with its preceding as well as following
words. For example, suppose if the preceding word of a word is article, then word must be a noun.

Stochastic POS Tagging

Another technique of tagging is Stochastic POS Tagging. Now, the question that arises here is which model
can be stochastic. The model that includes frequency or probability (statistics) can be called stochastic. Any
number of different approaches to the problem of part-of-speech tagging can be referred to as stochastic
tagger.

The simplest stochastic tagger applies the following approaches for POS tagging –

Word Frequency Approach

In this approach, the stochastic taggers disambiguate the words based on the probability that a word occurs
with a particular tag. We can also say that the tag encountered most frequently with the word in the training
set is the one assigned to an ambiguous instance of that word. The main issue with this approach is that it may
yield inadmissible sequence of tags.

Tag Sequence Probabilities

It is another approach of stochastic tagging, where the tagger calculates the probability of a given sequence of
tags occurring. It is also called n-gram approach. It is called so

13
because the best tag for a given word is determined by the probability at which it occurs with the n previous
tags.

Transformation-based Tagging

Transformation based tagging is also called Brill tagging. It is an instance of the transformation-based
learning (TBL), which is a rule-based algorithm for automatic tagging of POS to the given text. TBL, allows
us to have linguistic knowledge in a readable form, transforms one state to another state by using
transformation rules.

It draws the inspiration from both the previous explained taggers − rule-based and stochastic. If we see
similarity between rule-based and transformation tagger, then like rule-based, it is also based on the rules that
specify what tags need to be assigned to what words. On the other hand, if we see similarity between
stochastic and transformation tagger then like stochastic, it is machine learning technique in which rules are
automatically induced from data.

HMM for POS Tagging

The POS tagging process is the process of finding the sequence of tags which is most likely to have generated
a given word sequence. We can model this POS process by using a Hidden Markov Model (HMM),
where tags are the hidden states that produced the observable output, i.e., the words.

Code:

import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
text = nltk.word_tokenize("And now for Everything completely Same")
nltk.pos_tag(text)

Output:

[('And', 'CC'),
('now', 'RB'),
('for', 'IN'),
('Everything', 'VBG'),
('completely', 'RB'),
('Same', 'JJ')]

Result:
Thus, we have studied POS Tagging in the above experiment also learned regarding different types of POS
Tagging and tried to implement the code for POS Tagging and successfully
executed it.

14
EX.NO:5 CHUNKING

Aim: To study Chunking

Theory:

Chunk extraction or partial parsing is a process of meaningful extracting short phrases from the sentence
(tagged with Part-of-Speech).Chunks are made up of words and the kinds of words are defined using the part-
of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known
as chinks. A Chunk Rule class specifies what words or patterns to include and exclude in a chunk.

Defining Chunk patterns:

Chuck patterns are normal regular expressions which are modified and designed to match the part-of-speech
tag designed to match sequences of part-of-speech tags. Angle brackets are used to specify an indiviual tag for
example – to match a noun tag. One can define multiple tags in the same way.

Chunking is a process of extracting phrases from unstructured text. Instead of just simple tokens which may
not represent the actual meaning of the text, its advisable to use phrases such as “South Africa” as a single
word instead of ‘South’ and ‘Africa’ separate words.

Chunking in NLP is Changing a perception by moving a “chunk”, or a group of bits of information, in the
direction of a Deductive or Inductive conclusion through the use of language.
Chunking up or down allows the speaker to use certain language patterns, to utilize the natural internal
process through language, to reach for higher meanings or search for more specific bits/portions of missing
information.

When we “Chunk Up” the language gets more abstract and there are more chances for agreement, and when
we “Chunk Down” we tend to be looking for the specific details that may have been missing in the chunk up.

As an example, if you ask the question “for what purpose cars?” you may get the answer “transport”, which is
a higher chunk and more toward abstract.

If you asked “what specifically about a car”? you will start to get smaller pieces of information about a car.

Lateral thinking will be the process of chunking up and then looking for other examples: For example, “for
what intentions cars?”, “transportation”, “what are other examples of transportation?” “Buses!”

Code:

Noun Phrase chunking:

Import nltk

15
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ("dog", "NN"), ("barked", "VBD"), ("at",
"IN"), ("the", "DT"), ("cat", "NN")]
grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
>>> result.draw()

Output:
(S
(NP the/DT little/JJ yellow/JJ dog/NN)
barked/VBD
at/IN
(NP the/DT cat/NN))

Result:

Thus, in the above experiment we have studies regarding chunking and tried to implement the code for same
and successfully executed it.

16
EX.NO:6 NAMED ENTITY RECOGNITION

Aim: To study Named Entity Recognition

Theory:

Named Entity Recognition (NER) is a standard NLP problem which involves spotting named entities (people,
places, organizations etc.) from a chunk of text, and classifying them into a predefined set of categories. Some
of the practical applications of NER include:
 Scanning news articles for the people, organizations and locations reported.
 Providing concise features for search optimization: instead of searching the entire content, one may
simply search for the major entities involved.
 Quickly retrieving geographical locations talked about in Twitter posts.

In any text document, there are particular terms that represent specific entities that are more informative and
have a unique context. These entities are known as named entities, which more specifically refer to terms that
represent real-world objects like people, places, organizations, and so on, which are often denoted by proper
names. A naive approach could be to find these by looking at the noun phrases in text documents. Named
entity recognition (NER), also known as entity chunking/extraction, is a popular technique used in
information extraction to identify and segment the named entities and classify or categorize them under
various predefined classes.

How NER works

At the heart of any NER model is a two step process:
Detect a named entity

17
Categorize the entity
Beneath this lie a couple of things.
Step one involves detecting a word or string of words that form an entity. Each word represents a token: “The
Great Lakes” is a string of three tokens that represents one entity. Inside-outside-beginning tagging is a
common way of indicating where entities begin and end. We’ll explore this further in a future blog post.
The second step requires the creation of entity categories.

How is NER used?

NER is suited to any situation in which a high-level overview of a large quantity of text is helpful. With NER,
you can, at a glance, understand the subject or theme of a body of text and quickly group texts based on their
relevancy or similarity.

Some notable NER use cases include:

Human resources

Speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing
employee complaints and questions
Customer support

Improve response times by categorizing user requests, complaints and questions and filtering by priority
keywords

Code:

Named Entity Recognition

locs = [('Omnicom', 'IN', 'New York'),
('DDB Needham', 'IN', 'New York'),
('Kaplan Thaler Group', 'IN', 'New York'),
('BBDO South', 'IN', 'Atlanta'),
('Georgia-Pacific', 'IN', 'Atlanta')]
query = [e1 for (e1, rel, e2) in locs if e2=='Atlanta']
print(query)

Output:

['BBDO South', 'Georgia-Pacific']

18
Result:

Thus, in the above experiment we have studied regarding named entity recognition, working of named entity
recognition, how named entity recognition can be used and then implemented the code for the same and
successfully executed it.

19
EX.NO:7
VIRTUAL LAB ON WORD GENERATION

Aim: Virtual Lab on Word Generation

Theory: Given the root and suffix information, a word can be generated. For example,

Language input:analysis

Hindi rt=लड़का(ladakaa), cat=n, gen=m, num=sg, case=obl

Hindi rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=dir

English rt=boy, cat=n, num=pl

English rt=play, cat=v, num=sg, per=3, tense=pr

Morphological analysis and generation: Inverse processes.

Analysis may involve non-determinism, since more than one analysis is possible.
Generation is a deterministic process. In case a language allows spelling variation, then till that extent, generation would
also involve non-determinism.

Input:

Output:

Result: Thus, in the above experiment we have studied regarding Word Generation .

20
Ex.No:8 MINI PROJECT ON WORD PREDICTION

Aim: Miniproject based on NLP applications

Theory:These tools can be very helpful for kids who struggle with writing.
To use word prediction, your child needs to use a keyboard to write. This can be an onscreen keyboard on a smartphone
or digital tablet. Or it can be a physical keyboard connected to a device or computer.
Those suggestions are shown on the screen, like at the top of an onscreen keyboard. The child clicks or taps on a
suggested word, and it’s inserted into the writing.
There are also advanced word prediction tools available. They include:
Tools that read word choices aloud with text-to-speech. This is important for kids with reading issues who can’t read
what the suggestions are.
Word prediction tools that make suggestions tailored to specific topics. For instance, the words used in a history paper
will differ a lot from those in a science report. To make suggestions more accurate, kids can pick special dictionaries for
what they’re writing about.
Tools that display word suggestions in example sentences. This can help kids decide between words that are confusing,
like to, too and two.

Code:
import bs4 as bs
import urllib.request
import re
import nltk
In [38]:
scrapped_data = urllib.request.urlopen('https://en.wikipedia.org/wiki/Artificial_intelligence’)
article = scrapped_data .read()

parsed_article = bs.BeautifulSoup(article,'lxml')
paragraphs = parsed_article.find_all('p')
article_text = ""
for p in paragraphs:
article_text += p.text
Remove Stop Words
try:
import string
from nltk.corpus import stopwords
import nltk
except Exception as e:

21
print(e)
class PreProcessText(object):
def __init__(self):
pass
def __remove_punctuation(self, text):
"""
Takes a String
return : Return a String
"""
message = []
for x in text:
if x in string.punctuation:
pass
else:
message.append(x)
message = ''.join(message)
return message
def __remove_stopwords(self, text):
"""
Takes a String
return List
"""
words= []
for x in text.split():
if x.lower() in stopwords.words('english'):
pass
else:
words.append(x)
return words
def token_words(self,text=''):
"""
Takes String

22
Return Token also called list of words that is used to
Train the Model
"""
message = self.__remove_punctuation(text)
words = self.__remove_stopwords(message)
return words
import nltk
flag = nltk.download("stopwords")
if (flag == "False" or flag == False):
print("Failed to Download Stop Words")
else:
print("Downloaded Stop words ...... ")
helper = PreProcessText()
#words = helper.token_words(text=txt)
words = helper.token_words(text=article_text)
from gensim.models import Word2Vec
#model = Word2Vec([words], min_count=1)
model = Word2Vec([words], size=100, window=5, min_count=1, workers=4)
vocabulary = model.wv.vocab
sim_words = model.wv.most_similar('machine')
sim_words

Output: [('supervised', 0.28355103731155396),

('favor18', 0.30931341648101807),
('provide', 0.3048184812068939),
('would', 0.296578615903812237),
('species', 0.2742482125759125),
('collectively', 0.27363914251327515),
('transformative', 0.272122971861839294),
('advanced', 0.27198636531829834),
('n', 0.3024228811264038),]

Result: Thus we completed mini-project on similar word prediction successfully.

NLP Lab Manual Lab Work
No ratings yet
NLP Lab Manual Lab Work
24 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Basic English Grammar From Espresso English
100% (2)
Basic English Grammar From Espresso English
209 pages
NLP Practical
No ratings yet
NLP Practical
27 pages
NLP Sem Answers (All)
No ratings yet
NLP Sem Answers (All)
124 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
List of Verbs, Nouns, Adjectives and Adverbs
100% (1)
List of Verbs, Nouns, Adjectives and Adverbs
5 pages
Contrastive Analysis - Chapter 1 - p3
0% (1)
Contrastive Analysis - Chapter 1 - p3
71 pages
Blueprint - British English Student Book 1 TG (En)
No ratings yet
Blueprint - British English Student Book 1 TG (En)
120 pages
Subject and Predicate
No ratings yet
Subject and Predicate
30 pages
CEFR B1 Learning Outcomes
100% (1)
CEFR B1 Learning Outcomes
13 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
16 pages
Villupuram: Name: T. Vijayalakshmi Course: B.ED., Batch: 2009 - 2010 Major: English Topic: "Sentence Pattern"
0% (1)
Villupuram: Name: T. Vijayalakshmi Course: B.ED., Batch: 2009 - 2010 Major: English Topic: "Sentence Pattern"
25 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
Ai TXT Unit2
No ratings yet
Ai TXT Unit2
14 pages
Soal Pretest-A1
No ratings yet
Soal Pretest-A1
12 pages
Kontrak Bahasa Inggeris Form 3 2015
No ratings yet
Kontrak Bahasa Inggeris Form 3 2015
6 pages
Compounds
100% (1)
Compounds
11 pages
The Branches of Trees Are Dancing With The Wind
No ratings yet
The Branches of Trees Are Dancing With The Wind
4 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
Writing About A Bar Chart Exercise
No ratings yet
Writing About A Bar Chart Exercise
4 pages
Lab 2
No ratings yet
Lab 2
49 pages
Statistical NLP
No ratings yet
Statistical NLP
45 pages
Parts of Speech and Past and Present
No ratings yet
Parts of Speech and Past and Present
8 pages
An Easy Guide To Writing
No ratings yet
An Easy Guide To Writing
43 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
PDF NLP
No ratings yet
PDF NLP
7 pages
Connectors PPT by TsafiraS
No ratings yet
Connectors PPT by TsafiraS
10 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
2-Text Operations - New
No ratings yet
2-Text Operations - New
39 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
18 Text Mining - Text Preprocessing
No ratings yet
18 Text Mining - Text Preprocessing
40 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
2 - Text Operation - 1
No ratings yet
2 - Text Operation - 1
28 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
29 pages
Speaking Test Information: General Description
No ratings yet
Speaking Test Information: General Description
3 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
2 Text Operations
No ratings yet
2 Text Operations
32 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Unit 1b
No ratings yet
Unit 1b
24 pages
C10 - Ai - Unit 3 - NLP - Half Yearly
No ratings yet
C10 - Ai - Unit 3 - NLP - Half Yearly
37 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
ANLP semVI Labmanual
No ratings yet
ANLP semVI Labmanual
33 pages
Unit 2
No ratings yet
Unit 2
20 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
20 pages
Alternatives To If
No ratings yet
Alternatives To If
1 page
Text Analytics Basics
No ratings yet
Text Analytics Basics
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
frequency-adverbs- (для англ.заняття)
No ratings yet
frequency-adverbs- (для англ.заняття)
2 pages
Give Instructions in German
No ratings yet
Give Instructions in German
5 pages
572-Article Text-2084-1-10-20191125
No ratings yet
572-Article Text-2084-1-10-20191125
22 pages
Extracting, Cleaning and Pre-Processing Text
No ratings yet
Extracting, Cleaning and Pre-Processing Text
12 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
VO - MCA - SEM 4 - Text Mining - U2
No ratings yet
VO - MCA - SEM 4 - Text Mining - U2
15 pages
NLP Notes
No ratings yet
NLP Notes
12 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
Three Things
No ratings yet
Three Things
2 pages
E-Book Grammar by Anas
No ratings yet
E-Book Grammar by Anas
17 pages
DLP Q3W6 Day 1
No ratings yet
DLP Q3W6 Day 1
10 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
Ass7 Write Up .Final
No ratings yet
Ass7 Write Up .Final
11 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
WORKSHEET 2 IN ENGLISH 5-1st Quarter Clarify Meaning of Words Using Dictionary and Thesaurus
No ratings yet
WORKSHEET 2 IN ENGLISH 5-1st Quarter Clarify Meaning of Words Using Dictionary and Thesaurus
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
Understanding Each Pre-Processing Aspect
No ratings yet
Understanding Each Pre-Processing Aspect
5 pages
Vocabulary (Acedemic WDS) - Exercise 1
No ratings yet
Vocabulary (Acedemic WDS) - Exercise 1
4 pages
Noun Clauses
No ratings yet
Noun Clauses
10 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Patriotism
No ratings yet
Patriotism
3 pages
Ability Bluff
No ratings yet
Ability Bluff
3 pages
Tobe
No ratings yet
Tobe
2 pages
Ing - 1C
No ratings yet
Ing - 1C
1 page

NLB Final Lab Manual

Uploaded by

NLB Final Lab Manual

Uploaded by

MAHENDRA INSTITUTE OF TECHNOLOGY (AUTONOMOUS)

Mahendhirapuri, Mallasamudram,Namakkal- 637 503

DEPARTMENT OF B.E CSE (ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)

Academic Year: 2024-2025 (Even Semester)

1 Preprocessing of text (Tokenization, Filtration, Script

6 Named Entity Recognition

7 Virtual Lab on Word Generator

8 Mini Project based on NLP Application

File handling (tokenization and filtering):

for line in open("file.txt"):

Aim: To Study Morphological Analysis

Stop Word Removal:

Aim: To study N-gram model

P('There was heavy rain') = P('There', 'was', 'heavy', 'rain') = P('There')P('was'|'There')P('heavy'|'There

P('There was heavy rain') ~ P('There')P('was'|'There')P('heavy'|'was')P('rain'|'heavy')

Aim: To study POS tagging

Rule-based POS Tagging

Stochastic POS Tagging

Word Frequency Approach

Tag Sequence Probabilities

HMM for POS Tagging

Aim: To study Chunking

Defining Chunk patterns:

Noun Phrase chunking:

Aim: To study Named Entity Recognition

How NER works

How is NER used?

Some notable NER use cases include:

Named Entity Recognition

['BBDO South', 'Georgia-Pacific']

Aim: Virtual Lab on Word Generation

Hindi rt=लड़का(ladakaa), cat=n, gen=m, num=sg, case=obl

Hindi rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=dir

English rt=boy, cat=n, num=pl

English rt=play, cat=v, num=sg, per=3, tense=pr

Morphological analysis and generation: Inverse processes.

Aim: Miniproject based on NLP applications

Output: [('supervised', 0.28355103731155396),

Result: Thus we completed mini-project on similar word prediction successfully.

You might also like