0% found this document useful (0 votes)

13 views12 pages

NLPAssignment Purna

Uploaded by

DHARMESH SHRIVASTAVA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views12 pages

NLPAssignment Purna

Uploaded by

DHARMESH SHRIVASTAVA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

1a. Differentiate NLP and NLU?

NLP or natural language processing is evolved from computational linguistics, which aims to model
natural human language data. Also, NLP processes a large amount of human data and focus on use
of machine learning and deep learning techniques. It is commonly used in computer science,
information systems, linguistics, communications, and philosophy.

NLP has many subfields, including computational linguistics, syntax analysis, speech recognition,
machine translation, and more.

Natural language processing works by taking unstructured text and converting it into a correct format
or a structured text. It works by building the algorithm and training the model on large amounts of
data analyzed to understand what the user means when they say something.

It works by taking and identifying various entities together (named entity recognition) and
identification of word patterns. The word patterns are identified using methods such as tokenization,
stemming, and lemmatization.

NLP undertakes various tasks such as parsing, speech recognition, part-of-speech tagging, and
information extraction.

In the real world, NLP is used for text summarization, sentiment analysis, topic extraction, named
entity recognition, parts-of-speech tagging, relationship extraction, stemming, text mining, machine
translation, and automated question answering, as well as ontology population, language modelling,
and any other language-related task.

1|Page
NLU is a subset of natural language processing that uses the semantic analysis of text to
understand the meaning of sentences. It's possible that the same text can have many meanings,
that different words can have the same meaning, or that the meaning can change depending on the
situation.

NLU algorithms process text from different sources using computational methods to reach some
understanding of an input text, which is as simple as understanding what a sentence says or as
complex as understanding the dialogue between two people. So, NLU uses computational methods
to understand the text and produce a result.

NLU can be used in many different ways, including understanding dialogue between two people,
understanding how someone feels about a particular situation, and other similar scenarios.

There are namely three linguistic levels to understand NLU:

 Syntax: This is the process of understanding how sentences are constructed and if the
grammar is used correctly. For example, to understand if a sentence makes sense, it must
be considered in context and its syntax analyzed.

 Semantics: When we look at the text that contains contextual meaning details such as tone
of voice or word choice between two people. These pieces of data can also be used for an
NLU algorithm to produce results from all possible contexts in which the same piece of
spoken

 Pragmatic analysis: It helps understand the context and what the text is trying to achieve.

 Word sense disambiguation is the process of determining the meaning of words in

sentences. It gives a word meaning based on its context.

The major difference between the NLU and NLP is that NLP focuses on building algorithms to
recognize and understand natural language, while NLU focuses on the meaning of a sentence.
Another difference is that NLP breaks and processes language, while NLU provides language
comprehension.

Both NLU and NLP use supervised learning, which means that they train their models using labelled
data. However, the difference between them is in how it's done.

2|Page
Another difference between NLU and NLP is that NLU is focused more on sentiment analysis.
Sentiment analysis involves extracting information from the text in order to determine the emotional
tone of a text.

Natural language processing and natural language understanding language are not just about
training a dataset. The computer uses NLP algorithms to detect patterns in a large amount of
unstructured data.

NLU recognizes that language is a complex task made up of many components such as motions,
facial expression recognition etc. Furthermore, NLU enables computer programmes to deduce
purpose from language, even if the written or spoken language is flawed.

1b. Write regular expression for validation of email

A regular expression is a sequence of characters that defines a search pattern. They are used to
match character combinations in strings. A very common example of a real world use is when websites
verify whether the email address you entered is valid or not.

Any email is combination of 3 parts that is username, Domain & TLD

Username

We will need to extract all alphabets and numbers and dots before the @ sign. The following regexes
will get us what we want

A single alphabet or a number — [A-Za-z0-9]

Multiple “alphabets and numbers” — [A-Za-z0-9]*

Anything before an @ sign — ()@

Combining these 3 we get ([A-Za-z0-9\.]*)@

Domain

We will need to extract all alphabets and numbers after the @ and before the first dot. The following
regexes will get us what we want

Anything after the @ — @()

Anything before a dot — ()\.

Multiple “alphabets and numbers” — [A-Za-z0-9]*

Combining these 3 we get @([A-Za-z0–9]*)\.

3|Page
TLD

We will need to extract all alphabets after the dot that occurs after the domain name. The following
regexes will get us what we want

Anything after the domain name— regex-for-domain-name() = @[A-Za-z0–9]*\.

Multiple “alphabets and numbers” — [A-Za-z0-9]*

Combining these 2we get @[A-Za-z0–9]\.([A-Za-z\.])

1c. What is nGram tagging.

N-grams of texts are extensively used in text mining and natural language processing tasks.

NgramTagger has 3 subclasses

 UnigramTagger

 BigramTagger

 TrigramTagger

BigramTagger subclass uses previous tag as part of its context TrigramTagger subclass uses the
previous two tags as part of its context. ngram – It is a subsequence of n items.
Idea of NgramTagger subclasses :

By looking at the previous words and P-O-S tags, part-of-speech tag for the current word can be
guessed. Each tagger maintains a context dictionary (ContextTagger parent class is used to
implement it). This dictionary is used to guess that tag based on the context.

The context is some number of previous tagged words in the case of NgramTagger subclasses.

# Loading Libraries

from nltk.tag import DefaultTagger

from nltk.tag import BigramTagger

from nltk.corpus import treebank

# initializing training and testing set

4|Page
train_data = treebank.tagged_sents()[:3000]

test_data = treebank.tagged_sents()[3000:]

# Tagging

tag1 = BigramTagger(train_data)

# Evaluation

tag1.evaluate(test_data)

Output:
0.11318799913662854

Code #2 : Working of Trigram tagger

# Loading Libraries

from nltk.tag import DefaultTagger

from nltk.tag import TrigramTagger

from nltk.corpus import treebank

# initializing training and testing set

train_data = treebank.tagged_sents()[:3000]

test_data = treebank.tagged_sents()[3000:]

# Tagging

tag1 = TrigramTagger(train_data)

# Evaluation

tag1.evaluate(test_data)

5|Page
Output :

0.06876753723289446

1d. Define precision & Recall, explain cofth an example.

There is this huge farm filled with apple and orange trees.The owner of the farm wants to build a
classifier that would rightly predict apples and oranges so that he could categorize them and sell. The
owner of the farm wants to build a classifier that would rightly predict apples and oranges so that he
could categorize them and sell. The owner builds a classifier and sends random samples of 13 fruits
to calssify.

He made a chart as below to check how well the model had performed:

True positives: These are the apples that the model rightly predicted.

False positives: There are the oranges that the model predicted as apples.

False negatives: There are the apples that the model predicted as oranges.

True negatives: There are the oranges that the model rightly predicted.

From the chart we can draw the below inferences:

• Model classified 2 oranges as apples

• Model classified 3 apples as oranges

• Model classified 5 apples rightly

• Model classified 3 oranges rightly

The above image helps us gain a different insight into the model’s predictions:

• Out of 8 values that it classified as apples, only 5 are real apples, 3 are oranges.

• Out of 5 values that it classified as oranges, only 2 are real oranges, 3 are apples.

Now, let’s dive into precision and recall. Concerning the above image.

Precision:

6|Page
It is the quantity of the right predictions that the model made. In simpler words, it is:

Number of apples predicted correctly by the model / Number of apples and oranges predicted correctly
by the model

It doesn’t consider the wrong predictions done by the model.

The formula for precision:

# of true positives/ (# of true positives + # of false positives)

Precision for apple predictor: 5/(5+2) = 5/7 = 0.714

Recall:

It is the quantity of right predictions the model made concerning the total positive values present. In
simpler words, it is:

Number of apples predicted correctly by the model/Total number of apples

The total number of apples is the number of apples sent to the system i.e., 8.

It considers the wrong prediction made by the model. The formula for recall:

# of true positives/(# of false negatives + # of true positives)

For the above example, it is: 5/(5+3) = 5/8 = 0.625

So, we know that the model created by the owner of the farm has high precision but low recall!.

2a. # Python Program to Count Vowels and Consonants in a String

str1 = input("Please Enter Your Own String : ")

vowels = 0

consonants = 0

for i in str1:

if(i == 'a' or i == 'e' or i == 'i' or i == 'o' or i == 'u'

or i == 'A' or i == 'E' or i == 'I' or i == 'O' or i == 'U'):

vowels = vowels + 1

else:

consonants = consonants + 1

print("Total Number of Vowels in this String = ", vowels)

print("Total Number of Consonants in this String = ", consonants)

7|Page
2b. write in-detail about Regular Expressions for detecting word patterns.

Many linguistic processing tasks involve pattern matching. For example, we can find words ending
with ed using endswith('ed'). Regular expressions give us a more powerful and flexible method for de
scribing the character patterns we are interested in.

To use regular expressions in Python, we need to import the re library using: import re. Let’s find
words ending with ed using the regular expression «ed$». We will use the re.search(p, s) function
to check whether the pattern p can be found somewhere inside the string s. We need to specify the
characters of interest, and use the dollar sign, which has a special behavior in the context of regular
expressions in that it matches the end of the word:

>>> [w for w in wordlist if re.search('ed$', w)]

['abaissed', 'abandoned', 'abased', 'abashed', 'abatised', 'abed', 'aborted', ...]

The . wildcard symbol matches any single character. Suppose we have room in a crossword puzzle
for an eight-letter word, with j as its third letter and t as its sixth letter. In place of each blank cell we
use a period:

>>> [w for w in wordlist if re.search('^..j..t..$', w)]

['abjectly', 'adjuster', 'dejected', 'dejectly', 'injector', 'majestic', ...

Finally, the ? symbol specifies that the previous character is optional. Thus «^e-?mail $» will match
both email and e-mail. We could count the total number of occurrences of this word (in either spelling)
in a text using sum(1 for w in text if re.search('^e-?mail$', w)).

8|Page
3a. write the implementation of stemming & lemmatizat"

Stemming

Stemming generates the base word from the inflected word by removing the affixes of the word. It has
a set of pre-defined rules that govern the dropping of these affixes. It must be noted that stemmers
might not always result in semantically meaningful base words. Stemmers are faster and
computationally less expensive than lemmatizers.

from nltk.stem import PorterStemmer

# create an object of class PorterStemmer

porter = PorterStemmer()

print(porter.stem("Communication"))

Output:

commun

The stemmer reduces the word ‘communication’ to a base word ‘commun’ which is meaningless in
itself.

Lemmatization

Lemmatization involves grouping together the inflected forms of the same word. This way, we can
reach out to the base form of any word which will be meaningful in nature. The base from here is called
the Lemma.

Lemmatizers are slower and computationally more expensive than stemmers.

9|Page
Example:

'play', 'plays', 'played', and 'playing' have 'play' as the lemma.

In Python, both these tokenizations can be implemented in NLTK as follows:

from nltk.stem import WordNetLemmatizer

# create an object of class WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print(lemmatizer.lemmatize("plays", 'v'))

print(lemmatizer.lemmatize("played", 'v'))

print(lemmatizer.lemmatize("play", 'v'))

print(lemmatizer.lemmatize("playing", 'v'))

Output:

play

3b. Give the implementation of pos Tagging.

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

stop_words = set(stopwords.words('english'))

// Dummy text

txt = "Sukanya, Rajib and Naba are my good friends. " \

"Sukanya is getting married next year. " \

"Marriage is a big step in one’s life." \

"It is both exciting and frightening. " \

"But friendship is a sacred bond between people." \

"It is a special kind of love between us. " \

"Many of you must have tried searching for a friend "\

10 | P a g e
"but never found the right one."

# sent_tokenize is one of instances of

# PunktSentenceTokenizer from the nltk.tokenize.punkt module

tokenized = sent_tokenize(txt)

for i in tokenized:

# Word tokenizers is used to find the words

# and punctuation in a string

wordsList = nltk.word_tokenize(i)

# removing stop words from wordList

wordsList = [w for w in wordsList if not w in stop_words]

# Using a Tagger. Which is part-of-speech

# tagger or POS-tagger.

tagged = nltk.pos_tag(wordsList)

print(tagged)

Output:

[('Sukanya', 'NNP'), ('Rajib', 'NNP'), ('Naba', 'NNP'), ('good', 'JJ'), ('friends', 'NNS')]

[('Sukanya', 'NNP'), ('getting', 'VBG'), ('married', 'VBN'), ('next', 'JJ'), ('year', 'NN')]

[('Marriage', 'NN'), ('big', 'JJ'), ('step', 'NN'), ('one', 'CD'), ('’', 'NN'), ('life', 'NN')]

[('It', 'PRP'), ('exciting', 'VBG'), ('frightening', 'VBG')]

[('But', 'CC'), ('friendship', 'NN'), ('sacred', 'VBD'), ('bond', 'NN'), ('people', 'NNS')]

[('It', 'PRP'), ('special', 'JJ'), ('kind', 'NN'), ('love', 'VB'), ('us', 'PRP')]

[('Many', 'JJ'), ('must', 'MD'), ('tried', 'VB'), ('searching', 'VBG'), ('friend', 'NN'),

('never', 'RB'), ('found', 'VBD'), ('right', 'RB'), ('one', 'CD')]

4a. Give the framework for the supervised classification

Classification is the task of choosing the correct class label for a given input. In basic classification
tasks, each input is considered in isolation from all other inputs, and the set of labels is defined in
advance. Some examples of classification tasks are:

Deciding whether an email is spam or not.

• Deciding what the topic of a news article is, from a fixed list of topic areas such as “sports,”
“technology,” and “politics.”

11 | P a g e
• Deciding whether a given occurrence of the word bank is used to refer to a river bank, a financial
institution, the act of tilting to the side, or the act of depositing something in a financial institution.

A classifier is called supervised if it is built based on training corpora containing the correct label for
each input. The framework used by supervised classification is shown in Figure

4b. Explain Naive Based algorithm with suitable numerical examples in the Context of Text
analysis.

12 | P a g e

Globalization and Language Teaching (David Block Deborah Cameron) (Z-Library)
No ratings yet
Globalization and Language Teaching (David Block Deborah Cameron) (Z-Library)
261 pages
SSC All Board Questions
63% (8)
SSC All Board Questions
3 pages
Unit 4 Short Test 1A: Grammar
100% (1)
Unit 4 Short Test 1A: Grammar
2 pages
Grammar - Tag Questions - Use and Form 2 - Mode - Report - Unit 1 - Lesson 1 - L5 G2 2021B - MyEnglishLab
82% (11)
Grammar - Tag Questions - Use and Form 2 - Mode - Report - Unit 1 - Lesson 1 - L5 G2 2021B - MyEnglishLab
2 pages
34 Past Tense - Turkish Language Lessons
No ratings yet
34 Past Tense - Turkish Language Lessons
4 pages
Guia de Estudo de Inglês: Agree Agreed Answer Answered Apologize Apologized Awaken Awakened
0% (1)
Guia de Estudo de Inglês: Agree Agreed Answer Answered Apologize Apologized Awaken Awakened
24 pages
Upotreba Clanova Ispred Geografskih Pojmova U Engleskom Jeziku
No ratings yet
Upotreba Clanova Ispred Geografskih Pojmova U Engleskom Jeziku
2 pages
Soal Bahasa Inggris Kelas Xi
No ratings yet
Soal Bahasa Inggris Kelas Xi
12 pages
Gramática - Adjetivos Posesivos
No ratings yet
Gramática - Adjetivos Posesivos
1 page
Sample
No ratings yet
Sample
8 pages
Project Report
No ratings yet
Project Report
12 pages
Cafe Hub 1.4
No ratings yet
Cafe Hub 1.4
2 pages
Present Simple Routine, Verbs, Nouns, Time Book
No ratings yet
Present Simple Routine, Verbs, Nouns, Time Book
10 pages
NLP Steps Basic
No ratings yet
NLP Steps Basic
26 pages
Contrasena 02 Gramatica II
No ratings yet
Contrasena 02 Gramatica II
2 pages
Everyday We Talk
No ratings yet
Everyday We Talk
51 pages
Our Town Thesis Statement
100% (3)
Our Town Thesis Statement
7 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Complete Materials
No ratings yet
Complete Materials
35 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
The Summary of Types of Language Tests and Non-Tests: Suci Ambarwati 21701073015
No ratings yet
The Summary of Types of Language Tests and Non-Tests: Suci Ambarwati 21701073015
3 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
The Box Worksheet - JuanSanchez
No ratings yet
The Box Worksheet - JuanSanchez
4 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Draft: Natural Language Processing For The Working Programmer
No ratings yet
Draft: Natural Language Processing For The Working Programmer
79 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
11 Modulo
No ratings yet
11 Modulo
31 pages
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
No ratings yet
Top 30 NLP Interview Questions and Answers: 1. What Do You Understand by Natural Language Processing?
18 pages
BE Mesy Meisyaroh
No ratings yet
BE Mesy Meisyaroh
24 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
Module 3
No ratings yet
Module 3
17 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
Unit - 1
No ratings yet
Unit - 1
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP 1
No ratings yet
NLP 1
29 pages
KTP2024-2025 English Plus 9 Salikha
No ratings yet
KTP2024-2025 English Plus 9 Salikha
14 pages
Unit 1a
No ratings yet
Unit 1a
53 pages
STD 11 Grammar Worksheet 1
No ratings yet
STD 11 Grammar Worksheet 1
2 pages
Handling Corpus Raw Text
No ratings yet
Handling Corpus Raw Text
15 pages
Unit V Natural Language Processing
No ratings yet
Unit V Natural Language Processing
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Unit - 1 Introduction
No ratings yet
Unit - 1 Introduction
33 pages
Unit 1 NLP and TA
No ratings yet
Unit 1 NLP and TA
9 pages
Nghe 3 - Giao Trinh
No ratings yet
Nghe 3 - Giao Trinh
95 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
Skrip Sempro
No ratings yet
Skrip Sempro
15 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
7-Text Classification-13-11-2024
No ratings yet
7-Text Classification-13-11-2024
53 pages
CC S 339 NLP Basics &TSA
No ratings yet
CC S 339 NLP Basics &TSA
68 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
Eaktida, Journal Manager, Tidarat Ngamnikorn
No ratings yet
Eaktida, Journal Manager, Tidarat Ngamnikorn
10 pages
CH-2 Natural Language Processing Models and Algorithm
No ratings yet
CH-2 Natural Language Processing Models and Algorithm
119 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Koala
No ratings yet
Koala
14 pages
NLP Mod 1 SEE
No ratings yet
NLP Mod 1 SEE
7 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
NLP Book
No ratings yet
NLP Book
599 pages
Natural Language Processing Manual
No ratings yet
Natural Language Processing Manual
39 pages
NLP Ia1
No ratings yet
NLP Ia1
7 pages
NLP Front Matter
No ratings yet
NLP Front Matter
28 pages
Unit I - NLP
No ratings yet
Unit I - NLP
24 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
Mod 1
No ratings yet
Mod 1
71 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
The Ismailis in The Middle Ages A History of Survival A Search For Salvation 1st Edition Shafique N. Virani Download
No ratings yet
The Ismailis in The Middle Ages A History of Survival A Search For Salvation 1st Edition Shafique N. Virani Download
44 pages
UNIT-5 Quetions - Answers
No ratings yet
UNIT-5 Quetions - Answers
10 pages
CAT King Study Material 5
No ratings yet
CAT King Study Material 5
21 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
NLP Pipeline: Chapter-2
No ratings yet
NLP Pipeline: Chapter-2
171 pages
En - Iso - 15875 - 3 - 2004 - 06 - 01 - en Fitinguri
No ratings yet
En - Iso - 15875 - 3 - 2004 - 06 - 01 - en Fitinguri
18 pages
NLP Questions
No ratings yet
NLP Questions
26 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
NLP 1
No ratings yet
NLP 1
13 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
NLP Saurav
No ratings yet
NLP Saurav
16 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
Unit 1
No ratings yet
Unit 1
99 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Intro To NLP
No ratings yet
Intro To NLP
44 pages
BAI601 All Modules VTU 10 Mark Complete
No ratings yet
BAI601 All Modules VTU 10 Mark Complete
18 pages
2-s Jones - The History and Meaning of The Term Phoneme
No ratings yet
2-s Jones - The History and Meaning of The Term Phoneme
10 pages
Unit 1
No ratings yet
Unit 1
20 pages
Business Translation Essentials
No ratings yet
Business Translation Essentials
3 pages

NLPAssignment Purna

Uploaded by

NLPAssignment Purna

Uploaded by

1a. Differentiate NLP and NLU?

There are namely three linguistic levels to understand NLU:

 Word sense disambiguation is the process of determining the meaning of words in

1b. Write regular expression for validation of email

Any email is combination of 3 parts that is username, Domain & TLD

A single alphabet or a number — [A-Za-z0-9]

Multiple “alphabets and numbers” — [A-Za-z0-9]*

Anything before an @ sign — ()@

Combining these 3 we get ([A-Za-z0-9\.]*)@

Anything after the @ — @()

Anything before a dot — ()\.

Multiple “alphabets and numbers” — [A-Za-z0-9]*

Combining these 3 we get @([A-Za-z0–9]*)\.

Anything after the domain name— regex-for-domain-name() = @[A-Za-z0–9]*\.

Multiple “alphabets and numbers” — [A-Za-z0-9]*

Combining these 2we get @[A-Za-z0–9]*\.([A-Za-z\.]*)

1c. What is nGram tagging.

NgramTagger has 3 subclasses

from nltk.tag import DefaultTagger

from nltk.tag import BigramTagger

from nltk.corpus import treebank

# initializing training and testing set

Code #2 : Working of Trigram tagger

from nltk.tag import DefaultTagger

from nltk.tag import TrigramTagger

from nltk.corpus import treebank

# initializing training and testing set

1d. Define precision & Recall, explain cofth an example.

From the chart we can draw the below inferences:

• Model classified 2 oranges as apples

• Model classified 3 apples as oranges

• Model classified 5 apples rightly

• Model classified 3 oranges rightly

It doesn’t consider the wrong predictions done by the model.

The formula for precision:

# of true positives/ (# of true positives + # of false positives)

Precision for apple predictor: 5/(5+2) = 5/7 = 0.714

Number of apples predicted correctly by the model/Total number of apples

# of true positives/(# of false negatives + # of true positives)

For the above example, it is: 5/(5+3) = 5/8 = 0.625

2a. # Python Program to Count Vowels and Consonants in a String

str1 = input("Please Enter Your Own String : ")

if(i == 'a' or i == 'e' or i == 'i' or i == 'o' or i == 'u'

or i == 'A' or i == 'E' or i == 'I' or i == 'O' or i == 'U'):

print("Total Number of Vowels in this String = ", vowels)

print("Total Number of Consonants in this String = ", consonants)

>>> [w for w in wordlist if re.search('ed$', w)]

['abaissed', 'abandoned', 'abased', 'abashed', 'abatised', 'abed', 'aborted', ...]

>>> [w for w in wordlist if re.search('^..j..t..$', w)]

['abjectly', 'adjuster', 'dejected', 'dejectly', 'injector', 'majestic', ...

from nltk.stem import PorterStemmer

# create an object of class PorterStemmer

Lemmatizers are slower and computationally more expensive than stemmers.

'play', 'plays', 'played', and 'playing' have 'play' as the lemma.

In Python, both these tokenizations can be implemented in NLTK as follows:

from nltk.stem import WordNetLemmatizer

# create an object of class WordNetLemmatizer

3b. Give the implementation of pos Tagging.

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize, sent_tokenize

txt = "Sukanya, Rajib and Naba are my good friends. " \

"Sukanya is getting married next year. " \

"Marriage is a big step in one’s life." \

"It is both exciting and frightening. " \

"But friendship is a sacred bond between people." \

"It is a special kind of love between us. " \

"Many of you must have tried searching for a friend "\

# sent_tokenize is one of instances of

# PunktSentenceTokenizer from the nltk.tokenize.punkt module

# Word tokenizers is used to find the words

# and punctuation in a string

# removing stop words from wordList

wordsList = [w for w in wordsList if not w in stop_words]

# Using a Tagger. Which is part-of-speech

[('It', 'PRP'), ('exciting', 'VBG'), ('frightening', 'VBG')]

('never', 'RB'), ('found', 'VBD'), ('right', 'RB'), ('one', 'CD')]

Combining these 2we get @[A-Za-z0–9]\.([A-Za-z\.])