0% found this document useful (0 votes)
8 views3 pages

Experiment 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Experiment 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Department of Computer Science & Engineering (AI&ML)

BE SEM :VII AY: 2024-25

Subject: Natural Language Processing Lab

Aim: Implementation of: (i) POS tagging using nltk

Theory:Back in elementary school you learnt the difference between nouns, verbs, adjectives,
and adverbs. These "word classes" are not just the idle invention of grammarians, but are useful
categories for many language processing tasks. They arise from simple analysis of the
distribution of words in text. The goal of this chapter is to answer the following questions:

1. What are lexical categories and how are they used in natural language processing?

2. What is a good Python data structure for storing words and their categories?

3. How can we automatically tag each word of a text with its word class?

Along the way, we'll cover some fundamental techniques in NLP, including sequence labeling,
n- gram models, backoff, and evaluation. These techniques are useful in many areas, and
tagging gives us a simple context in which to present them. We will also see how tagging is
the second step in the typical NLP pipeline, following tokenization.

The process of classifying words into their parts of speech and labeling them accordingly is
known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also
known as word classes or lexical categories. The collection of tags used for a particular task is

Department of Computer Science & Engineering-(AI&ML) | APSIT


known as a tagset. Our emphasis in this chapter is on exploiting tags, and tagging text
automatically.
A part-of-speech tagger, or POS-tagger, processes a sequence of words, and attaches a part of
speech tag to each word (don't forget to import nltk):

Steps to perform part of speech tagging

Step 1: Install NLTK and Download Necessary Data

First, you need to install NLTK and download the necessary data:

Step 2: Tokenize the Text

Tokenization is the process of splitting text into tokens (words or phrases). NLTK provides
various tokenizers. For POS tagging, tokenization helps in breaking down text into individual
words or sentences.

Step 3: Perform POS Tagging

NLTK provides an interface to perform POS tagging using pre-trained models. After tokenizing
the text into words, you can use the pos_tag function to tag each word with its corresponding part
of speech.

Step 4: Display the POS Tags

You can print out the POS tags for better understanding and analysis of the text structure.

Conclusion:
The goal of a POS tagger is to assign linguistic (mostly grammatical) information to
sub-sentential units. Such units are called tokens and, most of the time, correspond to words
and symbols (e.g. punctuation).

Department of Computer Science & Engineering-(AI&ML) | APSIT


Department of Computer Science & Engineering-(AI&ML) | APSIT

You might also like