Natural Language Processing (NLP)
Natural Language Processing (NLP)
Processing (NLP)
Introduction
• Natural Language Processing (NLP) is a subfield of machine learning
which leverages analysis, generation, and understanding of human
languages to derive meaningful insights.
• NLP is becoming popular as Large Language Models (LLMs) are
growing and used widely in the market. Having foundational
knowledge of NLP concepts and techniques can help you become an
NLP data scientist, NLP engineer, or distinguished ML engineer to
stand out in the job market.
NLP topic to understand
❑Text pre-processing 8 marks (3+2+2) ❑Text feature
extraction
❑Text sort
❑Named entity recognition
❑Parts-of-speech tagging
❑Text generation
❑Text-to-speech and speech-to-text techniques
Text pre-processing
Lemmatization
• Lemmatization is the process of reducing a word to its base or root form, which is
known as the lemma.
• It is a more sophisticated version of stemming, as it takes into account the
context and the part of speech of the word.
• The lemmatize() from WordNetLemmatizer, of nltk.stem lemmatizes only nouns.
Text feature extraction
• https://spacy.io/usage
Part-of-speech tagging
• Part-of-speech tagging are approaches for identifying the parts of
speech of words in a sentence, such as nouns, verbs, and
adjectives.
• There are nine main parts of speech: noun, pronoun, verb, adjective,
adverb, conjunction, preposition, interjection, and article.
Code Answer :
nltk.download('punkt')
text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text into words
tokens = nltk.word_tokenize(text)
# Perform part-of-speech tagging on the tokenized words
tagged_words = nltk.pos_tag(tokens)
print(tagged_words)
OUTPUT
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'),
('dog', 'NN'), ('.', '.')]
Parts of Speech and Their Tags
There are nine main parts of speech: noun, pronoun, verb, adjective, adverb, conjunction, preposition,
interjection, and article.