Natural Language Processing (NLP) With Python - Tutorial
Natural Language Processing (NLP) With Python - Tutorial
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Source: Pixabay
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 1/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
In this article, we explore the basics of natural language processing (NLP) with
code examples. We dive into the natural language toolkit (NLTK) library to
present how it can be useful for natural language processing related-tasks. Afterward,
we will discuss the basics of other Natural Language Processing libraries and other
essential methods for NLP, along with their respective coding sample implementations
in Python.
This tutorial’s code is available on Github and its full implementation as well on Google
Colab.
Table of Contents:
1. What is Natural Language Processing (NLP)?
2. Applications of NLP
9. Word Cloud
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 2/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
10. Stemming
11. Lemmatization
13. Chunking
14. Chinking
16. WordNet
18. TF-IDF
Applications of NLP:
Machine Translation.
Speech Recognition.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 3/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Sentiment Analysis.
Question Answering.
Summarization of Text.
Chatbot.
Intelligent Systems.
Text Classifications.
Character Recognition.
Spell Checking.
Spam Detection.
Autocomplete.
Predictive Typing.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 4/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
We, as humans, perform natural language processing (NLP) considerably well, but even
then, we are not perfect. We often misunderstand one thing for another, and we often
interpret the same sentences or words differently.
For instance, consider the following sentence, we will try to understand its
interpretation in many different ways:
Example 1:
Figure 2: NLP example sentence with the text: “I saw a man on a hill with a telescope.”
Example 2:
Figure 3: NLP example sentence with the text: “Can you help me with the can?”
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 5/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
In the sentence above, we can see that there are two “can” words, but both of them have
different meanings. Here the first “can” word is used for question formation. The second
“can” word at the end of the sentence is used to represent a container that holds food or
liquid.
Hence, from the examples above, we can see that language processing is not
“deterministic” (the same language has the same interpretations), and something
suitable to one person might not be suitable to another. Therefore, Natural Language
Processing (NLP) has a non-deterministic approach. In other words, Natural Language
Processing can be used to create a new intelligent system that can understand how
humans understand and interpret language in different situations.
Comparison:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 6/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
a. Lexical Analysis:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 7/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
With lexical analysis, we divide a whole chunk of text into paragraphs, sentences, and
words. It involves identifying and analyzing words’ structure.
b. Syntactic Analysis:
Syntactic analysis involves the analysis of words in a sentence for grammar and
arranging words in a manner that shows the relationship among the words. For
instance, the sentence “The shop goes to the house” does not pass.
c. Semantic Analysis:
Semantic analysis draws the exact meaning for the words, and it analyzes the text
meaningfulness. Sentences such as “hot ice-cream” do not pass.
d. Disclosure Integration:
Disclosure integration takes into account the context of the text. It considers the
meaning of the sentence before it ends. For example: “He works at Google.” In this
sentence, “he” must be referenced in the sentence before it.
e. Pragmatic Analysis:
Pragmatic analysis deals with overall communication and interpretation of language. It
deals with deriving meaningful use of language in various situations.
9. Ambiguity in speech.
Features:
Tokenization.
Classification.
Sentiment analysis.
Packages of chatbots.
Use-cases:
Recommendation systems.
Sentiment analysis.
Building chatbots.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 9/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
b. spaCy:
spaCy is an open-source natural language processing Python library designed to be fast
and production-ready. spaCy focuses on providing software for production usage.
Features:
Tokenization.
Classification.
Sentiment analysis.
Dependency parsing.
Word vectors.
Use-cases:
Analyzing reviews.
Summarization.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 10/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
c. Gensim:
Gensim is an NLP Python framework generally used in topic modeling and similarity
detection. It is not a general-purpose NLP library, but it handles tasks assigned to it very
well.
Features:
TF-IDF.
Use-cases:
Text summarization.
d. Pattern:
Pattern is an NLP Python framework with straightforward syntax. It’s a powerful tool for
scientific and non-scientific tasks. It is highly valuable to students.
Features:
Tokenization.
Parsing.
Sentiment analysis.
Use-cases:
Spelling correction.
Sentiment analysis.
e. TextBlob:
TextBlob is a Python library designed for processing textual data.
Features:
Part-of-Speech tagging.
Sentiment analysis.
Classification.
Language translation.
Parsing.
Wordnet integration.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 12/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Use-cases:
Sentiment Analysis.
Spelling Correction.
For this tutorial, we are going to focus more on the NLTK library. Let’s dig deeper into
natural language processing by making some examples.
Figure 11: Small code snippet to open and read the text file and analyze it.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 13/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Next, notice that the data type of the text file read is a String. The number of characters
in our text file is 675.
c. Sentence tokenizing:
By tokenizing the text with sent_tokenize( ) , we can get the text as sentences.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 14/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
In the example above, we can see the entire text of our data is represented as sentences
and also notice that the total number of sentences here is 9.
d. Word tokenizing:
By tokenizing the text with word_tokenize( ) , we can get the text as words.
Next, we can see the entire text of our data is represented as words and also notice that
the total number of words here is 144.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 15/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 18: Using FreqDist() to find the frequency of words in our sample text.
Figure 19: Printing the ten most common words from the sample text.
Notice that the most used words are punctuation marks and stopwords. We will have to
remove such words to analyze the actual text.
In the graph above, notice that a period “.” is used nine times in our text. Analytically
speaking, punctuation marks are not that important for natural language processing.
Therefore, in the next step, we will be removing such punctuation marks.
Figure 21: Using the isalpha() method to separate the punctuation marks, along with creating a list under
words_no_punc to separate words with no punctuation marks.
As shown above, all the punctuation marks from our text are excluded. These can also
cross-check with the number of words.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 17/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 23: Printing the ten most common words from the sample text.
Notice that we still have many words that are not very useful in the analysis of our text
file sample, such as “and,” “but,” “so,” and others. Next, we need to remove coordinating
conjunctions.
i. List of stopwords:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 18/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
j. Removing stopwords:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 19/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 29: Displaying the final frequency distribution of the most common words found.
Figure 30: Visualization of the most common words found in the group.
As shown above, the final graph has many useful words that help us understand what
our sample data is about, showing how essential it is to perform data cleaning on NLP.
Word Cloud:
Word Cloud is a data visualization technique. In which words from a given text display
on the main chart. In this technique, more frequent or essential words display in a larger
and bolder font, while less frequent or essential words display in smaller or thinner
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 20/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
fonts. It is a beneficial technique in NLP that gives us a glance at what text should be
analyzed.
Properties:
1. font_path: It specifies the path for the fonts we want to use.
As shown in the graph above, the most frequent words display in larger fonts. The word
cloud can be displayed in any shape or image.
For instance: In this case, we are going to use the following circle image, but we can use
any shape or any image.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 22/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 23/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
As shown above, the word cloud is in the shape of a circle. As we mentioned before, we
can use any shape or image to form a word cloud.
Word CloudAdvantages:
They are fast.
Stemming:
We use Stemming to normalize words. In English and many other languages, a single
word can take multiple forms depending upon context used. For instance, the verb
“study” can take many forms like “studies,” “studying,” “studied,” and others, depending
on its context. When we tokenize words, an interpreter considers these input words as
different words even though their underlying meaning is the same. Moreover, as we
know that NLP is about analyzing the meaning of content, to resolve this problem, we
use stemming.
Stemming normalizes the word by truncating the word to its stem word. For example,
the words “studies,” “studied,” “studying” will be reduced to “studi,” making all these
word forms to refer to only one token. Notice that stemming may not give us a
dictionary, grammatical word for a particular set of words.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 24/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
c. SnowballStemmer:
SnowballStemmer generates the same output as porter stemmer, but it supports many
more languages.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 25/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
a. Porter’s Stemmer:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 26/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
b. Lovin’s Stemmer:
c. Dawson’s Stemmer:
d. Krovetz Stemmer:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 27/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
e. Xerox Stemmer:
f. Snowball Stemmer:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 28/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
If accuracy is not the project’s final goal, then stemming is an appropriate approach. If
higher accuracy is crucial and the project is not on a tight deadline, then the best option
is amortization (Lemmatization has a lower processing speed, compared to stemming).
Lemmatization takes into account Part Of Speech (POS) values. Also, lemmatization
may generate different outputs for different values of POS. We generally have four
choices for POS:
b. Lemmatizing:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 29/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
During lemmatization, the word “studies” displays its dictionary word “study.”
Python Implementation:
a. A basic example demonstrating how a lemmatizer works
In the following example, we are taking the PoS tag as “verb,” and when we apply the
lemmatization rules, it gives us dictionary words instead of truncating the original word:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 30/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 51: Lemmatization of the words: “am”, “are”, “is”, “was”, “were”
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 31/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 53: Sentence example, “can you help me with the can?”
Parts of speech(PoS) tagging is crucial for syntactic and semantic analysis. Therefore, for
something like the sentence above, the word “can” has several semantic meanings. The
first “can” is used for question formation. The second “can” at the end of the sentence is
used to represent a container. The first “can” is a verb, and the second “can” is a noun.
Giving the word a specific meaning allows the program to handle it correctly in both
semantic and syntactic analysis.
Below, please find a list of Part of Speech (PoS) tags with their respective examples:
3. DT: Determiner
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 32/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 33/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
7. JJ: Adjective
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 34/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 62:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 35/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 64:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 36/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 37/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 38/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 39/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
24. TO: To
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 40/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 41/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 42/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Python Implementation:
a. A simple example demonstrating PoS tagging.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 43/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Chunking:
Chunking means to extract meaningful phrases from unstructured text. By tokenizing a
book into words, it’s sometimes hard to infer meaningful information. It works on top of
Part of Speech(PoS) tagging. Chunking takes PoS tags as input and provides chunks as
output. Chunking literally means a group of words, which breaks simple text into
phrases that are more meaningful than individual words.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 44/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Before working with an example, we need to know what phrases are? Meaningful
groups of words are called phrases. There are five significant categories of phrases.
VP → V (NP)(PP)(Adverb).
PP → Pronoun (NP).
AP → Adjective (PP).
Example:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 45/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Python Implementation:
In the following example, we will extract a noun phrase from the text. Before extracting
it, we need to define what kind of noun phrase we are looking for, or in other words, we
have to set the grammar for a noun phrase. In this case, we define a noun phrase by an
optional determiner followed by adjectives and nouns. Then we can define other rules to
extract some other phrases. Next, we are going to use RegexpParser( ) to parse the
grammar. Notice that we can also visualize the text with the .draw( ) function.
Figure 93: Code snippet to extract noun phrases from a text file.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 46/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
In this example, we can see that we have successfully extracted the noun phrase from
the text.
Figure 94: Successful extraction of the noun phrase from the input text.
Chinking:
Chinking excludes a part from our chunk. There are certain situations where we need to
exclude a part of the text from the whole text or chunk. In complex extractions, it is
possible that chunking can output unuseful data. In such case scenarios, we can use
chinking to exclude some parts from that chunked text.
In the following example, we are going to take the whole string as a chunk, and then we
are going to exclude adjectives from it by using chinking. We generally use chinking
when we have a lot of unuseful data even after chunking. Hence, by using this method,
we can easily set that apart, also to write chinking grammar, we have to use inverted
curly braces, i.e.:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 47/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
From the example above, we can see that adjectives separate from the other text.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 48/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Named entity recognition can automatically scan entire articles and pull out some
fundamental entities like people, organizations, places, date, time, money, and GPE
discussed in them.
Use-Cases:
1. Content classification for news channels.
2. Summarizing resumes.
4. Recommendation systems.
5. Customer support.
Figure 97: An example of commonly used types of named entity recognition (NER).
Python Implementation:
There are two options :
1. binary = True
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 49/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
When the binary value is True, then it will only show whether a particular entity is
named entity or not. It will not show any further details on it.
Our graph does not show what type of named entity it is. It only shows whether a
particular word is named entity or not.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 50/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
2. binary = False
When the binary value equals False, it shows in detail the type of named entities.
Figure 101: Graph showing the type of named entities when a binary value equals false.
WordNet:
Wordnet is a lexical database for the English language. Wordnet is a part of the NLTK
corpus. We can use Wordnet to find meanings of words, synonyms, antonyms, and many
other words.
Figure 102: Checking word definitions with Wordnet using the NLTK framework.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 52/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 103: Gathering the meaning of the different definitions by using Wordnet.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 53/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 105: Finding all details for all the meanings of a specific word.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 54/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
h. Synonyms.
i. Antonyms.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 55/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 111: Finding synonyms and antonyms code snippet with Wordnet.
Figure 112: Finding the similarity ratio between words using Wordnet.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 56/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Figure 113: Finding the similarity ratio between words using Wordnet.
Bag of Words:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 57/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
1. Raw Text: This is the original text on which we want to perform analysis.
2. Clean Text: Since our raw text contains some unnecessary data like punctuation
marks and stopwords, so we need to clean up our text. Clean text is the text after
removing such words.
4. Building Vocab: It contains total words used in the text after removing unnecessary
data.
5. Generate Vocab: It contains the words along with their frequencies in the
sentences.
For instance:
Sentences:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 58/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 59/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
d. Final model:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 60/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Python Implementation:
Applications:
1. Natural language processing.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 61/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
3. Classifications of documents.
Limitations:
1. Semantic meaning: It does not consider the semantic meaning of a word. It ignores
the context in which the word is used.
2. Vector size: For large documents, the vector size increase, which may result in
higher computational time.
TF-IDF
TF-IDF stands for Term Frequency — Inverse Document Frequency, which is a
scoring measure generally used in information retrieval (IR) and summarization. The
TF-IDF score shows how important or relevant a term is in a given document.
2. A cute doggo.
3. A big dog.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 62/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Notice that the first description contains 2 out of 3 words from our user query, and the
second description contains 1 word from the query. The third description also contains 1
word, and the forth description contains no words from the user query. As we can sense
that the closest answer to our query will be description number two, as it contains the
essential word “cute” from the user’s query, this is how TF-IDF calculates the value.
Notice that the term frequency values are the same for all of the sentences since none of
the words in any sentences repeat in the same sentence. So, in this case, the value of TF
will not be instrumental. Next, we are going to use IDF values to get the closest answer
to the query. Notice that the word dog or doggo can appear in many many documents.
Therefore, the IDF value is going to be very low. Eventually, the TF-IDF value will also be
lower. However, if we check the word “cute” in the dog descriptions, then it will come up
relatively fewer times, so it increases the TF-IDF value. So the word “cute” has more
discriminative power than “dog” or “doggo.” Then, our search engine will find the
descriptions that have the word “cute” in it, and in the end, that is what the user was
looking for.
Simply put, the higher the TF*IDF score, the rarer or unique or valuable the term and
vice versa.
Now we are going to take a straightforward example and understand TF-IDF in more
detail.
Example:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 63/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 64/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 65/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
e. Calculating TF-IDF.
In this case, notice that the import words that discriminate both the sentences are “first”
in sentence-1 and “second” in sentence-2 as we can see, those words have a relatively
higher value than other words.
However, there any many variations for smoothing out the values for large documents.
The most common variation is to use a log value for TF-IDF. Let’s calculate the TF-IDF
value again by using the new IDF value.
Figure 131: Using a log value for TF-IDF by using the new IDF value.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 66/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
g. Calculating TF-IDF.
As seen above, “first” and “second” values are important words that help us to
distinguish between those two sentences.
Now that we saw the basics of TF-IDF. Next, we are going to use the sklearn library to
implement TF-IDF in Python. A different formula calculates the actual output from our
program. First, we will see an overview of our calculations and formulas, and then we
will implement it in Python.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 67/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Actual Calculations:
a. Term Frequency (TF):
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 68/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Python Implementation:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 69/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Conclusion:
These are some of the basics for the exciting field of natural language processing (NLP).
We hope you enjoyed reading this article and learned something new. Any suggestions
or feedback is crucial to continue to improve. Please let us know in the comments if you
have any.
DISCLAIMER: The views expressed in this article are those of the author(s) and do not
represent the views of Carnegie Mellon University, nor other companies (directly or
indirectly) associated with the author(s). These writings do not intend to be final
products, yet rather a reflection of current thinking, along with being a catalyst for
discussion and improvement.
Citation
For attribution in academic contexts, please cite this work as:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 70/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
BibTex citation:
@article{pratik_iriondo_2020,
url={https://towardsai.net/nlp-tutorial-with-python},
journal={Towards AI},
publisher={Towards AI Co.},
year={2020},
month={Jul}
Recommended Articles
I. Best Datasets for Machine Learning and Data Science
XII. Neural Networks from Scratch with Python Code and Math in Detail
References:
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 71/72
9/22/21, 1:15 AM Natural Language Processing (NLP) with Python — Tutorial | by Towards AI Team | Towards AI
Resources:
Google Colab Implementation.
Towards AI publishes the best of tech, science, and engineering. Subscribe to receive our updates
right in your inbox. Interested in working with us? Please contact us →
https://sponsors.towardsai.net Take a look.
https://pub.towardsai.net/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0 72/72