Applying Multinomial Naive Bayes to NLP Problems
Last Updated :
11 Jul, 2025
Multinomial Naive Bayes (MNB) is a popular machine learning algorithm for text classification problems in Natural Language Processing (NLP). It is particularly useful for problems that involve text data with discrete features such as word frequency counts. MNB works on the principle of Bayes theorem and assumes that the features are conditionally independent given the class variable.
Here are the steps for applying Multinomial Naive Bayes to NLP problems:
Preprocessing the text data: The text data needs to be preprocessed before applying the algorithm. This involves steps such as tokenization, stop-word removal, stemming, and lemmatization.
Feature extraction: The text data needs to be converted into a feature vector format that can be used as input to the MNB algorithm. The most common method of feature extraction is to use a bag-of-words model, where each document is represented by a vector of word frequency counts.
Splitting the data: The data needs to be split into training and testing sets. The training set is used to train the MNB model, while the testing set is used to evaluate its performance.
Training the MNB model: The MNB model is trained on the training set by estimating the probabilities of each feature given each class. This involves calculating the prior probabilities of each class and the likelihood of each feature given each class.
Evaluating the performance of the model: The performance of the model is evaluated using metrics such as accuracy, precision, recall, and F1-score on the testing set.
Using the model to make predictions: Once the model is trained, it can be used to make predictions on new text data. The text data is preprocessed and transformed into the feature vector format, which is then input to the trained model to obtain the predicted class label.
MNB is a simple and efficient algorithm that works well for many NLP problems such as sentiment analysis, spam detection, and topic classification. However, it has some limitations, such as the assumption of independence between features, which may not hold true in some cases. Therefore, it is important to carefully evaluate the performance of the model before using it in a real-world application.
Naive Bayes Classifier Algorithm is a family of probabilistic algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of a feature.
Bayes theorem calculates probability P(c|x) where c is the class of the possible outcomes and x is the given instance which has to be classified, representing some certain features.
P(c|x) = P(x|c) * P(c) / P(x)
Naive Bayes are mostly used in natural language processing (NLP) problems. Naive Bayes predict the tag of a text. They calculate the probability of each tag for a given text and then output the tag with the highest one.
How Naive Bayes Algorithm Works ?
Let's consider an example, classify the review whether it is positive or negative.
Training Dataset:
Text | Reviews |
---|
“I liked the movie” | positive |
“It’s a good movie. Nice story” | positive |
“Nice songs. But sadly boring ending. ” | negative |
“Hero’s acting is bad but heroine looks good. Overall nice movie” | positive |
“Sad, boring movie" | negative |
We classify whether the text "overall liked the movie" has a positive review or a negative review. We have to calculate,
P(positive | overall liked the movie) — the probability that the tag of a sentence is positive given that the sentence is “overall liked the movie”.
P(negative | overall liked the movie) — the probability that the tag of a sentence is negative given that the sentence is “overall liked the movie”.
Before that, first, we apply Removing Stopwords and Stemming in the text.
Removing Stopwords: These are common words that don’t really add anything to the classification, such as an able, either, else, ever and so on.
Stemming: Stemming to take out the root of the word.
Now After applying these two techniques, our text becomes
Text | Reviews |
---|
“ilikedthemovi” | positive |
“itsagoodmovienicestori” | positive |
“nicesongsbutsadlyboringend” | negative |
“herosactingisbadbutheroinelooksgoodoverallnicemovi” | positive |
“sadboringmovi" | negative |
Feature Engineering:
The important part is to find the features from the data to make machine learning algorithms works. In this case, we have text. We need to convert this text into numbers that we can do calculations on. We use word frequencies. That is treating every document as a set of the words it contains. Our features will be the counts of each of these words.
In our case, we have P(positive | overall liked the movie), by using this theorem:
P(positive | overall liked the movie) = P(overall liked the movie | positive) * P(positive) / P(overall liked the movie)
Since for our classifier we have to find out which tag has a bigger probability, we can discard the divisor which is the same for both tags,
P(overall liked the movie | positive)* P(positive) with P(overall liked the movie | negative) * P(negative)
There’s a problem though: “overall liked the movie” doesn’t appear in our training dataset, so the probability is zero. Here, we assume the 'naive' condition that every word in a sentence is independent of the other ones. This means that now we look at individual words.
We can write this as:
P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie)
The next step is just applying the Bayes theorem:-
P(overall liked the movie| positive) = P(overall | positive) * P(liked | positive) * P(the | positive) * P(movie | positive)
And now, these individual words actually show up several times in our training data, and we can calculate them!
Calculating probabilities:
First, we calculate the a priori probability of each tag: for a given sentence in our training data, the probability that it is positive P(positive) is 3/5. Then, P(negative) is 2/5.
Then, calculating P(overall | positive) means counting how many times the word “overall” appears in positive texts (1) divided by the total number of words in positive (17). Therefore, P(overall | positive) = 1/17, P(liked/positive) = 1/17, P(the/positive) = 2/17, P(movie/positive) = 3/17.
If probability comes out to be zero then By using Laplace smoothing: we add 1 to every count so it’s never zero. To balance this, we add the number of possible words to the divisor, so the division will never be greater than 1. In our case, the total possible words count are 21.
Applying smoothing, The results are:
Word | P(word | positive) | P(word | negative) |
---|
overall | 1 + 1/17 + 21 | 0 + 1/7 + 21 |
liked | 1 + 1/17 + 21 | 0 + 1/7 + 21 |
the | 2 + 1/17 + 21 | 0 + 1/7 + 21 |
movie | 3 + 1/17 + 21 | 1 + 1/7 + 21 |
Now we just multiply all the probabilities, and see who is bigger:
P(overall | positive) * P(liked | positive) * P(the | positive) * P(movie | positive) * P(positive ) = 1.38 * 10^{-5} = 0.0000138
P(overall | negative) * P(liked | negative) * P(the | negative) * P(movie | negative) * P(negative) = 0.13 * 10^{-5} = 0.0000013
Our classifier gives “overall liked the movie” the positive tag.
Below is the implementation :
Python
# cleaning texts
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer
dataset = [["I liked the movie", "positive"],
["It’s a good movie. Nice story", "positive"],
["Hero’s acting is bad but heroine looks good.\
Overall nice movie", "positive"],
["Nice songs. But sadly boring ending.", "negative"],
["sad movie, boring movie", "negative"]]
dataset = pd.DataFrame(dataset)
dataset.columns = ["Text", "Reviews"]
nltk.download('stopwords')
corpus = []
for i in range(0, 5):
text = re.sub('[^a-zA-Z]', '', dataset['Text'][i])
text = text.lower()
text = text.split()
ps = PorterStemmer()
text = ''.join(text)
corpus.append(text)
# creating bag of words model
cv = CountVectorizer(max_features = 1500)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values
Python
# splitting the data set into training set and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.25, random_state = 0)
Python
# fitting naive bayes to the training set
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
classifier = GaussianNB();
classifier.fit(X_train, y_train)
# predicting test set results
y_pred = classifier.predict(X_test)
# making the confusion matrix
cm = confusion_matrix(y_test, y_pred)
cm
Similar Reads
Natural Language Processing (NLP) Tutorial Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps machines to understand and process human languages either in text or audio form. It is used across a variety of applications from speech recognition to language translation and text summarization.Natural Languag
5 min read
Introduction to NLP
Natural Language Processing (NLP) - OverviewNatural Language Processing (NLP) is a field that combines computer science, artificial intelligence and language studies. It helps computers understand, process and create human language in a way that makes sense and is useful. With the growing amount of text data from social media, websites and ot
9 min read
NLP vs NLU vs NLGNatural Language Processing(NLP) is a subset of Artificial intelligence which involves communication between a human and a machine using a natural language than a coded or byte language. It provides the ability to give instructions to machines in a more easy and efficient manner. Natural Language Un
3 min read
Applications of NLPAmong the thousands and thousands of species in this world, solely homo sapiens are successful in spoken language. From cave drawings to internet communication, we have come a lengthy way! As we are progressing in the direction of Artificial Intelligence, it only appears logical to impart the bots t
6 min read
Why is NLP important?Natural language processing (NLP) is vital in efficiently and comprehensively analyzing text and speech data. It can navigate the variations in dialects, slang, and grammatical inconsistencies typical of everyday conversations. Table of Content Understanding Natural Language ProcessingReasons Why NL
6 min read
Phases of Natural Language Processing (NLP)Natural Language Processing (NLP) helps computers to understand, analyze and interact with human language. It involves a series of phases that work together to process language and each phase helps in understanding structure and meaning of human language. In this article, we will understand these ph
7 min read
The Future of Natural Language Processing: Trends and InnovationsThere are no reasons why today's world is thrilled to see innovations like ChatGPT and GPT/ NLP(Natural Language Processing) deployments, which is known as the defining moment of the history of technology where we can finally create a machine that can mimic human reaction. If someone would have told
7 min read
Libraries for NLP
Text Normalization in NLP
Normalizing Textual Data with PythonIn this article, we will learn How to Normalizing Textual Data with Python. Let's discuss some concepts : Textual data ask systematically collected material consisting of written, printed, or electronically published words, typically either purposefully written or transcribed from speech.Text normal
7 min read
Regex Tutorial - How to write Regular Expressions?A regular expression (regex) is a sequence of characters that define a search pattern. Here's how to write regular expressions: Start by understanding the special characters used in regex, such as ".", "*", "+", "?", and more.Choose a programming language or tool that supports regex, such as Python,
6 min read
Tokenization in NLPTokenization is a fundamental step in Natural Language Processing (NLP). It involves dividing a Textual input into smaller units known as tokens. These tokens can be in the form of words, characters, sub-words, or sentences. It helps in improving interpretability of text by different models. Let's u
8 min read
Python | Lemmatization with NLTKLemmatization is an important text pre-processing technique in Natural Language Processing (NLP) that reduces words to their base form known as a "lemma." For example, the lemma of "running" is "run" and "better" becomes "good." Unlike stemming which simply removes prefixes or suffixes, it considers
6 min read
Introduction to StemmingStemming is an important text-processing technique that reduces words to their base or root form by removing prefixes and suffixes. This process standardizes words which helps to improve the efficiency and effectiveness of various natural language processing (NLP) tasks.In NLP, stemming simplifies w
6 min read
Removing stop words with NLTK in PythonNatural language processing tasks often involve filtering out commonly occurring words that provide no or very little semantic value to text analysis. These words are known as stopwords include articles, prepositions and pronouns like "the", "and", "is" and "in." While they seem insignificant, prope
5 min read
POS(Parts-Of-Speech) Tagging in NLPParts of Speech (PoS) tagging is a core task in NLP, It gives each word a grammatical category such as nouns, verbs, adjectives and adverbs. Through better understanding of phrase structure and semantics, this technique makes it possible for machines to study human language more accurately. PoS tagg
7 min read
Text Representation and Embedding Techniques
NLP Deep Learning Techniques
NLP Projects and Practice
Sentiment Analysis with an Recurrent Neural Networks (RNN)Recurrent Neural Networks (RNNs) are used in sequence tasks such as sentiment analysis due to their ability to capture context from sequential data. In this article we will be apply RNNs to analyze the sentiment of customer reviews from Swiggy food delivery platform. The goal is to classify reviews
5 min read
Text Generation using Recurrent Long Short Term Memory NetworkLSTMs are a type of neural network that are well-suited for tasks involving sequential data such as text generation. They are particularly useful because they can remember long-term dependencies in the data which is crucial when dealing with text that often has context that spans over multiple words
4 min read
Machine Translation with Transformer in PythonMachine translation means converting text from one language into another. Tools like Google Translate use this technology. Many translation systems use transformer models which are good at understanding the meaning of sentences. In this article, we will see how to fine-tune a Transformer model from
6 min read
Building a Rule-Based Chatbot with Natural Language ProcessingA rule-based chatbot follows a set of predefined rules or patterns to match user input and generate an appropriate response. The chatbot canât understand or process input beyond these rules and relies on exact matches making it ideal for handling repetitive tasks or specific queries.Pattern Matching
4 min read
Text Classification using scikit-learn in NLPThe purpose of text classification, a key task in natural language processing (NLP), is to categorise text content into preset groups. Topic categorization, sentiment analysis, and spam detection can all benefit from this. In this article, we will use scikit-learn, a Python machine learning toolkit,
5 min read
Text Summarization using HuggingFace ModelText summarization involves reducing a document to its most essential content. The aim is to generate summaries that are concise and retain the original meaning. Summarization plays an important role in many real-world applications such as digesting long articles, summarizing legal contracts, highli
4 min read
Advanced Natural Language Processing Interview QuestionNatural Language Processing (NLP) is a rapidly evolving field at the intersection of computer science and linguistics. As companies increasingly leverage NLP technologies, the demand for skilled professionals in this area has surged. Whether preparing for a job interview or looking to brush up on yo
9 min read