Fine-tuning BERT model for Sentiment Analysis
Last Updated :
23 Jul, 2025
Google created a transformer-based machine learning approach for natural language processing pre-training called Bidirectional Encoder Representations from Transformers. It has a huge number of parameters, hence training it on a small dataset would lead to overfitting. This is why we use a pre-trained BERT model that has been trained on a huge dataset. Using the pre-trained model and try to "tune" it for the current dataset, i.e. transferring the learning, from that huge dataset to our dataset, so that we can "tune" BERT from that point onwards.
In this article, we will fine-tune the BERT by adding a few neural network layers on our own and freezing the actual layers of BERT architecture. The problem statement that we are taking here would be of classifying sentences into POSITIVE and NEGATIVE by using fine-tuned BERT model.
Preparing the dataset
The sentence column has text and the label column has the sentiment of the text - 0 for negative and 1 for positive. We first load the dataset followed by, some preprocessing before tuning the model.
Loading dataset
Python
!pip install transformers
!pip install torch
import numpy as np
import pandas as pd
import torch
from transformers import BertTokenizerFast, AutoModel
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
from torch import nn
from torch.utils.data import DataLoader
from torch.nn import CrossEntropyLoss
from torch.optim import AdamW
df = pd.read_csv('/content/sentiment_train.csv')
Split dataset:
After loading the data, split the data into train, validation ad test data. We are taking the 70:15:15 ratio for this division. The inbuilt function of sklearn is being used below to split the data. We use stratified attributes to ensure that the proportion of the categories remains the same after splitting the data.
Python
from sklearn.model_selection import train_test_split
train_text, temp_text, train_labels, temp_labels = train_test_split(df['sentence'], df['label'],
random_state = 2021,
test_size = 0.3,
stratify = df['label'])
val_text, test_text, val_labels, test_labels = train_test_split(temp_text, temp_labels,
random_state = 2021,
test_size = 0.5,
stratify = temp_labels)
Load pre-trained BERT model and tokenizer
Next, we proceed with loading the pre-trained BERT model and tokenizer. We would use the tokenizer to convert the text into a format(which has input ids, attention masks) that can be sent to the model.
Python
#load model and tokenizer
bert = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
Deciding the padding length
If we take the padding length as the maximum length of text found in the training texts, it might leave the training data sparse. Taking the least length would in turn lead to loss of information. Hence, we would plot the graph and see the "average" length and set it as the padding length to trade-off between the two extremes.
Python
train_lens = [len(i.split()) for i in train_text]
plt.hist(train_lens)

From the graph above, we take 17 as the padding length.
Tokenizing the data
Tokenize the data and encode sequences using the BERT tokenizer.
Python
tokens_train = tokenizer.batch_encode_plus(
train_text.tolist(),
max_length = pad_len,
pad_to_max_length = True,
truncation = True
)
tokens_val = tokenizer.batch_encode_plus(
val_text.tolist(),
max_length = pad_len,
pad_to_max_length = True,
truncation = True
)
tokens_test = tokenizer.batch_encode_plus(
test_text.tolist(),
max_length = pad_len,
pad_to_max_length = True,
truncation = True
)
train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())
val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())
test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())
Defining the model
We first freeze the BERT pre-trained model, and then add layers as shown in the following code snippets:
Python
#freeze the pretrained layers
for param in bert.parameters():
param.requires_grad = False
#defining new layers
class BERT_architecture(nn.Module):
def __init__(self, bert):
super(BERT_architecture, self).__init__()
self.bert = bert
# dropout layer
self.dropout = nn.Dropout(0.2)
# relu activation function
self.relu = nn.ReLU()
# dense layer 1
self.fc1 = nn.Linear(768,512)
# dense layer 2 (Output layer)
self.fc2 = nn.Linear(512,2)
#softmax activation function
self.softmax = nn.LogSoftmax(dim=1)
#define the forward pass
def forward(self, sent_id, mask):
#pass the inputs to the model
_, cls_hs = self.bert(sent_id, attention_mask=mask, return_dict=False)
x = self.fc1(cls_hs)
x = self.relu(x)
x = self.dropout(x)
# output layer
x = self.fc2(x)
# apply softmax activation
x = self.softmax(x)
return x
Also, add an optimizer to enhance the performance:
Python
model = BERT_architecture(bert)
optimizer = AdamW(model.parameters(), lr=1e-5)
Then compute class weights, and send them as parameters while defining loss function to ensure imbalance in the dataset is handled well while computing the loss.
Training the model
After defining the model, define a function to train the model (fine-tune, in this case):
Python
# function to train the model
def train():
model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds
Now, define another function that would evaluate the model on validation data.
Python
# function for evaluating the model
def evaluate():
print("\nEvaluating...")
# deactivate dropout layers
model.eval()
total_loss, total_accuracy = 0, 0
# empty list to save the model predictions
total_preds = []
# iterate over batches
for step,batch in enumerate(val_dataloader):
# Progress update every 50 batches.
if step % 50 == 0 and not step == 0:
# # Calculate elapsed time in minutes.
# elapsed = format_time(time.time() - t0)
# Report progress.
print(' Batch {:>5,} of {:>5,}.'.format(step, len(val_dataloader)))
# push the batch to gpu
batch = [t.to(device) for t in batch]
sent_id, mask, labels = batch
# deactivate autograd
with torch.no_grad():
# model predictions
preds = model(sent_id, mask)
# compute the validation loss between actual and predicted values
loss = cross_entropy(preds,labels)
total_loss = total_loss + loss.item()
preds = preds.detach().cpu().numpy()
total_preds.append(preds)
# compute the validation loss of the epoch
avg_loss = total_loss / len(val_dataloader)
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
return avg_loss, total_preds
Test the data
After fine-tuning the model, test it on the test dataset. Print a classification report to get a better picture of the model's performance.
Python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Move the model to the selected device (GPU or CPU)
model.to(device)
# Get predictions for test data
with torch.no_grad():
# Move data to device and perform prediction
preds = model(test_seq.to(device), test_mask.to(device))
# Move the predictions to CPU for further processing
preds = preds.detach().cpu().numpy()
# Convert predictions to class labels (0 or 1 for binary classification)
pred = np.argmax(preds, axis=1)
# Print classification report
from sklearn.metrics import classification_report
print(classification_report(test_y, pred))
After testing, we would get the results as follows:
Classification reportLink to the full code is here.
Fine-tuning BERT model for Sentiment Analysis
Similar Reads
Natural Language Processing (NLP) Tutorial Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps machines to understand and process human languages either in text or audio form. It is used across a variety of applications from speech recognition to language translation and text summarization.Natural Languag
5 min read
Introduction to NLP
Natural Language Processing (NLP) - OverviewNatural Language Processing (NLP) is a field that combines computer science, artificial intelligence and language studies. It helps computers understand, process and create human language in a way that makes sense and is useful. With the growing amount of text data from social media, websites and ot
9 min read
NLP vs NLU vs NLGNatural Language Processing(NLP) is a subset of Artificial intelligence which involves communication between a human and a machine using a natural language than a coded or byte language. It provides the ability to give instructions to machines in a more easy and efficient manner. Natural Language Un
3 min read
Applications of NLPAmong the thousands and thousands of species in this world, solely homo sapiens are successful in spoken language. From cave drawings to internet communication, we have come a lengthy way! As we are progressing in the direction of Artificial Intelligence, it only appears logical to impart the bots t
6 min read
Why is NLP important?Natural language processing (NLP) is vital in efficiently and comprehensively analyzing text and speech data. It can navigate the variations in dialects, slang, and grammatical inconsistencies typical of everyday conversations. Table of Content Understanding Natural Language ProcessingReasons Why NL
6 min read
Phases of Natural Language Processing (NLP)Natural Language Processing (NLP) helps computers to understand, analyze and interact with human language. It involves a series of phases that work together to process language and each phase helps in understanding structure and meaning of human language. In this article, we will understand these ph
7 min read
The Future of Natural Language Processing: Trends and InnovationsThere are no reasons why today's world is thrilled to see innovations like ChatGPT and GPT/ NLP(Natural Language Processing) deployments, which is known as the defining moment of the history of technology where we can finally create a machine that can mimic human reaction. If someone would have told
7 min read
Libraries for NLP
Text Normalization in NLP
Normalizing Textual Data with PythonIn this article, we will learn How to Normalizing Textual Data with Python. Let's discuss some concepts : Textual data ask systematically collected material consisting of written, printed, or electronically published words, typically either purposefully written or transcribed from speech.Text normal
7 min read
Regex Tutorial - How to write Regular Expressions?A regular expression (regex) is a sequence of characters that define a search pattern. Here's how to write regular expressions: Start by understanding the special characters used in regex, such as ".", "*", "+", "?", and more.Choose a programming language or tool that supports regex, such as Python,
6 min read
Tokenization in NLPTokenization is a fundamental step in Natural Language Processing (NLP). It involves dividing a Textual input into smaller units known as tokens. These tokens can be in the form of words, characters, sub-words, or sentences. It helps in improving interpretability of text by different models. Let's u
8 min read
Python | Lemmatization with NLTKLemmatization is an important text pre-processing technique in Natural Language Processing (NLP) that reduces words to their base form known as a "lemma." For example, the lemma of "running" is "run" and "better" becomes "good." Unlike stemming which simply removes prefixes or suffixes, it considers
6 min read
Introduction to StemmingStemming is an important text-processing technique that reduces words to their base or root form by removing prefixes and suffixes. This process standardizes words which helps to improve the efficiency and effectiveness of various natural language processing (NLP) tasks.In NLP, stemming simplifies w
6 min read
Removing stop words with NLTK in PythonNatural language processing tasks often involve filtering out commonly occurring words that provide no or very little semantic value to text analysis. These words are known as stopwords include articles, prepositions and pronouns like "the", "and", "is" and "in." While they seem insignificant, prope
5 min read
POS(Parts-Of-Speech) Tagging in NLPParts of Speech (PoS) tagging is a core task in NLP, It gives each word a grammatical category such as nouns, verbs, adjectives and adverbs. Through better understanding of phrase structure and semantics, this technique makes it possible for machines to study human language more accurately. PoS tagg
7 min read
Text Representation and Embedding Techniques
NLP Deep Learning Techniques
NLP Projects and Practice
Sentiment Analysis with an Recurrent Neural Networks (RNN)Recurrent Neural Networks (RNNs) are used in sequence tasks such as sentiment analysis due to their ability to capture context from sequential data. In this article we will be apply RNNs to analyze the sentiment of customer reviews from Swiggy food delivery platform. The goal is to classify reviews
5 min read
Text Generation using Recurrent Long Short Term Memory NetworkLSTMs are a type of neural network that are well-suited for tasks involving sequential data such as text generation. They are particularly useful because they can remember long-term dependencies in the data which is crucial when dealing with text that often has context that spans over multiple words
4 min read
Machine Translation with Transformer in PythonMachine translation means converting text from one language into another. Tools like Google Translate use this technology. Many translation systems use transformer models which are good at understanding the meaning of sentences. In this article, we will see how to fine-tune a Transformer model from
6 min read
Building a Rule-Based Chatbot with Natural Language ProcessingA rule-based chatbot follows a set of predefined rules or patterns to match user input and generate an appropriate response. The chatbot canât understand or process input beyond these rules and relies on exact matches making it ideal for handling repetitive tasks or specific queries.Pattern Matching
4 min read
Text Classification using scikit-learn in NLPThe purpose of text classification, a key task in natural language processing (NLP), is to categorise text content into preset groups. Topic categorization, sentiment analysis, and spam detection can all benefit from this. In this article, we will use scikit-learn, a Python machine learning toolkit,
5 min read
Text Summarization using HuggingFace ModelText summarization involves reducing a document to its most essential content. The aim is to generate summaries that are concise and retain the original meaning. Summarization plays an important role in many real-world applications such as digesting long articles, summarizing legal contracts, highli
4 min read
Advanced Natural Language Processing Interview QuestionNatural Language Processing (NLP) is a rapidly evolving field at the intersection of computer science and linguistics. As companies increasingly leverage NLP technologies, the demand for skilled professionals in this area has surged. Whether preparing for a job interview or looking to brush up on yo
9 min read