0% found this document useful (0 votes)

21 views5 pages

Exp 10 Sentiment Analysis BERT

Uploaded by

pranava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Exp 10 Sentiment Analysis BERT

Uploaded by

pranava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Installing Transformers

[]:
pip install transformers

[]:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: transformers in /usr/local/lib/python3.9/dist-packages (4.28.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (0.13.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (23.1)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.9/dist-packages (from transformers)
(0.13.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers) (3.11.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (2022.10.31)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (1.22.4)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers) (6.0)
Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers) (2.27.1)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers) (4.65.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-
hub<1.0,>=0.11.0->transformers) (4.5.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers)
(2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers)
(1.26.15)

we will now load the pre-trained BERT Tokenizer and Sequence Classifier as well as InputExample and InputFeatures.

Then, we will build our model with the Sequence Classifier and our tokenizer with BERT’s Tokenizer.

[]:
from transformers import BertTokenizer, TFBertForSequenceClassification
from transformers import InputExample, InputFeatures

model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

[]:
All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly
initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Let’s see the summary of our BERT model:

[]:
model.summary()

[]:
Model: "tf_bert_for_sequence_classification_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bert (TFBertMainLayer) multiple 109482240

dropout_75 (Dropout) multiple 0

classifier (Dense) multiple 1538

=================================================================
Total params: 109,483,778
Trainable params: 109,483,778
Non-trainable params: 0
_________________________________________________________________

Here are the results. We have the main BERT model, a dropout layer to prevent overfitting, and finally a dense layer for classification task:

Now that we have our model, let’s create our input sequences from the IMDB reviews dataset:

IMDB Dataset IMDB Reviews Dataset is a large movie review dataset collected and prepared by Andrew L. Maas from the popular movie rating service, IMDB.

The IMDB Reviews dataset is used for binary sentiment classification, whether a review is positive or negative.

It contains 25,000 movie reviews for training and 25,000 for testing. All these 50,000 reviews are labeled data that may be used for supervised deep learning.

Initial Imports We will first have two imports: TensorFlow and Pandas.

[]:

import tensorflow as tf
import pandas as pd

Get the Data from the Stanford Repo Then, we can download the dataset from Stanford’s relevant directory with tf.keras.utils.get_file function, as shown below:

[]:
URL = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
dataset = tf.keras.utils.get_file(fname="aclImdb_v1.tar.gz",
origin=URL,
untar=True,
cache_dir='.',
cache_subdir='')

Remove Unlabeled Reviews To remove the unlabeled reviews, we need the following operations.

[]:
# The shutil module offers a number of high-level
# operations on files and collections of files.
import os
import shutil
# Create main directory path ("/aclImdb")
main_dir = os.path.join(os.path.dirname(dataset), 'aclImdb')
# Create sub directory path ("/aclImdb/train")
train_dir = os.path.join(main_dir, 'train')
# Remove unsup folder since this is a supervised learning task
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)
# View the final train folder
print(os.listdir(train_dir))

[]:
['unsupBow.feat', 'urls_unsup.txt', 'urls_neg.txt', 'neg', 'labeledBow.feat', 'pos', 'urls_pos.txt']

Train and Test Split

Now that we have our data cleaned and prepared, we can create text_dataset_from_directory with the following lines.

I want to process the entire data in a single batch. That’s why I selected a very large batch size:

[]:
# We create a training dataset and a validation
# dataset from our "aclImdb/train" directory with a 80/20 split.
train = tf.keras.preprocessing.text_dataset_from_directory(
'aclImdb/train', batch_size=30000, validation_split=0.2,
subset='training', seed=123)
test = tf.keras.preprocessing.text_dataset_from_directory(
'aclImdb/train', batch_size=30000, validation_split=0.2,
subset='validation', seed=123)

[]:
Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.

Convert to Pandas to View and Process

Now we have our basic train and test datasets, I want to prepare them for our BERT model.

To make it more comprehensible, I will create a pandas dataframe from our TensorFlow dataset object.

The following code converts our train Dataset object to train pandas dataframe:

[]:
for i in train.take(1):
train_feat = i[0].numpy()
train_lab = i[1].numpy()

train = pd.DataFrame([train_feat, train_lab]).T

train.columns = ['DATA_COLUMN', 'LABEL_COLUMN']
train['DATA_COLUMN'] = train['DATA_COLUMN'].str.decode("utf-8")
train.head()

[]: DATA_COLUMN LABEL_COLUMN

0 Canadian director Vincenzo Natali took the art... 1

1 I gave this film 10 not because it is a superb... 1

2 I admit to being somewhat jaded about the movi... 1

3 For a long time, 'The Menagerie' was my favori... 1

4 A truly frightening film. Feels as if it were ... 0

I will do the same operations for the test dataset with the following lines:

[]:
for j in test.take(1):
test_feat = j[0].numpy()
test_lab = j[1].numpy()

test = pd.DataFrame([test_feat, test_lab]).T

test.columns = ['DATA_COLUMN', 'LABEL_COLUMN']
test['DATA_COLUMN'] = test['DATA_COLUMN'].str.decode("utf-8")
test.head()
[]: DATA_COLUMN LABEL_COLUMN

0 I can't believe that so much talent can be was... 0

1 This movie blows - let's get that straight rig... 0

2 The saddest thing about this "tribute" is that... 0

3 I'm only rating this film as a 3 out of pity b... 0

4 Something surprised me about this movie - it w... 1

Creating Input Sequences We have two pandas Dataframe objects waiting for us to convert them into suitable objects for the BERT model.

We will take advantage of the InputExample function that helps us to create sequences from our dataset.

The InputExample function can be called as follows:

[]:
InputExample(guid=None,
text_a = "Hello, world",
text_b = None,
label = 1)

[]:
InputExample(guid=None, text_a='Hello, world', text_b=None, label=1)

Now we will create two main functions:

1 — convert_data_to_examples: This will accept our train and test datasets and convert each row into an InputExample object.

2 — convert_examples_to_tf_dataset: This function will tokenize the InputExample objects, then create the required input format with the tokenized objects, finally, create an input
dataset that we can feed to the model.

[]:
def convert_data_to_examples(train, test, DATA_COLUMN, LABEL_COLUMN):
train_InputExamples = train.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
text_a = x[DATA_COLUMN],
text_b = None,
label = x[LABEL_COLUMN]), axis = 1)

validation_InputExamples = test.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
text_a = x[DATA_COLUMN],
text_b = None,
label = x[LABEL_COLUMN]), axis = 1)

return train_InputExamples, validation_InputExamples

train_InputExamples, validation_InputExamples = convert_data_to_examples(train,

test,
'DATA_COLUMN',
'LABEL_COLUMN')

def convert_examples_to_tf_dataset(examples, tokenizer, max_length=128):

features = [] # -> will hold InputFeatures to be converted later

for e in examples:
# Documentation is really strong for this method, so please take a look at it
input_dict = tokenizer.encode_plus(
e.text_a,
add_special_tokens=True,
max_length=max_length, # truncates if len(s) > max_length
return_token_type_ids=True,
return_attention_mask=True,
pad_to_max_length=True, # pads to the right by default # CHECK THIS for pad_to_max_length
truncation=True
)

input_ids, token_type_ids, attention_mask = (input_dict["input_ids"],

input_dict["token_type_ids"], input_dict['attention_mask'])

features.append(
InputFeatures(
input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, label=e.label
)
)

def gen():
for f in features:
yield (
{
"input_ids": f.input_ids,
"attention_mask": f.attention_mask,
"token_type_ids": f.token_type_ids,
},
f.label,
)
return tf.data.Dataset.from_generator(
gen,
({"input_ids": tf.int32, "attention_mask": tf.int32, "token_type_ids": tf.int32}, tf.int64),
(
{
"input_ids": tf.TensorShape([None]),
"attention_mask": tf.TensorShape([None]),
"token_type_ids": tf.TensorShape([None]),
},
tf.TensorShape([]),
),
)

DATA_COLUMN = 'DATA_COLUMN'
LABEL_COLUMN = 'LABEL_COLUMN'

We can call the functions we created above with the following lines:

[]:
train_InputExamples, validation_InputExamples = convert_data_to_examples(train, test, DATA_COLUMN, LABEL_COLUMN)

train_data = convert_examples_to_tf_dataset(list(train_InputExamples), tokenizer)

train_data = train_data.shuffle(100).batch(32).repeat(2)

validation_data = convert_examples_to_tf_dataset(list(validation_InputExamples), tokenizer)

validation_data = validation_data.batch(32)

[]:
/usr/local/lib/python3.9/dist-packages/transformers/tokenization_utils_base.py:2354: FutureWarning: The `pad_to_max_length` argument
is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in
the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g.
`max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
warnings.warn(

Our dataset containing processed input sequences are ready to be fed to the model.

Configuring the BERT model and Fine-tuning We will use

Adam as our optimizer,

CategoricalCrossentropy as our loss function, and

SparseCategoricalAccuracy as our accuracy metric.

Fine-tuning the model for 2 epochs.

[]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])

model.fit(train_data, epochs=2, validation_data=validation_data)

[]:
Epoch 1/2
611/Unknown - 590s 874ms/step - loss: 0.3459 - accuracy: 0.8454

Making Predictions I created a list of 10 reviews . few are positive reviews, and few are clearly negative.

[]:
pred_sentences = ['This was an awesome movie. I watch it twice my time watching this beautiful movie if I have known it was this good'
'One of the worst movies of all time. I cannot believe I wasted two hours of my life for this movie',
'Avatar The Way Of Water movie review: Avatar 2 is just stunning in the parts it skims along the water, dives deep,
'After 11 years, the Jackass crew is back for another crusade.',
'the movie is not so good',
'i liked the movie but i dont recommend it to anyone',
'its just one time watch',
'the movie was very lengthy not recommended',
'the movie was horrible and disgusting',
'S. S. Rajamoulis magnum opus epic war saga set new high for Indian Cinema across globe. Action sequences war scenes
len(pred_sentences)

We need to tokenize our reviews with our pre-trained BERT tokenizer. We will then feed these tokenized sequences to our model and run a final softmax layer to get the predictions. We
can then use the argmax function to determine whether our sentiment prediction for the review is positive or negative. Finally, we will print out the results with a simple for loop. The
following lines do all of these said operations:

[]:
tf_batch = tokenizer(pred_sentences, max_length=128, padding=True, truncation=True, return_tensors='tf')
tf_outputs = model(tf_batch)
tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
labels = ['Negative','Positive']
label = tf.argmax(tf_predictions, axis=1)
label = label.numpy()
for i in range(len(pred_sentences)):
print(pred_sentences[i], ": \n", labels[label[i]])

DL 22Q71A4206
No ratings yet
DL 22Q71A4206
65 pages
Keras Tutorial Lesson1
No ratings yet
Keras Tutorial Lesson1
52 pages
L2 - Basic ANN Model Building With TF-Keras
No ratings yet
L2 - Basic ANN Model Building With TF-Keras
16 pages
Transfer Learning With MobileNetV2 Test1
No ratings yet
Transfer Learning With MobileNetV2 Test1
16 pages
TensorFlow Cheat Sheet
No ratings yet
TensorFlow Cheat Sheet
7 pages
CCS355
No ratings yet
CCS355
29 pages
DL Student Lab Manual
No ratings yet
DL Student Lab Manual
81 pages
AI - Homework - Colab
No ratings yet
AI - Homework - Colab
10 pages
TMA01 Question 1 (45 Marks)
No ratings yet
TMA01 Question 1 (45 Marks)
31 pages
Deep Learning Lab Assignments - 6-9
No ratings yet
Deep Learning Lab Assignments - 6-9
14 pages
Soc DL Manual
No ratings yet
Soc DL Manual
50 pages
Csc413 Project Semantic Segmentation
No ratings yet
Csc413 Project Semantic Segmentation
84 pages
Deep Learning Lab Manual
100% (10)
Deep Learning Lab Manual
30 pages
DL Exp-10,11,12
No ratings yet
DL Exp-10,11,12
6 pages
DL LAB Manual (Uma)
No ratings yet
DL LAB Manual (Uma)
20 pages
3-Sentiment Analysis BERT
No ratings yet
3-Sentiment Analysis BERT
5 pages
DL Record Merged
No ratings yet
DL Record Merged
113 pages
BERT - Assignment - Jupyter Notebook
0% (2)
BERT - Assignment - Jupyter Notebook
8 pages
Handwriting Recognition
No ratings yet
Handwriting Recognition
31 pages
Skill4 2100100003
No ratings yet
Skill4 2100100003
5 pages
7 CNNWithCustomImage
No ratings yet
7 CNNWithCustomImage
11 pages
GenAI - Lab-File - Darab Khan 22SCSE1480055
No ratings yet
GenAI - Lab-File - Darab Khan 22SCSE1480055
31 pages
Improved - FCC - Cat - Dog - Ipynb - Colab
No ratings yet
Improved - FCC - Cat - Dog - Ipynb - Colab
12 pages
Recommending Movies - Retrieval - TensorFlow Recommenders
No ratings yet
Recommending Movies - Retrieval - TensorFlow Recommenders
11 pages
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
No ratings yet
566f0619-9145-4b8f-b12b-cb8a5b0cd30d
17 pages
Transfer Learning Model
No ratings yet
Transfer Learning Model
4 pages
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
No ratings yet
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
275 pages
Deep Learning Record
No ratings yet
Deep Learning Record
70 pages
Complete DL Record
No ratings yet
Complete DL Record
28 pages
Deep Learning (2024)
No ratings yet
Deep Learning (2024)
589 pages
Deep Learning Experiments
No ratings yet
Deep Learning Experiments
42 pages
CCS355-Neural Networks and Deep Learning - Assignment 1
No ratings yet
CCS355-Neural Networks and Deep Learning - Assignment 1
15 pages
Ultimate Data Science - GenAI Bootcamp
No ratings yet
Ultimate Data Science - GenAI Bootcamp
34 pages
A 1
No ratings yet
A 1
9 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Assignment Text Classification Using Hugging Face
No ratings yet
Assignment Text Classification Using Hugging Face
6 pages
TMA01 Question 2 (55 Marks)
No ratings yet
TMA01 Question 2 (55 Marks)
26 pages
1729492946538
No ratings yet
1729492946538
10 pages
Image Classification Code
No ratings yet
Image Classification Code
4 pages
DS Unit 2
No ratings yet
DS Unit 2
273 pages
Project Documentation
No ratings yet
Project Documentation
24 pages
Tensorflow 2 - 0 Slides PDF
No ratings yet
Tensorflow 2 - 0 Slides PDF
100 pages
CNN TF Keras
No ratings yet
CNN TF Keras
6 pages
DSE 3141 Deep Learning Lab Manual 2024 Week4
No ratings yet
DSE 3141 Deep Learning Lab Manual 2024 Week4
14 pages
Cat Dog Classification CNN Model
No ratings yet
Cat Dog Classification CNN Model
13 pages
Keras-tensorflow-IT Haarlem 2023
No ratings yet
Keras-tensorflow-IT Haarlem 2023
35 pages
Study of Intelligent Search Engine of Energy Indus
No ratings yet
Study of Intelligent Search Engine of Energy Indus
8 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
4 pages
Pre-Trained Models: Objectives
No ratings yet
Pre-Trained Models: Objectives
12 pages
21BCP167 Ai 9
No ratings yet
21BCP167 Ai 9
10 pages
DS Unit-2 Datastructures
No ratings yet
DS Unit-2 Datastructures
140 pages
Copia de Chat GPT Seo KINDLE
No ratings yet
Copia de Chat GPT Seo KINDLE
159 pages
Final Code
No ratings yet
Final Code
16 pages
Text Classification With Transformer - 1716327784332
No ratings yet
Text Classification With Transformer - 1716327784332
3 pages
Cours 4 - Loading and Preprocessing Data With TensorFlow
No ratings yet
Cours 4 - Loading and Preprocessing Data With TensorFlow
23 pages
Brain Tumour Classification
No ratings yet
Brain Tumour Classification
10 pages
Machine Learning Q and Ai Sample
No ratings yet
Machine Learning Q and Ai Sample
83 pages
Tensorflow Neural Network Lab: Notmnist
No ratings yet
Tensorflow Neural Network Lab: Notmnist
15 pages
Final Doc of Two Stage Job Title Identification System For Online Job Advertisements-1
No ratings yet
Final Doc of Two Stage Job Title Identification System For Online Job Advertisements-1
59 pages
Violentometer - Measuring Violence On The Web in Real Time
0% (1)
Violentometer - Measuring Violence On The Web in Real Time
4 pages
Explore The Implementation of CNNs in Python
No ratings yet
Explore The Implementation of CNNs in Python
10 pages
A Review On Graph Neural Network Methods in Financial Applications
No ratings yet
A Review On Graph Neural Network Methods in Financial Applications
32 pages
BT4431 Report of Project Ete 7TH Sem Plag Report Attachted
No ratings yet
BT4431 Report of Project Ete 7TH Sem Plag Report Attachted
69 pages
Convolutional Neural Networks: Objectives
No ratings yet
Convolutional Neural Networks: Objectives
10 pages
CTRL
No ratings yet
CTRL
5 pages
Assignment 2.4.1 Multiclass Classification
No ratings yet
Assignment 2.4.1 Multiclass Classification
5 pages
Thesis (Keshri, Ankit, Harsh)
No ratings yet
Thesis (Keshri, Ankit, Harsh)
71 pages
005 Seminar Report
No ratings yet
005 Seminar Report
43 pages
350 NLP Projects With Code
No ratings yet
350 NLP Projects With Code
70 pages
Interenship Report
No ratings yet
Interenship Report
26 pages
ML Lab Session 05 - CNN Implementation
No ratings yet
ML Lab Session 05 - CNN Implementation
4 pages
Cyber Threat Intelligence For SOC Analysts
No ratings yet
Cyber Threat Intelligence For SOC Analysts
9 pages
DS C++ External Lab
No ratings yet
DS C++ External Lab
35 pages
Thesis Presentation
No ratings yet
Thesis Presentation
33 pages
Major Complete Presentation - Major Project Presentation.
No ratings yet
Major Complete Presentation - Major Project Presentation.
28 pages
Review of Gen AI Models For Financial Risk Management
No ratings yet
Review of Gen AI Models For Financial Risk Management
16 pages
320 Cohort 9 Report Final
No ratings yet
320 Cohort 9 Report Final
46 pages
3.building A Knowledge Graph To Enrich Chatgpt Responses in Manufacturing Service Discovery
No ratings yet
3.building A Knowledge Graph To Enrich Chatgpt Responses in Manufacturing Service Discovery
28 pages
L - S H - C W T E E E M: Chinesewebtext
No ratings yet
L - S H - C W T E E E M: Chinesewebtext
15 pages
Transformer-Based Deep Learning Models For The Sentiment Analysis of Social Media Data
No ratings yet
Transformer-Based Deep Learning Models For The Sentiment Analysis of Social Media Data
12 pages
An Overview of Vision Transformers For Image Processing A Survey
No ratings yet
An Overview of Vision Transformers For Image Processing A Survey
17 pages
论文句子生成器
100% (1)
论文句子生成器
8 pages
Keras For Beginners: Implementing A Recurrent Neural Network
No ratings yet
Keras For Beginners: Implementing A Recurrent Neural Network
13 pages
Scholastic Achievements: Graduation IIT Bombay IIT Bombay 2025 8.46
No ratings yet
Scholastic Achievements: Graduation IIT Bombay IIT Bombay 2025 8.46
2 pages
Arabic Fake News Detection Comparative Study of Ne
No ratings yet
Arabic Fake News Detection Comparative Study of Ne
10 pages
2020 Acl-Main 577
No ratings yet
2020 Acl-Main 577
7 pages
THDGFH HSGFH G
No ratings yet
THDGFH HSGFH G
2 pages
Exp 11 NLI USING BERT
No ratings yet
Exp 11 NLI USING BERT
4 pages
Tensor Flow 2
No ratings yet
Tensor Flow 2
3 pages
Fake News Detection and Fact Verification Research Paper
No ratings yet
Fake News Detection and Fact Verification Research Paper
2 pages
Kumar Shivam CV
No ratings yet
Kumar Shivam CV
1 page
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet