0% found this document useful (0 votes)

225 views18 pages

A Simple Guide On Using BERT For Binary Text Classification

The document provides a guide on using BERT for binary text classification. It discusses downloading and setting up BERT with PyTorch, preparing a dataset by converting it from CSV to TSV format, loading and preprocessing the text data, fine-tuning a pretrained BERT model for the classification task, and evaluating the model's performance. The guide aims to explain using BERT for text classification as simply as possible.

Uploaded by

sita devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

225 views18 pages

A Simple Guide On Using BERT For Binary Text Classification

Uploaded by

sita devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

You have 1 free member-only story left this month. Sign up and get an extra one for free.

A Simple Guide On Using BERT for Binary Text

Classi cation.
The A-to-Z guide on how you can use Google’s BERT for binary text classi cation tasks. I’ll be
aiming to explain, as simply and straightforwardly as possible, how to ne-tune a BERT model
(with PyTorch) and use it for a binary text classi cation task.

Thilina Rajapakse Follow

Jun 9, 2019 · 10 min read

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 1/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

Photo by Andy Kelly on Unsplash

Update Notice II
Please consider using the Simple Transformers library as it is easy to use, feature-
packed, and regularly updated. The article still stands as a reference to BERT models
and is likely to be helpful with understanding how BERT works. However, Simple
Transformers offers a lot more features, much more straightforward tuning options, all
the while being quick and easy to use! The links below should help you get started
quickly.

1. Binary Classification

2. Multi-Class Classification

3. Multi-Label Classification

4. Named Entity Recognition (Part-of-Speech Tagging)

5. Question Answering

6. Sentence-Pair Tasks and Regression

7. Conversational AI

8. Language Model Fine-Tuning

9. ELECTRA and Language Model Training from Scratch

10. Visualising Model Training

Update Notice I
In light of the update to the library used in this article (HuggingFace updated the
pytorch-pretrained-bert library to pytorch-transformers ), I have written a new guide as

well as a new repo. If you are starting out with Transformer models, I recommend
using those as the code has been cleaned up both on my end and in the Pytorch-
Transformers library, greatly streamlining the whole process. The new repo also
supports XLNet, XLM, and RoBERTa models out of the box, in addition to BERT, as of
September 2019.

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 2/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

1. Intro
Let’s talk about what we are going to (and not going to) do.
Before we begin, let me point you towards the github repo containing all the code used in this
guide. All code in the repo is included in the guide here, and vice versa. Feel free to refer to it
anytime, or clone the repo to follow along with the guide.

If your internet wanderings have led you here, I guess it’s safe to assume that you have
heard of BERT, the powerful new language representation model, open-sourced by
Google towards the end of 2018. If you haven’t, or if you’d like a refresher, I recommend
giving their paper a read as I won’t be going into the technical details of how BERT
works. If you are unfamiliar with the Transformer model (or if words like “attention”,
“embeddings”, and “encoder-decoder” sound scary), check out this brilliant article by
Jay Alammar. You don’t necessarily need to know everything about BERT (or
Transformers) to follow the rest of this guide, but the above links should help if you wish
to learn more about BERT and Transformers.

Now that we’ve gotten what we won’t do out of the way, let’s dig into what we will do,
shall we?

Getting BERT downloaded and set up. We will be using the PyTorch version
provided by the amazing folks at Hugging Face.

Converting a dataset in the .csv format to the .tsv format that BERT knows and loves.

Loading the .tsv files into a notebook and converting the text representations to a
feature representation (think numerical) that the BERT model can work with.

Setting up a pretrained BERT model for fine-tuning.

Fine-tuning a BERT model.

Evaluating the performance of the BERT model.

One last thing before we dig in, I’ll be using three Jupyter Notebooks for data preparation,
training, and evaluation. It’s not strictly necessary, but it felt cleaner to separate those three
processes.

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 3/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

2. Getting set up
Time to get BERT up and running.
1. Create a virtual environment with the required packages. You can use any
package/environment manager, but I’ll be using Conda.
conda create -n bert python pytorch pandas tqdm

conda install -c anaconda scikit-learn

(Note: If you run into any missing package error while following the guide, go ahead and
install them using your package manager. A google search should tell you how to install
a specific package.)

2. Install the PyTorch version of BERT from Hugging Face.

pip install pytorch-pretrained-bert

3. To do text classification, we’ll obviously need a text classification dataset. For this
guide, I’ll be using the Yelp Reviews Polarity dataset which you can find here on
fast.ai. (Direct download link for any lazy asses, I mean busy folks.)
Decompress the downloaded file and get the train.csv, and test.csv files. For
reference, the path to my train.csv file is <starting_directory>/data/train.csv

3. Preparing data
Before we can cook the meal, we need to prepare the ingredients! (Or
something like that. <Insert proper analogy here>)
Most datasets you find will typically come in the csv format and the Yelp Reviews dataset
is no exception. Let’s load it in with pandas and take a look.

In [1]: import pandas as pd

In [2]: train_df = pd.read_csv('data/train.csv', header=None)

train_df.head()

Out[2]: 0 1

0 1 Unfortunately, the frustration of being Dr. Go...

1 2 Been going to Dr. Goldberg for over 10 years. ...

2 1 I don't know what Dr. Goldberg was like before...

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 4/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.
17
18 Args:
19 guid: Unique id for the example.
20 text_a: string. The untokenized text of the first sequence. For single
21 sequence tasks, only this sequence must be specified.
22 text_b: (Optional) string. The untokenized text of the second sequence.
23 Only must be specified for sequence pair tasks.
24 label: (Optional) string. The label of the example. This should be
25 specified for train and dev examples, but not for test examples.
26 """
27 self.guid = guid
28 self.text_a = text_a
29 self.text_b = text_b
30 self.label = label
31
32
33 class DataProcessor(object):
34 """Base class for data converters for sequence classification data sets."""
35
36 def get_train_examples(self, data_dir):
37 """Gets a collection of `InputExample`s for the train set."""
38 raise NotImplementedError()
39
40 def get_dev_examples(self, data_dir):
41 """Gets a collection of `InputExample`s for the dev set."""
42 raise NotImplementedError()
43
44 def get_labels(self):
45 """Gets the list of labels for this data set."""
46 raise NotImplementedError()
47
48 @classmethod
49 def _read_tsv(cls, input_file, quotechar=None):
50 """Reads a tab separated value file."""
51 with open(input_file, "r", encoding="utf-8") as f:
52 reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
53 lines = []
54 for line in reader:
55 if sys.version_info[0] == 2:
56 line = list(unicode(cell, 'utf-8') for cell in line)
57 lines.append(line)
58 return lines
59
60
61 class BinaryClassificationProcessor(DataProcessor):

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 8/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.
62 """Processor for binary classification dataset."""
63
64 def get_train_examples(self, data_dir):
65 """See base class."""
66 return self._create_examples(
67 self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
68
69 def get_dev_examples(self, data_dir):
70 """See base class."""
71 return self._create_examples(
72 self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
73
74 def get_labels(self):
75 """See base class."""
76 return ["0", "1"]
77
78 def create examples(self lines set type):

The first class, InputExample, is the format that a single example of our dataset should
be in. We won’t be using the text_b attribute since that is not necessary for our binary
classification task. The other attributes should be fairly self-explanatory.

The other two classes, DataProcessor and BinaryClassificationProcessor, are helper

classes that can be used to read in .tsv files and prepare them to be converted into
features that will ultimately be fed into the actual BERT model.

The BinaryClassificationProcessor class can read in the train.tsv and dev.tsv files
and convert them into lists of InputExample objects.

So far, we have the capability to read in tsv datasets and convert them into
InputExample objects. BERT, being a neural network, cannot directly deal with text as
we have in InputExample objects. The next step is to convert them into InputFeatures.

BERT has a constraint on the maximum length of a sequence after tokenizing. For any
BERT model, the maximum sequence length after tokenization is 512. But we can set
any sequence length equal to or below this value. For faster training, I’ll be using 128 as
the maximum sequence length. A bigger number may give better results if there are
sequences longer than this value.

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 9/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

3 1 I'm writing this review to give you a heads up...

4 2 All the food is great here. But the best thing...

In [3]: test_df = pd.read_csv('data/test.csv', header=None)

test_df.head()

Out[3]: 0 1

0 2 Contrary to other reviews, I have zero complai...

1 1 Last summer I had an appointment to get new ti...

2 2 Friendly staff same starbucks fair you get an

dataset.ipynb hosted with ❤ by GitHub view raw

As you can see, the data is in the two csv files train.csv and test.csv . They contain no

headers, and two columns for the label and the text. The labels used here feel a little
weird to me, as they have used 1 and 2 instead of the typical 0 and 1. Here, a label of 1
means the review is bad, and a label of 2 means the review is good. I’m going to change
this to the more familiar 0 and 1 labelling, where a label 0 indicates a bad review, and a
label 1 indicates a good review.

Much better, am I right?

BERT, however, wants data to be in a tsv file with a specific format as given below (Four
columns, and no header row).

Column 0: An ID for the row

Column 1: The label for the row (should be an int)

Column 2: A column of the same letter for all rows. BERT wants this so we’ll give it,
but we don’t have a use for it.

Column 3: The text for the row

Let’s make things a little BERT-friendly.

In [17]: train_df_bert = pd.DataFrame({

'id':range(len(train_df)),
'label':train df[0]
https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 5/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

An InputFeature consists of purely numerical data (with the proper sequence lengths)
that can then be fed into the BERT model. This is prepared by tokenizing the text of each
example and truncating the longer sequence while padding the shorter sequences to the
given maximum sequence length (128). I found the conversion of InputExample
objects to InputFeature objects to be quite slow by default, so I modified the conversion
code to utilize the multiprocessing library of Python to significantly speed up the process.

1 class InputFeatures(object):
2 """A single set of features of data."""
3
4 def __init__(self, input_ids, input_mask, segment_ids, label_id):
5 self.input_ids = input_ids
6 self.input_mask = input_mask
7 self.segment_ids = segment_ids
8 self.label_id = label_id
9
10
11 def _truncate_seq_pair(tokens_a, tokens_b, max_length):
12 """Truncates a sequence pair in place to the maximum length."""
13
14 # This is a simple heuristic which will always truncate the longer sequence
15 # one token at a time. This makes more sense than truncating an equal percent
16 # of tokens from each, since if one sequence is very short then each token
17 # that's truncated likely contains more information than a longer sequence.
18 while True:
19 total_length = len(tokens_a) + len(tokens_b)
20 if total_length <= max_length:
21 break
22 if len(tokens_a) > len(tokens_b):
23 tokens_a.pop()
24 else:
25 tokens_b.pop()
26
27
28 def convert_example_to_feature(example_row):
29 # return example_row
30 example, label_map, max_seq_length, tokenizer, output_mode = example_row
31
32 tokens_a = tokenizer.tokenize(example.text_a)
33
34 tokens_b = None
35 if example.text_b:
36 t k b t k i t k i ( l t t b)
https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 10/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.
label :train_df[0],
'alpha':['a']*train_df.shape[0],
'text': train_df[1].replace(r'\n', ' ', regex=True)
})

train_df_bert.head()

Out[17]:
id label alpha text

0 0 0 a Unfortunately, the frustration of being Dr. Go...

1 1 1 a Been going to Dr. Goldberg for over 10 years. ...

2 2 0 a I don't know what Dr. Goldberg was like before...

3 3 0 a I'm writing this review to give you a heads up...

4 4 1 a All the food is great here. But the best thing...

In [18]: dev_df_bert = pd.DataFrame({

'id':range(len(test_df)),
'label':test_df[0],
'alpha':['a']*test_df.shape[0],
'text': test_df[1].replace(r'\n', ' ', regex=True)
})
bert_format.ipynb hosted with ❤ by GitHub view raw

For convenience, I’ve named the test data as dev data. The convenience stems from the
fact that BERT comes with data loading classes that expects train and dev files in the
above format. We can use the train data to train our model, and the dev data to evaluate
its performance. BERT’s data loading classes can also use a test file but it expects the test
file to be unlabelled. Therefore, I will be using the train and dev files instead.

Now that we have the data in the correct form, all we need to do is to save the train and
dev data as .tsv files.

In [20]: train_df_bert.to_csv('data/train.tsv', sep='\t', index=False, heade

r=False)

In [21]: dev_df_bert.to_csv('data/dev.tsv', sep='\t', index=False, header=Fa

lse)

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 6/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

save_tsv.ipynb hosted with ❤ by GitHub view raw

That’s the eggs beaten, the chicken thawed, and the veggies sliced. Let’s get cooking!

4. Data to Features
The final step before fine-tuning is to convert the data into features that BERT
uses. Most of the remaining code was adapted from the HuggingFace example
run_classifier.py, found here.
Now, we will see the reason for us rearranging the data into the .tsv format in the
previous section. It enables us to easily reuse the example classes that come with BERT
for our own binary classification task. Here’s how they look.

1 from future import absolute_import, division, print_function

2
3 import csv
4 import os
5 import sys
6 import logging
7
8 logger = logging.getLogger()
9 csv.field_size_limit(2147483647) # Increase CSV reader's field limit incase we have long text.
10
11
12 class InputExample(object):
13 """A single training/test example for simple sequence classification."""
14
15 def __init__(self, guid, text_a, text_b=None, label=None):
16 """Constructs a InputExample.
17
https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 7/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.
36 tokens_b = tokenizer.tokenize(example.text_b)
37 # Modifies `tokens_a` and `tokens_b` in place so that the total
38 # length is less than the specified length.
39 # Account for [CLS], [SEP], [SEP] with "- 3"
40 _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
41 else:
42 # Account for [CLS] and [SEP] with "- 2"
43 if len(tokens_a) > max_seq_length - 2:
44 tokens_a = tokens_a[:(max_seq_length - 2)]
45
46 tokens = ["[CLS]"] + tokens_a + ["[SEP]"]
47 segment_ids = [0] * len(tokens)
48
49 if tokens_b:
50 tokens += tokens_b + ["[SEP]"]
51 segment_ids += [1] * (len(tokens_b) + 1)
52
53 input_ids = tokenizer.convert_tokens_to_ids(tokens)
54
55 # The mask has 1 for real tokens and 0 for padding tokens. Only real
56 # tokens are attended to.
57 input_mask = [1] * len(input_ids)
58
59 # Zero-pad up to the sequence length.
60 padding = [0] * (max_seq_length - len(input_ids))
61 input_ids += padding
62 input_mask += padding
63 segment_ids += padding
64
65 assert len(input_ids) == max_seq_length
66 assert len(input_mask) == max_seq_length
67 assert len(segment_ids) == max_seq_length
68
69 if output_mode == "classification":
70 label_id = label_map[example.label]
71 elif output mode == "regression":

We will see how to use these methods in just a bit.

(Note: I’m switching to the training notebook.)

First, let’s import all the packages that we’ll need, and then get our paths straightened
out.

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 11/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

(Tip: The model will be downloaded into a temporary folder. Find the folder by following the
path printed on the output once the download completes and copy the downloaded file to the
cache/ directory. The file should be a compressed file in .tar.gz format. Next time, you can
just use this downloaded file without having to download it all over again. All you need to do
is comment out the line that downloaded the model, and uncomment the line below it.)

We just need to do a tiny bit more configuration for the training. Here, I’m just using the
default parameters.

Setting up our DataLoader for training..

Training time!

Now we’ve trained the BERT model for one epoch, we can evaluate the results. Of
course, more training will likely yield better results but even one epoch should be
sufficient for proof of concept (hopefully!).

In order to be able to easily load our fine-tuned model, we should save it in a specific
way, i.e. the same way the default BERT models are saved. Here is how you can do that.

Go into the outputs/yelp directory where the fine tuned models will be saved.
There, you should find 3 files; config.json , pytorch_model.bin , vocab.txt .

Archive the two files (I use 7zip for archiving) config.json, and pytorch_model.bin
into a .tar file.

Compress the .tar file into gzip format. Now the file should be something like
yelp.tar.gz

Copy the compressed file into the cache/ directory.

We will load this fine tuned model in the next step.

6. Evaluation

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 16/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

Time to see what our fine-tuned model can do. (We’ve cooked the meal, let’s see
how it tastes.)
(Note: I’m switching to the evaluation notebook)

Most of the code for the evaluation is very similar to the training process, so I won’t go
into too much detail but I’ll list some important points.

BERT_MODEL parameter should be the name of your fine-tuned model. For

example, yelp.tar.gz .

The tokenizer should be loaded from the vocabulary file created in the training
stage. In my case, that would outputs/yelp/vocab.txt (or the path can be set as
OUTPUT_DIR + vocab.txt )

This time, we’ll be using the BinaryClassificationProcessor to load in the dev.tsv

file by calling the get_dev_examples method.

Double check to make sure you are loading the fine-tuned model and not the
original BERT model. 😅

Here’s my notebook for the evaluation.

With just one single epoch of training, our BERT model achieves a 0.914 Matthews
correlation coefficient (Good measure for evaluating unbalanced datasets. Sklearn doc
here). With more training, and perhaps some hyperparameter tuning, we can almost
certainly improve upon what is already an impressive score.

7. Conclusion
BERT is an incredibly powerful language representation model that shows great promise
in a wide variety of NLP tasks. Here, I’ve tried to give a basic guide to how you might use
it for binary text classification.

As the results show, BERT is a very effective tool for binary text classification, not to
mention all the other tasks it has already been used for.

Reminder: Github repo with all the code can be found here.

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 17/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

Sign up for Top Stories from The Startup

A newsletter that delivers The Startup's most popular stories to your inbox once a month.

Create a free Medium account to get Top Stories in your

Get this newsletter inbox.

Data Science Arti cial Intelligence NLP Bert Pytorch

About Help Legal

Get the Medium app

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 18/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

In [1]: import torch

import pickle
from torch.utils.data import (DataLoader, RandomSampler, Sequential
Sampler, TensorDataset)
from torch.nn import CrossEntropyLoss, MSELoss

from tqdm import tqdm_notebook, trange

import os
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertF
orMaskedLM, BertForSequenceClassification
from pytorch_pretrained_bert.optimization import BertAdam, WarmupLi
nearSchedule

from multiprocessing import Pool, cpu_count

from tools import *
import convert_examples_to_features

# OPTIONAL: if you want to have more information on what's happenin

g, activate the logger as follows
import logging
logging.basicConfig(level=logging.INFO)

device = torch.device("cuda" if torch.cuda.is_available() else "cp

u")

In [2]: # The input data dir. Should contain the .tsv files (or other data
)
imports_and_paths.ipynb hosted with ❤ by GitHub view raw

In the first cell, we are importing the necessary packages. In the next cell, we are setting
some paths for where files should be stored and where certain files can be found. We are
also setting some configuration options for the BERT model. Finally, we will create the
directories if they do not already exist.

Next, we will use our BinaryClassificationProcessor to load in the data, and get
everything ready for the tokenization step.

In [6]: processor = BinaryClassificationProcessor()

train_examples = processor.get_train_examples(DATA_DIR)
train_examples_len = len(train_examples)

In [7]: label_list = processor.get_labels() # [0, 1] for binary classificat

ion
num_labels = len(label_list)

In [8]: num_train_optimization_steps = int(

train examples len / TRAIN BATCH SIZE / GRADIENT ACCUMULATION S
https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 12/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.
train_examples_len / TRAIN_BATCH_SIZE / GRADIENT_ACCUMULATION_S
TEPS) * NUM_TRAIN_EPOCHS

In [5]: # Load pre-trained model tokenizer (vocabulary)

tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_low
er_case=False)

INFO:pytorch_pretrained_bert.tokenization:loading vocabulary file h

ttps://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-
vocab.txt from cache at C:\Users\chatu\.pytorch_pretrained_bert\5e8
a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13db
b970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1

In [9]: label_map = {label: i for i, label in enumerate(label_list)}

train_examples_for_processing = [(example, label_map, MAX_SEQ_LENGT
H, tokenizer, OUTPUT_MODE) for example in train_examples]

prepare_for_tokenizing.ipynb hosted with ❤ by GitHub view raw

Here, we are creating our BinaryClassificationProcessor and using it to load in the

train examples. Then, we are setting some variables that we’ll use while training the
model. Next, we are loading the pretrained tokenizer by BERT. In this case, we’ll be
using the bert-base-cased model.

The convert_example_to_feature function expects a tuple containing an example, the

label map, the maximum sequence length, a tokenizer, and the output mode. So lastly, we
will create an examples list ready to be processed (tokenized, truncated/padded, and
turned into InputFeatures) by the convert_example_to_feature function.

Now, we can use the multi-core goodness of modern CPU’s to process the examples
(relatively) quickly. My Ryzen 7 2700x took about one and a half hours for this part.

In [10]: process_count = cpu_count() - 1

if __name__ == '__main__':
print(f'Preparing to convert {train_examples_len} examples..')
print(f'Spawning {process_count} processes..')
with Pool(process_count) as p:
train_features = list(tqdm_notebook(p.imap(convert_examples
_to_features.convert_example_to_feature, train_examples_for_process
ing), total=train_examples_len))

Preparing to convert 560000 examples..

Spawning 15 processes..
HBox(children=(IntProgress(value=0, max=560000), HTML(value='')))

In [11]: with open(DATA_DIR + "train_features.pkl", "wb") as f:

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 13/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.
pickle.dump(train_features, f)

BERT.ipynb hosted with ❤ by GitHub view raw

Your notebook should show the progress of the processing rather than the ‘HBox’ thing I have here. It’s an
issue with uploading the notebook to Gist.

(Note: If you have any issues getting the multiprocessing to work, just copy paste all the code
up to, and including, the multiprocessing into a python script and run it from the command
line or an IDE. Jupyter Notebooks can sometimes get a little iffy with multiprocessing. I’ve
included an example script on github named converter.py )

Once all the examples are converted into features, we can pickle them to disk for
safekeeping (I, for one, do not want to run the processing for another one and a half
hours). Next time, you can just unpickle the file to get the list of features.

Well, that was a lot of data preparation. You deserve a coffee, I’ll see you for the training
part in a bit. (Unless you already had your coffee while the processing was going on. In
which case, kudos to efficiency!)

5. Fine-tuning BERT (finally!)

Had your coffee? Raring to go? Let’s show BERT how it’s done! (Fine tune. Show
how it’s done. Get it? I might be bad at puns.)
Not much left now, let’s hope for smooth sailing. (Or smooth.. cooking? I forgot my
analogy somewhere along the way. Anyway, we now have all the ingredients in the pot,
and all we have to do is turn on the stove and let thermodynamics work its magic.)

In [ ]: # Load pre-trained model (weights)

model = BertForSequenceClassification.from_pretrained(BERT_MODEL, c
ache_dir=CACHE_DIR, num_labels=num_labels)
https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 14/18
6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

# model = BertForSequenceClassification.from_pretrained(CACHE_DIR +
'cased_base_bert_pytorch.tar.gz', cache_dir=CACHE_DIR, num_labels=n
um_labels)

1%|▌
| 3306496/404400730 [00:19<08:08, 820603.15B/s]

In [11]: model.to(device)

Out[11]: BertForSequenceClassification(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(28996, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bi

loading_bert.ipynb hosted with ❤ by GitHub view raw

HuggingFace’s pytorch implementation of BERT comes with a function that

automatically downloads the BERT model for us (have I mentioned I love these dudes?).
I stopped my download since I have terrible internet, but it shouldn’t take long. It’s only
about 400 MB in total for the base models. Just wait for the download to complete and
you are good to go.

Don’t panic if you see the following output once the model is downloaded, I know it
looks panic inducing but this is actually the expected behavior. The not initialized
things are not meant to be initialized. Intentionally.

INFO:pytorch_pretrained_bert.modeling:Weights of
BertForSequenceClassification not initialized from pretrained model:
['classifier.weight', 'classifier.bias']
INFO:pytorch_pretrained_bert.modeling:Weights from pretrained model
not used in BertForSequenceClassification: ['cls.predictions.bias',
'cls.predictions.transform.dense.weight',
'cls.predictions.transform.dense.bias',
'cls.predictions.decoder.weight', 'cls.seq_relationship.weight',
'cls.seq_relationship.bias',
'cls.predictions.transform.LayerNorm.weight',
'cls.predictions.transform.LayerNorm.bias']

https://medium.com/swlh/a-simple-guide-on-using-bert-for-text-classification-bbf041ac8d04 15/18

Program Enrollment Test Quiz - WorldQuant University
No ratings yet
Program Enrollment Test Quiz - WorldQuant University
4 pages
Biometric Manual
100% (1)
Biometric Manual
45 pages
Loadsensing CMT Edge v.2.5 - Draft
0% (1)
Loadsensing CMT Edge v.2.5 - Draft
70 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
O'Connor - Matlab's Floating Point System
No ratings yet
O'Connor - Matlab's Floating Point System
17 pages
Artificial Neural Networks An Econometric Perspective
No ratings yet
Artificial Neural Networks An Econometric Perspective
98 pages
04 Notes 6250 f13
0% (1)
04 Notes 6250 f13
16 pages
CS 3600 Project 4b Analysis
No ratings yet
CS 3600 Project 4b Analysis
3 pages
Bian - Deep Learning On Smooth Manifolds
No ratings yet
Bian - Deep Learning On Smooth Manifolds
6 pages
BackPropogationCrossEntNotes PDF
No ratings yet
BackPropogationCrossEntNotes PDF
4 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Matlab Matlab Toolbox Deep Learning Toolbox Neural Network Toolbox Libraries Functions How To Use
No ratings yet
Matlab Matlab Toolbox Deep Learning Toolbox Neural Network Toolbox Libraries Functions How To Use
5 pages
Abdelkader BENHARI Optimisation Notes PDF
No ratings yet
Abdelkader BENHARI Optimisation Notes PDF
252 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
RAG With Math
No ratings yet
RAG With Math
7 pages
R Cheat Sheet 3 PDF
No ratings yet
R Cheat Sheet 3 PDF
2 pages
L12 Cubic Spline
No ratings yet
L12 Cubic Spline
15 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
Basic Iterative Methods For Solving Linear Systems PDF
No ratings yet
Basic Iterative Methods For Solving Linear Systems PDF
33 pages
Cubic Spline Interpolation
100% (3)
Cubic Spline Interpolation
15 pages
LeastSquares Fit in Matlab
No ratings yet
LeastSquares Fit in Matlab
64 pages
Levenberg Examples
100% (1)
Levenberg Examples
2 pages
OptimisationII Notes
100% (1)
OptimisationII Notes
94 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
Week 1
No ratings yet
Week 1
50 pages
RNN
No ratings yet
RNN
16 pages
Cubic Spline PDF
No ratings yet
Cubic Spline PDF
34 pages
Backtracking ADA
No ratings yet
Backtracking ADA
20 pages
Neural Network Presentation
100% (4)
Neural Network Presentation
33 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Matlab Code
No ratings yet
Matlab Code
7 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Neural Computing
No ratings yet
Neural Computing
13 pages
2D Heat Equation Iteration Method
No ratings yet
2D Heat Equation Iteration Method
4 pages
Diffusion Equation PDF
No ratings yet
Diffusion Equation PDF
33 pages
Linear Algebra - Intuition, Math, Code
No ratings yet
Linear Algebra - Intuition, Math, Code
565 pages
Computational Tools and Software MATLAB Python
No ratings yet
Computational Tools and Software MATLAB Python
5 pages
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
No ratings yet
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
3 pages
A Gentle Introduction To Backpropagation
100% (1)
A Gentle Introduction To Backpropagation
15 pages
Bayesian Data Analysis
No ratings yet
Bayesian Data Analysis
38 pages
Animal Model in R - Example Mrode
100% (1)
Animal Model in R - Example Mrode
6 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
Solid Modeling: Evolution of Geometric Modeling
No ratings yet
Solid Modeling: Evolution of Geometric Modeling
18 pages
Mathematical Treatise On Linear Algebra
No ratings yet
Mathematical Treatise On Linear Algebra
7 pages
Visualizing Structural Matrices in ANSYS Using APDL
No ratings yet
Visualizing Structural Matrices in ANSYS Using APDL
17 pages
Chapter
100% (1)
Chapter
101 pages
XL Wings
No ratings yet
XL Wings
214 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
22 pages
Example of 2D Convolution
No ratings yet
Example of 2D Convolution
5 pages
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
Numerical Methods for Two-Point Boundary-Value Problems
From Everand
Numerical Methods for Two-Point Boundary-Value Problems
Herbert B. Keller
No ratings yet
A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)
No ratings yet
A Hands-On Guide To Text Classification With Transformer Models (XLNet, BERT, XLM, RoBERTa)
9 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
The Project Gutenberg RST Manual
From Everand
The Project Gutenberg RST Manual
Marcello Perathoner
No ratings yet
Data Mining Report
No ratings yet
Data Mining Report
17 pages
Hugging Face
100% (1)
Hugging Face
11 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
Bert As Service
No ratings yet
Bert As Service
43 pages
Electronic Candidate Registration System User Manual
No ratings yet
Electronic Candidate Registration System User Manual
57 pages
CataRT Documentation 0.9
No ratings yet
CataRT Documentation 0.9
4 pages
Inputaccel 5.3: Capture For Documentum Quick Start Guide
No ratings yet
Inputaccel 5.3: Capture For Documentum Quick Start Guide
22 pages
DATA FILE HANDLING Chapter Clearance
No ratings yet
DATA FILE HANDLING Chapter Clearance
21 pages
How To Use Regshot To Monitor Your Registry
No ratings yet
How To Use Regshot To Monitor Your Registry
13 pages
1st Internal Key Answers
No ratings yet
1st Internal Key Answers
9 pages
Farm Seek and Find
No ratings yet
Farm Seek and Find
4 pages
Epicor ERP 10 Colombia Country Specific Functionality Guide
No ratings yet
Epicor ERP 10 Colombia Country Specific Functionality Guide
73 pages
FlashUTIL LPE UG122 101
No ratings yet
FlashUTIL LPE UG122 101
52 pages
SDR
No ratings yet
SDR
9 pages
Os Lab 2
No ratings yet
Os Lab 2
18 pages
Hioki Im3570 Handbuch en A981 08
No ratings yet
Hioki Im3570 Handbuch en A981 08
458 pages
Fanuc PC Fapt
50% (2)
Fanuc PC Fapt
198 pages
Information Technology Level 4
No ratings yet
Information Technology Level 4
3 pages
Brochure Dox
No ratings yet
Brochure Dox
1 page
Read Me Please
No ratings yet
Read Me Please
2 pages
CSS DLL 8 Pco
No ratings yet
CSS DLL 8 Pco
4 pages
Intellij Idea Help
No ratings yet
Intellij Idea Help
3,016 pages
Data en Try Op Er A Tions: Gen Eral in Struc Tions
No ratings yet
Data en Try Op Er A Tions: Gen Eral in Struc Tions
4 pages
Stars! Player's Guide
No ratings yet
Stars! Player's Guide
280 pages
Integration With CommVault Simpana c02747478
No ratings yet
Integration With CommVault Simpana c02747478
52 pages
Willmar Design - 2024 Branding Guide
No ratings yet
Willmar Design - 2024 Branding Guide
87 pages
Trimble Quadri and Trimble Connect Brochure English
No ratings yet
Trimble Quadri and Trimble Connect Brochure English
2 pages
Progress 4 GL
100% (1)
Progress 4 GL
328 pages
As400 Faqs
No ratings yet
As400 Faqs
6 pages
Notes of Azure Data Bricks
No ratings yet
Notes of Azure Data Bricks
16 pages
G3335-90031 Quant Familiarization
No ratings yet
G3335-90031 Quant Familiarization
110 pages
Unit 2 Python
No ratings yet
Unit 2 Python
17 pages

A Simple Guide On Using BERT For Binary Text Classification

Uploaded by

A Simple Guide On Using BERT For Binary Text Classification

Uploaded by

6/28/2020 A Simple Guide On Using BERT for Binary Text Classification.

A Simple Guide On Using BERT for Binary Text

Thilina Rajapakse Follow

Photo by Andy Kelly on Unsplash

4. Named Entity Recognition (Part-of-Speech Tagging)

6. Sentence-Pair Tasks and Regression

8. Language Model Fine-Tuning

9. ELECTRA and Language Model Training from Scratch

10. Visualising Model Training

Setting up a pretrained BERT model for fine-tuning.

Fine-tuning a BERT model.

Evaluating the performance of the BERT model.

conda install -c anaconda scikit-learn

2. Install the PyTorch version of BERT from Hugging Face.

In [1]: import pandas as pd

In [2]: train_df = pd.read_csv('data/train.csv', header=None)

0 1 Unfortunately, the frustration of being Dr. Go...

1 2 Been going to Dr. Goldberg for over 10 years. ...

2 1 I don't know what Dr. Goldberg was like before...

The other two classes, DataProcessor and BinaryClassificationProcessor, are helper

3 1 I'm writing this review to give you a heads up...

4 2 All the food is great here. But the best thing...

In [3]: test_df = pd.read_csv('data/test.csv', header=None)

0 2 Contrary to other reviews, I have zero complai...

1 1 Last summer I had an appointment to get new ti...

2 2 Friendly staff same starbucks fair you get an

Much better, am I right?

Column 0: An ID for the row

Column 1: The label for the row (should be an int)

Column 3: The text for the row

Let’s make things a little BERT-friendly.

In [17]: train_df_bert = pd.DataFrame({

0 0 0 a Unfortunately, the frustration of being Dr. Go...

1 1 1 a Been going to Dr. Goldberg for over 10 years. ...

2 2 0 a I don't know what Dr. Goldberg was like before...

3 3 0 a I'm writing this review to give you a heads up...

4 4 1 a All the food is great here. But the best thing...

In [18]: dev_df_bert = pd.DataFrame({

In [20]: train_df_bert.to_csv('data/train.tsv', sep='\t', index=False, heade

In [21]: dev_df_bert.to_csv('data/dev.tsv', sep='\t', index=False, header=Fa

save_tsv.ipynb hosted with ❤ by GitHub view raw

1 from __future__ import absolute_import, division, print_function

We will see how to use these methods in just a bit.

(Note: I’m switching to the training notebook.)

Setting up our DataLoader for training..

Copy the compressed file into the cache/ directory.

We will load this fine tuned model in the next step.

BERT_MODEL parameter should be the name of your fine-tuned model. For

This time, we’ll be using the BinaryClassificationProcessor to load in the dev.tsv

file by calling the get_dev_examples method.

Here’s my notebook for the evaluation.

Sign up for Top Stories from The Startup

Create a free Medium account to get Top Stories in your

Data Science Arti cial Intelligence NLP Bert Pytorch

About Help Legal

Get the Medium app

In [1]: import torch

from tqdm import tqdm_notebook, trange

from multiprocessing import Pool, cpu_count

# OPTIONAL: if you want to have more information on what's happenin

device = torch.device("cuda" if torch.cuda.is_available() else "cp

In [6]: processor = BinaryClassificationProcessor()

In [7]: label_list = processor.get_labels() # [0, 1] for binary classificat

In [8]: num_train_optimization_steps = int(

In [5]: # Load pre-trained model tokenizer (vocabulary)

INFO:pytorch_pretrained_bert.tokenization:loading vocabulary file h

In [9]: label_map = {label: i for i, label in enumerate(label_list)}

prepare_for_tokenizing.ipynb hosted with ❤ by GitHub view raw

Here, we are creating our BinaryClassificationProcessor and using it to load in the

The convert_example_to_feature function expects a tuple containing an example, the

In [10]: process_count = cpu_count() - 1

Preparing to convert 560000 examples..

In [11]: with open(DATA_DIR + "train_features.pkl", "wb") as f:

BERT.ipynb hosted with ❤ by GitHub view raw

5. Fine-tuning BERT (finally!)

In [ ]: # Load pre-trained model (weights)

1 from future import absolute_import, division, print_function