100% found this document useful (1 vote)
122 views11 pages

Hugging Face

Uploaded by

vishalbobby680
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
122 views11 pages

Hugging Face

Uploaded by

vishalbobby680
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Transfarmers

Hugging Face’s Transformers


library

Transformers are a type of neural network architecture introduced in the paper “Attention is All You Need” by
Vaswani et al. in 2017. They are highly efficient for NLP tasks because they can process input sequences in
parallel using a mechanism called self-attention. Unlike traditional recurrent neural networks (RNNs), which
process input sequentially, Transformers model dependencies between all words in a sentence at once, which
allows them to capture long-range dependencies and relationships.


Popular models like BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-
trained Transformer), and DistilBERT are based on the Transformer architecture.


Hugging Face’s Transformers library is a powerful tool for working with Transformer-based models, which are
widely used in natural language processing (NLP) tasks like text classification, sentiment analysis, translation,
and more. Here's an introduction to Transformers and how you can fine-tune pre-trained models such as
DistilBERT for tasks like text classification and sentiment analysis on your custom dataset.

Dr. Ashaq Hussain Bhat



A Transformer is a deep learning architecture introduced in the seminal 2017 paper “Attention is
All You Need” by Vaswani et al. The core innovation of the Transformer is the self-attention
mechanism, which allows the model to weigh the importance of different words in a sentence
relative to each other, regardless of their distance. Unlike recurrent neural networks (RNNs) or
long short-term memory networks (LSTMs), which process tokens sequentially, Transformers
process tokens in parallel, making them more efficient for training on large datasets.
Key Models built on the Transformer architecture:

BERT (Bidirectional Encoder Representations from Transformers): Designed to understand the
context of a word in relation to the entire sentence, reading text in both directions (left-to-right and
right-to-left).

GPT (Generative Pre-trained Transformer): Developed for generating text, it's optimized for tasks
like text completion, summarization, and dialogue generation.

DistilBERT: A smaller, faster version of BERT, offering comparable performance with fewer
parameters.

Dr. Ashaq Hussain Bhat


l a ssi fi cation
od el s for Text C
ed M
Pre-train


Hugging Face provides a variety of pre-trained models that you can directly use or fine-tune for specific
NLP tasks. A common use case is text classification or sentiment analysis, where you want to assign a
label to a given text.


DistilBERT is one of the popular pre-trained models for such tasks. It is a smaller and faster version of
BERT, with 40% fewer parameters while retaining 97% of BERT’s performance. It’s a great choice when
you need efficiency without sacrificing much accuracy.

Dr. Ashaq Hussain Bhat


Pre-trained Models for Text Classification


Hugging Face provides a variety of pre-trained models that you can directly use or fine-tune for
specific NLP tasks. A common use case is text classification or sentiment analysis, where you want to
assign a label to a given text.


DistilBERT is one of the popular pre-trained models for such tasks. It is a smaller and faster version of
BERT, with 40% fewer parameters while retaining 97% of BERT’s performance. It’s a great choice
when you need efficiency without sacrificing much accuracy.

Dr. Ashaq Hussain Bhat


Using the Transformers Library for Text Classification (distilBERT)
a) Install the Transformers library and required dependencies:
pip install transformers datasets
b) Load a pre-trained model and tokenizer:
The tokenizer is responsible for converting text into numerical format that the model can process, and the model is the pre-trained
Transformer model.
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
# Load pre-trained tokenizer and model for sequence classification
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')


model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2) # 2 labels for binary classification
c) Preprocess the dataset:
You can use the datasets library from Hugging Face to load and preprocess your custom dataset. You need to tokenize your text data using
the tokenizer.


def preprocess_function(examples):
return tokenizer(examples['text'], padding="max_length", truncation=True)
# Example with a custom dataset, you can replace this with your dataset
dataset = load_dataset('csv', data_files={'train': 'train.csv', 'test': 'test.csv'})
# Apply the tokenizer to the dataset
tokenized_datasets = dataset.map(preprocess_function, batched=True)

Dr. Ashaq Hussain Bhat


Working with Custom Datasets
Fine-tuning a pre-trained model requires a labeled dataset specific to your task. This dataset can be in any format (CSV,
JSON, etc.), but it must be tokenized before passing it to the model.
Steps for using a custom dataset:
1. Load the dataset: You can use the Hugging Face datasets library to load your custom dataset or standard datasets (e.g.,
IMDb, SST2 for sentiment analysis).
2. Tokenize the dataset: Use the tokenizer to convert raw text into numerical format that the model can understand.
3. Split the dataset: Typically, the dataset is split into training, validation, and test sets.
Example of loading and tokenizing a dataset:


from datasets import load_dataset

# Load custom dataset from CSV


dataset = load_dataset('csv', data_files={'train': 'train.csv', 'test': 'test.csv'})

# Tokenize the dataset


def preprocess_function(examples):
return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(preprocess_function, batched=True)



Dr. Ashaq Hussain Bhat
d) Define the training arguments:
You can configure the training process using the TrainingArguments class. This includes settings like the number of
epochs, batch size, learning rate, etc.
training_args = TrainingArguments(
output_dir='./results', # output directory
evaluation_strategy="epoch", # evaluate after each epoch
per_device_train_batch_size=16, # batch size for training
per_device_eval_batch_size=64, # batch size for evaluation
num_train_epochs=3, # number of training epochs
weight_decay=0.01, # strength of weight decay
)


e) Initialize the Trainer:
The Trainer class simplifies the training and evaluation loop for most Transformer models.
trainer = Trainer(


model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
)

Dr. Ashaq Hussain Bhat


f) Train the model:
Once everything is set up, you can start the training process.
trainer.train()
g) Evaluate the model:
After training, you can evaluate the performance of your model on the test set.
results = trainer.evaluate()
print(results)


Sentiment Analysis
The above setup can be used for sentiment analysis by assigning labels like 0 for negative and 1 for positive


sentiments in the dataset. After fine-tuning, the model can classify whether a given text has a positive or negative
sentiment.
Fine-Tuning on a Custom Dataset
Fine-tuning a pre-trained model like DistilBERT is effective when you want to adapt the model to a specific task or
domain, such as legal text classification, medical document classification, etc. All you need is a labeled dataset with
the target labels (for example, positive/negative in sentiment analysis).
Dr. Ashaq Hussain Bhat
Fine-tuning Pre-trained Models for Specific Tasks

Pre-trained models like DistilBERT are general-purpose, meaning they are trained on vast amounts of diverse data and can be
adapted (fine-tuned) for specific tasks with relatively few task-specific labeled examples.

Fine-tuning Process:

• Load the pre-trained model and tokenizer: The model and tokenizer are loaded from the Hugging Face Model Hub.

• Prepare your dataset: Your custom dataset needs to be tokenized. This means converting your text into numerical input.

• Training the model: The model is trained (fine-tuned) on your dataset using Hugging Face's Trainer class, which abstracts


away the complexity of model training.

• Evaluate the model: After training, the model is evaluated on a test set to check how well it generalizes to unseen data.
The process involves setting training arguments such as learning rate, batch size, number of epochs, and more.


Dr. Ashaq Hussain Bhat
Example of fine-tuning DistilBERT for text classification:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
trainer = Trainer(
output_dir='./results', # output directory model=model,
evaluation_strategy="epoch", # evaluate after each epoch args=training_args,
train_dataset=tokenized_datasets['train'],


per_device_train_batch_size=16, # batch size for training
eval_dataset=tokenized_datasets['test'],
per_device_eval_batch_size=64, # batch size for evaluation )
num_train_epochs=3, # number of training epochs


trainer.train() # Fine-tune the model
weight_decay=0.01, # strength of weight decay
)

Dr. Ashaq Hussain Bhat

You might also like