0% found this document useful (0 votes)
67 views

Text Classification Using Hugging Face

The document describes an assignment to build a text classification model using the Hugging Face library. It involves choosing a dataset with 1000+ samples in each category, preprocessing the text, fine-tuning a pretrained model like BERT or GPT-2 for classification, evaluating the model, making predictions, and writing a report discussing the model and results. The report, code, and dataset must be submitted.

Uploaded by

dilip
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Text Classification Using Hugging Face

The document describes an assignment to build a text classification model using the Hugging Face library. It involves choosing a dataset with 1000+ samples in each category, preprocessing the text, fine-tuning a pretrained model like BERT or GPT-2 for classification, evaluating the model, making predictions, and writing a report discussing the model and results. The report, code, and dataset must be submitted.

Uploaded by

dilip
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

Text pre-processing, tokenization, and stemming/lemmatization

N-grams and bag-of-words models


Part-of-speech tagging and named entity recognition
Sentiment analysis and text classification
Word embeddings (e.g. word2vec, GloVe) and deep learning techniques for NLP such as
LSTMs and Transformer
Knowledge of Python and NLP libraries such as NLTK, spaCy, and gensim
Familiarity with machine learning frameworks like Tensorflow, Pytorch
Experience with NLP application such as language model, text generation,
summarization, question answering, machine translation, etc.

Assignment: Text Classification using Hugging Face

Objective: The goal of this assignment is to build a text classification model


using the Hugging Face library to classify a dataset of text into one of multiple
categories. The candidate will use a pre-trained model such as BERT or GPT-2 as a
starting point and fine-tune it on the classification task.

Instructions:

Choose a dataset of text that has multiple categories (e.g. news articles labeled
as sports, politics, entertainment, etc.). The dataset should have at least 1000
samples for each category.

Preprocess the text data by cleaning it, removing stopwords, punctuations and other
irrelevant characters.

Use the Hugging Face library to fine-tune a pre-trained model such as BERT or GPT-2
on the classification task. The candidate should use the transformers library in
python.

Train the model on the dataset and evaluate the performance using metrics such as
accuracy, precision, recall and F1-score.

Use the trained model to predict the categories of a few samples from the test set.

Write a report that includes the following:

A brief introduction to the task and the dataset used


The preprocessing steps taken
The architecture of the model used, and how it was fine-tuned
The evaluation metrics and the results obtained
A discussion of the performance of the model and possible ways to improve it.
Sample predictions and their explanations
Submit the report, the code and the dataset used for the task.

Notes:

Use the latest version of transformers and python.


Feel free to experiment with different pre-trained models and fine-tuning
techniques.
The report should be clear, concise and well-structured.
The code should be well-commented and easy to understand.

Good luck!

Please let me know if you need me to explain more.

You might also like