Text Classification Using Hugging Face
Text Classification Using Hugging Face
Instructions:
Choose a dataset of text that has multiple categories (e.g. news articles labeled
as sports, politics, entertainment, etc.). The dataset should have at least 1000
samples for each category.
Preprocess the text data by cleaning it, removing stopwords, punctuations and other
irrelevant characters.
Use the Hugging Face library to fine-tune a pre-trained model such as BERT or GPT-2
on the classification task. The candidate should use the transformers library in
python.
Train the model on the dataset and evaluate the performance using metrics such as
accuracy, precision, recall and F1-score.
Use the trained model to predict the categories of a few samples from the test set.
Notes:
Good luck!