Text Classification Using Hugging Face

The document describes an assignment to build a text classification model using the Hugging Face library. It involves choosing a dataset with 1000+ samples in each category, preprocessing the text, fine-tuning a pretrained model like BERT or GPT-2 for classification, evaluating the model, making predictions, and writing a report discussing the model and results. The report, code, and dataset must be submitted.

Uploaded by

dilip

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

Text Classification Using Hugging Face

Uploaded by

dilip

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 1

Text pre-processing, tokenization, and stemming/lemmatization

N-grams and bag-of-words models

Part-of-speech tagging and named entity recognition
Sentiment analysis and text classification
Word embeddings (e.g. word2vec, GloVe) and deep learning techniques for NLP such as
LSTMs and Transformer
Knowledge of Python and NLP libraries such as NLTK, spaCy, and gensim
Familiarity with machine learning frameworks like Tensorflow, Pytorch
Experience with NLP application such as language model, text generation,
summarization, question answering, machine translation, etc.

Assignment: Text Classification using Hugging Face

Objective: The goal of this assignment is to build a text classification model

using the Hugging Face library to classify a dataset of text into one of multiple
categories. The candidate will use a pre-trained model such as BERT or GPT-2 as a
starting point and fine-tune it on the classification task.

Instructions:

Choose a dataset of text that has multiple categories (e.g. news articles labeled
as sports, politics, entertainment, etc.). The dataset should have at least 1000
samples for each category.

Preprocess the text data by cleaning it, removing stopwords, punctuations and other
irrelevant characters.

Use the Hugging Face library to fine-tune a pre-trained model such as BERT or GPT-2
on the classification task. The candidate should use the transformers library in
python.

Train the model on the dataset and evaluate the performance using metrics such as
accuracy, precision, recall and F1-score.

Use the trained model to predict the categories of a few samples from the test set.

Write a report that includes the following:

A brief introduction to the task and the dataset used

The preprocessing steps taken
The architecture of the model used, and how it was fine-tuned
The evaluation metrics and the results obtained
A discussion of the performance of the model and possible ways to improve it.
Sample predictions and their explanations
Submit the report, the code and the dataset used for the task.

Notes:

Use the latest version of transformers and python.

Feel free to experiment with different pre-trained models and fine-tuning
techniques.
The report should be clear, concise and well-structured.
The code should be well-commented and easy to understand.

Good luck!

Please let me know if you need me to explain more.

NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
No ratings yet
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
2 pages
Complex Engineering Activity
No ratings yet
Complex Engineering Activity
2 pages
PGI20S02J - LAB RECORD (3)
No ratings yet
PGI20S02J - LAB RECORD (3)
24 pages
RAI AI Engineer Intern Assignments
No ratings yet
RAI AI Engineer Intern Assignments
3 pages
Vijayi WFH Tech_Assignment_AI Internship_Jan 2025
No ratings yet
Vijayi WFH Tech_Assignment_AI Internship_Jan 2025
3 pages
Project
No ratings yet
Project
11 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
IQBAL Fresher 19
No ratings yet
IQBAL Fresher 19
3 pages
Week_2_Task
No ratings yet
Week_2_Task
4 pages
cb.sc.p2cse23010
No ratings yet
cb.sc.p2cse23010
30 pages
NLP Assignment 2024
No ratings yet
NLP Assignment 2024
12 pages
Batch 13 CSE A
No ratings yet
Batch 13 CSE A
35 pages
Assignment data science intern
No ratings yet
Assignment data science intern
8 pages
ML Case Study
No ratings yet
ML Case Study
1 page
Fine-Tuning_and_Chatbot_Planning
No ratings yet
Fine-Tuning_and_Chatbot_Planning
2 pages
Gen AI Notes Paer 2
No ratings yet
Gen AI Notes Paer 2
14 pages
Ai 1
No ratings yet
Ai 1
22 pages
cl12_huggingface
No ratings yet
cl12_huggingface
34 pages
How to use ChatGPT
From Everand
How to use ChatGPT
Bernhard Gaum
No ratings yet
Harvard CS197 Lecture 4 Notes
No ratings yet
Harvard CS197 Lecture 4 Notes
15 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
Britto
No ratings yet
Britto
16 pages
Sentiment Analysis Using NLP
No ratings yet
Sentiment Analysis Using NLP
42 pages
ML NLP Assignment
No ratings yet
ML NLP Assignment
3 pages
ArIES Open Projects ML
No ratings yet
ArIES Open Projects ML
6 pages
ChatGPT- Jack of All Trades, Master of None
No ratings yet
ChatGPT- Jack of All Trades, Master of None
37 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
Spring 2025_CS619_10969
No ratings yet
Spring 2025_CS619_10969
4 pages
Questions and Instructions For Test IIT - BHU
No ratings yet
Questions and Instructions For Test IIT - BHU
1 page
Report1 2
No ratings yet
Report1 2
9 pages
Project Seminar
No ratings yet
Project Seminar
12 pages
taask
No ratings yet
taask
18 pages
CM2060 NLP Coursework
No ratings yet
CM2060 NLP Coursework
5 pages
Fine Tuning and Evaluation of A Language Model - Edited
No ratings yet
Fine Tuning and Evaluation of A Language Model - Edited
10 pages
Finetuning
No ratings yet
Finetuning
3 pages
AD3511 dl
No ratings yet
AD3511 dl
2 pages
Course Project Report For: Artificial Intelligence EL-3011
No ratings yet
Course Project Report For: Artificial Intelligence EL-3011
8 pages
VIDEO PRESENTATION INFORMATION
No ratings yet
VIDEO PRESENTATION INFORMATION
5 pages
03-134221-038-13291998009-28032025-090951pm
No ratings yet
03-134221-038-13291998009-28032025-090951pm
4 pages
RediMinds - AIEnabler - Technical - Exercise - DF 1
No ratings yet
RediMinds - AIEnabler - Technical - Exercise - DF 1
2 pages
britto-1-15-2-15_merged
No ratings yet
britto-1-15-2-15_merged
18 pages
NLP MTE syllabus and Practice Problems (2)
No ratings yet
NLP MTE syllabus and Practice Problems (2)
2 pages
dl_pro_456
No ratings yet
dl_pro_456
8 pages
Case Study Question unit 6 dl
No ratings yet
Case Study Question unit 6 dl
3 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
Mastering Python: A Comprehensive Guide to Programming
From Everand
Mastering Python: A Comprehensive Guide to Programming
Christine Lambertson
No ratings yet
Challenge-2024
No ratings yet
Challenge-2024
5 pages
python_genai_intqa 2
No ratings yet
python_genai_intqa 2
5 pages
Unit 2
No ratings yet
Unit 2
34 pages
Hugging Face
No ratings yet
Hugging Face
1 page
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
81 Cse e
No ratings yet
81 Cse e
5 pages
Natural Language Understanding in Chatbots
No ratings yet
Natural Language Understanding in Chatbots
4 pages
Hugging Face
100% (1)
Hugging Face
11 pages
Conference_FakeNews_Detection
No ratings yet
Conference_FakeNews_Detection
6 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Dataset format_LLM
No ratings yet
Dataset format_LLM
3 pages
Chat Bot
No ratings yet
Chat Bot
10 pages
Train ChatGPT
79% (14)
Train ChatGPT
67 pages

Text Classification Using Hugging Face

Uploaded by

Text Classification Using Hugging Face

Uploaded by

Text pre-processing, tokenization, and stemming/lemmatization

N-grams and bag-of-words models

Assignment: Text Classification using Hugging Face

Objective: The goal of this assignment is to build a text classification model

Write a report that includes the following:

A brief introduction to the task and the dataset used

Use the latest version of transformers and python.

Please let me know if you need me to explain more.

You might also like