NLP Assignment 2
NLP Assignment 2
Regulations:
• Plagiarism: Strictly prohibited. All work should be original. The code and report will
be checked for plague (as well as AI detector) and appropriate action will taken if found
guilty of copying.
Submission Guidelines:
– The Colab notebook should only contain the inference part of the model and load
the pre-trained weights. The training part should be commented out.
– The model should be able to load weights from your public GitHub repository (create
a repo with the trained model and download weights from it).
• Students need to submit only the URL of the Colab notebook (with public access) with
clear instructions for running the code. The runtime of the code should not be more than
10 minutes.
• Deadline: All assignments must be submitted by the deadline. Late submissions will be
penalized.
Marking:
• Marking will be done based on two criterias, (i) code, and (ii) model performance
(more focus will be given to performance).
• The performance of each submission will be evaluated using average macro F1-score based
on the predicted labels and the gold ones.
2
• All submitted code should be reproducible with public access. If the results cannot be
reproduced, the submission will be considered incomplete and the submission will not be
marked.
https://github.com/emotion-analysis-project/SemEval2025-task11
The sample dataset is provided in the above URL. Students can refer it to understand the
task and the dataset format. The train dataset is available on the URL : https://github.
com/debajyotimaz/nlp_assignment. Students are supposed to submit the code of the best
performing algorithm and that algorithm will be evaluated on our custom test set (which will
be kept private). Use this colab notebook to get started: https://colab.research.google.
com/drive/13yNxUnB866IqHF8H1mnXIRQYT-t4COBA?usp=sharing.
Task overview:
1. Embeddings: Create a custom Word2Vec model from scratch. Train it on your dataset
with an embedding size of 100 dimensions, and save the trained embeddings in your
GitHub repository.
3. Training setup:
4. Note: Comment out the training part (both embedding creation and model training)
of the code (we can undo comments to check the training part also). The model should
already be trained, and the deliverable will be focused on inference only.
2
3
Constraints Value
Maximum number of layers 4
Maximum number of units per layer 64
Maximum embedding size 100
Maximum sequence length 128
Optimizer Adam, AdamW, SGD
Learning rate 0.001