Lab Assignment Week - 3
Lab Assignment Week - 3
Instructions:
1. Please submit the assignment through Moodle in .ipynb format (python notebook)
2. The submission should contain a notebook containing all the solutions, including the
requested documentation, observations, and findings.
3. You must adequately comment on the code and mark your observations to improve its
readability.
4. Make sure to put question numbers where your answer starts.
5. Before submitting, rename the notebook file as
<roll number>_Lab Assignment-<week no.>.ipynb
6. This lab is due on the same day at 4.00 pm
Happy coding!!!
Note:
1. In this lab session, you should explore "Text Classification". You have to use 20
Newsgroup dataset. Visit the link http://qwone.com/~jason/20Newsgroups/ for a
description of the dataset. You have to use bydate version
(https://drive.google.com/file/d/1Sv1-tOMryapEbyRonfjAiLw_sEI1LSdw/view?usp=sharing). The
dataset contains train and test splits.
2. You need to code naive Bayes classifier from scratch.
Tasks:
Q1. Code and train a naive Bayes text classifier for classifying twenty classes given in the
dataset and evaluate the accuracy of the classifier on the test documents. Plot the confusion
matrix. (5 marks)
____________________________________________________________________________