ML Case Study

The document outlines an assignment focused on implementing a document classification algorithm for Chinese language data, utilizing a dataset of 2500 news articles across 10 categories. It emphasizes the importance of data pre-processing, feature engineering, modeling approach, accuracy, and coding standards in the evaluation. Participants are required to submit their work in a Jupyter notebook, along with the trained model, and provide explanations of their methodology.

Uploaded by

ap.synophic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views1 page

ML Case Study

Uploaded by

ap.synophic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Document classification is one of the core classic problems which many of the businesses face in

their day to day work. The NLP community is actively pursuing this problem and you’ll find
new state-of-the-art models being published for the same time to time. Recently, there have been
some major breakthroughs like LSTM and transformers which tried solving the problem
contextually. These models have proved themselves very effective in document classification as
well as in many other downstream tasks. A lot of work has been done on English language but
for other languages, the task is still tricky, specifically if we talk about classifying the text
without losing its context.

In this assignment we want to implement a document classification algorithm which can classify
the Chinese language data. Given a set of text and their respective categories we want to predict
the class of a random text from test set with highest possible confidence.

A small dataset containing Chinese news articles along with their respective category is
provided. The data contain 2500 news articles and is spread across 10 categories. You are free to
choose the methodology of your choice (one or multiple) to go ahead with the problem. You will
be expected to share your work in the form of a jupyter notebook and the trained model in
pickle/h5 format so that we can test it with the unseen data which we will be holding.

The assignment will be evaluated on the basis of:

1. Data Pre-processing
2. Feature Engineering
3. Modeling Approach
4. Accuracy
5. Coding Standard

In case you are using any large size library/pretrained model, you can share your work using
google colab notebook as well. You’ll be expected to provide brief explanation of the
steps/approach and the reasoning behind it, which can be done either in the jupyter notebook
itself or in a separate presentation. You should be able to finish the task in 4-5 days.

Fake News Detection
No ratings yet
Fake News Detection
25 pages
Spam News Detection Report
No ratings yet
Spam News Detection Report
9 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
Computer Science 2
No ratings yet
Computer Science 2
66 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
No ratings yet
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
66 pages
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
No ratings yet
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
54 pages
LLM Project Cards
No ratings yet
LLM Project Cards
30 pages
Rithvik Bhuvkar AI Assignment Final
No ratings yet
Rithvik Bhuvkar AI Assignment Final
24 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
AI Assignment Gaurav
No ratings yet
AI Assignment Gaurav
20 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Project Report
No ratings yet
Project Report
12 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
MSDS Ljmu Submission Form
No ratings yet
MSDS Ljmu Submission Form
14 pages
ML Report Fake News Detection
No ratings yet
ML Report Fake News Detection
15 pages
Lab Report 8
No ratings yet
Lab Report 8
11 pages
Methodology
No ratings yet
Methodology
9 pages
NLP Assignment 2024
No ratings yet
NLP Assignment 2024
12 pages
Samaksh Gupta Programming Ass. IR
No ratings yet
Samaksh Gupta Programming Ass. IR
13 pages
1.1 (Final - CCP - DS)
No ratings yet
1.1 (Final - CCP - DS)
9 pages
2nd Project Darling
No ratings yet
2nd Project Darling
9 pages
NM Project Phase-2
No ratings yet
NM Project Phase-2
9 pages
NLP A2
No ratings yet
NLP A2
7 pages
Python Task Descriptions
No ratings yet
Python Task Descriptions
10 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Project Proposal - Group 17-2-5
No ratings yet
Project Proposal - Group 17-2-5
4 pages
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
No ratings yet
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
4 pages
Vijayi WFH Tech - Assignment - AI Internship - Jan 2025
No ratings yet
Vijayi WFH Tech - Assignment - AI Internship - Jan 2025
3 pages
TMLS20 Machine Learning Coursework-1
No ratings yet
TMLS20 Machine Learning Coursework-1
5 pages
RAI AI Engineer Intern Assignments
No ratings yet
RAI AI Engineer Intern Assignments
3 pages
Important Questions
No ratings yet
Important Questions
4 pages
Project Questions
No ratings yet
Project Questions
3 pages
Data Science Interns Tasks
No ratings yet
Data Science Interns Tasks
2 pages
Deliverables and Question Answer
No ratings yet
Deliverables and Question Answer
4 pages
Spring 2025 - CS619 - 10969
No ratings yet
Spring 2025 - CS619 - 10969
4 pages
03 134221 038 13291998009 28032025 090951pm
No ratings yet
03 134221 038 13291998009 28032025 090951pm
4 pages
Smanimarannmphase 1
No ratings yet
Smanimarannmphase 1
3 pages
Prateek Gupta Resume
No ratings yet
Prateek Gupta Resume
3 pages
NM TF
No ratings yet
NM TF
3 pages
IQBAL Fresher 19
No ratings yet
IQBAL Fresher 19
3 pages
CM2060 NLP Coursework
No ratings yet
CM2060 NLP Coursework
5 pages
The Better India Task 2
No ratings yet
The Better India Task 2
2 pages
Text Classification Using Hugging Face
No ratings yet
Text Classification Using Hugging Face
1 page
Round 1 Task - Musk
No ratings yet
Round 1 Task - Musk
1 page
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
Programming Problems: Advanced Algorithms
From Everand
Programming Problems: Advanced Algorithms
Bradley Green
3.5/5 (7)
Mastering Java Persistence API (JPA): Realize Java's Capabilities Spanning RDBMS, ORM, JDBC, Caching, Locking, Transaction Management, and JPQL
From Everand
Mastering Java Persistence API (JPA): Realize Java's Capabilities Spanning RDBMS, ORM, JDBC, Caching, Locking, Transaction Management, and JPQL
Nisha Parameswaran Kurur
No ratings yet
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
LangChain in your Pocket: LangChain Essentials: From Basic Concepts to Advanced Applications
From Everand
LangChain in your Pocket: LangChain Essentials: From Basic Concepts to Advanced Applications
Mehul Gupta
No ratings yet
Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-train, Fine-tune and Deployment
From Everand
Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-train, Fine-tune and Deployment
James Chen
No ratings yet
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Upgrade your Todo List: Self-Management with Google Docs
From Everand
Upgrade your Todo List: Self-Management with Google Docs
Kurt Olderman
No ratings yet
Computer Programming Languages for Beginners
From Everand
Computer Programming Languages for Beginners
Adesh Silva
No ratings yet
Software Engineering & Object Oriented Modeling
From Everand
Software Engineering & Object Oriented Modeling
Jitendra Patel
No ratings yet
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)

ML Case Study

Uploaded by

ML Case Study

Uploaded by

Document classification is one of the core classic problems which many of the businesses face in

The assignment will be evaluated on the basis of:

You might also like