IEEE-paper (1) Original

Uploaded by

mitalimeshram4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

IEEE-paper (1) Original

Uploaded by

mitalimeshram4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Text Processing and Classification using NLP

Given Name Surname (of Affiliation)

dept. name of organization Nagpur, India
(of Affiliation) email address or ORCID
name of organization

Abstract— classification but require significant computational

resources. This paper focuses on traditional machine
Text classification is a critical task in natural language learning techniques, which remain relevant for resource-
processing (NLP) with extensive applications in areas such as constrained environments.
spam detection, sentiment analysis, and content categorization.
This paper presents a comparative analysis of traditional
machine learning models applied to a curated dataset of BBC
news articles. Preprocessing techniques, including tokenization,
lemmatization, and TF-IDF transformation, were employed to Methodology
optimize feature representation. Four classifiers—Logistic
Regression, Support Vector Machines (SVM), Multinomial
Naïve Bayes, and Random Forest—were trained and evaluated A. Dataset
based on accuracy, precision, recall, and F1-score. Among the The BBC dataset consists of 2,225 news articles categorized
models tested, SVM achieved the highest accuracy of 96.94%. into five classes:
This paper discusses the implications of preprocessing and
model selection on classification performance. 1. Business
2. Entertainment
Keywords—Text Classification, Natural Language Processing, 3. Politics
Logistic Regression, Support Vector Machines, Multinomial 4. Sports
Naïve Bayes, Random Forest, Feature Extraction,
TfidfTransformer, WordCloud, Model Comparison.
5. Technology

The dataset is balanced, ensuring equal representation of

each class for unbiased model training.
Introduction
B. Preprocessing
To transform raw text into usable features, the following
The exponential growth of textual data from sources such as
preprocessing steps were applied:
social media, news platforms, and online forums
necessitates efficient text classification systems. Text
classification involves assigning a predefined label to a 1. Lowercasing: All text was converted to lowercase
given piece of text based on its content. Traditional machine to maintain consistency.
learning methods have proven effective in tackling this 2. Tokenization: The text was split into individual
problem. These methods often rely on robust preprocessing words using the Regexp Tokenizer to handle
techniques to transform raw text into feature-rich punctuation.
representations that machine learning models can interpret. 3. Stop Word Removal: Common English stop words
were removed, retaining only meaningful terms.
4. Lemmatization: Words were reduced to their root
This study uses the BBC text dataset, which contains news
forms using WordNetLemmatizer.
articles categorized into five distinct topics: business,
5. TF-IDF Transformation: CountVectorizer was used
entertainment, politics, sports, and technology. By
to extract unigram and bigram features, and TF-
comparing multiple classification models, this paper aims to
IDF scores were computed using TfidfTransformer.
identify the most suitable approach for accurate text
categorization.
C. Classification Models
The following machine learning models were implemented
Related Work
using the Scikit-learn library:
And this is a level 3 heading: Text classification has been
1. Logistic Regression (LR): A linear model that
extensively studied in both traditional and deep learning
predicts probabilities for multi-class classification.
paradigms. Early approaches utilized Naïve Bayes for its
2. Support Vector Machines (SVM): Utilizes a linear
computational efficiency and probabilistic foundation.
kernel for separating classes with maximum
Logistic Regression and SVM have gained popularity due to
margin.
their ability to handle high-dimensional data effectively.
3. Multinomial Naïve Bayes (MNB): A probabilistic
Ensemble methods like Random Forest provide additional
model that assumes conditional independence of
robustness by combining multiple decision trees. Recent
features.
advancements in deep learning, such as recurrent neural
networks (RNNs) and transformers, have revolutionized text

4. Random Forest (RF): An ensemble of decision
trees providing robust performance through The results highlight the importance of preprocessing in
majority voting. enhancing model performance. SVM outperformed other
models due to its robustness in high-dimensional spaces.
D. Evaluation Metrics Logistic Regression provided comparable performance,
The models were evaluated using: while Naïve Bayes was limited by its strong assumptions.
Random Forest exhibited stable results but did not surpass
1. Accuracy: Proportion of correctly classified SVM. These findings demonstrate the suitability of
instances. traditional methods for small-scale text classification tasks.
2. Precision: Proportion of true positive predictions
among all positive predictions. VI. CONCLUSION
3. Recall: Proportion of true positive predictions
among all actual positives.
4. F1-Score: Harmonic mean of precision and recall. This paper explored traditional machine learning models for
text classification on the BBC dataset. Among the models
EXPERIMENTS AND RESULTS evaluated, SVM achieved the highest accuracy of 96.94%,
showcasing its effectiveness for such tasks. The importance
A. Experiment Setup of preprocessing and feature extraction was emphasized,
The dataset was divided into training (80%) and demonstrating their impact on overall performance. Future
testing (20%) sets. Hyperparameters were optimized research will extend this work to larger datasets and explore
using GridSearchCV. deep learning approaches for enhanced accuracy.

B. Results
The models' performances are summarized in Table I. ACKNOWLEDGMENTS

TABLE I. MODEL PERFORMANCE COMPARISON We would like to express our sincere gratitude to Dr.
Deepali Kotambkar, from the Electronics Department at
Model Accurac Precisio Reca F1- Shri Ramdeobaba College of Engineering and
y (%) n ll Scor Management, for her invaluable guidance, encouragement,
e and support throughout this research. Her expertise and
insights played a pivotal role in shaping the direction and
Logistic
outcomes of this work.
Regressio 96.58 0.97 0.96 0.97
n
We also extend our thanks to the Electronics Department
Support of Shri Ramdeobaba College of Engineering and
Vector 96.94 0.97 0.97 0.97 Management for providing access to the necessary
Machines resources and tools required for conducting this study.
Multinomi Additionally, we are grateful to the creators and maintainers
al Naïve 94.97 0.95 0.95 0.95 of the open-source libraries Scikit-learn and NLTK, which
Bayes were integral to the implementation and experimentation of
Random this research. Finally, we acknowledge the unwavering
94.79 0.95 0.95 0.95 support of our peers and family, whose motivation and
Forest
constructive feedback were invaluable during the course of
this project.

REFERENCES
C. Visualizations
[1] T. Joachims, "Text Categorization with Support Vector Machines:
1. Word Clouds: Generated for each category to Learning with Many Relevant Features," Proceedings of the 10th European
identify frequent terms. Conference on Machine Learning, 1998.
[2] A. Zhang, A. Lipton, M. Li, and A. Smola, Dive into Deep Learning.
2. Feature Importance: Bar graphs illustrating the Amazon, 2020.
significance of features in classification tasks. [3] S. Bird, E. Klein, and E. Loper, Natural Language Processing with
Python. O'Reilly Media, 2009.
[4] Scikit-learn Documentation: https://scikit-learn.org/
V. DISCUSSION [5] NLTK Documentation: https://www.nltk.org/

Researchpaperclassification IEEEprocedding 1
No ratings yet
Researchpaperclassification IEEEprocedding 1
7 pages
Analytics of Machine Learning-Based Algorithms For Text Classification
No ratings yet
Analytics of Machine Learning-Based Algorithms For Text Classification
11 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
NLP m4
No ratings yet
NLP m4
97 pages
Classification Survey
No ratings yet
Classification Survey
40 pages
Group08 - BDM01 - Topic Modelling in Text Classification
No ratings yet
Group08 - BDM01 - Topic Modelling in Text Classification
19 pages
A Survey On Text Classification From Shallow To Deep Learning
No ratings yet
A Survey On Text Classification From Shallow To Deep Learning
21 pages
17 Result Analysis NLP
No ratings yet
17 Result Analysis NLP
13 pages
A Survey of Text Classification With Transformers How Wide How Large How Long How Accurate How Expensive How Safe
No ratings yet
A Survey of Text Classification With Transformers How Wide How Large How Long How Accurate How Expensive How Safe
14 pages
Unit 3
No ratings yet
Unit 3
27 pages
Comparative Study Between Traditional Machine Learning and Deep Learning Approaches For Text Classification
No ratings yet
Comparative Study Between Traditional Machine Learning and Deep Learning Approaches For Text Classification
11 pages
Text Classification Research Paper 2
No ratings yet
Text Classification Research Paper 2
7 pages
Best Text To Speech Ai - Aitech - Studio
No ratings yet
Best Text To Speech Ai - Aitech - Studio
8 pages
News Classsification
No ratings yet
News Classsification
11 pages
CNN vs. LSTM For Turkish Text Classification
No ratings yet
CNN vs. LSTM For Turkish Text Classification
6 pages
Lect 05
No ratings yet
Lect 05
17 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
Report
No ratings yet
Report
2 pages
Review of Text Classification Methods On Deep Learning
No ratings yet
Review of Text Classification Methods On Deep Learning
13 pages
A Neural Network For Classifying News Wires (Multi Class Classification) Using Reuters Dataset
No ratings yet
A Neural Network For Classifying News Wires (Multi Class Classification) Using Reuters Dataset
16 pages
Lec # 9
No ratings yet
Lec # 9
18 pages
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
No ratings yet
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
9 pages
Technovate Poster - Template (AutoRecovered)
No ratings yet
Technovate Poster - Template (AutoRecovered)
1 page
Dynamic Embedding Projection-Gated
No ratings yet
Dynamic Embedding Projection-Gated
10 pages
Project Proposal - Group 17-2-5
No ratings yet
Project Proposal - Group 17-2-5
4 pages
Document Classification Using Machine Learning: What Is Document Classifier?
No ratings yet
Document Classification Using Machine Learning: What Is Document Classifier?
9 pages
Wa0002
No ratings yet
Wa0002
21 pages
Text Classification Research Based On Bert Model and Bayesian Network
No ratings yet
Text Classification Research Based On Bert Model and Bayesian Network
5 pages
(IJCST-V11I6P2) :ms. Madhuri P. Narkhede, Dr. Harshali B Patil
No ratings yet
(IJCST-V11I6P2) :ms. Madhuri P. Narkhede, Dr. Harshali B Patil
5 pages
Text Classification
No ratings yet
Text Classification
7 pages
Paper 1 - 1662-Article Text-12759-12507-10-20210526
No ratings yet
Paper 1 - 1662-Article Text-12759-12507-10-20210526
2 pages
Machine Learning Models For News Article Classification
No ratings yet
Machine Learning Models For News Article Classification
8 pages
Artigo
No ratings yet
Artigo
10 pages
IEEE-paper On NLP
No ratings yet
IEEE-paper On NLP
3 pages
Unit 2
No ratings yet
Unit 2
26 pages
Survey On Text Classification
No ratings yet
Survey On Text Classification
7 pages
A Survey On Machine Learning Techniques
No ratings yet
A Survey On Machine Learning Techniques
8 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
17 - Project Report - NLP-2-27
No ratings yet
17 - Project Report - NLP-2-27
26 pages
A Complete Process of Text Classification System Using State of The Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State of The Art NLP Models
26 pages
1822 B Deleted
No ratings yet
1822 B Deleted
38 pages
Lec # 4-1
No ratings yet
Lec # 4-1
15 pages
MEE 437 Operations Research Project Document Text Mining For Supplier Manufacturing Industries
No ratings yet
MEE 437 Operations Research Project Document Text Mining For Supplier Manufacturing Industries
25 pages
A New Text Mining Approach Based On HMM-SVM For Web News Classification
No ratings yet
A New Text Mining Approach Based On HMM-SVM For Web News Classification
8 pages
Deep Learning
No ratings yet
Deep Learning
42 pages
Comparison of Text Classifiers On News Articles
No ratings yet
Comparison of Text Classifiers On News Articles
5 pages
Text Classification Reseach Paper
No ratings yet
Text Classification Reseach Paper
4 pages
Report On Email Spam
No ratings yet
Report On Email Spam
7 pages
A Comparative Analysis of Logistic Regression, Random Forest and KNN Models For The Text Classification
No ratings yet
A Comparative Analysis of Logistic Regression, Random Forest and KNN Models For The Text Classification
16 pages
Academic Internship Final Report
No ratings yet
Academic Internship Final Report
11 pages
Science Research Journal
No ratings yet
Science Research Journal
7 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
A Survey On Different Types of Approaches To Text Categorization
No ratings yet
A Survey On Different Types of Approaches To Text Categorization
3 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Queuing Management System
No ratings yet
Queuing Management System
66 pages
Bag of Tricks For Efficient Text Classification: Armand Joulin Edouard Grave Piotr Bojanowski Tomas Mikolov
No ratings yet
Bag of Tricks For Efficient Text Classification: Armand Joulin Edouard Grave Piotr Bojanowski Tomas Mikolov
5 pages
MOBA Games As Affecting Factor On Study Habits in Terms of Leisure Time and Peer Group
No ratings yet
MOBA Games As Affecting Factor On Study Habits in Terms of Leisure Time and Peer Group
25 pages
Models On Effective Teaching
No ratings yet
Models On Effective Teaching
2 pages
Basic Concepts in Assessment
No ratings yet
Basic Concepts in Assessment
62 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
DLL Tle 6 Entrep Ict Revised
No ratings yet
DLL Tle 6 Entrep Ict Revised
49 pages
Ethnographic Method
No ratings yet
Ethnographic Method
11 pages
Documenting Data
No ratings yet
Documenting Data
6 pages
Project Report On Total Quality Management
100% (1)
Project Report On Total Quality Management
31 pages
Research Guides: Organizing Your Social Sciences Research Paper: Writing A Case Study
No ratings yet
Research Guides: Organizing Your Social Sciences Research Paper: Writing A Case Study
5 pages
GVP Mba Syllabus
No ratings yet
GVP Mba Syllabus
69 pages
Deca Exam
No ratings yet
Deca Exam
31 pages
MPC 005 - Research
No ratings yet
MPC 005 - Research
63 pages
Stoikiometri - Law of Chemistry
No ratings yet
Stoikiometri - Law of Chemistry
15 pages
An Analysis of Stochastic Game Theory For Multiagent Reinforcement Learning
No ratings yet
An Analysis of Stochastic Game Theory For Multiagent Reinforcement Learning
12 pages
CDMP-10 - M10 - Digital - Marketing - Strategy - Script - STUDY NOTES
No ratings yet
CDMP-10 - M10 - Digital - Marketing - Strategy - Script - STUDY NOTES
67 pages
Introduction To Manac II
No ratings yet
Introduction To Manac II
24 pages
Meskla Ni Onin
No ratings yet
Meskla Ni Onin
47 pages
Kasyoka - An Analysis of Problematic Contents' Expressed in Gengetone Songs in Kenya
No ratings yet
Kasyoka - An Analysis of Problematic Contents' Expressed in Gengetone Songs in Kenya
101 pages
Literature Review Environmental Sustainability
100% (1)
Literature Review Environmental Sustainability
6 pages
Irjet V7i7824
No ratings yet
Irjet V7i7824
5 pages
Performance Appraisal: Chapter Objectives
No ratings yet
Performance Appraisal: Chapter Objectives
15 pages
Additional File 3 - JBI Checklists
No ratings yet
Additional File 3 - JBI Checklists
10 pages
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
No ratings yet
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
15 pages
IandF CS2A 202309 Examiner Report
No ratings yet
IandF CS2A 202309 Examiner Report
16 pages
Literature Review On Organisational Behaviour
100% (3)
Literature Review On Organisational Behaviour
10 pages
Highlighting Relatedness Promotes Prosoc
No ratings yet
Highlighting Relatedness Promotes Prosoc
14 pages
00 Front 4th
No ratings yet
00 Front 4th
21 pages
Customer Centricity at Allianz. Marketing Orientation in A Service Industry
No ratings yet
Customer Centricity at Allianz. Marketing Orientation in A Service Industry
14 pages
Efektivitas Self Management Terhadap Keterlambatan Menyerahkan Tugas Pada Masa Pandemi Covid-19
No ratings yet
Efektivitas Self Management Terhadap Keterlambatan Menyerahkan Tugas Pada Masa Pandemi Covid-19
11 pages
Science Research Article Summary Worksheet
No ratings yet
Science Research Article Summary Worksheet
2 pages
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
From Everand
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
Sebastian Raschka
4/5 (20)
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

IEEE-paper (1) Original

Uploaded by

IEEE-paper (1) Original

Uploaded by

Text Processing and Classification using NLP

Given Name Surname (of Affiliation)

Abstract— classification but require significant computational

The dataset is balanced, ensuring equal representation of

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©2025 IEEE

You might also like