IEEE-paper On NLP
IEEE-paper On NLP
Abstract—Text classification is a critical task in natural resources. This paper focuses on traditional machine
language processing (NLP) with extensive applications in areas learning techniques, which remain relevant for resource-
such as spam detection, sentiment analysis, and content constrained environments.
categorization. This paper presents a comparative analysis of
traditional machine learning models applied to a curated
dataset of BBC news articles. Preprocessing techniques,
including tokenization, lemmatization, and TF-IDF
transformation, were employed to optimize feature
Methodology
representation. Four classifiers—Logistic Regression, Support
Vector Machines (SVM), Multinomial Naïve Bayes, and
Random Forest—were trained and evaluated based on A. Dataset
accuracy, precision, recall, and F1-score. Among the models The BBC dataset consists of 2,225 news articles categorized
tested, SVM achieved the highest accuracy of 96.94%. This into five classes:
paper discusses the implications of preprocessing and model
selection on classification performance.
1. Business
2. Entertainment
Keywords—Text Classification, Natural Language Processing,
Logistic Regression, Support Vector Machines, Multinomial
3. Politics
Naïve Bayes, Random Forest, Feature Extraction, 4. Sports
TfidfTransformer, WordCloud, Model Comparison. 5. Technology
B. Results
The models' performances are summarized in Table I. ACKNOWLEDGMENTS
TABLE I. MODEL PERFORMANCE COMPARISON We would like to express our sincere gratitude to Dr.
Deepali Kotambkar, from the Electronics Department at
Model Accura Precisio Reca F1- Shri Ramdeobaba College of Engineering and
cy (%) n ll Scor Management, for her invaluable guidance, encouragement,
e and support throughout this research. Her expertise and
insights played a pivotal role in shaping the direction and
Logistic
outcomes of this work.
Regressio 96.58 0.97 0.96 0.97
n
We also extend our thanks to the Electronics Department
Support of Shri Ramdeobaba College of Engineering and
Vector 96.94 0.97 0.97 0.97 Management for providing access to the necessary
Machines resources and tools required for conducting this study.
Multinomi Additionally, we are grateful to the creators and maintainers
al Naïve 94.97 0.95 0.95 0.95 of the open-source libraries Scikit-learn and NLTK, which
Bayes were integral to the implementation and experimentation of
Random this research. Finally, we acknowledge the unwavering
94.79 0.95 0.95 0.95 support of our peers and family, whose motivation and
Forest
constructive feedback were invaluable during the course of
this project.
REFERENCES
C. Visualizations
[1] T. Joachims, "Text Categorization with Support Vector Machines:
1. Word Clouds: Generated for each category to Learning with Many Relevant Features," Proceedings of the 10th European
identify frequent terms. Conference on Machine Learning, 1998.
[2] A. Zhang, A. Lipton, M. Li, and A. Smola, Dive into Deep Learning.
2. Feature Importance: Bar graphs illustrating the Amazon, 2020.
significance of features in classification tasks. [3] S. Bird, E. Klein, and E. Loper, Natural Language Processing with
Python. O'Reilly Media, 2009.
[4] Scikit-learn Documentation: https://scikit-learn.org/
V. DISCUSSION [5] NLTK Documentation: https://www.nltk.org/