Stress Prediction Documentation
Stress Prediction Documentation
Group members:
• Zyam Maqsood
• Mohammad Hamza
• Syed Muneeb Naqvi
Course:
Comp-404 Artificial Intelligence
Dataset:
https://www.kaggle.com/datasets/kreeshrajani/human-stress-prediction/discussion/428141
Models implemented:
• Support Vector Machines (SVM)
• k-Nearest Neighbors (kNN)
Introduction:
Human stress is a critical psychological factor that affects well-being and productivity.
Automated stress prediction can aid in timely interventions, providing insights to healthcare
professionals and individuals alike. This project explores the application of Support Vector Machine
(SVM) and K-Nearest Neighbors (KNN) algorithms for classifying stress levels from textual data.
The dataset used for this project contains stress-related textual information labeled for binary
classification.
Data Description:
The dataset, Human Stress Prediction Dataset, consists of textual data labeled with stress
indicators:
• Columns:
◦ text: Contains text input describing situations, emotions, or expressions.
◦ label: Binary indicator of stress (1 for stressed and 0 for non-stressed).
• Preprocessing Steps:
◦ Converted text to lowercase.
◦ Removed punctuation marks (e.g., periods, commas, exclamation marks).
• Data Distribution:
◦ Total records: 568
◦ Non-stressed (0): 263 records
Methodology:
• Preprocessing:
◦ The text data was cleaned to remove punctuation and converted to lowercase for
uniformity.
◦ No stop-word removal or stemming/lemmatization was applied.
• Feature Extraction:
◦ TF-IDF Vectorizer: Text data was transformed into numerical vectors using a maximum
feature limit of 5000 terms.
• Model Training:
◦ Support Vector Machine (SVM): A linear kernel was used to classify text as stressed or
non-stressed. This kernel is suitable for linearly separable data and efficient for high-
dimensional spaces.
◦ K-Nearest Neighbors (KNN): A neighborhood-based classification method, using 5
neighbors (n_neighbors=5) for prediction.
• Model Evaluation
◦ Models were evaluated using:
▪ Accuracy Score: Percentage of correct predictions.
▪ Classification Report: Includes precision, recall, F1-score, and support for both
classes.
Results:
• SVM Classifier
◦ Accuracy: 74.82%
◦ Classification Report:
Precision Recall f1-score support
0 0.73 0.73 0.73 263
1 0.77 0.76 0.77 305
Accuracy 0.75 568
Macro avg 0.75 0.75 0.75 568
Weighted avg 0.75 0.75 0.75 568
◦ Insight:
▪ The SVM model demonstrates robust performance, particularly for the stressed class
(1).
▪ A well-balanced precision and recall indicate reliable predictions across both classes.
• KNN Classifier
◦ Accuracy: 65.85%
◦ Classification Report:
Precision Recall f1-score support
0 0.72 0.43 0.54 263
1 0.63 0.86 0.73 305
Accuracy 0.66 568
Macro avg 0.68 0.64 0.63 568
Weighted avg 0.68 0.66 0.64 568
◦ Insights:
▪ While KNN demonstrates high recall for the stressed class (1), its performance on
the non-stressed class (0) is limited, with a lower precision and recall.
▪ Sensitivity to class imbalance and reliance on neighborhood distances likely
impacted its overall accuracy.
• Model Comparison
Conclusion:
• The SVM classifier outperformed the KNN classifier in terms of accuracy and classification
metrics, making it a more reliable model for stress prediction.
• KNN struggled with lower precision and recall for the non-stressed class, highlighting its
sensitivity to class imbalance and feature scaling.
• These results indicate that TF-IDF combined with SVM is effective for high-dimensional
text classification tasks such as stress detection.