ML Project - Report
ML Project - Report
1
4. Supervised Learning
This section has 2 tasks namely, Classifying Bugs Sever-
ity & Classifying Issue Type.
2
Table 1. Performance Metrics for Models
Model F1-Score Accuracy
Macro Weighted
Logistic Regression 0.44 0.57 0.60
Decision Tree 0.34 0.44 0.44
Random Forest 0.39 0.53 0.57
Multilayer Percep- 0.48 0.58 0.59
tron
Ensemble Voting 0.44 0.56 0.60
class but below average for the task class. The improve-
ment class lies in the middle, with moderate F1-scores, sug-
gesting partial success in distinguishing these instances but
leaving room for better class-specific predictions.
5. Unsupervised Learning
In this section, we try to cluster the bug reports into bug
Figure 4. Per Class Heatmap for Severity Classification
domains using unsupervised learning techniques.
3
point was assigned to the cluster with the highest pos- while the remaining clusters are more sparse. This indi-
terior probability. The clustering results were visual- cates that a central pattern defines most of the data, while
ized in both 2D and 3D using Principal Component the other clusters represent less frequent, niche occurrences.
Analysis (PCA). For computational efficiency, 25% of
the dataset was sampled.
4
6. Learnings & Contributions
The team learned data preprocessing techniques (e.g., to-
kenization, TF-IDF), applied supervised algorithms (e.g.,
Logistic Regression), and evaluated models using metrics
like accuracy and F1-score. Challenges included data im-
balance, model generalization, and preprocessing unstruc-
tured text data, requiring fine-tuning and optimization for
bug severity and issue classification. Contributions of each
member:
7. References
[1] A. Baarah, A. Al-oqaily, Z. Salah, M. Sal-
lam, & M. Al-qaisy. (2019), Machine Learning
Approaches for Predicting the Severity Level of
Software Bug Reports in Closed Source Projects,
IJACSA.
[2] Tan, Y., Xu, S., Wang, Z., Zhang, T., Xu, Z.,
& Luo, X. (2020). Bug Severity Prediction Using
Question-and-Answer Pairs from Stack Overflow.
Journal of Systems and Software, 110567.