0% found this document useful (0 votes)

10 views5 pages

ML Project - Report

Uploaded by

Manit Kaushik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views5 pages

ML Project - Report

Uploaded by

Manit Kaushik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Identifying Bug Types & Severity in Open-Source Code

Ishir Bhardwaj Manit Kaushik Pranav Gupta Raghav Wadhwa

2022223 2022277 2022364 2022385

1. Abstract 3. Literature Review

This report presents the outcomes of the CSE363 Ma- 3.1. Not all bugs are the same: Understanding, charac-
chine Learning group project, guided by Prof. Jainendra terizing, and classifying bug types
Shukla. The project develops a machine learning system
• Goal: Develop a taxonomy of bug types and create an
to predict bug severity and classify issue types, automat-
automated model to classify bugs based on this taxon-
ing bug management to address challenges in large projects
omy.
and enhance workflow efficiency through supervised and
unsupervised learning methods. Github repository for this • Dataset: Manually classifying 1280 bug reports of
project: Link 119 software projects belonging to ecosystems such as
Mozilla, Apache and Eclipse.
2. Introduction
• Features: Extracted textual descriptions from bug re-
The problem addressed in this project is the manual and ports, including details such as error messages, file
time-consuming process of bug severity prediction and is- names, and system components. Then TF-IDF was
sue classification in software development. By using ma- used to identify relevant terms.
chine learning, the goal is to predict both the severity and
issue type of software issues given in a bug report in a super- • Method: Logistic Regression classifier with TF-IDF
vised learning context. Additionally, it aims to explore un- features derived from bug report summaries.
supervised learning methods to categorize the bug domains
like memory, security, GUI etc. • Result: Identified 9 bug types; model achieved 64%
F-Measure and 74% AUC-ROC.

3.2. Machine Learning Approaches for Predicting

the Severity Level of Software Bug Reports in
Closed Source Projects
• Goal: Build prediction models to determine the class
of severity (severe or non-severe) of reported bugs.

• Dataset: Bug reports extracted from the JIRA bug

tracking system used by INTIX company, containing
bug IDs and descriptions.

• Features: The bug report is transformed into a feature

vector (Bag-of-Words) using Tokenization, Stop-word
removal, and Stemming.

• Method: Naive Bayes, Naive Bayes Multinomial,

Support Vector Machine (SVM), Decision Tree (J48),
RandomForest, Logistic Model Trees (LMT), Deci-
sion Rules (JRip), and KNN.

• Result: LMT algorithms reported the best perfor-

Figure 1. Project Flowchart mance results with Accuracy = 86.31, AUC = 0.90,
and F-measure = 0.91.

1
4. Supervised Learning
This section has 2 tasks namely, Classifying Bugs Sever-
ity & Classifying Issue Type.

4.1. Dataset & EDA for Classifying Bugs Severity

The bug reports from Eclipse & Mozilla include a sever-
ity field, which are 5 types - blocker, critical, major, minor
& trivial. Each bug report also has ID and a description of
the issue.
Figure 3. EDA Graphs for Issue Type Dataset

Example: “Stack overflow with namespace aliases”

changes to “Stack overflow namespace alias.”
4.4. Methodology for Predicting Bugs Severity
After preprocessing, features are extracted using TF-
IDF (Term Frequency-Inverse Document Frequency)
vectorization, which measures a word’s importance within
a document relative to the corpus. We also included uni-
grams and bigrams in TF-IDF vectorization to capture sin-
gle words and meaningful word pairs.
After TF-IDF vectorization, Latent Semantic Analy-
sis (LSA) is performed using Truncated Singular Value
Decomposition (SVD) to reduce the feature space to 1000
components, minimizing noise while preserving essential
information. The refined features are used as input for vari-
ous machine learning models, including a multinomial Lo-
gistic Regression with the lbfgs solver and 1000 maximum
iterations, a Decision Tree and Random Forest, both with a
maximum depth of 100, and a Multilayer Perceptron with
two hidden layers (100 and 50 units), 1000 maximum iter-
ations, tanh activation, and an sgd solver. Additionally, an
Ensemble Voting classifier combines predictions from all
models through majority voting.
Figure 2. EDA Graphs for Bug Severity Dataset 4.5. Analysis & Conclusion for Predicting Bugs
Severity
4.2. Dataset & EDA for Classifying Issue Type The results indicate that Logistic Regression and Ensem-
ble Voting both achieved the highest accuracy of 0.60. De-
The JIRA bug tracking system has issue reports contain cision Tree performed the worst with an accuracy of 0.44,
multiple categories for ’Bug Type’. For our project, we while Random Forest and Multilayer Perceptron (MLP)
grouped these categories into broader issues i.e. Defect, performed similarly with accuracies of 0.57 and 0.59, re-
Improvement & Task. spectively. In terms of the weighted F1-score, MLP showed
strong performance with a score of 0.58, outperforming the
4.3. Data Pre-processing for Supervised Learning
other models. The macro F1-scores were generally lower,
The NLP preprocessing steps applied to both datasets in- with the ensemble model achieving a score of 0.44, reflect-
cluded tokenization to split text into individual words or ing the overall balance in model performance.
tokens, lowercasing to ensure uniformity, and the removal The performance of class prediction for major and mi-
of stop words like ”the” and ”is” to reduce noise. Addition- nor classes is relatively strong, while the prediction for
ally, removal of non-alphabetic characters and lemmati- the blocker class remains below average across all mod-
zation was performed to reduce words to their base forms, els. The Multilayer Perceptron (MLP) model achieves a no-
such as converting ”running” to ”run”. tably higher F1-score across all classes when compared to

2
Table 1. Performance Metrics for Models
Model F1-Score Accuracy
Macro Weighted
Logistic Regression 0.44 0.57 0.60
Decision Tree 0.34 0.44 0.44
Random Forest 0.39 0.53 0.57
Multilayer Percep- 0.48 0.58 0.59
tron
Ensemble Voting 0.44 0.56 0.60

Figure 5. Per Class Heatmap for Issue Type Classification

class but below average for the task class. The improve-
ment class lies in the middle, with moderate F1-scores, sug-
gesting partial success in distinguishing these instances but
leaving room for better class-specific predictions.

5. Unsupervised Learning
In this section, we try to cluster the bug reports into bug
Figure 4. Per Class Heatmap for Severity Classification
domains using unsupervised learning techniques.

the other models, demonstrating a more consistent perfor-

5.1. Dataset & Data Pre-processing
mance. For this task, we used the DeepTriage dataset which
contans bug reports from Google Chromium, Mozilla
4.6. Analysis & Conclusion for Classifying Issue Core, & Mozilla Firefox, each including the bug ID, title,
Type and description. After removing duplicates and null values,
The overall performance metrics, including accuracy, the total samples came out to be 116371.
macro F1-score, and weighted F1-score, highlight the Description of bug reports are used as the unlabelled
strong performance of Logistic Regression and MLP, both dataset. The pre-processing steps used were same as before
achieving high macro and weighted F1-scores of 0.60 and i.e. Tokenization, Lowercasing, Stopwords Removal,
0.71 and accuracies of 0.72 and 0.73, respectively. Major- Non-Alphabetic Character Removal & Lemmatization.
ity Voting showed moderate performance with a weighted
F1-score of 0.68, while Decision Tree and Random For- 5.2. Methodology for Clustering
est performed lower, particularly the Decision Tree, which After pre-processing the bug descriptions, 5000 features
struggled with class balance. per sample are created using TF-IDF Vectorization. The
created feature space is then applied to the following unsu-
Table 2. Performance Metrics for Models
Model F1-Score Accuracy pervised paradigms:
Weighted Macro
Logistic Regression 0.72 0.60 0.71 1. K-Means Clustering: The optimal number of clus-
ters, k, was determined using the Elbow and Knee
Decision Tree 0.58 0.48 0.58
methods. A word cloud was then generated for each
Random Forest 0.68 0.50 0.53
cluster to visually represent the most frequent terms
Multilayer Percep- 0.73 0.60 0.71
within them.
tron
Ensemble Voting 0.71 0.56 0.68 2. Gaussian Mixture Model (GMM): Clusters were cre-
ated using the GMM approach, with the number of
Performance of all models is decently well for the defect clusters (k) obtained from the previous step. Each data

3
point was assigned to the cluster with the highest pos- while the remaining clusters are more sparse. This indi-
terior probability. The clustering results were visual- cates that a central pattern defines most of the data, while
ized in both 2D and 3D using Principal Component the other clusters represent less frequent, niche occurrences.
Analysis (PCA). For computational efficiency, 25% of
the dataset was sampled.

5.3. Analysis & Conclusion for K Means Clustering

In the Within-Cluster Sum of Squares (WCSS) vs. k
graph, the ’elbow’ point occurs at k=7, which indicates the
optimal number of clusters. This is the point where the rate
of decrease in WCSS slows significantly.
Labels for the clusters are manually assigned based on
insights derived from their respective word clouds. For in-
stance, a cluster had terms like ’linux’, ’useragent’, ’in-
tel’, ’macintosh’, ’mac’, ’window’, and ’gecko’, which
point to compatibility or configuration issues across various
operating systems and hardware environments and was la-
beled as OS Related. Another cluster included words such
as ’crash’, ’build’, ’run’, ’test’, ’logging’, ’failed’, ’val-
grind’, ’null’, ’missing’, and ’exception’, reflecting errors
encountered during the build and compilation processes and
was given Compilation Errors as the label. These clusters
are visualized below:

Figure 6. Cluster 1: OS Related

Figure 8. 2D & 3D Visualization

This imbalance in cluster distribution is mirrored in

the visualizations, where the dominant groups stands out
clearly, but overlapping regions hint at the complexities
within the smaller clusters. Both the 2D and 3D visual-
Figure 7. Cluster 2: Compilation Error izations demonstrate successful clustering, with data points
generally well-separated. However, some overlap between
clusters suggests underlying similarities or noise. This over-
5.4. Analysis & Conclusion for Gaussian Mixture
lap may decrease in higher dimensions, where cluster sep-
Model
aration could become more pronounced. As the current
The Gaussian Mixture Model (GMM) clustering reveals dimensionality may not fully capture the data’s complex-
an imbalanced distribution across the 7 clusters, with two ity, further analysis in higher dimensions could enhance the
dominant groups containing the majority of the data points, clarity of cluster distinctions.

4
6. Learnings & Contributions
The team learned data preprocessing techniques (e.g., to-
kenization, TF-IDF), applied supervised algorithms (e.g.,
Logistic Regression), and evaluated models using metrics
like accuracy and F1-score. Challenges included data im-
balance, model generalization, and preprocessing unstruc-
tured text data, requiring fine-tuning and optimization for
bug severity and issue classification. Contributions of each
member:

• Ishir Bhardwaj: Unsupervised Learning & making

project report
• Manit Kaushik: Supervised Learning & making
project presentation
• Pranav Gupta: Unsupervised Learning & making
project report
• Raghav Wadhwa: Supervised Learning & making
project presentation

7. References
[1] A. Baarah, A. Al-oqaily, Z. Salah, M. Sal-
lam, & M. Al-qaisy. (2019), Machine Learning
Approaches for Predicting the Severity Level of
Software Bug Reports in Closed Source Projects,
IJACSA.

[2] Tan, Y., Xu, S., Wang, Z., Zhang, T., Xu, Z.,
& Luo, X. (2020). Bug Severity Prediction Using
Question-and-Answer Pairs from Stack Overflow.
Journal of Systems and Software, 110567.

[3] Catolino, G., Palomba, F., Zaidman, A.,&

Ferrucci, F. (2019). Not All Bugs Are the
Same:Understanding, Characterizing, and Classi-
fying Bug Types. Journal of Systems and Soft-
ware.

[4] Senthil Mani, Anush Sankaran, Rahul Ara-

likatte, (IBM Research, India). DeepTriage: Ex-
ploring the Effectiveness of Deep Learning for
Bug Triaging.

[5] Lamkanfi, A., Perez, J., & Demeyer, S.

(2013). The Eclipse and Mozilla defect tracking
dataset: A genuine dataset for mining bug infor-
mation. Proceedings of the 10th Working Con-
ference on Mining Software Repositories (MSR
’13)

[6] Ahmed, H. A., Bawany, N. Z., & Shamsi, J.

A. (n.d.). CaPBug: A framework for automatic
bug categorization and prioritization using NLP
and machine learning algorithms.

Cryptography and Network Security Notes
100% (1)
Cryptography and Network Security Notes
94 pages
[RMO PRMO IMO INMO] Krishnamurthy Pranesachar Ranganathan Venkatachala - Challenge and Thrill of Pre-College Mathematics Part 2 Circles Till Miscellaneous Problems New Age International Publishers Krishnamurthy Pra
100% (2)
[RMO PRMO IMO INMO] Krishnamurthy Pranesachar Ranganathan Venkatachala - Challenge and Thrill of Pre-College Mathematics Part 2 Circles Till Miscellaneous Problems New Age International Publishers Krishnamurthy Pra
260 pages
ECBFMBP: Design of An Ensemble Deep Learning Classifier With Bio-Inspired Feature Selection For High-Efficiency Multidomain Bug Prediction
100% (1)
ECBFMBP: Design of An Ensemble Deep Learning Classifier With Bio-Inspired Feature Selection For High-Efficiency Multidomain Bug Prediction
24 pages
Assignment 2 - The Role of Linear Algebra in Data Science
No ratings yet
Assignment 2 - The Role of Linear Algebra in Data Science
1 page
Tech Exult Intra Finals: Powered by
100% (1)
Tech Exult Intra Finals: Powered by
50 pages
Operating System Os Notes Cs 2nd Year
No ratings yet
Operating System Os Notes Cs 2nd Year
89 pages
Software Defect Prediction PPR
No ratings yet
Software Defect Prediction PPR
11 pages
TFM Jenifer Tabita Ciuciu-Kis
No ratings yet
TFM Jenifer Tabita Ciuciu-Kis
83 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
Midterm Exam - Attempt Review
No ratings yet
Midterm Exam - Attempt Review
16 pages
MLT CNN Architectures
No ratings yet
MLT CNN Architectures
104 pages
Dynamic Selection of Heterogenous Ensemble To Improve Bug Prediction
No ratings yet
Dynamic Selection of Heterogenous Ensemble To Improve Bug Prediction
62 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
L03 Problem Solving As Search I
No ratings yet
L03 Problem Solving As Search I
66 pages
Shirley Yang Masc Thesis
No ratings yet
Shirley Yang Masc Thesis
65 pages
2307 00009
No ratings yet
2307 00009
83 pages
Software Defect
No ratings yet
Software Defect
46 pages
Improving Bug Detection Via Context-Based Code Rep
No ratings yet
Improving Bug Detection Via Context-Based Code Rep
30 pages
August 2024: Top 10 Cited Articles in Software Engineering & Applications
No ratings yet
August 2024: Top 10 Cited Articles in Software Engineering & Applications
31 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Data Science
No ratings yet
Data Science
25 pages
May 2025: Top 10 Cited Articles in Software Engineering & Applications
No ratings yet
May 2025: Top 10 Cited Articles in Software Engineering & Applications
31 pages
Ebug Final
No ratings yet
Ebug Final
25 pages
Last Day
No ratings yet
Last Day
35 pages
Final Report
No ratings yet
Final Report
17 pages
Streamlining Security Vulnerability Triage With Large Language Models
No ratings yet
Streamlining Security Vulnerability Triage With Large Language Models
16 pages
Ai PPT 2)
No ratings yet
Ai PPT 2)
10 pages
Course Notes For EE394V Restructured Electricity Markets: Locational Marginal Pricing
No ratings yet
Course Notes For EE394V Restructured Electricity Markets: Locational Marginal Pricing
95 pages
Optimal Machine Learning Model For Software Defect Prediction
No ratings yet
Optimal Machine Learning Model For Software Defect Prediction
14 pages
Unsolved Python Problems
No ratings yet
Unsolved Python Problems
18 pages
RLocator
No ratings yet
RLocator
14 pages
Thesis
No ratings yet
Thesis
13 pages
13 Error Detection and Correction 23122024 024244pm
No ratings yet
13 Error Detection and Correction 23122024 024244pm
15 pages
CNN-Based Automatic Prioritization of Bug Reports Transaction Paper
No ratings yet
CNN-Based Automatic Prioritization of Bug Reports Transaction Paper
14 pages
Handwritten Text Recognition Using Deep Learning
No ratings yet
Handwritten Text Recognition Using Deep Learning
13 pages
Comprehensive Study On Machine Learning
No ratings yet
Comprehensive Study On Machine Learning
10 pages
15 Jsee2445
No ratings yet
15 Jsee2445
11 pages
Designing A Robust Software Bug Prediction Model Using Enhanced Learning Principles With Artificial Intelligence Assistance
No ratings yet
Designing A Robust Software Bug Prediction Model Using Enhanced Learning Principles With Artificial Intelligence Assistance
6 pages
A Developer Centered Bug Prediction Model
No ratings yet
A Developer Centered Bug Prediction Model
21 pages
Attiq Ahmad Afsar Assignment 1
No ratings yet
Attiq Ahmad Afsar Assignment 1
12 pages
OPABP NidhiSrivastava
No ratings yet
OPABP NidhiSrivastava
7 pages
14 Apr
No ratings yet
14 Apr
9 pages
Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques
No ratings yet
Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques
17 pages
A Systematic Review of Unsupervised Learning Techniques For Software Defect Prediction
No ratings yet
A Systematic Review of Unsupervised Learning Techniques For Software Defect Prediction
18 pages
Bug Paper
No ratings yet
Bug Paper
10 pages
Review Paper
No ratings yet
Review Paper
13 pages
SML Final Submission Report
No ratings yet
SML Final Submission Report
5 pages
Real-World Challenges in Building Accurate Software Fault Prediction Models
No ratings yet
Real-World Challenges in Building Accurate Software Fault Prediction Models
47 pages
Bug Classification Accuracy Report Updated
No ratings yet
Bug Classification Accuracy Report Updated
6 pages
CIS 6213 Applied Machine Learning Coursework
No ratings yet
CIS 6213 Applied Machine Learning Coursework
5 pages
Comparing Mining Algorithms For Predicting The Severity of A Reported Bug
No ratings yet
Comparing Mining Algorithms For Predicting The Severity of A Reported Bug
10 pages
Wordle Manuscript
No ratings yet
Wordle Manuscript
5 pages
DSA Roadmap Sheet
No ratings yet
DSA Roadmap Sheet
4 pages
Simple Scheduller
No ratings yet
Simple Scheduller
3 pages
Bug Reports Priortisation 5 Page
No ratings yet
Bug Reports Priortisation 5 Page
5 pages
An Isomorphism
No ratings yet
An Isomorphism
27 pages
Ise Report
No ratings yet
Ise Report
7 pages
Cglibs13 Recognition
No ratings yet
Cglibs13 Recognition
24 pages
Ds Cat Ques Paper
No ratings yet
Ds Cat Ques Paper
5 pages
Lafi 2020
No ratings yet
Lafi 2020
5 pages
Mapping Bug Reports To Relevant Files: A Ranking Model, A Fine-Grained Benchmark, and Feature Evaluation
No ratings yet
Mapping Bug Reports To Relevant Files: A Ranking Model, A Fine-Grained Benchmark, and Feature Evaluation
18 pages
L-5 Introduction To Robust Control
No ratings yet
L-5 Introduction To Robust Control
9 pages
Smells Like Teen Spirit: Improving Bug Prediction Performance Using The Intensity of Code Smells
No ratings yet
Smells Like Teen Spirit: Improving Bug Prediction Performance Using The Intensity of Code Smells
12 pages
Department of Mechanical Engineering Mentofmechanicalengineering Ofmechanicalengineering
No ratings yet
Department of Mechanical Engineering Mentofmechanicalengineering Ofmechanicalengineering
2 pages
Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method
No ratings yet
Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method
11 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
How Misclassification Impacts Bug Prediction
No ratings yet
How Misclassification Impacts Bug Prediction
10 pages
Simple Multi Threader
No ratings yet
Simple Multi Threader
2 pages
Searching Algorithms: Welcome To CS221: Programming & Data Structures
No ratings yet
Searching Algorithms: Welcome To CS221: Programming & Data Structures
12 pages
(ICPC 2017) Bug Localization With Combination of Deep Learning and Information Retrieval
No ratings yet
(ICPC 2017) Bug Localization With Combination of Deep Learning and Information Retrieval
12 pages
6.003 Homework #9 Solutions: Problems
No ratings yet
6.003 Homework #9 Solutions: Problems
10 pages
Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns
No ratings yet
Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns
8 pages
IEEE - INDIACom 2018 Paper
No ratings yet
IEEE - INDIACom 2018 Paper
6 pages
Software Bug Prediction Using Machine Learning Approach
No ratings yet
Software Bug Prediction Using Machine Learning Approach
6 pages
CS 302 (3001 (O) ) PDF
No ratings yet
CS 302 (3001 (O) ) PDF
7 pages
Valgrind Essentials: Definitive Reference for Developers and Engineers
From Everand
Valgrind Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automated Duplicate Detection For Bug Tracking Systems
No ratings yet
Automated Duplicate Detection For Bug Tracking Systems
10 pages
Comparison of Big Data Analyses For Reliable Open Source Software
No ratings yet
Comparison of Big Data Analyses For Reliable Open Source Software
5 pages
A Meta-Stacked Software Bug Prognosticator Classifier
No ratings yet
A Meta-Stacked Software Bug Prognosticator Classifier
7 pages
Predicting Root Cause Analysis (RCA) Bucket For
No ratings yet
Predicting Root Cause Analysis (RCA) Bucket For
4 pages
Apache Ant in Practice: Definitive Reference for Developers and Engineers
From Everand
Apache Ant in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Chapter 3-ADC and DAC: Nyquist Frequency and The Nyquist Rate. Unfortunately, Their Meaning Is
No ratings yet
Chapter 3-ADC and DAC: Nyquist Frequency and The Nyquist Rate. Unfortunately, Their Meaning Is
4 pages
Practice Coding Questions DSA-ST-3
No ratings yet
Practice Coding Questions DSA-ST-3
3 pages
Research Proposal
No ratings yet
Research Proposal
4 pages
Using Data Reduction Techniques For Effective Bug Triage: Shanthipriya. D, Deepa.K
No ratings yet
Using Data Reduction Techniques For Effective Bug Triage: Shanthipriya. D, Deepa.K
4 pages
Anam, Al-Jumaily - 2016 - Adaptive Myoelectric Pattern Recognition For Arm Movement in Different Positions Using Advanced Online Sequent
No ratings yet
Anam, Al-Jumaily - 2016 - Adaptive Myoelectric Pattern Recognition For Arm Movement in Different Positions Using Advanced Online Sequent
4 pages
Jstse Assignment 1 Class 9: 6. The Tropic of Cancer Does Not Pass Through Which of These Indian States?
No ratings yet
Jstse Assignment 1 Class 9: 6. The Tropic of Cancer Does Not Pass Through Which of These Indian States?
3 pages
Sakshigabahne
No ratings yet
Sakshigabahne
3 pages
ESE547
No ratings yet
ESE547
3 pages
Assignment 1: CS21003 Algorithms 1
No ratings yet
Assignment 1: CS21003 Algorithms 1
1 page
Delhi Public School, Rohini Jstse Assignment - Ii Class - Ix
No ratings yet
Delhi Public School, Rohini Jstse Assignment - Ii Class - Ix
2 pages
Calibration of Software Quality: Fuzzy Neural and Rough Neural Computing Approaches
No ratings yet
Calibration of Software Quality: Fuzzy Neural and Rough Neural Computing Approaches
4 pages
Software Bug Classification and Assignment: Mradul Singh, Sandeep K
No ratings yet
Software Bug Classification and Assignment: Mradul Singh, Sandeep K
3 pages
Fibonacci
No ratings yet
Fibonacci
4 pages
Coding Theory Lecturs For M Al-Ashker
No ratings yet
Coding Theory Lecturs For M Al-Ashker
101 pages
JSTSE History Set-I Marking Scheme
No ratings yet
JSTSE History Set-I Marking Scheme
1 page
Stability of Gepp and Gecp
No ratings yet
Stability of Gepp and Gecp
3 pages
The Assignment Problem Hungarian Method
No ratings yet
The Assignment Problem Hungarian Method
1 page

ML Project - Report

Uploaded by

ML Project - Report

Uploaded by

Identifying Bug Types & Severity in Open-Source Code

Ishir Bhardwaj Manit Kaushik Pranav Gupta Raghav Wadhwa

1. Abstract 3. Literature Review

3.2. Machine Learning Approaches for Predicting

• Dataset: Bug reports extracted from the JIRA bug

• Features: The bug report is transformed into a feature

• Method: Naive Bayes, Naive Bayes Multinomial,

• Result: LMT algorithms reported the best perfor-

4.1. Dataset & EDA for Classifying Bugs Severity

Example: “Stack overflow with namespace aliases”

Figure 5. Per Class Heatmap for Issue Type Classification

the other models, demonstrating a more consistent perfor-

5.3. Analysis & Conclusion for K Means Clustering

Figure 6. Cluster 1: OS Related

Figure 8. 2D & 3D Visualization

This imbalance in cluster distribution is mirrored in

• Ishir Bhardwaj: Unsupervised Learning & making

[3] Catolino, G., Palomba, F., Zaidman, A.,&

[4] Senthil Mani, Anush Sankaran, Rahul Ara-

[5] Lamkanfi, A., Perez, J., & Demeyer, S.

[6] Ahmed, H. A., Bawany, N. Z., & Shamsi, J.

You might also like