0% found this document useful (0 votes)
7 views6 pages

Bug Classification Accuracy Report Updated

The document outlines the implementation of Zero-Shot Learning (ZSL) for bug categorization, utilizing TF-IDF for keyword extraction and the Google T5 model for dynamic categorization without predefined labels. It details the process of mapping bug reports to categories based on extracted keywords and evaluates classification accuracy through various metrics before and after refinements. Additionally, it describes a contrastive learning approach using a Siamese network and MLP classifier for further bug categorization improvements.

Uploaded by

sillybillybakku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

Bug Classification Accuracy Report Updated

The document outlines the implementation of Zero-Shot Learning (ZSL) for bug categorization, utilizing TF-IDF for keyword extraction and the Google T5 model for dynamic categorization without predefined labels. It details the process of mapping bug reports to categories based on extracted keywords and evaluates classification accuracy through various metrics before and after refinements. Additionally, it describes a contrastive learning approach using a Siamese network and MLP classifier for further bug categorization improvements.

Uploaded by

sillybillybakku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Zero-Shot Learning for Bug Categorization: Detailed Explanation

We implemented Zero-Shot Learning (ZSL) to automatically categorize bug reports


without predefined labels. Instead of manually specifying categories, we used machine
learning techniques to extract relevant keywords, classify them into meaningful groups,
and map bug reports accordingly.

Step 1: Extracting Important Keywords using TF-IDF

Bug reports contain unstructured text, making it difficult to categorize them directly. To
extract meaningful information, we applied TF-IDF (Term Frequency-Inverse Document
Frequency) to identify the most important words in the dataset.

Why Use TF-IDF?

TF-IDF helps determine which words are:

• Frequent but meaningful (e.g., "error", "database", "performance").

• Rare yet relevant (e.g., "deadlock", "segmentation fault").

By computing TF-IDF scores, we ranked the top 2000 words that best represent the bug
reports. These words were stored as "extracted keywords", forming the basis for
categorization.

Step 2: Categorizing Keywords using Google T5 Model

Once we had a list of important words, the next challenge was to group them into
meaningful bug categories. Instead of manually defining these categories, we leveraged
Google T5 (Flan-T5-Large) to perform Zero-Shot Learning (ZSL).

Why Use Google T5?

• Zero-shot capability → No labeled data required.

• Understands contextual relationships → Assigns keywords to relevant categories


dynamically.

• Text-to-text framework → Accepts keywords as input and outputs a category


name.

For each extracted keyword, we prompted Google T5 to classify it into a relevant bug
category. The model dynamically generated categories such as:

▪ Functionality & Behavior Issues


▪ Crash & Failure Issues
▪ Build & Compilation Issues
▪ UI & Display Issues
▪ Performance & Resource Issues

The results were stored in a keyword-to-category mapping, which was later used to
classify bug reports.

Step 3: Mapping Bug Reports to Categories

With a keyword-to-category mapping in place, we categorized bug reports based on


which keywords appeared in their descriptions.

How Bug Reports Were Categorized?

Tokenized each bug description → Extracted words from the report.


Checked if any extracted keyword was present → Matched words with the T5-generated
categories.
Assigned the most frequent category → If multiple keywords mapped to different
categories, the most common one was selected.

This approach allowed us to classify bug reports without explicitly training a model,
making it a fully zero-shot learning pipeline.

Bug Classification Accuracy Report

1.1 Keyword Overlap Accuracy


Measures the proportion of words in the bug description that match the assigned category's
keywords.

Keyword Overlap Accuracy = (Matched Keywords in Description) / (Total Keywords in


Category) * 100

Before Refinement: 8.08%

After Refinement: 4.02%

1.2 Category Confidence Score


Evaluates how confidently each bug report matches its assigned category based on keyword
occurrences.

Category Confidence Score = Number of Matched Keywords in Description

Before Refinement: 0.68

After Refinement: 0.72

1.3 Entropy-Based Consistency


Measures how distinct or ambiguous classifications are. Lower entropy indicates better
separation of categories.
Entropy = - Σ P(Category) * log(P(Category))

Before Refinement: 1.27

After Refinement: 0.31

2. Refinements Made
To improve classification accuracy, the following refinements were applied:

- Expanded Keyword Sets – Added multi-word phrases (e.g., 'memory leak' instead of
'memory' alone).

- Weighted Phrase Matching – Increased focus on critical phrases instead of single words.

Metric Ideal Value


Keyword Overlap Accuracy (%) 5% - 20%

Category Confidence Score >3


Entropy Score <1

Category Occurrences:

Functionality & Behavior Issues 46252


Crash & Failure Issues 15173

Build & Compilation Issues 14652


UI & Display Issues 11180
Performance & Resource Issues 1425
Tools used by companies for bug triaging

Company Tool/Model Categorization Tools/Technologies Source


Name Methods Used Used
Pegasystems Deep Assigns bug reports Deep Learning GitHub.com
Learning- to 47 engineering
Based Bug teams
Report
Classifier
Mozilla & CaPBug 1. Program Anomaly NLP, Supervised IEEE Xplore
Eclipse Framework 2. GUI Machine Learning
3. Network
4. Security
5. Performance
6. Usability
GitHub Automatic 1. Bug RoBERTa-based arXiv.org
Issue 2. Enhancement Neural Network
Classifier 3. Question
(AIC)
Semgrep Code Identifies security Code Parsing, Company
Analysis Tool flaws and classifies Pattern Matching Website
bugs related to
vulnerabilities
Turna62 Bug Severity 1. normal LSTM (Long Short- GitHub.com
Prediction 2. enhancement Term Memory)
Model 3. major Neural Networks
4. minor
5. critical
6. blocker
7. trivial
Kualitee Software Uses decision trees, Machine Learning Kualitee.com
Testing & SVM, and neural (Decision Trees,
Quality networks to classify SVM, Neural
Management software bugs Networks)
Workflow: Contrastive Learning using Siamese Network + MLP Classifier for Bug
Categorization

1. Data Preparation

• Load the bug report dataset (CSV file).

• Extract bug descriptions and category labels.

• Convert text into numerical embeddings using BERT/SBERT to represent each bug
report as a vector.

2. Generate Training Pairs for Contrastive Learning

Contrastive learning requires paired samples to help the model learn relationships between
different bug reports:

• Positive Pairs: Two bug reports from the same category.

• Negative Pairs: Two bug reports from different categories.

Example:

This helps the model learn similarities within a category and differences across categories.

Bug ID Description Category


1 Application crashes on startup with Crash & Failure Issues
segmentation fault
2 Login page buttons are misaligned on mobile UI & Display Issues
view
3 Memory leak detected in background process Performance & Resource
Issues
4 Segmentation fault occurs when clicking Crash & Failure Issues
'Save'
5 Slow response time when fetching API data Performance & Resource
Issues

Anchor Bug Positive Bug (Same Negative Bug


Category) (Different
Category)
1 4 2
3 5 1
2 None 4
3. Train a Siamese Network for Embedding Learning

• Define a Siamese Network that processes pairs of bug reports and generates their
embeddings.

• Train the network using Contrastive Loss (Triplet Loss), which ensures that
embeddings from the same category are closer together while embeddings from
different categories are farther apart.

4. Extract Embeddings for Classification

• Once the Siamese Network is trained, use it to generate vector embeddings for all
bug reports.

• These embeddings are used as input features for classification instead of raw text.

5. Train an MLP Classifier

• Use the embeddings from the Siamese Network to train a Multi-Layer Perceptron
(MLP) classifier.

• The MLP consists of:

o An input layer that accepts embeddings.

o One or more hidden layers with nonlinear activation functions.

o An output layer that predicts the bug category using Cross-Entropy Loss.

• Train the classifier using Adam optimizer and evaluate performance on a test set.

6. Evaluate Model Performance

The performance of the classification model is measured using the following metrics:

• Accuracy: Measures the percentage of correctly classified bug reports.

• F1-score: Evaluates precision and recall for each category.

• Entropy-Based Consistency: Ensures that different categories have well-separated


embeddings.

• Hyperparameter tuning: Optimizes learning rate, batch size, and network depth for
improved performance.

You might also like