0% found this document useful (0 votes)

15 views14 pages

Flakify Journal Paper

Uploaded by

venkadeshr123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

Flakify Journal Paper

Uploaded by

venkadeshr123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Flakify: A Black-

Box ,Language Model-Based

Predictor for Flaky Tests
NAREN AADHITYA B S
VENKADESH R
MOHANA PRIYA S T P
PROBLEM STATEMENT
• In software testing , Developers write test cases to make sure their code works as expected.

• The tests should give consistent results.

• They should either always pass (code works) or Always fail (ie: There is a problems).

• Sometimes, these tests acts unpredictably.

• They Might pass one time but fail another time even if the code hasn't changed.

• These problematic tests are called “Flaky Tests”.

• Developer working on a project with lots of tests & some of them are Flaky.

• This becomes challenging because you can’t rely on these Tests if your code is Good or bad
“Giving false signals”.
OBJECTIVE
• The paper presents flakify, which is a tool designed to solve the problem of flaky test in software
development.

• Flakey tests are test that sometimes pass and sometime fail even if the code being tested hasn’t changed.

• These tests can cause confusion and waste time for developers.

• Flakify is able to predict flaky test cases without executing them relying exclusively on test code.

• It uses machine learning and language models to understand the test code and make predictions about its
behavior.

• It doesn't need to run the tests multiple times or look at the actual code being tested (Production code).
PROPOSED WORK
• Flakify is a black-box predictor designed to identify flaky test cases without the need to acess the
production code.

• This approach distinguishes flakify from existing solutions that often rely on rerunning test cases
multiple times or analyzing.

• The proposed works on language models specifically codeBERT, to analyze the source code of test
cases.

• By fine tuning codeBERT to classify test cases as flaky or non-flaky based on the test code.

• Flakify aims to achieve accurate prediction.

PROPOSED WORK
• Flakify black-box nature allows it to operate exclusively on the code.

• Eliminating the requirement to acess the production code of the system under test.

• By being independent of the production code, flakify can be deployed more easily used in
different setup and Scaled across a wide range of software project without facing the limitations
that come with relying on production code.
CodeBERT
• CodeBERT a pre-trained language model that has been trained on a large dataset containing source
code from various programming language text.

• This pre-training enables codeBERT to understand the syntax and semantics of code, natural language

• Making it suitable for tasks such as

 Code search

 Code documentation

 Bug prediction

• CodeBERT provide a way to automatically identify patterns in the syntax and semantics of the code
without the need for manually defined features.
DATASET
Flakily using two publicly available datasets.

• The flake Flagger dataset.

• The international dataset of Flakey Tests (IDOFT)

• Flakeflagger Dataset

 22,236 javatest cases from Github project labeled as Flakey or Non Flaky based on 10,000 executions.

 After excluding irrelevant test cases 21,661 were used for Evaluation

• IDOFT Dataset

 3,268 Flakey Test cases from 312 Java Projects.

 1,263 Flakey Test cases from 179-projects were used to obtain non Flakey test cases.

 The updated dataset consists of 3,862 test cases after adding the 594 Non-flaky(fixed) tests.
TEST SMELLS
• These are, indicators found within the test code that suggest potential problems.

• Just as a strange odor might indicate a hidden issue.

• Test smells hint at areas in the test suite that might need attention.

Test smells can be like

Test Logics that's too complicated.

Having the same test point repeated unnecessarily.

Setting up too Much stuff before running the actual test.

This method involves examining the test code without executing it.
METHODOLOGY INVOLVED
• Data collections

 Two publicly Available Datasets Namely

 Flake flagger dataset

 International Dataset of Flakey Tests(IDOFT).

• Data Preprocessing

 Test cases are preprocessed to extract relevant code statements related to test smells.

 Reducing the token Length of test cases to enhance prediction accuracy.

EVALUATED MATRICES
• Precision

 It measures the ability of a classification Model to precisely predict flaky test cases.

• Recall

 It measures the ability of a model to predict all flakey test cases.

• F1-Score

 It is the harmonic mean of precision and recall, providing a balance between the two metrics.
SUMMARIZE
 Flakily is a black box predictor that solely relies on the source code oftest cases

 Eliminating need to access to production code.

 This contrast with white-Box Predictions that requires access to production code

 Flakily does not require pre-defined set of features for predicting Flakey test cases.

 It gets adapt to different testing environments without features engineering.

 Flakify incorporates that smell detection techniques to enhance its prediction capabilities.

 Flakify Demonstrates a reduction in the cost of debugging test cases and production code compared to
existing-predictors.

 The reduction in wasted Resources highlights the practical benefit of using Flakify for detecting Flaky tests.
LITERATURE REVIEW
S.N TITLE AUTHOR YEAR DESCRIPTION
O
1 FlakeFlagger: A. Alshammari, C. The paper describes FlakeFlagger, a novel approach for
Morris, M. Hilton, predicting flakiness in tests without rerunning them. Inspired
Predicting flakiness by previous studies on test flakiness, FlakeFlagger uses a hybrid
without rerunning and J. Bell static/dynamic framework to collect behavioral features of
tests 2021 tests, such as API usage and test smells. By analyzing these
features, FlakeFlagger constructs a classifier to predict which
tests are likely to be flaky. This approach aims to improve the
detection of flaky tests and enhance the reliability of the
testing process.

2 A replication study G. Haben, S. The paper describes a replication study on the usability of code
on the usability of Habchi, M. vocabulary in predicting flaky tests. Researchers from the
University of Luxembourg evaluate a method for predicting
code vocabulary in Papadakis, M. flaky tests based on their vocabulary, aiming to improve
predicting flaky Cordy, and Y. L. 2021 industrial adoption and practice. Key findings include the
tests Traon impact of time-sensitive validation, generalizability to different
programming languages, and the influence of features on
prediction performance.
LITERATURE REVIEW
S.NO TITLE AUTHOR YEAR DESCRIPTION

Surveying the O. Parry, G. M. The paper explores the experiences of software developers with flaky
developer Kapfhammer, M. 2021 tests, which are tests that pass and fail without changes to the code
3 under test. Flaky tests are a significant challenge in software
experience of Hilton, and P. development as they disrupt continuous integration, reduce
flaky tests McMinn productivity, and erode confidence in testing. The study investigates the
impacts and causes of flaky tests through a survey of 170 developers,
highlighting factors beyond code that contribute to test flakiness. The
research emphasizes the need for a broader definition of flaky tests to
include environmental factors and recommends considering test
behavior in different execution environments.
An empirical study C. Pan, M. Lu, and The paper presents an empirical study on software defect prediction
4 on software B. Xu 2021 using the CodeBERT model. It introduces various CodeBERT models like
CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT for
defect prediction software defect prediction. The study investigates the impact of using a
using CodeBERT pre-trained CodeBERT model on prediction performance and explores
model different prediction patterns. Results show that utilizing CodeBERT
improves prediction performance and can save time. Additionally, the
study discusses the relationship between prediction patterns and buggy
rates in software defect prediction
REFERENCES
1. [PDF] An Empirical Study on Software Defect Prediction Using
CodeBERT Model | Semantic Scholar
2. A Survey of Flaky Tests (acm.org)
3. Sci-Hub | FlakeFlagger
: Predicting Flakiness Without Rerunning Tests | 10.1109/icse-comp
anion52605.2021.00081 (
soik.top)
4. MSR21_FlakyReplication.pdf (uni.lu)

Black-Box Prediction of Flaky Test Fix Categories Using Language Models
No ratings yet
Black-Box Prediction of Flaky Test Fix Categories Using Language Models
12 pages
Test-Driven iOS Development with Swift: Create fully-featured and highly functional iOS apps by writing tests first
From Everand
Test-Driven iOS Development with Swift: Create fully-featured and highly functional iOS apps by writing tests first
Dr. Dominik Hauser
5/5 (2)
Human Services Intervention Strategies
No ratings yet
Human Services Intervention Strategies
6 pages
Albert Bandura: Social Learning Theory
No ratings yet
Albert Bandura: Social Learning Theory
21 pages
Eye of Longing
No ratings yet
Eye of Longing
20 pages
Flakify
No ratings yet
Flakify
16 pages
Flakify A Black-Box Language Model-Based Predictor For Flaky Tests
No ratings yet
Flakify A Black-Box Language Model-Based Predictor For Flaky Tests
16 pages
Does Refactoring of Test Smells Induce
No ratings yet
Does Refactoring of Test Smells Induce
12 pages
Mastering the Art of Unit Testing: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Unit Testing: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Ginkgo for Effective Go Testing: The Complete Guide for Developers and Engineers
From Everand
Ginkgo for Effective Go Testing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Flash Issta20
No ratings yet
Flash Issta20
14 pages
Dafny Programming and Verification: Definitive Reference for Developers and Engineers
From Everand
Dafny Programming and Verification: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Flaky Test
No ratings yet
Flaky Test
65 pages
Java Testing for New Developers: A Practical Guide with Examples
From Everand
Java Testing for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
Go Debugging from Scratch: A Practical Guide with Examples
From Everand
Go Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Mockito Techniques for Effective Unit Testing: Definitive Reference for Developers and Engineers
From Everand
Mockito Techniques for Effective Unit Testing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Go Exception Handling Made Easy: A Practical Guide with Examples
From Everand
Go Exception Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Debugging and Testing from Scratch: A Practical Guide with Examples
From Everand
Debugging and Testing from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Go Functional Programming Simplified: A Practical Guide with Examples
From Everand
Go Functional Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
From Everand
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
Mark Garzone
4.5/5 (3)
Python Debugging from Scratch: A Practical Guide with Examples ASIN (Ebook):
From Everand
Python Debugging from Scratch: A Practical Guide with Examples ASIN (Ebook):
William E. Clark
No ratings yet
Chai Assertion Library in Practice: Definitive Reference for Developers and Engineers
From Everand
Chai Assertion Library in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Art of Go Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Go Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
1 s2.0 S0950584923002495 Main
No ratings yet
1 s2.0 S0950584923002495 Main
27 pages
TestNG Essentials: Definitive Reference for Developers and Engineers
From Everand
TestNG Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SpecFlow Test Automation Essentials: Definitive Reference for Developers and Engineers
From Everand
SpecFlow Test Automation Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
JUnit in Depth: Definitive Reference for Developers and Engineers
From Everand
JUnit in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Codeception Essentials: Definitive Reference for Developers and Engineers
From Everand
Codeception Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
On The Effect of Instrumentation On Test Flakiness
No ratings yet
On The Effect of Instrumentation On Test Flakiness
5 pages
Effective Mocha Testing: Definitive Reference for Developers and Engineers
From Everand
Effective Mocha Testing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Coq Language and Proof Development: Definitive Reference for Developers and Engineers
From Everand
Coq Language and Proof Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Test-Driven Development (TDD): Building Reliable and Maintainable Software
From Everand
Mastering Test-Driven Development (TDD): Building Reliable and Maintainable Software
Robert Johnson
No ratings yet
Comprehensive Guide to EasyMock: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to EasyMock: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Moq for .NET Developers: Definitive Reference for Developers and Engineers
From Everand
Practical Moq for .NET Developers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Debugging Playbook: System Testing, Error Localization, And Vulnerability Remediation
From Everand
Debugging Playbook: System Testing, Error Localization, And Vulnerability Remediation
Rob Botwright
No ratings yet
Debugging Like a Pro: A Practical Guide with Examples
From Everand
Debugging Like a Pro: A Practical Guide with Examples
William E. Clark
No ratings yet
Structured Software Testing: The Discipline of Discovering
From Everand
Structured Software Testing: The Discipline of Discovering
Arunkumar Khannur
No ratings yet
Unit Testing and TDD with TypeScript: Quality Code from Day One
From Everand
Unit Testing and TDD with TypeScript: Quality Code from Day One
Baldurs L.
No ratings yet
Principles of Test-Driven Development: Definitive Reference for Developers and Engineers
From Everand
Principles of Test-Driven Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Deep Learning For Software Defect Prediction - A Survey
No ratings yet
Deep Learning For Software Defect Prediction - A Survey
6 pages
Comprehensive Guide to SoapUI: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to SoapUI: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
XCTest in Swift: Definitive Reference for Developers and Engineers
From Everand
XCTest in Swift: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming Best Practices for New Developers: A Practical Guide with Examples
From Everand
Programming Best Practices for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
JavaScript Debugging for Beginners: A Practical Guide with Examples
From Everand
JavaScript Debugging for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Writing Clean Code Step by Step: A Practical Guide with Examples
From Everand
Writing Clean Code Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
PHPUnit in Practice: Definitive Reference for Developers and Engineers
From Everand
PHPUnit in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Cucumber Automation: Definitive Reference for Developers and Engineers
From Everand
Effective Cucumber Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Getting Started with Go: A Practical Guide with Examples
From Everand
Getting Started with Go: A Practical Guide with Examples
William E. Clark
No ratings yet
Kotlin Made Simple: A Practical Guide with Examples
From Everand
Kotlin Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Specs2 for Scala Development: Definitive Reference for Developers and Engineers
From Everand
Specs2 for Scala Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SVVT Portfolio
No ratings yet
SVVT Portfolio
3 pages
Zig Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
Zig Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Jest Techniques and Best Practices: Definitive Reference for Developers and Engineers
From Everand
Jest Techniques and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Predicting Code Coverage Without Execution
No ratings yet
Predicting Code Coverage Without Execution
12 pages
FSE21
No ratings yet
FSE21
12 pages
Predicting Root Cause Analysis (RCA) Bucket For
No ratings yet
Predicting Root Cause Analysis (RCA) Bucket For
4 pages
Mastering Concurrent Programming with Go
From Everand
Mastering Concurrent Programming with Go
Brett Neutreon
No ratings yet
ScalaTest Essentials: Definitive Reference for Developers and Engineers
From Everand
ScalaTest Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automating Software Tests Using Selenium
From Everand
Automating Software Tests Using Selenium
Hugo Peres
No ratings yet
Rust Programming Basics: A Practical Guide with Examples
From Everand
Rust Programming Basics: A Practical Guide with Examples
William E. Clark
No ratings yet
Go Algorithms for Beginners: A Practical Guide with Examples
From Everand
Go Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Practical Guide to Behave for Python Testing: Definitive Reference for Developers and Engineers
From Everand
Practical Guide to Behave for Python Testing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Algorithms Made Simple: Understanding the Building Blocks of Software
From Everand
Algorithms Made Simple: Understanding the Building Blocks of Software
William E. Clark
No ratings yet
Aguilos Mario Jr. G. LDM Portfolio
No ratings yet
Aguilos Mario Jr. G. LDM Portfolio
22 pages
Passive Voice: by Group 1: Luthfika Adityo Gindo Putra Nugroho Primatia Palwani Munawaroh
No ratings yet
Passive Voice: by Group 1: Luthfika Adityo Gindo Putra Nugroho Primatia Palwani Munawaroh
8 pages
Main Complementary: Page Text Book/ Activity Book: 4
No ratings yet
Main Complementary: Page Text Book/ Activity Book: 4
2 pages
Bicentennial Man
No ratings yet
Bicentennial Man
2 pages
Collaborative Dialogue Rubric
No ratings yet
Collaborative Dialogue Rubric
2 pages
Final PPT 3rd Sem
No ratings yet
Final PPT 3rd Sem
16 pages
High Flyer
No ratings yet
High Flyer
5 pages
Tieng Anh 1 I-Learn Smart
No ratings yet
Tieng Anh 1 I-Learn Smart
94 pages
Life Is Negotiation (Slideshow - Chris Voss)
No ratings yet
Life Is Negotiation (Slideshow - Chris Voss)
19 pages
Sigmund Freud: Studies
No ratings yet
Sigmund Freud: Studies
11 pages
The Scientific Method Gummy Bears
No ratings yet
The Scientific Method Gummy Bears
3 pages
CB Decision Making Model
No ratings yet
CB Decision Making Model
11 pages
How To Improve Spoken English
No ratings yet
How To Improve Spoken English
2 pages
Akreditasi Puskesmas Berdasarkan Kriteria Penilaian 8.4.3 Di Puskesmas Langsat Pekanbaru
No ratings yet
Akreditasi Puskesmas Berdasarkan Kriteria Penilaian 8.4.3 Di Puskesmas Langsat Pekanbaru
9 pages
Computational Thinking: Syllabus
No ratings yet
Computational Thinking: Syllabus
2 pages
COMP1649 Coursework Term1 - 2223
No ratings yet
COMP1649 Coursework Term1 - 2223
7 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Ouniverse
No ratings yet
Ouniverse
25 pages
Journalistic Language
100% (1)
Journalistic Language
2 pages
DIAZ COLLEGE - Prelim
No ratings yet
DIAZ COLLEGE - Prelim
5 pages
Extraction of Emotions From Text Using ML and Convert To Speech
No ratings yet
Extraction of Emotions From Text Using ML and Convert To Speech
19 pages
Human Relations Chapter 1 2 3 Test
100% (1)
Human Relations Chapter 1 2 3 Test
9 pages
7es Lesson Plan Template1pdf
100% (1)
7es Lesson Plan Template1pdf
2 pages
Interview Preparation Questions 1706886033
No ratings yet
Interview Preparation Questions 1706886033
12 pages
How To Write A Analytical Essay
No ratings yet
How To Write A Analytical Essay
3 pages
Halton The Meaningof Personal Art Objects
No ratings yet
Halton The Meaningof Personal Art Objects
10 pages
Stress Adaptation Concept UNSRAT
No ratings yet
Stress Adaptation Concept UNSRAT
30 pages

Flakify Journal Paper

Uploaded by

Flakify Journal Paper

Uploaded by

Flakify: A Black-

Box ,Language Model-Based

• The tests should give consistent results.

• Sometimes, these tests acts unpredictably.

• These problematic tests are called “Flaky Tests”.

• Flakify aims to achieve accurate prediction.

• Making it suitable for tasks such as

• The flake Flagger dataset.

• The international dataset of Flakey Tests (IDOFT)

 3,268 Flakey Test cases from 312 Java Projects.

• Just as a strange odor might indicate a hidden issue.

Test smells can be like

Having the same test point repeated unnecessarily.

Setting up too Much stuff before running the actual test.

 Two publicly Available Datasets Namely

 Flake flagger dataset

 International Dataset of Flakey Tests(IDOFT).

 Reducing the token Length of test cases to enhance prediction accuracy.

 It measures the ability of a model to predict all flakey test cases.

 Eliminating need to access to production code.

 It gets adapt to different testing environments without features engineering.

You might also like