Flakify Journal Paper
Flakify Journal Paper
• They should either always pass (code works) or Always fail (ie: There is a problems).
• They Might pass one time but fail another time even if the code hasn't changed.
• Developer working on a project with lots of tests & some of them are Flaky.
• This becomes challenging because you can’t rely on these Tests if your code is Good or bad
“Giving false signals”.
OBJECTIVE
• The paper presents flakify, which is a tool designed to solve the problem of flaky test in software
development.
• Flakey tests are test that sometimes pass and sometime fail even if the code being tested hasn’t changed.
• These tests can cause confusion and waste time for developers.
• Flakify is able to predict flaky test cases without executing them relying exclusively on test code.
• It uses machine learning and language models to understand the test code and make predictions about its
behavior.
• It doesn't need to run the tests multiple times or look at the actual code being tested (Production code).
PROPOSED WORK
• Flakify is a black-box predictor designed to identify flaky test cases without the need to acess the
production code.
• This approach distinguishes flakify from existing solutions that often rely on rerunning test cases
multiple times or analyzing.
• The proposed works on language models specifically codeBERT, to analyze the source code of test
cases.
• By fine tuning codeBERT to classify test cases as flaky or non-flaky based on the test code.
• Eliminating the requirement to acess the production code of the system under test.
• By being independent of the production code, flakify can be deployed more easily used in
different setup and Scaled across a wide range of software project without facing the limitations
that come with relying on production code.
CodeBERT
• CodeBERT a pre-trained language model that has been trained on a large dataset containing source
code from various programming language text.
• This pre-training enables codeBERT to understand the syntax and semantics of code, natural language
Code search
Code documentation
Bug prediction
• CodeBERT provide a way to automatically identify patterns in the syntax and semantics of the code
without the need for manually defined features.
DATASET
Flakily using two publicly available datasets.
• Flakeflagger Dataset
22,236 javatest cases from Github project labeled as Flakey or Non Flaky based on 10,000 executions.
After excluding irrelevant test cases 21,661 were used for Evaluation
• IDOFT Dataset
1,263 Flakey Test cases from 179-projects were used to obtain non Flakey test cases.
The updated dataset consists of 3,862 test cases after adding the 594 Non-flaky(fixed) tests.
TEST SMELLS
• These are, indicators found within the test code that suggest potential problems.
• Test smells hint at areas in the test suite that might need attention.
This method involves examining the test code without executing it.
METHODOLOGY INVOLVED
• Data collections
• Data Preprocessing
Test cases are preprocessed to extract relevant code statements related to test smells.
It measures the ability of a classification Model to precisely predict flaky test cases.
• Recall
• F1-Score
It is the harmonic mean of precision and recall, providing a balance between the two metrics.
SUMMARIZE
Flakily is a black box predictor that solely relies on the source code oftest cases
This contrast with white-Box Predictions that requires access to production code
Flakily does not require pre-defined set of features for predicting Flakey test cases.
Flakify incorporates that smell detection techniques to enhance its prediction capabilities.
Flakify Demonstrates a reduction in the cost of debugging test cases and production code compared to
existing-predictors.
The reduction in wasted Resources highlights the practical benefit of using Flakify for detecting Flaky tests.
LITERATURE REVIEW
S.N TITLE AUTHOR YEAR DESCRIPTION
O
1 FlakeFlagger: A. Alshammari, C. The paper describes FlakeFlagger, a novel approach for
Morris, M. Hilton, predicting flakiness in tests without rerunning them. Inspired
Predicting flakiness by previous studies on test flakiness, FlakeFlagger uses a hybrid
without rerunning and J. Bell static/dynamic framework to collect behavioral features of
tests 2021 tests, such as API usage and test smells. By analyzing these
features, FlakeFlagger constructs a classifier to predict which
tests are likely to be flaky. This approach aims to improve the
detection of flaky tests and enhance the reliability of the
testing process.
2 A replication study G. Haben, S. The paper describes a replication study on the usability of code
on the usability of Habchi, M. vocabulary in predicting flaky tests. Researchers from the
University of Luxembourg evaluate a method for predicting
code vocabulary in Papadakis, M. flaky tests based on their vocabulary, aiming to improve
predicting flaky Cordy, and Y. L. 2021 industrial adoption and practice. Key findings include the
tests Traon impact of time-sensitive validation, generalizability to different
programming languages, and the influence of features on
prediction performance.
LITERATURE REVIEW
S.NO TITLE AUTHOR YEAR DESCRIPTION
Surveying the O. Parry, G. M. The paper explores the experiences of software developers with flaky
developer Kapfhammer, M. 2021 tests, which are tests that pass and fail without changes to the code
3 under test. Flaky tests are a significant challenge in software
experience of Hilton, and P. development as they disrupt continuous integration, reduce
flaky tests McMinn productivity, and erode confidence in testing. The study investigates the
impacts and causes of flaky tests through a survey of 170 developers,
highlighting factors beyond code that contribute to test flakiness. The
research emphasizes the need for a broader definition of flaky tests to
include environmental factors and recommends considering test
behavior in different execution environments.
An empirical study C. Pan, M. Lu, and The paper presents an empirical study on software defect prediction
4 on software B. Xu 2021 using the CodeBERT model. It introduces various CodeBERT models like
CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT for
defect prediction software defect prediction. The study investigates the impact of using a
using CodeBERT pre-trained CodeBERT model on prediction performance and explores
model different prediction patterns. Results show that utilizing CodeBERT
improves prediction performance and can save time. Additionally, the
study discusses the relationship between prediction patterns and buggy
rates in software defect prediction
REFERENCES
1. [PDF] An Empirical Study on Software Defect Prediction Using
CodeBERT Model | Semantic Scholar
2. A Survey of Flaky Tests (acm.org)
3. Sci-Hub | FlakeFlagger
: Predicting Flakiness Without Rerunning Tests | 10.1109/icse-comp
anion52605.2021.00081 (
soik.top)
4. MSR21_FlakyReplication.pdf (uni.lu)