Jurnal 6
Jurnal 6
Jurnal 6
Abstract— Agile software development deviates from the identifying fine-grained requirements of the targeting system's
traditional and plan-based approaches to software engineering behavior in such a way as to automate them. BDD's primary
as its iterative cycles embrace changes in software requirements. objective is to obtain executable system requirements [2].
Test-Driven Development (TDD) and Behavior-Driven BDD depends on TDD but tests are clearly written and readily
Development (BDD) are techniques that have been recently understandable in BDD as it offers a specific ubiquitous
adopted by the software industry and have evolved from agile language that allows stakeholders to better define their tests.
practices. These techniques aim at improving both the quality of
the delivered software and the developers’ productivity. They The primary aim of this paper is to analyze the impacts of
are several studies that have been conducted on TDD in both TDD and BDD techniques on software development in an
academic and industry settings with however only a few on industry setting. The two techniques will be applied in
BDD. While TDD and BDD have both become popular, many different scenarios and the results will be analyzed in terms of
organizations and developers still do not understand the external code quality, internal code quality and productivity.
differences between these concepts or where they overlap. The The rest of the paper is structured as follows. Section II
aim of this paper is to assess the effectiveness of these two presents a background study on TDD and BDD. Details about
approaches in terms of external code quality, internal code the experiment carried out are presented in section III with
quality and developers’ productivity. To understand the section IV highlighting the results. Finally, section V
weaknesses and strengths of each of these methods, a literature
concludes the work and provides some future works.
review was first performed. An experiment was then carried out
in an industry setting to observe the effectiveness of TDD and II. BACKGROUND STUDY
BDD on a number of subjects. Our results showed that both
techniques indeed increased the external quality of the delivered Testing is one of the most important phases of software
product. However, a decrease in productivity and internal development since it ensures that high quality products are
quality were noted when BDD was used compared to TDD delivered [5]. In traditional software development
which might be due to the additional steps involved in BDD. methodologies, the tests are carried out after the code has been
written, hence known as Test-Last Development (TLD). Since
Keywords—Test-Driven Development, Behavior-Driven the tests are written in the last phase, the quality of the
Development, TDD, BDD, productivity, quality, code coverage software is determined at that moment and making changes in
the final stage becomes difficult. On the other hand, agile
I. INTRODUCTION software development allows tests to be written before starting
A large number of software development companies are the coding process. This allows for easy integration of any
shifting from the traditional waterfall model to the agile changes in the coding phase and eases defect correction.
methodology. Test-Driven Development (TDD) and
Behavior-Driven Development (BDD) are techniques A. Test-Driven Development
frequently used in agile software development. TDD refers to Test-Driven Development (TDD), commonly known as
a development principle that executes in short and repetitive test-first coding or Test-Driven Design, is a development
cycles, whereby in each cycle, the tests are written prior to the technique where a programmer first writes a failing test case
actual coding [1]. BDD was subsequently invented by Dan and then proceeds to write the necessary codes [6]. It is hence
North to make TDD a more effective process [2]. The primary considered as an evolutionary approach which relies on agile
objective of BDD is to obtain executable and well-defined practices of creating the tests prior to writing the functional
software specifications [2]. Dan North defined BDD as: “A code, refactoring and continuous integration [7]. This
second-generation, outside-in, pull-based, multiple- software development methodology differs completely from
stakeholder, multiple-scale, high-automation, agile the conventional test-last approach normally used in software
methodology” [3]. BDD therefore integrates TDD methods development where the software is developed based on the
and principles with domain-driven design concepts and functional specifications and the test cases are written after the
object-oriented analysis and design. This provides for shared whole program has been developed. TDD is a way of
tools and processes for management and software programming where the three activities, namely coding,
development teams to collaborate on the development of the testing and design, are closely linked. This implies that the
software. unit tests are first written for the feature that should be
implemented and then this functionality is coded. The TDD
TDD is an adaptive approach based on very brief cycles of process is illustrated in Error! Reference source not found.
development and agile principles of writing tests prior to and comprises of the following:
coding, refactoring and ongoing integration [4]. BDD is
usually considered to be an evolution of TDD. It focuses on 1. Selection of a user story,
365
2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE)
December 11–12, 2019, Amity University Dubai, UAE
2. Writing a unit test that accomplishes a subtask of the user objective is to obtain executable system requirements. BDD
story thereby producing a failed test, depends on ATDD, however since BDD uses a specific
3. Writing the code needed to implement the functionality, ubiquitous language helping the stakeholders to describe their
tests, the latter are well written and easily understood. Several
4. Executing the unit tests again wherein if any test fails, BDD supporting toolkits like JBehave, Cucumber and RSpec
the code is modified and the tests are executed again, are also available.
5. Refactoring the code and the tests.
The BDD process, shown in Fig. 2, follows these steps:
1. Write a scenario
2. Run the scenario that fails
3. Write the test that corresponds to the specifications of the
scenario
4. Write the simplest code to pass the test and the scenario
5. Refactor to eliminate duplication
C. Related Works
Erdogmus et al. [8] carried out an experiment to measure
programmers’ tests per unit effort, productivity and quality.
The subjects were 3rd year students. The experiment consisted
of implementing the Bowling Score Keeper problem using
Java and JUnit. The problem consisted of producing the total
score for a given valid sequence of rolls for one line of
American Ten-Pin Bowling. The metrics used were:
1. Productivity: measured by the number of user stories
implemented normalized by total programming effort,
2. Programmer tests per unit effort: measured by
number of tests normalized by the total effort
Fig. 1. Test-Driven Development [9] expended, and,
3. Quality: measured by defects per story.
B. Behaviour-Driven Development
Recently, BDD has become an undeniably common agile Their primary outcome was that more tests per unit of
development technique and has picked up considerations by programming effort were written by TDD programmers. As a
both practitioners and researchers. It was created by Dan result, a greater amount of programmer tests resulted in
North as a new method to mitigate issues in TDD [2]. productivity rates being proportionally greater. Thus, TDD
seemed to enhance efficiency through a chain impact. They
also thought that proceeding with one test at a time and writing
tests prior to execution promoted better breakdown, greater
comprehension of the fundamental requirements, and
decreased the scope of the functions to be conducted. On
average, TDD programmers did not attain better quality
despite the fact that more consistent performance outcomes
were obtained. The latter observation was ascribed to the
impact of skill on performance that TDD tended to diminish.
Having more tests enhanced the achievable minimum value
and reduced the variation. However, this impact did not seem
to be specific to TDD. In short, the TDD technique's
effectiveness may eventually depend on its capacity to inspire
developers to back up their code with test resources [8].
Cisneros et al. [10] carried out an academic experiment to
compare the Incremental Test-Last, Test-Driven
Development and Behavior-Driven Development. The
experiment was focused on the external software quality,
internal software quality and developers’ productivity. During
the experimental phase, two code katas were developed:
String Calculator and FizzBuzz. The String Calculator
consisted of implementing a simple ADD method. It received
Fig. 2. Behaviour-Driven Development [10] a string with some numbers separated by one or multiple
delimiters and returned the sum of all the numbers [11].
BDD is usually regarded as an evolution of TDD and FizzBuzz is a counting and number replacement game, where:
Acceptance TDD. It focuses on identifying the system's any number that is divisible by 3 is substituted by the word
behavior so that it can be easily automated. BDD's primary 'fizz', any number divisible by 5 is substituted by the word
366
2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE)
December 11–12, 2019, Amity University Dubai, UAE
'buzz', any prime number is substituted by the word 'whiz', any RQ1: Does BDD achieve better external code quality
number simultaneously divisible by 3 and 5 is substituted by compared to TDD?
'fizz buzz', any prime number divisible by 3 is substituted by
'fizz whiz', and any prime number divisible by 5 is substituted RQ2: Does BDD achieve better internal code quality
by 'buzz whiz' [12]. The metrics that the authors planned to compared to TDD?
use were: RQ3: Does BDD achieve better productivity compared
1. External quality metric which represents the to TDD?
fulfilment of stakeholders’ requirements, and, B. Hypothesis Formulation
2. The productivity metric represented by the work done Hypothesis testing is the basis for the statistical analysis of
by the subjects with the desired quality and within the an experiment. The hypotheses are formally stated, and
specific timeframe. relevant data is gathered and collected during the experiment
which will eventually help to either accept or reject the
The authors expected that the exercises developed with
hypotheses [15]. The following hypotheses were formulated:
TDD and BDD would present an improvement in internal and
external quality. They also expected a slight decrease in the External Code Quality (RQ1) - The null hypothesis, (𝑯𝟏𝟎 ),
productivity given that both TDD and BDD consisted of is that TDD and BDD give the same external code quality and
additional steps [10]. To our knowledge, no practical the alternative hypotheses being that BDD (𝑯𝟏𝟏 ) gives better
experiments were found that discuss the benefits put forward code quality or BDD (𝑯𝟏𝟐 ) gives lower external code quality
by BDD. But it is expected that BDD improves or at least than TDD.
maintain the benefits proposed by TDD as BDD is based on a
set of practices derived from TDD. 𝑯𝟏𝟎 : External code qualityBDD = External code qualityTDD
Lopes [13] designed a case study to evaluate BDD. The 𝑯𝟏𝟏 : External code qualityBDD > External code qualityTDD
aim was to apply the BDD process to a project in the domain
of maritime safety and security for a fictitious coast guard 𝑯𝟏𝟐 : External code qualityBDD < External code qualityTDD
center. The main programming language for the experiment Internal Code Quality (RQ2) - The null hypothesis, (𝑯𝟐𝟎 ),
was Ruby and Cucumber [14] and RSpec was used to write is that TDD and BDD give the same internal code quality and
BDD-style unit tests for Ruby code. BDD was used in each the alternative hypotheses being that BDD (𝑯𝟐𝟏 ) gives better
phase of the experiment (requirements gathering, design phase internal code quality or BDD (𝑯𝟐𝟐 ) gives lower internal code
and implementation phase). The author stated that the BDD quality than TDD.
process was incomplete in terms of requirements, bug
handling, acceptance testing and documentation of the 𝑯𝟐𝟎 : Internal code qualityBDD = Internal code qualityTDD
architecture and deployment procedure. He also highlighted a
number of instances where BDD could be enhanced. The 𝑯𝟐𝟏 : Internal code qualityBDD > Internal code qualityTDD
author then evaluated the modified BDD process and 𝑯𝟐𝟐 : Internal code qualityBDD < Internal code qualityTDD
concluded that the modified process would result in a more
comprehensive software engineering process for BDD. The Productivity (RQ3) - The null hypothesis, (𝑯𝟑𝟎 ), is that
suggestion for improving the documentation of the TDD and BDD give the same productivity and the alternative
architecture was not as expected but the author proposed some hypotheses being that BDD (𝑯𝟑𝟏 ) is more productive or (𝑯𝟑𝟐 )
future avenues that could be investigated. less productive than TDD.
III. EXPERIMENT DESIGN 𝑯𝟑𝟎 : ProductivityBDD = ProductivityTDD
The aim of the experiment of this paper is to find out the 𝑯𝟑𝟏 : ProductivityBDD > ProductivityTDD
impact on external code quality, internal code quality and
developers’ productivity by applying TDD and BDD 𝑯𝟑𝟐 : ProductivityBDD < ProductivityTDD
techniques in software development. The idea behind the C. Subjects
design of this experiment is to decrease as far as possible
threats to validity. A brief presentation on TDD and BDD was The experiment has been conducted in an industry setting
therefore prepared that also included techniques of test writing with volunteers consisting of both senior and junior
and design. This helped to set the standard of knowledge for developers in a private IT company in Mauritius. It had a
the experiment. duration of approximately 8 hours over 2 days and consisted
of two phases: knowledge and experimentation. The initial
For this experiment, two code katas had to be developed knowledge phase consisted of a brief presentation on Test-
by the subjects using TDD and BDD: The String Calculator, Driven Development and Behavior-Driven Development. The
described in [11] and the Bowling Score Keeper, described in developers were also given examples of correctly written test
[8]. Two questionnaires had been prepared and were filled by cases. A total of 10 subjects volunteered to participate in the
the participants, one before starting the experiment and the experiment. 8 developers and 2 functional analyst formed part
other one at the end of the experiment. of the volunteers. Each subject was instructed to develop the
A. Research Questions two code katas chosen, namely the String Calculator and the
Bowling Score Keeper.
The experiment was focused on three research questions
with regards to three outcomes: external software quality, D. Metrics
internal software quality and developers’ productivity. The Table I shows the metrics that were monitored for this
research questions which arise are the following: experiment.
367
2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE)
December 11–12, 2019, Amity University Dubai, UAE
TABLE I. MEASUREMENT CRITERIA complex program components that are more likely to be
Variable Metrics susceptible to defects. The complexity of a program can be
No. of accepted test cases passed calculated using Eclipse plugins. Branch coverage, on the
Internal Code Coverage McCabe’s Cyclomatic complexity and other hand, is a metric used to ensure that all codes are
Branch Coverage executed and that no branch contributes to the application
Productivity No. of user stories implemented in a given failure. Branch coverage also can be measured using Eclipse
time
plugins. The McCabe’s level of complexity is summarized in
Developer Choice Survey questionnaires
Table II [17].
External quality (𝑄𝐿𝑇𝑌) depends on the subtasks that have TABLE II. MCCABE'S COMPLEXITY LEVEL
been tackled (#𝑡𝑠𝑡) for a given task. If at least one of the assert
statements in the acceptance test related to that subtask passes, Complexity Number Meaning
1-10 Code is well-written and well-structured
the subtask is considered as completed. The formula of #𝑡𝑠𝑡 Highly testable
(1) is given below where 𝑛 represents the number of user Low maintenance cost and effort
stories that make up the task. 11-20 Complex source code
𝑛 Medium testability
1 𝐴𝑆𝑆𝐸𝑅𝑇𝑖 (𝑃𝐴𝑆𝑆)> 0 Medium maintenance cost and effort
#𝑡𝑠𝑡 = ∑ ∫ (1) 21-50 Highly complex code
Low testability
𝑖=0 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 High maintenance cost and effort
#𝑡𝑠𝑡 is used to calculate 𝑄𝐿𝑇𝑌 in (2) >50 Not testable
Very high maintenance cost and effort
∑#𝑡𝑠𝑡
𝑖=1 𝑄𝐿𝑇𝑌𝑖
𝑄𝐿𝑇𝑌 = × 100 (2) E. Instrumentation and Measurement
#𝑡𝑠𝑡 The materials required for this experiment were arranged
where 𝑄𝐿𝑇𝑌𝑖 is the quality of the 𝑖 𝑡ℎ tackled subtask and beforehand and consisted of pre-experiment questionnaires,
is defined as: post-experiment questionnaires, instructions on TDD and
BDD and user stories. The user stories had to be implemented
#𝐴𝑠𝑠𝑒𝑟𝑡𝑖 (𝑃𝑎𝑠𝑠) in Java using the Eclipse IDE. JUnit plugin was used for TDD
𝑄𝐿𝑇𝑌𝑖 = (3)
#𝐴𝑠𝑠𝑒𝑟𝑡𝑖 (𝐴𝑙𝑙) and Cucumber plugin was used for BDD. Code complexity
For example, if a person assesses two user stories was obtained using the Eclipse plugin Metrics [18] and branch
(#tst = 2), this implies that there are two user stories for which coverage was obtained using the Eclipse plugin EclEmma
at least one assert statement passes. Assuming that acceptance [19]. The time taken was recorded manually by each subject
tests of the first user story has 10 assertions, out of which 5 by writing down the start time and end time of each user story.
passes, then the quality value (QLTY1) is 0.50. Furthermore, F. Experiment Execution and Data collection
if 3 out of 9 assertions are passing in the second user story,
The experiment consisted of two phases: a brief
then it has a quality value (QLTY2) of 0.33. The external
introduction to the two development techniques followed by
quality is therefore QLTY = (0.50 + 0.33) / 2 * 100, i.e. 41.5%.
the coding. Most of the subjects were already aware of TDD
The productivity (PROD) metric is measured as the as they were already using it in the workplace. However, the
amount of work successfully carried out by the subject and is subjects were not well versed with BDD. The experience and
calculated as follows: expertise of the programmers were based on the number of
years in their professional career. The experiment carried out
𝑂𝑈𝑇𝑃𝑈𝑇 was scheduled as shown in Table III.
𝑃𝑅𝑂𝐷 = (4)
𝑇𝐼𝑀𝐸
TABLE III. SCHEDULE OF THE EXPERIMENT
Where OUTPUT is the number passing assert statements
as a percentage of the total set of tests for a task. Schedule of the Experiment
Program Introduction Coding Phase
#𝐴𝑠𝑠𝑒𝑟𝑡 (𝑃𝑎𝑠𝑠) Day1 TDD TDD
𝑂𝑈𝑇𝑃𝑈𝑇 = × 100 (5) BDD BDD
#𝐴𝑠𝑠𝑒𝑟𝑡 (𝐴𝑙𝑙) Day2
TIME, measures in minutes, is an estimation of the time During the introduction phase, the subjects were
taken to complete a task and is obtained from the difference in introduced to the two development techniques. For each
the time records noted in the questionnaires. method, separate guidelines for TDD and BDD were prepared
in order to avoid any threats to validity. Experiment
𝑇𝐼𝑀𝐸 = 𝑡𝑒𝑛𝑑 − 𝑡𝑠𝑡𝑎𝑟𝑡 (6) guidelines, and pre and post questionnaires were given to the
subjects. During the Coding phase, the subjects were
Consider a subject completing a task containing 10 assert
instructed to develop the two code katas using TDD and BDD
statements within a test suite. If after executing the acceptance
techniques. They were required to create their own test cases
tests, 8 assert statements have passed, then the OUTPUT =
(assert statements) to test their codes. The two metrics
(8 / 10) x 100 = 80%. Furthermore, if the functionality was
productivity (PROD) and external quality (QLTY) are based
delivered in two hours with TIME = 120 minutes, then the
on the number of tests passed during the experiment and this
subject’s PROD is calculated as 80/120 resulting in an
information were collected from the post-experiment
assertion passing rate of 0.67 % per unit time, i.e. per minute.
questionnaire. The participants were clearly advised to record
McCabe’s Cyclomatic complexity, which indicates the their time taken to complete the user stories on the post-
difficulty level of program to test and maintain, is used to experiment questionnaire. The complexity of the software and
calculate the internal quality [16]. It also helps recognize code coverage were recorded using the Metrics plugin in
368
2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE)
December 11–12, 2019, Amity University Dubai, UAE
Code Complexity
the EclEmma plugin. The result was given as a percentage of 12.10
the total instructions divided by the number of instructions
12.00
covered by the tests. 11.90
11.90
IV. DATA ANALYSIS AND RESULTS
11.80
This section shows the mean values of the collected values TDD BDD
for the two code katas: String Calculator and the Bowling
Score Keeper. Development Technique
A. Data Analysis
Fig. 5. Code Complexity
Table IV illustrates the mean results obtained for the two
code katas. Error! Reference source not found. shows the mean
values for the code coverage achieved. A slight increase of
TABLE IV. MEAN VALUES COLLECTED FROM EXPERIMENT 0.44% was noted for this metric when BDD is used.
Factors TDD BDD % Increase
92.10 92.05
External Quality 91.35 92.4 1.15
Productivity 1.78 1.29 -28 92.00
Code Coverage
Code Complexity 11.90 12.10 1.68 91.90
Code Coverage 91.65 92.05 0.44
91.80
91.65
91.70
Error! Reference source not found. shows the mean
91.60
values of the external quality for each development method.
91.50
A slight increase of 1.15% in the external quality was
observed when using BDD compared to TDD. 91.40
TDD BDD
92.5 92.4 Development Technique
External Quality
92
91.35 Fig. 6. Code Coverage
91.5
91 According the survey findings, developers were reluctant
to adopt BDD as most of them (70%) found it difficult to use.
90.5 All the subjects claimed that BDD required more effort than
TDD BDD TDD. Most of the developers (80%) claimed that the TDD
Development Technique would be the first choice of development method.
Error! Reference source not found. shows the mean B. Comparison with existing studies
value of both code katas for the code complexity. A slight
increase of 1.68% in the complexity was noted. Since, internal Cisneros Gómez [20] conducted a similar experiment for
code quality depends mainly on the code complexity, a slight his Master thesis. The author compared three different
increase in the code complexity implies a small decrease in the techniques, namely TDD, BDD and ATDD using computer
internal code quality. science students as subjects. The results showed that BDD
improved the external quality. However, TDD offered better
productivity and internal quality than BDD.
369
2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE)
December 11–12, 2019, Amity University Dubai, UAE
Comparing the experiment of Cisneros Gómez which was all aspects of TDD and BDD development. Furthermore, the
carried out in an academic setting with our experiment carried difficulty of the exercises can also be increased and applied in
out in an industry setting, it can be concluded that both similar industrial environments in order to verify whether
experiments found a decrease in productivity when using similar results are obtained across IT industries
BDD compared to TDD. Both studies found that there was a
small increase in the external quality with a small decrease in REFERENCES
internal quality. The results of our study are therefore [1] E. C. Silva Santos, D. M. Beder and R. A. Dellosso Penteado, “A Study
consistent with the experiment conducted by Cisneros Gómez of Test Techniques for Integration with Domain Driven Design,” in
with both studies reaching the same conclusions irrespective 12th International Conference on Information Technology - New
of the academic or industry settings. Table V summarizes the Generations, 2015.
two experiments. [2] D. North, “Introducing BDD,” 2006. [Online]. Available:
http://dannorth.net/introducing-bdd. [Accessed August 2019].
V. CONCLUSION AND FUTURE WORK [3] D. North, “How to sell BDD to the business. Skills Matter,” 2009.
[Online]. Available: https://skillsmatter.com/skillscasts/923-how-to-
The aim of this study was to identity and validate the sell-bdd-to-the-business#showModal?modal-signup-complete.
effects of Test-Driven Development (TDD) and Behavior- [Accessed August 2019].
Driven Development (BDD) in an industry setting of a private [4] D. Janzen and H. Saiedian, “Does Test-Driven Development Really
IT company. In order to accomplish this, the current states of Improve Software Design Quality?,” IEEE Software , vol. 25, no. 2,
TDD and BDD were initially surveyed. The literature review 2008, pp. 77-84.
had the aim of identifying the relevant studies and finding the [5] T. D. Hellmann, A. Sharma, J. Ferreira and F. Maurer, “Agile testing:
Past, present, and future - Charting a systematic map of testing in agile
relevant impacts, metrics and challenges. However, we noted software development,” Agil. Conf. Agil., 2012, pp. 55-63.
a significant lack of BDD studies performed in industrial [6] K. Beck, Test Driven Development: By Example, Boston, MA, USA:
environment. A controlled experiment was then conducted Addison-Wesley Longman Publishing Co., 2002.
using the metrics identified from literature which consisted of [7] D. Janzen and H. Saiedian, “Test-Driven Development:Concepts,
external quality, internal quality and productivity. The Taxonomy,and Future Direction,” Computer, vol. 38, no. 9, 2005, pp.
experiment was conducted in two phases: knowledge phase, 43-50.
where the subjects were introduced to the two development [8] H. Erdogmus, M. Morisio and M. Torchiano, “On the effectiveness of
techniques and coding phase, where the subject implemented the test-first approach to programming,” IEEE Transactions on
Software Engineering, vol. 31, no. 3, 2005, pp. 226-237.
the two code katas namely Bowling Score Keeper and String
Calculator. [9] H. Munir, K. Wnuk, K. Petersen M. and Moayyed., 2014. “An
experimental evaluation of test driven development vs. test-last
The results of the experiment showed that a slight development with industry professionals”, In proceedings of the 18th
International Conference on Evaluation and Assessment in Software
improvement in the external quality compared to TDD. The Engineering, ACM, 2014.
use of Gherkin language might have contributed to this
[10] L. A. Cisneros, M. Maximiano, C. I. Reis and J. A. Quiña Mera, “An
improvement as it allows a better understanding of the Experimental Evaluation of ITL, TDD and BDD,” in ICSEA 2018 :
requirements to be implemented. However, a decrease in The Thirteenth International Conference on Software Engineering
productivity and internal quality were noted when applying Advances, 2018.
BDD compared to TDD mainly due to the lack of experience [11] R. Osherove, “TDD Kata 1 - String Calculator,” 2018. [Online].
of the programmers who had little or no experience for this Available: http://osherove.com/tdd-kata-1/. [Accessed August 2019].
technique. This could also mean that extensive training might [12] M. Whelan, “FizzBuzzWhiz Kata,” [Online]. Available:
have been required for this technique. Another factor that https://github.com/mwhelan/Katas/tree/master/Katas.FizzBuzzWhiz.
[Accessed 9 June 2019].
could have explained the drop in the productivity when using
[13] J. H. Lopes, “Evaluation of Behavior-Driven Development,” Delft,
BDD is probably because more steps are involved in the latter Netherlands, 2012.
than in TDD. Table VI shows the summary of the hypotheses
[14] M. Wynne and A. Hellesøy, The Cucumber Book: Behaviour-Driven
that have been accepted. According to the results, all the null Development for Testers and Developers, The Pragmatic Bookshelf,
hypotheses were rejected, and the alternative hypotheses were 2012.
true in all the cases. [15] Wohlin, P. Runeson, A. Wesslén, B. Regnell, M. Ohlsson and M. Host,
Experimentation in Software Engineering: An Introduction, 1st ed.,
TABLE VI. SUMMARY OF HYPOTHESIS ACCEPTED Springer, Ed., 1999.
[16] T. J. McCabe, “A Complexity Measure,” IEEE Transactions on
Factor Hypothesis accepted Software Engineering SE, vol. 2, no. 4, 1976, pp. 308-320.
External 𝑯𝟏𝟏 : External code qualityBDD [17] SourceForge, “Metrics 1.3.6,” [Online]. Available:
Quality > External code qualityTDD http://metrics.sourceforge.net/. [Accessed 7 July 2019].
Internal 𝑯𝟐𝟐 : Internal code qualityBDD [18] R. M. Patelia and S. Vyas, “A Review and Analysis on Cyclomatic
Quality < Internal code qualityTDD Complexity”, Oriental Journal Of Computer Science & Technology,
vol. 7, no. (3), 2014, pp. 382-384.
Productivity 𝑯𝟑𝟐 : ProductivityBDD < ProductivityTDD
[19] Mountainminds GmbH & Co. and KG and Contributors, “EclEmma
Within the context of this experiment, a small-scale study 3.1.2 Java Code Coverage for Eclipse”, 2017. [Online]. Available:
https://www.eclemma.org/. [Accessed July 2019].
was conducted with only 10 participants as professional
[20] L. A. Cisneros Gómez, “Analysis of the impact of Test Based
developers. In order to obtain more accurate results, a larger Development Techniques (TDD, BDD and ATDD) to the software life
number of subjects should be considered in the future. This cycle,” 2018.
will definitely improve the significance of the results and help
researchers to draw conclusions that could be generalized to
370