0% found this document useful (0 votes)
8 views

Accelerating Software Development Using Generative AI ChatGPT Case Study

The paper discusses a systematic approach to accelerate software development using Generative AI, specifically ChatGPT, within the Software Development Life Cycle (SDLC). It proposes a prompting methodology based on meta-model concepts to enhance the efficiency and quality of deliverables across various SDLC phases. The approach is validated through a case study involving small yet complex business application development, demonstrating the potential of Generative AI to lower skill barriers and improve productivity.

Uploaded by

brunojimezz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Accelerating Software Development Using Generative AI ChatGPT Case Study

The paper discusses a systematic approach to accelerate software development using Generative AI, specifically ChatGPT, within the Software Development Life Cycle (SDLC). It proposes a prompting methodology based on meta-model concepts to enhance the efficiency and quality of deliverables across various SDLC phases. The approach is validated through a case study involving small yet complex business application development, demonstrating the potential of Generative AI to lower skill barriers and improve productivity.

Uploaded by

brunojimezz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/378394341

Accelerating Software Development Using Generative AI: ChatGPT Case Study

Conference Paper · February 2024


DOI: 10.1145/3641399.3641403

CITATIONS READS
0 586

4 authors, including:

Asha Rajbhoj Vinay Kulkarni


Tata Consultancy Services Limited Tata Consultancy Services Limited
21 PUBLICATIONS 89 CITATIONS 166 PUBLICATIONS 1,098 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Asha Rajbhoj on 29 February 2024.

The user has requested enhancement of the downloaded file.


Accelerating Software Development Using Generative AI:
ChatGPT Case Study
Asha Rajbhoj Akanksha Somase
TCS Research, Tata Consultancy Services, Pune, India TCS Research, Tata Consultancy Services, Pune, India
[email protected] [email protected]

Piyush Kulkarni Vinay Kulkarni


TCS Research, Tata Consultancy Services, Pune, India TCS Research, Tata Consultancy Services, Pune, India
[email protected] [email protected]

ABSTRACT 1 INTRODUCTION
The Software Development Life Cycle (SDLC) comprises multiple The Software Development Life Cycle (SDLC) consists of multiple
phases, each requiring Subject Matter Experts (SMEs) with phase- phases. Each phase of the SDLC produces distinct engineering arti-
specific skills. The efficacy and quality of deliverables of each phase facts and requires Subject Matter Experts (SME) with skills relevant
are skill dependent. In recent times, Generative AI techniques, to each phase. The quality, as well as the efficiency of deliverables,
including Large-scale Language Models (LLMs) like GPT, have be- is skill dependent. The recent progress in Generative AI techniques
come significant players in software engineering. These models, has significantly influenced software engineering and we believe it
trained on extensive text data, can offer valuable contributions to can lower this skills barrier by enabling domain SMEs to operate at
software development. Interacting with LLMs involves feeding the natural language level. Large-scale language models (LLMs),
prompts with the context information and guiding the generation like OpenAI’s Codex [1], and Generative Pre-trained Transformer
of textual responses. The quality of the response is dependent on (GPT) [2, 3] are increasingly adopted in AI-driven software engi-
the quality of the prompt given. This paper proposes a system- neering. These are trained on a large corpus of text data and have
atic prompting approach based on meta-model concepts for SDLC capabilities that make it a valuable tool for software development
phases. The approach is validated using ChatGPT for small but to enhance the efficiency and quality of the development process.
complex business application development. We share the approach This can save time and effort for the skilled development teams,
and our experience, learnings, benefits obtained, and the challenges allowing them to focus on higher-level tasks.
encountered while applying the approach using ChatGPT. Our Interacting with LLMs in general involves feeding suitable
experience indicates that Generative AI techniques, such as Chat- prompts (natural language instructions) to provide a context and
GPT, have the potential to reduce the skills barrier and accelerate guide its generation of textual responses [4]. Many researchers
software development substantially. have discussed the future of ChatGPT and other large language
models having a significant effect on how we interact with technol-
CCS CONCEPTS ogy [5, 6]. One may guide LLMs to generate desired responses in
• Software and its engineering → Software creation and man- multiple ways. For instance, one may directly ask LLM to provide
agement; Software development process management; Software details. Another way is to dictate LLM to preempt how to follow the
development methods. response generation. For instance, technology stack, design pattern,
architecture, and so on are preempted prior to setting up the context
for subsequent interaction for the code generation. We propose a
KEYWORDS
systematic prompting approach to leverage LLMs for application
AI in SDLC, Large Language Models, Generative AI, ChatGPT, SDLC development. The approach defines prompt templates for SDLC
automation, Automated Software Development phases based on meta-model concepts. The prompting approach is
ACM Reference Format: validated for small yet complex business application development
Asha Rajbhoj, Akanksha Somase, Piyush Kulkarni, and Vinay Kulkarni. using ChatGPT. We share the approach and our experience, learn-
2024. Accelerating Software Development Using Generative AI: ChatGPT ings, benefits obtained, and challenges encountered while applying
Case Study. In 17th Innovations in Software Engineering Conference (ISEC the approach using ChatGPT. In summary, this paper makes the
2024), February 22–24, 2024, Bangalore, India. ACM, New York, NY, USA, following contributions:
11 pages. https://doi.org/10.1145/3641399.3641403
• Approach for accelerating software development by leverag-
ing Generative AI.
Permission to make digital or hard copies of all or part of this work for personal or • Generic prompt template for SDLC phases based on high-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation level meta-model concepts.
on the first page. Copyrights for components of this work owned by others than the • Evaluation of approach using ChatGPT.
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or • Validation of the approach on a small yet complex enough
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]. business application.
ISEC 2024, February 22–24, 2024, Bangalore, India The organization of the paper is as follows: Section 2 provides
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-1767-3/24/02 a brief overview of related work. Section 3 describes the meta-
https://doi.org/10.1145/3641399.3641403 model used for defining prompts. Section 4 presents a prompting
ISEC 2024, February 22–24, 2024, Bangalore, India Asha Rajbhoj et al.

Figure 1: Meta Model

approach. Section 5 presents an evaluation of the approach using efficient code snippets [15, 16]. Talasbek discussed how advanced
the case study. Section 6 presents threats to validity. Section 7 AI technologies like ChatGPT can improve software testing process
discusses overall learning and future work. efficiency, and productivity [17]. They explored the possibilities
of automation through the automated generation of test plans and
test scripts using Python and Selenium WebDriver. Ameya et al.
2 RELATED WORK
studied the impact of Generative AI on software development [23].
There has been a growing interest in using Generative AI techniques They interviewed 30 professionals from diverse groups of software
for software engineering tasks, including requirements engineering, engineers, UX designers, and project managers. This research found
design, and testing [1, 3, 19-22]. Several researchers have explored Generative AI is effective in the SDLC, irrespective of the size, scale,
the application of Generative AI. This section reviews related work and nature of the enterprises adopting it.
in all SDLC phases. To the best of our knowledge, the utilization
of Generative AI techniques across various stages of the Software
Development Life Cycle (SDLC) is a relatively unexplored area.
Jianzhang et al. conducted an empirical evaluation of ChatGPT 3 META-MODEL
on retrieving requirements information, specifically NFR, features, Fig. 1 depicts the high-level meta-model used for defining the
and domain terms [7]. Their qualitative and quantitative results prompting approach for the SDLC process. The meta-model has
indicated impressive performance results. White et al. proposed three parts:
prompt design techniques for software engineering in the form Requirement specification meta-model: This covers various
of patterns to enhance the use of LLMs, such as ChatGPT, for requirements specification concepts – Context, Process, Activity,
improving requirements elicitation, rapid prototyping, code quality, Parameter, Rule, RuleType, and various types of associations among
refactoring, and system design [8, 9]. Ruan et al. presented an these concepts. Functionality can be decomposed into multiple Pro-
automated framework for generating requirements models from cess. A Process may have subprocesses. A Process can be described in
requirements written in natural language [24]. They used ChatGPT terms of one or many activities. An Activity can be further decom-
to extract requirement description elements from the requirements posed into sub-activities. A Process may have multiple Rule. Each
text and to present them in a structured format. rule is categorized into one of the predefined RuleType. An Activ-
On the architecture and design front, Galanos et al. presented ity may have multiple input-output (IO) Parameter. Applications
Architext, a semantic generation tool that generates architectural typically need to take care of geography-specific regulations, and
designs using natural language prompts as input to LLMs [10]. varying currencies, address specific details, and so on depending
Ahmad et al. conducted a case analysis of a services-driven software on the operating market and geography. Context concept specifies
application, demonstrating ChatGPT’s potential to support it [11]. these details.
Several researchers have explored the use of LLMs for a variety Design specification meta-model: This covers various design
of code generation. Researchers investigated the Text-to-SQL ca- specification concepts related to presentation, service and database
pabilities of the GPT3 model [12, 13]. H Tian et.al conducted an layer. The presentation layer concepts includes- Screen, user inter-
empirical analysis of ChatGPT performance on code generation, face class (UIClass), attribute to display data attributes (UIAttribute),
program repair, and code summarization and compared it with and buttons. The service layer concepts include – Service and Op-
state-of-the-art approaches [14]. They concluded that ChatGPT eration. The data layer concepts include – Entity and Relationship.
can handle typical programming challenges and discussed limited The presentation layer Button invokes Operation of Service.
attention span. The paper highlights the significance of prompt en- Code generation meta-model: This includes all the concepts
gineering for practical applications in software engineering. Many of design specification meta-model and additional concepts - Class,
researchers have successfully used Codex to generate accurate and Attribute, and Operation.
Accelerating Software Development Using Generative AI: ChatGPT Case Study ISEC 2024, February 22–24, 2024, Bangalore, India

Figure 2: SDLC Process using Generative AI

4 SDLC PROCESS USING GENERATIVE AI 4.1 Requirements Specification Generation


Fig. 2 depicts the step-by-step methodological approach for the The requirements specification generation prompt template takes
development of an application. The overall approach for application the business domain, business requirements, and context as in-
creation using Generative AI contains the following steps: put. Here business domain is related to business needs such as
employee pension, banking, insurance etc. Business requirements
• Requirements specification generation: This step gen-
define the scope of the application. Typically, this defines the sub-
erates various requirements specification details such as
set of requirement in a specific domain. Context may refer to any
processes, activities, parameters, and rules for the input re-
additional details related to operating market/geography. Require-
quirements.
ments specification generation prompt template covers prompt
• Design specifications generation: This step generates
to generate the processes and rules for a given business domain,
various design specification details such as entities, entity
business requirements, and context. For process generation, the
relationships, services, screens, screen fields for input/out-
prompt covers activities and parameters. For rule generation, the
put display, and buttons based on the input requirements
prompt covers nine rule types - Access Control Rules, Business
specification.
Logic Rules, Calculation Rules, Compliance Rules, Data Validation
• Application code generation: This step generates code
Rules, Display Rules, Notification Rules, State Change (Transition)
based on input design specifications. Code comprises user
Rules, and Workflow Rules [27].
interface code, service layer code, business logic, database
Prompt Template:
scripts, and so on based on the services and screen details
Input: <Business Domain>, <Context>, <Business requirements>
generated in design specification generation.
Prompts:
• Test case generation: This step involves generating func-
tional, nonfunctional, unit, and system test cases based on 1. Generate requirement specifications. Consider the appli-
input services and screen details generated in design specifi- cation in <Business Domain>. The application context is <
cation generation. Context>. Outlined below are the application business re-
quirements < Business requirements>.
The artifacts for the above steps are generated using prompts in
2. Generate processes for the above requirements. Generate
prompt templates. These prompts are defined based on the high-
activities for the processes along with input and output pa-
level meta-model shown in Fig. 1. The prompts are domain and
rameters.
LLM agnostics. Templates take the input parameters. These param-
3. Generate rules for the above requirements covering these
eter values are then replaced in prompts in the template manually
rule types - Access Control Rules, Business Logic Rules, Cal-
prior to executing the prompts on LLMs. As LLMs are probabilistic
culation Rules, Compliance Rules, Data Validation Rules,
models, they commonly experience context forgetting. To minimize
Display Rules, Notification Rules, State Change (Transition)
the number of conversation iterations, prompts are meticulously
Rules, and Workflow Rules.
designed to provide the necessary context at the start of each phase.
Responses are reviewed by a Subject Matter Expert (SME) to ensure
correctness and completeness. To ease the review process, prompts 4.2 Design Specification Generation
indicate various classes in which generated artifacts are to be clas- The design specification generation prompt template takes as input
sified. If the response contains undesired content that falls outside the business domain, processes, and rules details. The input related
the application scope, LLM is subsequently prompted through edit to processes and rules is taken from the generated requirements
prompts for the updates. The following subsections outline the specifications. The 1st prompt sets the initial context using the
prompt templates for all four SDLC process steps. inputs. 2nd prompt generates entities and relationships. Subsequent
ISEC 2024, February 22–24, 2024, Bangalore, India Asha Rajbhoj et al.

prompts are structured to first generate high-level descriptions, Input: <Business Domain>, <Technology Stack>, <ER Spec: De-
followed by detailed specifications. For instance, the 3rd prompt sign Spec Prompt 1 Response>, <Services Spec: Design Spec Prompt
of this template generates high-level descriptions of the services. 2 Response>, <Screens Spec: Design Spec Prompt 5 Response>
After reviewing these, service details such as service signature with Prompts:
input and output data type are generated in the 4th prompt. This is 1. My application domain is <Business Domain>. Application
executed for each service generated from the 3rd prompt. Similarly, Architecture specification is as follows: < Technology Stack>.
the 5th prompt generates the high-level descriptions of the screens. Consider the following design specification: Entities: <ER
After reviewing these, screen details such as buttons, input /output Spec>. Services Specification: <Services Spec >
data fields, and screen flow details are generated in the 6th prompt. 2. For each <service> IN Services Spec
This prompt is executed for each screen generated by 5th prompt. i. Generate service class code for <service>.
Prompt Template: ii. Generate data access object class code for <service>.
Input: <Business Domain>, <Process Spec: Requirement Spec 3. Generate input output class code for <service>.
Prompt 2 Response>, <Rules Spec: Requirement Spec Prompt 3 4. For each <screen layout> IN Screens Spec
Response> i. Consider following screen layout and generate code: <
Prompts: screen layout>
1. My application domain is <Business Domain>. Consider the ii. Generate screen validation and form submission code.
following requirement specification: Processes: <Processes 5. For each <entity> IN ER Spec - Generate database schema
Spec> Rules: <Rules Spec> for <entity> table.
2. Generate Entities and Relationships descriptions for the
above requirements specification. 4.4 Test Cases Generation
3. Generate Services description for the above requirements The test case generation prompt template takes as input the busi-
specification. ness domain, services, and screen details generated in the design
4. For each <service> IN <Prompt 2 Response> - Generate phase. The prompt template covers prompts to generate various
further details for <service> such as input and output pa- types of test cases along with test data. The 1st prompt sets the
rameters. initial context using the inputs. In the 2nd prompt, test cases are
5. Generate Screens for the above requirements specification. generated for each service, covering various test case types such
6. For each <screen> IN <Prompt 4 Response> - Generate as functional test cases, data validation test cases, and exception
further details for <screen> such as UI Classes, Attributes, handling. Similarly, in the 3rd prompt, test cases are generated for
Buttons, and Screen flows. each screen, covering test case types such as screen field interac-
tion, screen flow navigation, and data validation. After the test case
4.3 Code Generation generation for all services and screens, in the 4th prompt, system
test cases are generated.
The code generation prompt template takes as input the business
Prompt Template:
domain, technology stack, entities, services, and screen details.
Input: <Business Domain>, <Services Spec: Design Spec Prompt 2
The technology stack specifies the technologies of interest to gen-
Response>, <Screens Spec: Design Spec Prompt 4 Response>
erate the application. The input related to entities, services, and
Prompts:
screen control details are taken from the generated design specifi-
cations. The prompt template covers instructions to generate code 1. My application domain is <Business Domain>. Consider
for each service class, followed by its data access object class and the following design specification: Services Specification:
input-output class. After generating the service layer code, code for <Services Spec> Screens Specification: <Screens Spec>
screens is generated. The screen layout descriptions from the design 2. For each <service> IN Services Spec - Generate test cases
specification are provided within the prompt, and code is gener- for <service> covering these test case types – Functional test
ated for screen validation and event handling. Similarly, database case, Data validation test case, and Exception handling. Also,
schemas for tables based on the entity-relationship specification generate the test data for each test case.
are generated. 3. For each <screen> IN Screens Spec - Generate test cases
After reviewing LLM prompt responses, generated code can be for <screen> covering these test case types – Screen field
copied into the integrated development environment (IDE). The interaction, Screen flow navigation, Data validation. Also,
prompts for the code generation are designed such that the manual generate the test data for each test case.
effort required to copy and paste the generated code is minimal. 4. Generate system test cases for the application. Also, generate
Code corresponding to a single file is generated in the same re- the test data for each test case.
sponse to the extent possible. For instance, code for each service,
DAO, and entity-specific class is generated in a separate response. 5 CASE STUDY
Code is to be checked for compilation errors if any and further We used ChatGPT, a Generative AI technique, and Employee Pen-
augmented manually. For instance, business logic for the services, sion Plan System case study to validate the prompting approach.
modification of screen style, etc. can be added/modified if not as Requirements specification, design specification, and test case gen-
desired. eration prompts are executed by an experienced person in one Chat-
Prompt Template: GPT session. Code generation prompts are executed by a novice
Accelerating Software Development Using Generative AI: ChatGPT Case Study ISEC 2024, February 22–24, 2024, Bangalore, India

developer. In this section, we share our experiences in applying The representative prompts execution and observations for re-
the approach using ChatGPT for application development. Due quirements specification generation are shown in Table 1. Some
to the large size of ChatGPT response text, in this paper, we have review edit prompts were required for missed information. Chat-
not shown prompt responses. Instead, this section presents our GPT generated requirement specifications corresponding to On-
positive and negative observations, of the prompt responses. All boarding of New Employees, Employee Fund Allocation, Balance
responses are evaluated to check whether they address the prompt Tracking and Reporting, Recording of Month-end Returns, Dash-
context, include all the necessary details, consistent, and do not boards and Reports. Generated high-level data elements required
conflict with other responses. review edit prompt. 1st Edit (1E) prompt shown in Table 1 was
required for the addition of missing information such as grade,
5.1 Requirements Specification Generation department, employee contact information, employment start date,
and end date. ChatGPT generated processes, corresponding to the
Requirements specification generation prompt template is used for five requirements - i) Onboarding of New Employees, ii) Employee
the requirements specification generation of the case study. The Fund Allocation, iii) Balance Tracking and Reporting, iv) Recording
scope of the application is limited by specifying business require- of Month-end Returns, v) Dashboards and Reports. It provided
ments. The input parameters used for the template are - a description of the processes. Each requirements specification
<Business Domain>: Employee Pension Domain covered the necessary activities. The generated processes were
<Context>: INDIA Geography accurate. ChatGPT generated multiple rules corresponding to each
<Business requirements>: i) On-board new employees to the pen- of the processes along with rules classification. A few examples
sion plan and capture employee information. ii) Enable employees of classification output are as follows – i) Validation Rule: Ensure
to allocate their funds to available asset types iii) Allow adminis- that all mandatory employee information fields are completed before
trator and employee to view their current and past fund balances, enrolling an employee in the pension plan. ii) Display Rule: Display
and their contribution allocation on a monthly basis. iv) Record comprehensive information to employees upon their enrollment in the
month-end returns for the various asset types and incorporate these pension plan, including the selected contribution rate, available asset
returns into the employee balances. v) Provide employee-specific types, and investment strategy.
and overall dashboards of fund-wise balances.

Table 1: Requirements Specification Generation Prompts and Observations

PromptSr. Prompts SME Observations


No.
1 Generate requirement specifications. Consider the application (+) High level requirement specifications are generated
in Employee Pension Domain. The application context is using the given requirements. It covered high level data
INDIA Geography. Outlined below are the application elements.
business requirements.
i) On-board new employees to the pension plan and (+) It generated Non-Functional Requirements as well.
capture employee information. ii) Enable employees to
allocate their funds to available asset types iii) Allow (-) SME Review identified modifications were required in
administrator and employee to view their current and data elements of User Management.
past fund balances, and their contribution allocation on a
monthly basis. iv) Record month-end returns for the
various asset types and incorporate these returns into the
employee balances. v) Provide employee-specific and
overall dashboards of fund-wise balances.
1E On-boarding new employees needs an update: Add grade, (+) It generated updated requirement specification for
department, phone number, email address, salary, start date, On-boarding employee.
and end date.
2 Generate processes for the above requirements. Generate (+) It generated descriptions of high-level processes: i)
activities for the processes along with input and output Onboarding of New Employees ii) Employee Fund
parameters Allocation iii) Balance Tracking and Reporting iv)
Recording of Month-end Returns v) Dashboards and
Reports
3 Generate rules for the above requirements covering these rule (+) It generated rules corresponding to each process.
types - Access Control Rules, Business Logic rules, Calculation (+) It generated multiple rules of each type for processes.
rules, Compliance rules, Data Validation rules, Display rules,
Notification rules, State Change (Transition) rules, and
Workflow rules.
ISEC 2024, February 22–24, 2024, Bangalore, India Asha Rajbhoj et al.

In total ChatGPT generated 5 processes, 12 activities, 28 parame- iii)DashBoardService iv)AuthService iv)PensionPlanService. The
ters, and 11 rules. All activities, rules, and parameters of processes prompt response was satisfactory. Each service covered the func-
were verified for correctness. The generated requirements specifi- tionality details in terms of operations. For instance, the gener-
cations were satisfactory and met the input business requirements. ated EmployeeService description covered functionality related to
managing employee information and onboarding new employees
5.2 Design Specification Generation to the pension plan. ChatGPT could generate the service details
On finalizing requirements specification, design specifications are such as names of operations along with input and output data
generated using ChatGPT. Design specifications covered the cre- type details. A few operations specification generated for Em-
ation of detailed entities, entity relationships, services, and screen ployeeService are as follows: addEmployee(employee: Employee):
specifications. The representative prompts execution and obser- Employee ; updateEmployee(employeeId: string, updatedEmploy-
vations for design specification generation are shown in Table eeInfo: Partial<Employee>): Employee ; getEmployeeById(employ-
2. ChatGPT could generate entities and relationships. The re- eeId: string): Employee ; validateEmployeeInfo(employee: Em-
sponse covered five entities: Employee, Fund Allocation, Asset ployee): Boolean ; deleteEmployee(employeeId: string): boolean
Type, Balance, and Returns; four relationships: i) Employee to ChatGPT generated seven screen descriptions. Here, context
Fund Allocation, ii) Employee to Balance, iii) Balance to Employee forgetting was observed. The screens were not as desired. So, we
and Asset, iv) Returns to Asset Type. Each entity was described provided the screen names through a subsequent edit prompt (Ta-
with the attributes along with the data types. The primary key ble 2, 5E). The screen names given are as - i) Onboarding Screen
and foreign key attributes were also marked in the generated text. ii) Contribution Allocation Screen iii) Balance and Contribution
The generated entity attributes were consistent with requirements Summary Screen iv) Fund Performance Record Screen v) Dashboard
specification parameters. We observed that all the generated speci- Screen. With this edit prompt, ChatGPT generated accurate screen
fications details were in sync. ChatGPT generated textual descrip- specification details. Further for each screen, ChatGPT generated
tion of five Services – i) EmployeeService, ii) ContributionService detailed screen layout information. This included screen structure

Table 2: Design Specification Generation Prompts and Observations

Prompt Prompts SME Observations


Sr. No.
1 My application domain is Employee Pension Domain. (+) It summarized the context it understood from the entire input
Consider the following requirement specification: text. Summary was accurate.
Processes: <Table 1 2nd Prompt Response>, Rules:
<Table 1 3rd Prompt Response>
2 Generate Entities and Relationships descriptions for the (+) It generated textual description for all Entities and
above requirements specification. Relationships. The Entities are as follows: Employee (employee_id,
first_name, last_name, grade, department, phone, email,salary,
start_date, end_date), Fund Allocation (allocation_id, employee_id,
asset_type,investment_strategy), Balance(balance_id, employee_id,
balance_date, asset_type, balance_amount ), Returns(returns_id,
returns_date, asset_type, returns_amount )
3 Generate Services description for the above (+) It generated textual description of Services – i)
requirements specification. EmployeeService, ii) ContributionService iii)DashBoardService
iv)AuthService v)PensionPlanService.

4 Generate further details for EmployeeService such as (+) It generated service details including service description,
input and output parameters. method signature with input and output parameters.
5 Generate Screens for the above requirements (-) It generated high level description of screens – i) Login screen
specification ii) Employee dashboard iii) Admin Dashboard iv) Update employee
screen v) View employee screen
(-) These generated screens do not cover all the application’s
requirements.
5E Consider following screens: i) Onboarding Screen ii) (+) It generated high level functional description for all the screens.
Contribution Allocation Screen iii) Balance and
Contribution Summary Screen iv) Fund Performance
Record Screen v) Dashboard Screen Generate
specification details.
6 Generate further details for Onboarding Screen such (+) It generated textual information of this screen that contains i)
as UI Classes, Attributes, Buttons, and Screen flows. Form Field (input / output controls) ii) Buttons iii) validations
Accelerating Software Development Using Generative AI: ChatGPT Case Study ISEC 2024, February 22–24, 2024, Bangalore, India

such as screen title, screen fields, and screen behavior. For instance, 5.3 Code Generation
for the On-Boarding Screen, it generated a screen structure descrip- Code generation prompts are executed by a novice developer in
tion that covered - Name (text input), Phone Number (text input), a separate ChatGPT session. Generated Entity specifications and
Department (dropdown list), Salary (number input), and Start Date high-level services are used as input to the prompt template. The
(date picker), buttons for data submission. The generated text also technology stack is provided as follows: PHP, HTML, CSS, and
had screen behavior related to data validation, mandatory/ optional MySQL. The representative prompts execution and observations
data fields, data format, minimum/ maximum field values, and so for code generation are shown in Table 3. As shown in 1st prompt
on. Table 3, context is set through the design specification and tech-
Overall, ChatGPT generated 4 entities, 4 entity relationships, nology stack. ChatGPT summarized the input given, which was
5 services, 13 operations, and 5 screens. The generated specifica- as per design specification. The subsequent prompts generated
tions are verified with respect to the input requirement specifica- service class code, data access object (DAO) class code, and entity
tions. Additionally, we checked if the generated services included specific class code for each service. Considering entity relation-
appropriate methods, input, output details, screen specifications ships, ChatGPT automatically imported dependent services PHPs
screen attributes, buttons, and screen flows. The generated design in service class code. The DAO class code generated CRUD (Create,
specifications were satisfactory and met the input requirements Read, Update, Delete) operations. However, for other function-
specifications. alities, stubbed methods were generated, and business logic was
written manually. The generated code for services had some incon-
sistencies. Multiple instances of compilation errors occurred due

Table 3: Code Generation Prompts and Observations

PromptSr. Prompts Developer Observations


No.
1 My application domain is Employee Pension (+) It summarized the context it understood from the entire input text.
Domain. Application Architecture Summary was accurate.
specification is as follows: PHP, HTML, CSS
and MySQL.
Consider the following design specification:
Entities: <Table 2 1st Prompt Response>
Service Specification: <Table 2 2nd Prompt
Response>
2 i) Generate service class code for (+) It generated PHP code of EmployeeService.
EmployeeService. It assumed architecture as follows: EmployeeService methods takes
EmployeDAO as parameters.

The generated code of EmployeeService that contained all method


implementation details that were part of Employee Service Description
Response.
2 ii) Generate data access object class code for (+) It generated php code for EmployeeDAO. The methods and data referred
EmployeeDAO in the code was correct. EmployeeDAO methods takes Employee as
parameters.
2 iii) Generate input output class code for (+) It generated php code for Employee class considering entity attribute.
Employee Generated code contained getter and setter methods. There was consistency
in generated code for all three classes EmployeeSrvice, EmployeeDAO and
Employee.
3 i) Consider following screen layout and generate (+) It generated PHP code based on HTML for the On Boarding Screen.
code: <Table 2 4th Prompt Response> (+) PHP code is generated, it also referred to EmployeeService code
(+) It also confirmed that: code assumes that there are two separate PHP
scripts to handle the form submission and to return to the previous screen
with file names submit_employee.php and previous_screen.php.
(-) For subsequent generation of the two file next prompt was given
3 ii) Generate screen validation and form (+) It generated code to retrieve the form data, to validated it, and either to
submission code. display errors or instantiation of a new employee and redirected to a
success page
4 Generate database schema for <entity> table (+) The generated schema code was accurate.
ISEC 2024, February 22–24, 2024, Bangalore, India Asha Rajbhoj et al.

to mismatches in parameters, classes, and operation naming, as Overall, each test case was well-defined by ChatGPT and in-
well as constructors. Code corrections related to missing imports cluded different scenarios such as invalid inputs, data accuracy,
were manually resolved. The code for handling messages and user screen navigation, and so on. For instance 1) for On boarding em-
alerts was implemented manually. For instance, for Contribution- ployee operation of EmployeeService, ChatGPT generated test case
Service constructor did not have DAO class as a parameter but for the addition of a new employee with valid data, a test case for
was created in the constructor, and for EmployeeService construc- the addition of an employee with invalid data, etc. 2) for onboard-
tor it was taken as a parameter. Business logic was plugged in ing screen ChatGPT generated test case to validate that the Start
wherever required. Manual code was written to establish database Date is not later than the End Date, test case to verify that the
connectivity. phone number follows a valid format, test case to validate that the
For screen generation, the generated screen layout description email address is in a valid format. 3) it generated test cases for
of design specification generated from 5th prompt is given as input. transition from the Onboarding Screen to the Contribution Allo-
Generated code is compiled and reviewed for syntax and semantics cation Screen after successfully adding an employee 4) for system
errors. Errors were manually resolved. Generated HTML code was test cases ChatGPT generated an employee’s ability to complete
largely as per screen specifications. There was a lack of consistency the onboarding process, navigate to the Contribution Allocation
in the CSS. Manual integration of CSS into HTML was required. Screen, adjust allocation, and save changes.
Manual resizing and positioning of UI components was needed. The It generated 35 test cases for screens, 39 test cases for services
code generated by ChatGPT lacked validation checks. For instance, and 11 test cases for the system testing. Five system test cases
no code is generated to prevent the insertion of duplicate records covered non-functional testing related to performance, security,
to the database, such as checking for existing entries with the same usability, accessibility, and browser compatibility. There were no
email address or other unique identifiers. review edits for the test cases. Generated test cases were used for
testing. Test cases related to services are tested through driver stub
Table 4: Code generation summary code. Test cases related to screens are executed from a web browser
Code Count by deploying code using XAMPP platform. Fig. 3 shows sample
representative screens of the application developed.
Number of Files Generated 42
LoC Generated 4914
LoC Written 310
LoC Relevant 4604
LoC Not Relevant 237

Table 4 shows the summary of generated and manually written


code along with relevant and non-relevant code. There were a
total of 42 code files generated with approximately ∼5K Lines of
Code (LoC). Approximately 310 LoC had to be manually written
and ∼237 LoC were inaccurate. Inaccurate code either gave com-
pilation errors or runtime errors. These were manually corrected.
The execution, debugging, and troubleshooting of the code were
performed manually by the developer.

5.4 Test Cases Generation


Test cases are generated after finalizing design specifications. We
asked ChatGPT to generate test cases for each screen and service,
followed by system test cases as per the prompt template. The
generated test cases are primarily focused on the system function-
ality considering the input and output scenarios (black box testing).
For each screen, it generated positive and negative scenarios, ver-
ification of mandatory and optional fields, screen compatibility
checking accessibility and display, field type verifications such as
date is less than / greater than the current date, and so on. A few
screen test cases covered integration test cases that involved end-
to-end functionality testing. For services test cases, it generated
functional test cases for each service method along with other test
cases related to concurrency, data integrity, error handling, and
exception-related test cases. For instance, for service AuthService
it had test cases related to the encryption of sensitive information, Figure 3: Sample Application Screens
for DashboardService test cases related to filtering and sorting were
generated.
Accelerating Software Development Using Generative AI: ChatGPT Case Study ISEC 2024, February 22–24, 2024, Bangalore, India

5.5 SDLC Productivity A productivity gain of ∼70.66% is observed. Leveraging ChatGPT


Generated requirements, design, and test specifications are manu- for software development improved productivity to a large extent.
ally copied into rich text documents for easy future reference. As ChatGPT generated a structural code of the application, including
generated specifications were in textual form, manual formatting service layer code and user interfaces which gave a head start for
related to headings and style was done for better readability. Half a the application development. The developer could further suitably
day’s effort was spent on this activity. The effort spent on software add the application business functionality. The details of the effort
development is compared by estimating the effort using the estima- for application creation are shown in Table 5. The effort includes
tion guideline suggested by D. J. Reifer [25]. The author proposed a ChatGPT response review time as well as correction time.
Web Development Model (WebMo) approach to estimate the effort
and duration needed for web-based application development. The 6 THREATS TO VALIDITY
approach of estimation uses a size metric involving the count of The SDLC phase-specific prompt templates are domain and LLM
web objects as defined by the Halstead equation [26]. The Halstead agnostics. The templates are designed to take the input parameters.
metrics are comprehensive and are based on the number of distinct The parameters in prompts are replaced with input values prior to
operators and operands. Table 5 shows the size of web objects executing the prompts on LLMs. The Employee Pension Plan case
computed for the employee pension plan system. There were 25 study covered the functional complexity across all architectural
distinct web objects spread across the five screens. Using this data, layers including multiple screens, screen flows, multiple services,
the effort is computed by mapping the architectural requirements multiple entities, and relationships. The choice of the domain was
of the employee pension plan system. The calculation resulted in random. ChatGPT proved to be effective in developing the applica-
an estimated effort of ∼75 person days. tion. The template prompt size considered for SDLC productivity
depends on the size of the application in terms of services and
Table 5: Size Metric as per Halstead Equation screens. With the increase in size, the number of edit prompts
may also increase. Observed productivity gain may vary with the
size of the application being developed as well as on the specific
Screens Number of
Generative AI used.
unique web
Validation is carried out using ChatGPT. The results may vary
Objects
across different Generative AI techniques. The replication of re-
Onboard New Employee Screen 3 sults on large-scale application development is yet to be explored.
Contribution Allocation Screen 4 Large-scale business applications are typically complex and have
Balance and Contribution Summary Screen 5 intricate functionality and architecture. Multi-site, multi-team or-
Fund Performance Record Screen 6 ganizations usually develop and maintain large applications by
Dashboard Screen 7 following an integrated process. The approach may need further
Total 25 enhancements to support team collaboration, especially for design
and code generation.

Table 6 shows the prompts and effort breakup for Employee 7 DISCUSSION AND FUTURE WORK
Pension Plan application development using ChatGPT. The count
of template prompts (T) is calculated considering the number of The SDLC process demands Subject Matter Experts (SMEs) with
services and screens. The edit prompts (E) are used for correcting phase-specific skills. The efficacy and quality of the software is skill
the inaccuracies in the generated artifacts. Overall SDLC process dependent. To reduce the skills barrier and to accelerate software
using ChatGPT took us ∼22 person days. Productivity gain is development, in this paper, we propose the use of Generative AI
computed as follows: Productivity Gain = (Estimated Effort – techniques. The use of Generative AI demands prompts engineer-
Actual Effort) / Actual Effort. ing as the quality of the response is dependent on the quality of
the prompt. To address this, we proposed a systematic prompting
approach. The prompting strategy encompasses several key aspects
Table 6: Prompts and Effort Summary i) Prompts are designed using meta-model concepts. ii) Prompt
templates are designed specifically for each SDLC phase and can
SDLC Artifacts PromptsCountT– EffortPer- be executed in separate LLM sessions. The first prompt of each
Generation Template son Day phase-specific template sets the appropriate context for subsequent
promptE–Edit prompt (PD) generation. iii) SDLC phase-specific prompts also covered the spe-
cific needs of a phase. For instance, for rules generation in the
Requirement 3T + 1 E ∼1 PD
requirements specification phase, predefined rule types are given
Specifications
in the prompt. Design phase, the strategy was to first generate
Design 14T + 1 E ∼4 PD
high-level description for services and screens followed by detailed
Specifications
generation. For code generation, code corresponding to a single
Source Code 31T + 14 E ∼15 PD
file is generated to reduce copy-paste efforts.
Test Cases 12T ∼2 PD
The prompting approach is evaluated on a small, yet complex
Total 76 Prompts ∼22 PD
business application using ChatGPT. There were many lessons
ISEC 2024, February 22–24, 2024, Bangalore, India Asha Rajbhoj et al.

learned from this evaluation. The prompting approach of breaking a significant challenge, as there are differences in the natural lan-
down the application functionality into smaller parts worked to a guage text variations, label variations, and structure variations
large extent. Using ChatGPT, we could generate multiple artifacts when correlating previous and new artifacts. To address this issue,
of SDLC phases. However, it was essential to validate the gener- we are exploring prompt engineering to reduce non-determinism.
ated artifacts at each step to prevent the propagation of errors into This paper discussed a waterfall approach for the SDLC. We are
subsequent phases. The involvement of a subject-matter expert was exploring other approaches like Agile software development. We
crucial for effective validation. Requirements specification genera- have also planned to use the approach across various domain appli-
tion and test cases generation gave satisfactory outputs. However cations and LLMs.
multiple challenges were observed in design specification and code
generation. While the response text from ChatGPT was semanti- REFERENCES
cally correct, there were a few instances where the labels used in the [1] Wojciech Zaremba ,Greg Brockman, OpenAi, 2021. OpenAI Codex, (Aug, 2021).
response text did not match those specified in the prompts. For ex- Available at: https://openai.com/blog/openai-codex
[2] OpenAi, 2023. Introducing ChatGPT, (Nov, 2023). Available at: https://openai.
ample, in one iteration, the recommended entities were ”Employee com/blog/chatgpt .
Pension Contribution”, ”Contribution Percentage”, ”Asset Returns”, [3] Ashley Pilipiszyn, OpenAI, 2021. GPT-3 Powers the Next Generation of Apps,
”Fund Balance”, ”Users”, ”Roles”, and ”Role Privileges”. In other (Mar, 2021). Available at: https://openai.com/blog/gpt-3-apps
[4] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and
iterations, the recommended entities were ”Employee”, ”Contribu- Graham Neubig. 2023. Pre-train, Prompt, and Predict: A Systematic Survey of
tion”, ”Pension Plan”, ”Fund”, ”User”, and ”Performance Record”. Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9,
Similarly, the generated code, content, and output forms varied, Article 195 (September 2023), 35 pages. https://doi.org/10.1145/3560815.
[5] Aljanabi Mohammad, 2023. ChatGPT: Future directions and open possibilities.
with differences in the number of services generated, the names Mesopotamian Journal of Cybersecurity (2023), 16-17. DOI: https://doi.org/10.
of functions and parameters, and the format of the output. This 58496/MJCS/2023/003
[6] Aljanabi Mohammad, Ghazi Mohanad, Ahmed H. Ali, and Saad A. Abed, 2023.
non-determinism resulted in increased effort required for review ChatGPT: Open Possibilities. Iraqi Journal for Computer Science and Mathemat-
and modification. ics(Jan,2023),62-64. DOI: https://doi.org/10.52866/20ijcsm.2023.01.01.0018
Occasional difficulty in recalling past conversations was noted [7] Jianzhang Zhang, Yiyang Chen, Nan Niu, Yinglin Wang, Chuang Liu, 2023. A
Preliminary Evaluation of ChatGPT in Requirements Information Retrieval. DOI:
when referencing previously generated output. To overcome this https://doi.org/10.48550/arXiv.2304.12562.
issue, the output had to be used as input multiple times, and sub- [8] Jules White, Sam Hays, Quchen Fu, Jesse Spencer-Smith, Douglas C. Schmidt,
sequent improvements were to be requested through prompts. At 2023. ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Re-
quirements Elicitation, and Software Design. DOI: https://doi.org/10.48550/arXiv.
times past conversation references were given with annotations 2303.07839.
such as short names, keywords, etc. For inconsistency in code gen- [9] Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry
Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt, 2023. A
eration, previous output code was given as input, and subsequent Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. DOI:
improvements were asked through prompts. Multiple prompts https://doi.org/10.48550/arXiv.2302.11382
were required to direct corrections. The generated code was func- [10] Theodoros Galanos, Antonios Liapis, Georgios N. Yannakakis, 2023. Architext:
Language-Driven Generative Architecture Design. DOI: https://doi.org/10.48550/
tionally correct. However, it was not the best. We observed code arXiv.2303.07519
improvements were possible for error handling and performance. [11] Aakash Ahmad, Muhammad Waseem, Peng Liang, Mahdi Fahmideh, Mst
Generative AI for software engineering, particularly for large Shamima Aktar, and Tommi Mikkonen, 2023. Towards Human-Bot Collaborative
Software Architecting with ChatGPT. In Proceedings of the 27th International
and complex applications, is a promising area of research, and fur- Conference on Evaluation and Assessment in Software Engineering (EASE ’23),(
ther investigation is required to fully evaluate its potential and June, 2023), 279-285. DOI: https://doi.org/10.1145/3593434.3593468
[12] Rajkumar Nitarshan, Raymond Li, and Dzmitry Bahdanau, 2022. Evaluating
limitations. Generative AI-based code generation may not be as the Text-to-SQL Capabilities of Large Language Models. DOI: https://doi.org/10.
effective as the Model Driven Engineering (MDE) based approach, 48550/arXiv.2204.00498.
especially for large application development. Model Driven Engi- [13] Aiwei Liu, Xuming Hu, Lijie Wen, Philip S. Yu, 2023. A Comprehensive Evaluation
of ChatGPT’s Zero-Shot Text-to-SQL Capability. DOI: https://doi.org/10.48550/
neering (MDE) presents a solution that shifts the focus to creating arXiv.2303.13547.
problem-specific models and using them for automated validation, [14] Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques
analysis, and code generation. The benefits of enhanced developer Klein, Tegawendé F. Bissyandé, 2023. Is ChatGPT the Ultimate Programming
Assistant—How Far Is It?. DOI: https://doi.org/10.48550/arXiv.2304.11938.
productivity through automated code generation have been proven [15] Burak Yetistiren, Isik Ozsoy, and Eray Tuzun, 2022. Assessing the Quality of
with MDE. However, MDE poses a significant entry barrier for GitHub Copilot’s Code Generation. In Proceedings of the 18th International
Conference on Predictive Models and Data Analytics in Software Engineering
Subject Matter Experts (SMEs) who are typically not well-versed (PROMISE ’22), (Nov, 2022),62-71. DOI: https://doi.org/10.1145/3558489.3559072.
with the technology [18]. In the future, we plan to combine MDE [16] Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation
with Generative AI, to leverage the benefits of both. SMEs can inter- vs. Experience: Evaluating the Usability of Code Generation Tools Powered by
Large Language Models. In CHI Conference on Human Factors in Computing
act with Generative AI tools like ChatGPT using purpose-specific Systems Extended Abstracts (CHI EA ’22), (April, 2022), 1-7. DOI: https://doi.org/
contextual prompts, and the design model can be automatically 10.1145/3491101.3519665.
populated. We are currently exploring the use of a meta-model to [17] Arailym L. Talasbek. 2023. The Automation Capabilities in the Field of Software
Testing. Suleyman Demirel University Bulletin: Natural and Technical Sciences
guide the automatic model population using LLMs. 62, (Mar. 2023), 5-14.
The maintenance of large applications is a crucial area that re- [18] Jon Whittle, John Hutchinson, and Mark Rouncefield. 2014. The State of Practice
in Model-Driven Engineering. IEEE Software 31, 3 (May 2014), 79-85. DOI: https:
quires attention. The correlation between the contextual informa- //doi.org/10.1109/ms.2013.65
tion generated by Generative AI and the custom application-specific [19] Nat Friedman. 2021. Introducing GitHub Copilot: Your AI Pair Pro-
contextual information needs to be explored to facilitate evolution- grammer. URL:https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-
programmer
ary maintenance. Non-determinism in the generation process is [20] Rohith Pudari and Neil A. Ernst, 2023. From Copilot to Pilot: Towards AI Sup-
ported Software Development. DOI: https://doi.org/10.48550/arXiv.2303.04142.
Accelerating Software Development Using Generative AI: ChatGPT Case Study ISEC 2024, February 22–24, 2024, Bangalore, India

[21] Arghavan M. Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel https://doi.org/10.1109/REW57809.2023.00035
C. Desmarais, Zhen Ming, and Jiang. 2022. GitHub Copilot AI Pair Programmer: [25] Donald J. Reifer. 2000. Web Development: Estimating Quick-to-Market Software.
Asset or Liability? .DOI: https://doi.org/10.48550/arXiv.2206.15331 IEEE Software 17, (June, 2000), 57-64. DOI: https://doi.org/10.1109/52.895169.
[22] Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li, 2023. Self-collaboration Code Gener- [26] Maurice H. Halstead. 1977.Elements of Software Science (Operating and Pro-
ation via ChatGPT. DOI: https://doi.org/10.48550/arXiv.2304.07590. gramming Systems Series). Elsevier Science Inc.
[23] Ameya S. Pothukuchi, Lakshmi V. Kota, and Vinay Mallikarjunaradhya, 2023. [27] Asha Rajbhoj, Padmalata Nistala, Ajim Pathan, Piyush Kulkarni, and Vinay
Impact of Generative AI on the Software Development Lifecycle (SDLC). Interna- Kulkarni, 2023. RClassify: Combining NLP and ML to Classify Rules from Re-
tional Journal of Creative Research Thoughts, vol 11, (Aug, 2023). quirements Specifications Documents. In Proceedings of the 31st IEEE Interna-
[24] Kun Ruan, Xiaohong Chen, Zhi Jin, Requirements Modeling Aided by ChatGPT: tional Requirements Engineering Conference (RE ’23), pp. 180-189. DOI:https:
An Experience in Embedded Systems. In Proceedings of the 31st IEEE Interna- //doi.org/10.1109/RE57278.2023.00026
tional Requirements Engineering Conference (RE’23 ), (Sep, 2023), 170-177. DOI:

View publication stats

You might also like