(2023) Smart Knowledge Transfer Using Google-Like Search (ResearchGate)
(2023) Smart Knowledge Transfer Using Google-Like Search (ResearchGate)
Search
Srijoni Majumdar Partha Pratim Das
Advanced Department of
Technology Development Computer Science
Centre and Engineering
Indian Institute of Technology Indian Institute of Technology
Kharagpur-721302 Kharagpur-721302
[email protected] [email protected]
arXiv:2308.06653v1 [cs.SE] 12 Aug 2023
Abstract—To address the issue of rising software maintenance application specific entities, concepts have been located in
cost due to program comprehension challenges, we propose code comments based on enumerated domain concepts [12] or
SMARTKT (Smart Knowledge Transfer), a search framework, ontology [2], [34], [24], [20]. To extract project management
which extracts and integrates knowledge related to various
aspects of an application in form of a semantic graph. This graph details, comments are mined to track code changes in [1]
supports syntax and semantic queries and converts the process and [11] and software repositories are analysed to extract
of program comprehension into a google-like search problem. information related to bug history, version changes, developer
Index Terms—Program Comprehension, Knowledge Trans- and tester details and their interrelationships in [7], [29].
fer, Machine Learning, Natural Language Processing, Semantic Program comprehension can be aided by extracting relevant
Graph
knowledge from various sources (for a representative set, refer
In the last three decades, software maintenance cost has Table I) related to a working software. However, we observe
risen to 90% of the total Software Development Life Cycle that the available assistance tools consider only limited sources
(SDLC) cost [9], [10], [15]. Surveys conducted in [14], [30] and additionally there is an absence of an easy to use integrated
conclude that as 80% of the maintenance tasks are adaptive and framework based on these sources.
perfective [27], hence the absence of an integrated framework To analyse the comprehension challenges more specifically
or assistance for knowledge transfer (KT) to lessen program and understand the requirements for an effective design of
comprehension challenges contributes primarily to this rising an assistance framework, we conducted surveys and personal
cost. To execute a maintenance task, developers spend the interviews with a group of developers in a software company.
majority of their time to manually search and mine source files We present a representative scenario here (names have been
and other knowledge sources like design documents, defect changed for confidentiality): A developer Neha, working with
and version trackers, emails and the like, taking mental notes C++ and traditional Vi editor [16], is assigned to fix bug#67
or scribbling the mappings, in an attempt to infer an overall in ClearQuest [36], with error message “processing error :
knowledge about the design, behaviour and evolution of the unsigned 162 S1”. As she is new, she enquires from her
application [30], [25] so as to subsequently locate the relevant senior Sandra at every step. Sandra uses Cscope [35] tool
code sections and their dependencies. However, in most cases, in Vi to grep the code base with the error string and locates
documents are dated with missing information, tracker systems function VHDLPosedge#S2 in file VHDLPosedge.cc and
are not updated properly and help from earlier developers are provides to her. Neha asks Sandra for any similar defect.
scanty or not available. Due to these factors, coupled with Sandra searches ClearQuest, discovers bug#22 and searches
frequent interruptions [14] for attending calls or meetings, the the Microsoft Concurrent Versions System (CVS) [13] and
developers get involved in a tedious and inefficient process of emails to extract the bug related commit summary for code
building, revalidating and rebuilding their understanding of the level changes. The summary stated a change of data type from
application and resort to quick fixes which introduces hidden unsigned int to unsigned long int for variable
errors that cannot be removed by re-running the golden test var1, but in present code var1 has type – long int caus-
cases [30]. ing bug#67. Sandra then recalls this change as part of change
To address the program comprehension challenges, devel- request CR123 for optimisation of file VHDLPosedge.cc
opers extract software development related knowledge through a month ago. Sandra tells Neha to revert the datatype to
detection of low-level algorithm details using static instru- unsigned long int, as it would not affect the behaviour
mentation [3] or extraction of the control flow between run- of the code. As part of CR123, VHDLPosedge#S2 is called
time events using static analysis and dynamic profiling [33]. by a thread start function, so Sandra tells Neha to add mutex
Code search tools based on the abstract syntax tree have locks for read and writes in the function to prevent data-races.
been proposed in [31], [5], [23], [21], [22]. For extracting Sandra responds to queries of Neha based on multiple
relevant sources, and also provides additional important infor-
mation (Smart help). However due to evolving teams and frag-
mented task distribution, resources like Sandra who is aware
of various aspects of the application, are hardly available.
TABLE I
K NOWLEDGE T YPE – K NOWLEDGE S OURCES