(2023) PARALLELC-ASSIST - Productivity Accelerator Suite Based On Dynamic Instrumentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

PARALLEL C-A SSIST: Productivity


Accelerator Suite based on Dynamic
Instrumentation
NACHIKETA CHATTERJEE1 , SRIJONI MAJUMDAR2 (STUDENT MEMBER, IEEE), AND
PARTHA PRATIM DAS3 , (MEMBER, IEEE), AND AMLAN CHAKRABARTI4 ,(SR. MEMBER,
IEEE)
1
A.K.Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India (e-mail: [email protected])
2
School of Computing, University of Leeds, United Kingdom, India (e-mail: [email protected])
3
Department of Computer Science, Ashoka University and Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West
Bengal, India (e-mail:[email protected], [email protected])
4
A.K.Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India (e-mail: [email protected])
Corresponding author: Nachiketa Chatterjee (e-mail: [email protected]).

ABSTRACT Software developers often face challenges in terms of quality and productivity to
match competitive costs. The software industry seeks options to minimize this cost during different
phases of software development and maintenance with improved productivity. Software developers adopt
different tools for different purposes, such as understanding program behavior, debugging memory issues,
debugging concurrency issues, and testing. In this article we study different debugging tools mostly used
for program design analysis, thread debugging, and resource management. Stand-alone tools do track
static or dynamic control flow, thread activities, etc. But these do not specifically identify the thread
work-breakdown-structure, global memory location management, thread-data interaction, etc. to allow
good comprehension of the concurrency model of the program. Similarly for resource management, we
observe that the Valgrind addresses a few required features but does not offer automatic garbage
collection. Moreover, to address the outcomes of different tools, developers must compile and configure
the application in different environments. This is very time-consuming, requires skills in different software
paradigms, and is sometimes not supported by the tool itself. As a result, they cannot be used in an inter-
operable manner to analyze by relating the different tool’s outcomes. In this study, we conduct a detailed
survey of the available tools and techniques and their limitations in identifying gaps. We address these gaps
by implementing the tools for different phases of software development and maintenance. For example,
a concurrency model detector based on thread behavior, resource debugger with features of automatic
garbage collection, etc. can collectively inter-operate within our designed open-source tool framework
PARALLEL C-A SSIST to address the common requests of the developers in one toolset. The tool is built
upon open-source dynamic instrumentation tool PIN and supports a wide variety of IDEs and OS to detect
various multi-threaded memory issues and provide additional features to inject concerns dynamically at
run-time to extend it further according to the user’s needs. We verify our tool with a wide variety of
industry-standard benchmarks and compare its features with other similar tools.

INDEX TERMS Multi-threaded issues, Memory issues, Dynamic instrumentation

I. INTRODUCTION Technology, USA, estimates that 54.33% and 21.42% of


While the costs for Software Development Life Cycle the SDLC costs are related to the efforts of spent by
(SDLC) have reduced considerably over the past three developers to fix bugs and enhance code in the maintenance
decades, the maintenance cost has gone up significantly phase [4]. The costs are incurred owing to challenges in
and is amounting to more than 90% of the total SDLC Program Comprehension (PC) of existing code bases, lack
cost [1]–[3] now. The National Institute of Standards and of adequate documentation, improper knowledge transfer

2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

from core teams, and incomplete and inadequate test strate- We observe that most existing approaches are standalone
gies [1]. Naturally, this leads to lowering of the quality of tools and address specific issues related to multi-threading.
software and the productivity of developers. There is an absence of an integrated inter-operable frame-
The manifestation, nature, and complexity of the chal- work to provide aid to analyze a multi-threaded application
lenges of comprehension vary widely on the type of pro- in its totality. Owing to the non-deterministic nature of
gramming languages. For example, the dangling pointer multi-threaded applications, it is important to understand
issue in C is automatically managed by the Java run-time. the design of the application to fix or to enhance them. In
Hence, the nature of the support required by the developers addition, the framework needs to be integrated with IDEs,
also varies accordingly. C provides low-level access to to reduce the learning overhead of developers and facilitate
memory and hardware, has cross-platform features, and is easy diffusion [20]. An integrated framework would reduce
primarily used to build the firmware, operating systems, and the cross-transportation of data, overheads in learning new
so on [5]. For effectiveness, the codes written in C must development environments, and multiple installations, and
certainly be multi-threaded in nature. A significant number the like. Intel, and Microsoft, have tried to address this
of multi-threaded C codes have been developed in the to a certain extent and have created development suites
last decade owing to the rise of cost-effective and energy- for bug-fixing, quality assurance, and testing strategies of
efficient multi-core technologies [6]. Due to this prolifera- multi-threaded applications with support for integration to
tion, many developers have had to deal with new correctness standard IDEs and debuggers [21]. However, most of these
issues (like non-deterministic nature, concurrency bugs) frameworks do not focus on deducing the concurrency-
and performance improvement techniques, without effective related design aspects, are commercial, and cater to the
and simple multi-core programming tools, which in turn needs of a certain language (mostly open MP), develop-
significantly added to the maintenance woes and overhead. ment environment, or compiler. They additionally lack the
The major support areas for multi-threaded programming features for supporting customized tools and features, which
include analysis of concurrency-induced bugs, concurrency might be required apart from the features supported in the
related design aspects, and performance improvement, in suite.
addition to the support required for single threaded appli- In this article, we propose PARALLEL C-A SSIST tool set
cations such as debuggers and resource managers [7]. A to analyze concurrency-related aspects of design based on
survey conducted by Microsoft concluded that 66% of the thread-resource interaction [22]. We target to detect dead-
developers find it difficult to deal with concurrency-induced lock, data-race, and possible livelocks using GNU Debugger
bugs and issues and often need help to comprehend the (GDB) augmented with new commands [23], to support
concurrency models of an application to debug concurrency interface for dynamic weaving to inject thread functions
bugs. For example, to find the root cause of a deadlock and at run-time [24], and to automate garbage collection for C
to fix the same, concurrency models related to the thread applications [25].
data (or resource) interaction, lock hierarchy, thread work The tools related to concurrency models and dynamic
breakdown structure, and starving threads need to be known. weaving are of one type, and there are not many equiva-
Research has been conducted to help developers deal with lent tools that analyze these aspects of the multi-threaded
the complexities of multi-threaded applications. Debuggers applications. Hence, integrating these tool sets with the con-
like Intel Debugger (IDB) [8] and Intel Inspector [9] provide current bug detection and resource management tools makes
advanced debugging features for threads (mostly data race) the framework more effective. We have tested PARALLEL C-
and memory errors. They provide APIs for integration A SSIST using the pthread CDAC [26] benchmark. We ran
with popular IDEs such as Eclipse but mostly through all the tools individually, repeated the process through the
commercial product suites such as Intel Composer XE or integrated architecture, and obtained correct results for all
Intel System Studio [10]. A number of approaches have the programs in the test suite. PARALLEL C-A SSIST can be
been proposed for deadlock and data-race detection through extended to other OS, debuggers, or compilers based on
the analysis of run-time events in [11]–[13]. Apart from the availability of suitable interconnection APIs. Hence, the
debugging, the analysis of thread activities and synchro- major contributions of this study are as follows.
nized executions has been attempted in [14] and [15].
Researchers have employed static and dynamic weaving • Study of the various open source instrumentation tools,
of code using aspects to understand the behaviour of the IDE’s, and debuggers and their possible interactions
code for designing relevant test cases [16] or for extracting • An inter-operable framework of tools, integrated with
design elements [17]. Intel provides development suites like common IDE’s, to assist in developing and maintaining
Intel Parallel Studio [18], and Intel System Studio [19] multi-threaded applications
for debugging, testing, tracing, and monitoring applications. • Unique combination of tools that deduce concurrency-
The design principles of existing tools and strategies used related design aspects along with concurrency bugs
for multi-threaded debugging, resource management, design The remainder of this paper is organized as follows. Sec-
analysis, and dynamic aspect weaving, are detailed in Sec- tion II presents a literature survey. We discuss the architec-
tion II including the gaps identified in each area. ture of PARALLEL C-A SSIST in Section III, and individual
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

TABLE 1. Thread and Resource Debugging Support in various tools in In [37], Moiseev et al. detected data races in SystemC
different IDEs or Product Suites shown in comparison with the features of
Parallel C-Assist. Further comparative information are given in Table 5. designs by static analysis of every program construct and
event notifications.
Thread Debuggers Gaps: None of the stand-alone tools leverage and integrate
Thread Debugger
GNU
(gdb)
Intel
(IDB)
Microsoft
Visual
Helgrind
[27]
Intel
Inspector
Parallel
C-
the support from standard debuggers. We address this issue
[28] [8]
Studio
[29] [9]
Assist by extending the open-source gdb [28] debugger in [23]
Memory Related ✓ ✓ ✓ ✓ ✓ ✓ to detect data races and deadlocks for multi-threaded C
Breakpoints
Thread specific ✓ ✓ ✓ × ✓ ✓ applications using PIN [38]. Hence, we re-use the standard
breakpoints
Thread synch ✓ ✓ ✓ × ✓ ✓ debugging features of the gdb [28] and add support to
breakpoints
Thread data sharing × ✓ ✓ ✓ ✓ ✓ concurrency related debug features.
events
Data race detection × × × ✓ ✓ ✓
Deadlock detection × × × ✓ ✓ ✓
Livelock detection × × × × × ✓ B. RESOURCE MANAGEMENT
Detecting errors such as uninitialized memory, dangling
Resource Debuggers
pointers, unreachable locations, and leaks in the stack and
Resource
Debugger
Intel
Inspector
C++
Validator
Visual
Studio
Parasoft
Insure++
Valgrind
Memcheck
Parallel
C-
heap memory are common supports required in all phases
[9] [30]
Profiler
[31] [32] [33]
Assist
of the SDLC. The resource management tools available as
Uninitialized mem-
ory
✓ × ✓ ✓ ✓ ✓
part of the IDE or as product suites are listed in Table 1
Lost pointers
Leaked Global


×









(Resource Debuggers).
Memory
Unreachable ✓ ✓ ✓ ✓ ✓ ✓
Most of the tools discussed in Table 1 are com-
allocations
Automatic Garbage × × × × × ✓
mercial and require recompilation with specific libraries.
Collection Valgrind [39], however, is free and has multiple features
Helgrind and Intel Inspector are not open source and come as a part of product suites
Open-source versions have limited features and not available for all compilers to detect several memory management and threading bugs
PARALLEL C-A SSIST can be easily integrated with standard IDE’s
and is also used for program profiling.
✓- Feature Present × - Feature Absent As part of stand-alone tools, Windbg [40] provides
complete memory statistics (address, length, and freed
size) for the heap-allocated locations. ccmalloc [41]
tools and some case studies are discussed in Section IV. is a memory profiler that detects memory leaks and de-
Finally, we conclude with directions for future work in tects repeated deallocation of the same memory location.
Section V. LeakTracer [42] extended gdb to print the allocated
memory locations that have not been freed. Memdebug [43]
II. RELATED WORK tracks and logs (if desired) memory allocations and deallo-
The tools in our PARALLEL C-A SSIST framework are de- cations to infer memory leaks.
signed for single as well as multi-threaded native C applica- Gaps: Among the open-source and commercial resource
tions with focus on four major functionalities – debugging management tools, Valgrind [39] provides most of the
concurrency bugs, discovering design models, automating required features including detection of several memory
resource management, and providing a handle to weave management and threading bugs. It is also used for program
code using aspects. Hence, we review the integrated product profiling. However, it does not provide a comprehensive
suites, IDE-supported features, and standalone tools and interface consisting of functionalities such as automatic
utilities that target to provide similarly functional support. garbage collection. We address the same in [25] based
on PIN [38], wherein we extend the features provided by
A. DEBUGGING Valgrind.
Standard debuggers provide support for breakpoints in
memory-related errors, such as overflow and uninitialized C. DESIGN
access, in addition to data control and monitoring of re- Comprehending the design is essential for any code fixing /
lated breakpoints. Further, some of the debuggers support enhancement task. For example, while fixing a performance
breakpoints to trace thread data interactions and concurrency issue, developers must understand the control flow on the
bugs (deadlock and datarace). We enumerate the standard relevant code lines along with the design of the code.
debuggers provided as part of the IDE’s or as product suites Research has mostly focused around standalone tools,
in Table 1. where approaches have been suggested for constructing
Stand-alone tools have been designed to specifically control-flow graphs and detecting design patterns. The au-
support advanced debugging features, such as concurrency thors in [44]–[46] constructed a set of cogent relationships
bugs. The authors of [13], [34], and [35] suggested ap- between the components of a program and the elements
proaches to detect data races by constructing happens-before extracted from the application domain ontology based on a
graphs on runtime event traces. To detect the data-race, static analysis of the source code. The design patterns (Gang
Christiaens et al. [36] employed different logical clocks of Four) were extracted from the source code for object-
over the collected run-time traces of send-receive events. oriented languages, using source code parsing in [47], [48].
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

In the case of a multi-threaded program, the design is along with detecting common errors such as buffer overruns.
defined additionally in terms of the aspects of concurrency SUN extends Netbeans to develop jackpot source code
ingrained in the application. These aspects of design are metrics to examine codes and detect structural issues [21].
difficult to infer because of their non-deterministic nature Intel provides development suites like Intel Parallel
and cannot be directly understood from the extracted control Studio XE [58], and Intel System Studio [59],
flow. In [15], [49]–[51], the sequence of the program execu- with support for tracing programs, analyzing execution
tion is transformed, and the relevant sequences are extracted, sequences, detecting memory and thread errors, etc. The
such as event control flows, thread, routine, and class Intel frameworks also provide extension APIs for integra-
mapping using static analysis and dynamic profiling. In [52], tion into standard IDEs, such as Eclipse CDT, Visual
the authors estimated inactive threads to comprehend the Studio, and debuggers, such as gdb.
effectiveness of parallelism in programs using dynamic The integrated frameworks, however, are mostly com-
profiling. The runtime patterns for thread behavior in the mercial and targeted to managed languages, such as Java
case of shared data locations were deduced by inspecting and Python. Additionally, they focus on bug checking and
synchronized executions in [14] and [15]. program tracing and not on other crucial support, such as
Gaps: The tools proposed so far track static and dynamic extraction of the design. Similarly, standalone tools focus
control flow graphs, thread activities, execution sequences only on specific aspects, incur a huge installation and
but do not aid to comprehend the concurrency-related design learning overhead, and do not allow the facilities from other
issues in totality. For example, understanding thread work- tools to be used in an inter-operable manner. For example,
breakdown-structure, global memory location management, while debugging an application for a memory error, the
thread-data interaction, and thread scheduling is as im- developer may need to know the thread work-breakdown-
portant as understanding the design of a multi-threaded structure or the likely threads that start in execution for a
application. We address the same in [22], where we build a particular input. In another scenario, the developer may want
concurrency model detector based on thread behavior. to debug the program after dynamically weaving some code
at runtime.
D. CODE WEAVING AND INSPECTION We address these challenges in PARALLEL C-A SSIST,
Approaches have been explored for building an Aspect- where we integrate the support for debugging, design ex-
Oriented Programming (AOP) framework using Java to help traction, and code inspection into a singular framework and
weave code, enhance or observe program behavior, and extend the same with a standard IDE such as Eclipse
write relevant test cases [16]. AspectC++, an extension of CDT. In the next section, we discuss the various open-source
AOP for C++, was created [53] based on AspectJ [54] of frameworks explored to conclude on a set for developing
Java, to enable the static weaving of code. AspectC++ has PARALLEL C-A SSIST.
been used in multiple scenarios; however, static weaving
requires recompilation after every code change. These meth- F. SURVEY OF OPEN SOURCE FRAMEWORKS
ods do not implement weaving-on-the-fly (without recom- As several design aspects of multi-threaded applications
pilation), that is, aspect weaving at runtime. We propose manifest only at runtime, we focus on open-source dynamic
dynamic aspect weaving for C programs in [24], wherein we instrumentation tools to trace and extract runtime events.
use dynamic instrumentation framework to attach, detach, Further, there should be support for plugins for the inte-
and modify concerns during the execution of the program gration of the instrumentation framework with a common
without modifying the program. Using the observations development environment and debuggers. These plugins /
from code weaving, we can design effective test cases. We interconnections must be re-targeted in order to develop an
also observed that research related to code weaving has integrated framework. We review the tools and their support
mostly focused on standalone tools. for extensibility into other frameworks in Table 2.
We see that PIN [38], [60] and Valgrind [39], [63] are
E. INTEGRATED DEVELOPMENT ENVIRONMENT, both well-accepted and widely used dynamic instrumenta-
PRODUCT SUITES tion frameworks. However, PIN is lightweight, is 3.3x times
The Eclipse CDT [55] is a commonly used IDE that faster [60] than Valgrind, has support for integration with
supports multiple features, such as call graphs, code high- multiple debuggers and IDEs, and is available for several
lighter, code generation, and debugging facilities, to help operating systems. As our focus is on creating a framework
developers write correct and efficient code. IBM extended that can benefit parallel developers working with multiple
Eclipse to Hyades [56] with an integrated test and verifi- OSs, compilers, IDEs, and debuggers, the PIN is well-suited
cation module. Microsoft has developed PREfix [21], an for our requirements. We use the PIN to build our analysis
integrated framework that helps to detect logic and coding tools and re-target its remote extensions to Eclipse CDT,
errors based on pattern matching with a pool of com- gdb, and LLDB debuggers to develop an integrated archi-
mon errors. PREfast [21], another integrated tool from tecture. We explain the process of re-targeting and analysis
Microsoft, detects discrepancies in the coding conventions tools in Sections III and IV, respectively.
used. In [57] Microsoft presented SLAM for model checking,
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

TABLE 2. Availability of plugins/interconnections for dynamic instrumentation


tools for C, C++. We use PIN as shown in Figure 1.

Instrumentation OS IDE Debugger Compiler Thread


Frameworks Libraries
Pintool (In- Windows, Eclipse gdb, icc, gcc pthread,
tel) [38], [60] Linux, CDT, LLDB boost,
Android, Code intel tbb
MacOS Blocks,
Visual
Studio
QBDI [61] Windows, Ninja, LLDB gcc pthread
Linux, Visual
Android, Studio
MacOS FIGURE 1. Integrated Architecture of PARALLEL C-A SSIST framework showing
DynamoRio [62] Windows, Eclipse gdb gcc pthread interactions between the components of IDE [Eclipse], Plug-ins, PIN Dynamic
Linux CDT Instrumentation tools, and its run-time instance. Component-wise details are
Valgrind [39], Linux Eclipse gdb gcc pthread, given in Figures 2, 3, and 4.
[63] CDT boost,
intel tbb

• OP: Domain of various data structures related to the


operations in C
III. ARCHITECTURE
• AS: Domain of analyzed aspects
Our aim is to develop an integrated architecture to support
all features from a single screen without the need for cross- An instrumentation process registers callbacks to either
transportation of information and manual linking. We first the image, routine, or instruction APIs and passes the pointer
discuss a common scenario in the workplace as learnt from to the current object (image, routine, etc). It is defined as:
multiple developers of companies working in Electronic I = (AP, O), where AP is the API for image, routine,
Design Automation extensively using C. and instruction and O is the pointer to the current object.
Case Study of industry practices: We enumerate and The domain for AP is Dap = IG ∪ RT ∪ IN
analyze the problems faced by developer Sandra (name The inference process can thus be modeled as:
changed) while fixing a deadlock bug or analyzing function An : Dap × OP × D × P IN → AS where P IN
exit and entry points in a multi-threaded application, as symbolizes the just-in-time compiler.
shown in Table 3. Sandra starts the analysis to detect The equation focuses on the domain dependencies of the
potential deadlocks in the application and tries to simulate analysis process and generalizes the domains that must be
various other run-time facts during program execution. She considered for any inferences using the proposed frame-
is challenged with a suitable toolset for the OS, or IDE work.
she is working with. The case studies help in designing our In the next few sections, we explain the general working
architecture according to the developers’ needs. of the following sub-components:
Architecture: We design an architecture to fit all the
individual tools together and integrate them to extend the A. THE PLUGIN INTERFACE
assistance from a single window so that developers may Our tool use the plugin interface (Figure 3) to act upon the
benefit from using the tools simultaneously, such as detect- executable from the editor of Java Workbench [65] and
ing memory issues in addition to parallel debugging. obtain the results. Hence, we partition the architecture into
We present an overview of the architecture of the plu- the following components to handle the various sub-tasks.
gin toolkit in Figure 1. Our toolkit is based on concept
1) Creation of graphical menu: This is created as a new
extractors that are built using a PIN framework coupled
extension in the Plugin.xml file of the JDK pack-
with an inference engine. We integrate the toolkit with the
age. Each new menu is an action set for extension.
Eclipse IDE using its plugin interface. Figure 2 shows
The menu is characterized using the functions of the
the architecture of the Eclipse IDE [65]. The IDE has
label and icon class of the Java plugin.
a workbench containing editors, consoles, and sits on a
2) Link of menu to Pintools: Each menu has a script at
Java run-time for its utilities. Project management is also
its back-end, which fires a Pintool that works on the
conducted using WorkSpace [65]. The plugin interface is
current executable from the editor. Each menu first
available to add separate menus to the editors to support the
calls the required function from the action set class,
additional features we are integrating.
which then fires the script.
To model the components of the architecture and develop 3) The plugin of Pintools with executable: The Pintool
an analytical framework, we define the following domains: Action class [65] is called when the script is fired
• IG: Domain of APIs for Image Instrumentation for each menu. This class takes the source code as
• RT : Domain of APIs for run-time Instrumentation input and the required arguments from the editors,
• IN : Domain of APIs for Instruction Instrumentation and compiles the program into an executable. It then
• D: Domain of various data structures in C Programs traverses the pin executable engine through a script
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

TABLE 3. Case Studies for Detecting Potential Deadlocks and Analysing Function Entry and Exit Points. These studies were conducted with C developers from
companies working in Electornic Design Automation.

Task at Hand Possible Solutions Explored Decision


Case Study 1: Developer Sandra wants to check for potential deadlocks
Initial Exploration
Sandra tries in a multi-threaded (pthread) C application, works looks for standard toolset provided by Intel or Eclipse finds a 30 days free trial pack of Intel
with gcc compiler [64], Eclipse CDT 8.4 [65] & Ubuntu 16.04 Inspector [9], with extensions to Eclipse
OS CDT
Installing and Using a new tool
Eclipse CDT 8.4 is compatible with the automatic integration checks the supported compilers, icc [64] is found suitable Sandra compiles the application with
of Inspector XE into the Eclipse IDE [65] during installation. icc [64] and repeats the process
The version downloaded for Intel Inspector does not support
gcc compiler
Intel Inspector finds two deadlocks but could not find potential deadlocks which might have not occurred in that run. Sandra wants to first fix the
detected deadlocks
Sandra looks for tracer tools to dump the run-time events in checks if Intel Inspector XE supports tracing, and also other Intel Inspector XE does not support trac-
every run, wants to understand lock sequences, wants to know tools ing but is a component tool of the Intel
how long a thread is waiting on a synchronization resource Parallel Studio, which has a tracer tool
Vtune profiler [66]
Sandra explores Vtune profiler Vtune works only for OpenMP-MPI and Intel-TBB frame- discards Vtune profiler [66]
works
Sandra continues to look for tracers looks for other concurrency visualizers like Microsoft con- Shifts to a new IDE
currency visualiser [14] for thread data executions. The
visualizer mostly addresses thread contention, cross-core
thread migration, and synchronization delays and comes
integrated with Visual Studio
Shifting to a new IDE
Sandra uses the visualiser tool and find some feature useful. Explores various open source tools, find a relevant (finds However, software verify [67] is priced
However, she now needs to find deadlock detection tools potential deadlocks also) tool software verify, which can be at $499. Another trial version is avail-
compatible with Visual Studio [31] integrated with Visual Studio able, with lots of forms to fill up, with
a detailed reason for its use. Discards
software verify [67]
Sandra tries to look for the free version of Intel Inspector XE, find a patch version after a lot of manual searches installs and integrates with difficulties
which can be integrated with Visual Studio [31]
Even with the new set-up, a lot of information is required but missing to analyse and fix deadlocks – graphical traces, concurrency models- thread
work breakdown structure, thread starvation. The features in Concurrency Visualiser [31] are mostly related to optimising code by checking the
thread to core mapping and related statistics. This will not completely help to fix the deadlocks in a short span of time
Sandra tries to look for more standalone tools She checks research papers and the link to source code, but Manually analyses the application and
in most cases, the links are broken, or the repository is not long with the output from Concurrency
up to date Visualiser and Intel Inspector XE, tries
to fix the deadlocks
Case Study 2: Sandra wants to analyse the function entry and exit points
Sandra tries to look for more tools that may work collectively No automated support Manually analyses the output traces of
on the same execution the application as extracted from Hel-
grind [27] and Intel Inspector XE, to
mark and log the entry/exit point of the
functions
During execution Sandra realized that she needs to simulate an exception that was intermittently happening in a function leading to a critical issue
Sandra searches for tools that may inject exceptions in run-time Could not find run-time injection tool with trace support Manually added additional code to
along-with existing execution throw an exception in the source, re-
compile, and re-execute

and plugs the relevant Pintool (analysis tool) with the granularities include images, traces, routines, and in-
executable. structions.
2) Analysis Routines: For every instrumentation granu-
B. PIN FRAMEWORK
larity, there can be multiple analysis tools that store
the traces collected in a data structure and analyze the
We describe the PIN framework as re-targeted and cus-
same based on algorithms designed by us.
tomized for PARALLEL C-A SSIST in Figure 4. We use PIN
The PIN engine, along with the Pintools, interacts with
to extract the primitives of the program and further analyze
the plugin interface and serves as input to the inference
the same for higher-level features. Using the APIs of this
engine. The high-level features analyzed from the primitives
framework, we created Pintools to extract and analyze the
in the analysis routines serve as the input to the inference
run-time traces of an application. The PIN [38] API’s sit
engine.
on a PIN run-time with the support of just-in-time compiler
(JIT), an emulator, and a dispatcher.
C. INFERENCE ENGINE
The inference engine (Figure 1) contains a suite of machine
1) Pintools
learning classifiers that work on the features generated from
Every Pintool has two major components: Pintools and learn and predict a model. The engine also
1) Instrumentation routines: This specifies the uniformity contains other algorithms based on a set of rules to detect
at which run-time traces are collected. The available execution behavior, which works directly on the primitives
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

FIGURE 2. Architecture and interface diagram of Eclipse IDE platform including Workbench and UI Toolkits containing editors, consoles, and sits on a Java
run-time for its utilities. The CDE plugin interface is available to add separate menus to the editors to support the additional features by interacting with PIN run-time
interface and visualize acquired knowledge about a program from GDB callbacks thrugh CDI Debugger. This is a component from Figure 1.

FIGURE 3. Flow diagram of Plugin Interface to inject tools on executable from Java Workbench editor and UI toolkit of Eclipse IDE. Each menu in the UI toolkit will
have an associated script to fire a tool in PIN that works on the current executable from the editor. This is a component from Figure 1.

extracted from Pintools. on either icon or menu options. If the user clicks on GC,
PGDB, and then on the tool assembly, the GC and PGDB
D. USER INTERFACE execute in parallel to detect the data race or deadlock along
Keeping tool usability in consideration, we leverage the with the detection of memory issues with optional garbage
Eclipse IDE and develop plug-ins for our tool assem- collection. Based on the selection of plug-ins, the code is
bly. In the plug-ins, the individual tools are designed as selectively equipped with the required parts of unit tools.
user menus, as shown in Figure 5. Users can select any The instrumentation gathers run-time information of the
combination of tools according to their needs by clicking program execution. The run-time console logs generated
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

FIGURE 4. PIN Infrastructure diagram of instrumentation components with Instrumentation API, Features and run-time interfaces. PIN Instruments the target
application based on the Instrumentation policy to extract the run-time features and are analyzed using the analysis routine. At the highest level, Pin consists of a
virtual machine (VM), a code cache, and an instrumentation API invoked by Pintools. The VM consists of a just-in-time compiler (JIT), an emulator, and a
dispatcher. After Pin gains control of the application, the VM coordinates its components to execute the application [38]. This is a component from Figure 1.

with a visualization software to display the run-time event


traces. Furthermore, an enhanced debugger with support for
additional concurrency bugs is integrated with the IDE to
provide end-to-end support.

IV. THE TOOL SET


We discuss the various tool supports as provided
in PARALLEL C-A SSIST: Debug assistance tool (Sec-
tion IV-A), Design tool (Section IV-B), Memory tool (Sec-
tion IV-C), and Aspect injection strategy (Section IV-D).

A. DEBUG ASSIST [12]


In the PGDB [12] tool, we designed features to detect and
FIGURE 5. SDLC Suite menu and icons injected in Eclipse IDE as an User
Inference of Tool Assembly to dynamically add or remove different debug tools solve issues such as deadlock and data race with breakpoints
or aspects as and when required. to the source. We augment PGDB [12] with LLDB [68] and
GNU debugger so that developers can leverage the facilities
of PGDB [12] within their existing debuggers.
from the code and tool are combined into an Eclipse
console. An inference drawn by the tool assembly is also 1) Implementation
prioritized, and suitable assistance, either printed on the Our approach is to identify whether there are memory
console or breakpoint, may be invoked. references shared among multiple threads. We instrument
An integrated view of the architecture is shown in RecordLockBefore() to monitor the locality of accesses
Figure 1. The PIN engine contains various callbacks to with or without locks by a thread and maintain an hash
functionalities, such as extracting images, routine, thread, table MemTracker, where the key is a memory reference
and instruction level aspects, which are then deployed to and the value contains identification of executing thread and
the plugin architecture and work on the current executable the type of access (READ/WRITE). We have designed the
of the IDE. The output from the Pintool is then integrated instrumentation routines RecordMemRead() and Record-
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

MemWrite() before load and store instructions, respectively, tool successfully detects all potential dataraces and dead-
including the thread id to trace memory accesses from lock conditions. Detailed test evidence is provided in the
concurrent execution. First-time READ accesses to any PGDB [12] and extended debugger [69] to verify the PGDB
memory reference are captured in MemTracker by anal- on the collected set of benchmarks and prove its behavior
ysis routine RecordMemRead(). For subsequent accesses and efficiency.
to this captured memory reference that already exists in
MemTracker, the following situations may occur [12]. B. DESIGN TOOL
• Existing READ access: This is a safe access. Thus, Here, we outline the design to capture the execution se-
there is no data race in the case of READ-after-READ. quences from various PIN events.
• Existing WRITE Access: For the same thread ID,
it is a safe access case for READ-after-WRITE. For 1) Implementation
different thread IDs, the memory reference is marked Here the run-time information is stored and grouped logi-
as a shared-exclusive memory. cally into maps, and then into profilers that produce output
Similarly write accesses to memory being analyzed by for a code related to a specific classified set of problems.
RecordMemWrite, executes before Store instruction. First- • We capture every important routine executed as part of
time memory WRITE access is also captured by the Mem- a code as an event s with various parameters.
Tracker, including the thread ID. Again, subsequent ac- • We need to provide a logical ordering of events that
cesses to the captured memory reference already exist in might be useful for the debugging.
MemTracker and the possibilities are as follows: • Only those events that are relevant to debugging should
• Existing READ or WRITE Access: Unsafe access be tracked, so we instrument only important routines
in both cases, WRITE-after-WRITE or WRITE-after- from the total routine trace.
READ. If the threads involved are different, the mem- • To decide on the routines, we consider those related to
ory reference is marked as shared-exclusive memory. thread creation, communication, data variables used in
A Boolean variable is introduced here for each thread to message passing, thread exit sequence, synchronization
detect the datarace. When a thread, say T1, enters the critical functions, and signaling functions.
section, RecordLockAfter() is called, and sets a flag for • For every routine, we capture all the relevant infor-

thread T1. While leaving the critical section for thread mation in the form of parameters such as the thread
T1, we reset the flag using RecordUnlockAfter(). We also id and the data variable involved before and after the
instrument the barrier along with thread ids. The inference execution of the event, along with logical order.
block is used to analyze the memory read-write sequences of • Storing the information extracted from routines related

each thread. Therefore, access to a shared-exclusive memory to various run-time occurrences, such as global variable
reference is identified as unsafe, where the flag is set to access sequence, creation sequence of threads, commu-
false, and safe otherwise. Once a memory reference is nication statistics among threads, wait time of a thread
identified as shared-exclusive and unsafe access exists, there to acquire a mutex, etc. by logically grouping them
is a potential for datarace and datarace breakpoints to be into distinct maps (data structure). The first level is a
invoked. detailed map.
Similarly, an algorithm continues to construct an RAG1 • Logically converting the detailed maps to summarised

by identifying the waiting and acquired edges as follows: maps to extract distinct run time information.
• We provide the output to the user in textual and
• RecordLockBefore() adds an edge to the RAG denois-
graphical forms.
ing thread T is waiting for resource (mutex) R, when
another thread already holds the lock on mutex R.
• RecordLockAfter() adds an acquired edge to the RAG
2) Verification
when t acquires a lock on the mutex r. The waiting edge For normal (error-free) execution, models detected from
was removed if an acquired edge was added. features extracted by our analyzer for a classified set
• RecordUnLockAfter() removes the acquired edge of problems from the test suites are validated by mod-
from the RAG when thread t releases mutex r. els detected by manual analysis of the test suites. We
also compare the results with existing analyzers such as
The inference block here continues to analyze the RAG,
Valgrinds, Purify, and other profilers, and achieve
and once it detects a cycle, then announces a deadlock, and
91% accuracy [22].
the breakpoint is invoked.
C. MEMORY TOOL [70]
2) Verification
We prepared a test suite to determine the correctness of We designed a resource management tool for native lan-
our tool for the detection of dataraces and deadlocks. The guages that may be invoked as and when developers want
to debug memory issues or manage resources. We call it
1 Resource Allocation Graph GC Pintool, as it identifies the memory issues during the
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

execution. Moreover, it offers optional garbage collection end of scope, the memory leak is reported, and optionally,
features in the native application: a breakpoint may be invoked or the detected leak may be
garbage collected, that is, freed up by this tool. Similarly, the
1) Implementation inference block also analyzes the reference of memory read
The GC tool is one of our unit tools used for tool assembly. or write; if the reference is outside our recorded memory
The strategy of this tool was designed using two different references allocated during the execution, it declares this as
components: instrumentation strategy and inference. First, memory corruption. In this case, the user could optionally
we construct the instrumentation algorithm as follows: invoke a breakpoint.
• main() and user functions are instrumented to track
the entry-exit points. D. AOP - ASPECT-ORIENTED DEBUGGING [71]
• On call of memory allocation, experiencing a local In this suite, we added the framework of dynamic aspect
pointer, the reference is logged in the data structure weaving to extend the tool according to user needs. Here, the
with local scope. For a global pointer, the reference framework is flexible for the vanilla deployment of dynamic
scope changed to global. For another assignment, scope aspects in just-in-time.
is modified accordingly.
• In the deallocation of memory, the entry of the ref- 1) Implementation
erence is removed from the data structure. If no such This framework works with the components below:
entry is found in the data structure, then a double-free • The configuration XML is available for the user to
error must be declared. define the function / event of the executable, that is,
• The exit of each reference invokes an inference routine.
to be observed–advice names–what to observe, and the
In the inference routine, the data structure is analyzed location (after/before)–where to observe.
to identify the remaining entries associated with recently • The analysis code library is compiled and maintained
exited scope. Those entries are marked as memory leaks. in the vanilla scope and is loaded on the fly in execution
These memories may be optionally freed up. time at the placeholder into the desired function/event.
• Our PIN tool reads the configuration from XML and
2) Verification injects advice at desired locations in the code under
The efficacy of the GC Pintool was verified using different execution by using the dynamic instrumentation tech-
benchmark C programs, and the approach was proven to nique.
be correct and precise. From the literature survey, we found
that Valgrind is a winner in comparison to other memory 2) Verification
tools; therefore, using this tool, we offer memory error de- We used a testbed to verify the injection of advice at various
tection features, such as memory leak, memory corruption, desired points. We successfully verified the location-before
double frees, and uninitialized pointers, along with break- and location-after instrumentation for global functions, static
points similar to Valgrind. Valgrind experiences a 10– functions, function pointers, and references in the C pro-
50 times slowdown2 , whereas GC Pintool always performed gram. However, we identified the limitations of the tool
correctly, we find that it runs about 35% faster compared in the case of macros. As we are performing run-time
to Valgrind. Our tool also has optional automatic GC binary instrumentation, we do not have any control over
features. In this tool, we use a map as a core data structure the macro during the run-time. On the other hand, in C++,
to hold the scope and memory addresses belonging to a we successfully injected advice in Constructor, Overloaded
particular scope. We then devised an algorithm to instrument Constructor, Copy Constructor, Destructor, Overloaded Op-
the memory allocator and deallocator functions to capture erator, etc. We also validated the injection of aspects in
the allocation / deallocation events. Memory read and write the Friend Function, Member Function, Overloaded Member
events are captured by the incoming memory read and Function, Virtual Member Function, and Overridden Mem-
write instructions. We also investigated the function call ber Functions and identified some limitations for the in-
and return events of the program scope. Upon entering a line function. We used our tool for system and user-defined
new scope, the new key will be defined in the map as the functions in exception scenarios for different hierarchies.
current scope, making the earlier scope a parent, and for The same tool has been used to inject bits of advice for
each successful allocator execution, the allocated memory different thread events, such as create, join, lock, and unlock.
address will belong to that current scope in the map. For In all of the above scenarios, we successfully injected the
each deallocation, the memory reference is removed from aspect before or after the events occurred.
the current scope. The inference block analyzes the allocated
memory of a particular scope at the end of each scope. If E. TESTING THE INTEGRATED ARCHITECTURE
any allocated memory is found to be dereferenced due to the
We tested the integrated architecture of PARALLEL C-
2 "2.1. What Valgrind does with your program" in A SSIST on the pthread CDAC [26] benchmark for cor-
http://valgrind.org/docs/manual/manual-core.html rectness and robustness. The benchmark set of CDAC
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

contains a varied set of programs that replicates various TABLE 4. Concurrency characteristics of the testsuite of PARALLEL C-A SSIST
from CDAC Pthread Benchmarks [26]. We achieved an overall precision and
concurrency-related issues and bugs, numerical computa- recall score of 98.13% and 98.56%, respectively on these.
tions using threads in parallel, input / output, and other
resource management using threads and the like, and helps File Name
Data-
PGDB [23]
Deadlock Thread
D-C UBE [22]
Global Starvation
GC [25]
Leak
us to test PARALLEL C-A SSIST for utility. The results are Race Concur-
rency
Variable

presented in Table 4. The results were validated against the Basic Pthread API calls
Eventual Starvation and Livelock considered for a certain run and input
benchmark documentation available with the dataset. We Datarace and Deadlock are also input and run specific
Pthread Join No No Boss Only Lo- No Thread
achieved an overall precision and recall score of 98.13% and Worker cals local
Stack Management addr No Peer to Unsync, No No
98.56%, respectively (false positives and false negatives are 0x78C741; Peer Low
func Locality
marked as FP and FN in Table 4). As PARALLEL C-A SSIST dowork
Mutex Operation No No Boss Sync, No No
also detects a potential deadlock, it might happen that the Worker High
Locality
deadlocks do not occur in a particular run and hence the Condition Waiting No No Pipeline Sync, No No
High
false positives. We get a false negative in the leak detector as Locality
Disjoint Array Ac- No No Boss Sync, 1 thread Leak
our garbage collector failed to capture a free function which cess Worker High
Locality
was called through a wrapper in certain scenarios. However, Numerical Computations (Dense Matrix Computation)
Numerical No Yes Boss Sync, No No [fn]
all of these cases are related to individual tools and are Integration Worker High
Locality
not caused by the integration implemented in PARALLEL C- Vector Multiplica- No No Boss Sync, No Function
tion block striped Worker High & Thread
A SSIST. partitioning Locality Local
Infinity norm - Row- No No Peer to Sync, Yes Function
wise Partitioning Peer High & Thread
Locality Local
V. CONCLUSIONS Infinity norm No No Peer to Sync, Yes Function
- Column-wise Peer High & Thread
We build an integrated architecture – PARALLEL C-A SSIST Partitioning Locality Local
linear equations addr No Boss Unsync, No Function
to support developers in maintaining multi-threaded appli- - Parallel Jacobi 0x12A89C; Worker High & Thread
Method 0x Locality Local
cations in C through the detection of concurrency-related 12A90B,
func
bugs, analysis of concurrency-related design aspects, mem- Jacobi
Non-Numerical Computations & I/O – Sorting, Searching, Producer-Consumer, using thread APIs
ory management, and logging facilities. The architecture Minimum in No No Peer to Partial No No
unsorted array Peer Sync,
was built using the dynamic instrumentation framework of Low
Locality
PIN [38] and re-targets its interconnection APIs for integra- Producer-Consumer No No Boss Sync, No Thread
work queues Worker Low Local
tion with various IDEs and Debuggers. Thus, PARALLEL C- Locality
k matches in the list No No Peer to Partial No No
A SSIST provides easy-to-use interfaces, and we demon- Peer Sync,
Low
strate a prototype integration with Eclipse CDT. Further, Locality
data race condition addr No Peer to Partial No No
PARALLEL C-A SSIST provides a framework to write various 6234231; Peer Synch,
func: High
other analysis tools according to the developer’s requirement thread_ Locality
mutex
to comprehend any aspect of a C code. We set up the archi- Loop-carried No No Boss Unsync, No Function
tecture with an initial tool set for debugging (deadlock, data dependence
Loop-independent
to Worker Low
Locality
Local

race, and livelock), extracting concurrency-related design dependence


Read-Write API library calls
elements based on thread-resource interaction, automated Read-Write locks No No Boss
Worker
Sync,
High
No No

garbage collection, and dynamic code weaving. We tested array minimum - No yes [FP] Peer to
Locality
Sync, No No
the integrated architecture over the pthread CDAC [26] Read-Write locks Peer High
Locality
benchmark and achieved overall precision and recall scores array minimum -
Read-Write & mutex
No No Boss
Worker
Sync,
High
1 thread No

of 98.13% and 98.56%, respectively. We study readily locks Locality


Illustration of Producer/ Consumer problems using pthreads for large queues
available tools integrated with the popular plug-ins and producer/consumer; No No Boss Sync, 3 threads Thread
Indexed-access Worker High Local
compared them in terms of the offered features listed in Locality
producer/consumer; No Yes [FP] Peer to Sync, No Thread
Table 5. Visual Studio Profiler provides many of Condition waiting Peer Low Local
Locality
the features supported by PARALLEL C-A SSIST; however, producer/consumer; No Yes Boss Sync, No Thread
Mutex objects Worker High Local
it is commercial, does not support code injection, and does Locality
producer/consumer; No No Boss Sync, No Thread
not provide a framework to write and customizing new tools. Condition-variable Worker High Local
Locality
We intend to extend our tool in the future as follows:
• Extend PARALLEL C-A SSIST with other IDEs such as
Visual Studio, Code Blocks, and other debug- REFERENCES
gers such as Microsoft Visual Studio debug- [1] S. M. H. Dehaghani and N. Hajrahimi, “Which factors affect software
ger, etc. projects maintenance cost more?” Acta Informatica Medica, The Academy
of Medical Sciences of Bosnia and Herzegovina, vol. 21, no. 1, pp. 63–72,
• Analyse the utility of a compatible PARALLEL C- 2013.
A SSIST over languages like C++ or Rust. [2] L. Erlikh, “Leveraging legacy system dollars for e-business,” IT profes-
sional, IEEE, vol. 2, no. 3, pp. 17–23, 2000.
[3] J. Koskinen, “Software maintenance costs,” Information Technology Re-
search Institute, University of Jyvaskyla, Tech. Rep., 2015.

12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

TABLE 5. Comparison with existing frameworks as outlined in Table 1 [19] ——. Intel system studio. Last Accessed: May 3, 2020. [Online].
Available: https://software.intel.com/en-us/system-studio

Code:: Blocks [74]


[20] W. Keller, “International technology diffusion,” Journal of economic liter-

Intel Debugger [75]


Visual Studio [72]

Eclipse CDT [65]


NetBeans [73]
ature, vol. 42, no. 3, pp. 752–782, 2004.

GDB [28]
[21] S. J. Vaughan-Nichols, “Building better software with better tools,” IEEE
Tools Computer Society, IEEE, vol. 36, no. 9, pp. 12–14, 2003.
[22] S. Majumdar, N. Chatterjee, S. R. Sahoo, and P. P. Das, “D-cube: Tool
for dynamic design discovery from multi-threaded applications using pin,”
in International Conference on Software Quality, Reliability and Security
Rational Rose [76] ✓ × × × ✓ × (QRS). IEEE, 2016, pp. 25–32.
MS Concurrency Visualizer [14] ✓ × × × ✓ × [23] N. Chatterjee, S. Majumdar, S. R. Sahoo, and P. P. Das, “Debugging multi-
Bottle Graph [77] × × × × × × threaded applications using pin-augmented gdb (pgdb),” in International
Visual Leak Detector [78] ✓ × × × × × Conference on Software Engineering Research and Practice (SERP).
GNU Checker [79] × ✓ × × × ✓ WorldComp-USA, 2015, pp. 109–115.
AspectC++ [53] × × × × × ✓ [24] N. Chatterjee, S. Bose, and P. P. Das, “Dynamic weaving of aspects
in c/c++ using pin,” in International Conference on High Performance
ThreadSanitizer(TSAN) [80] × ✓ × × ✓ ✓
Compilation, Computing and Communications. ACM, 2017, pp. 55–59.
Helgrind [81] × × ✓ ✓ × ×
[25] N. Chatterjee, S. S. Thakur, and P. P. Das, “Resource management in na-
Valgrind [39] × × ✓ ✓ × ×
tive languages using dynamic binary instrumentation (pin),” in Advanced
Open Source? N Y Y Y Y Y
Computing and Systems for Security. Springer, 2016, pp. 107–119.
Determine Thread Model × × × × × ×
[26] CDAC. In house pthreads benchmarks. Last Ac-
Detect Deadlock × ✓ ✓ ✓ ✓ × cessed: February 1, 2019. [Online]. Available:
Detect Datarace × × P P P × https://www.cdac.in/index.aspx?id=ev_hpc_hypack_pthreads_overview
Detect Potential Livelock × × × × × × [27] A. Jannesari, K. Bao, V. Pankratius, and W. F. Tichy, “Helgrind+: An
Memory Leak Detection ✓ ✓ ✓ ✓ ✓ ✓ efficient dynamic race detector,” in 2009 IEEE International Symposium
Memory Overflow Detection P × P P P × on Parallel & Distributed Processing. IEEE, 2009, pp. 1–13.
Optional GC × × × × × × [28] G. FSF, “Gdb: The gnu project debugger,” Free Software Foundation, Inc.,
Dynamic Aspect Injection × × × × × P 2017. [Online]. Available: http://www.gnu.org/software/gdb
✓- Feature Present × - Feature Absent P - Partial [29] R. DeLine, A. Bragdon, K. Rowan, J. Jacobsen, and S. P. Reiss, “Debugger
canvas: industrial experience with the code bubbles paradigm,” in 2012
34th International Conference on Software Engineering (ICSE). IEEE,
2012, pp. 1064–1073.
[4] H. Krasner, “The cost of poor quality software in the us: A 2018 report,”
[30] S. Chakraborty and V. Vafeiadis, “Validating optimizations of concurrent
Consortium for IT Software Quality (CISQ), USA, Tech. Rep., 2018.
c/c++ programs,” in International Symposium on Code Generation and
[5] L. Prechelt, “An empirical comparison of c, c++, java, perl, python, rexx
Optimization, 2016, pp. 216–226.
and tcl,” IEEE Computer, IEEE, vol. 33, no. 10, pp. 23–29, 2000.
[31] Microsoft, “Visual studio profiler,” last Accessed: May 3, 2020. [Online].
[6] D. Geer, “Chip makers turn to multicore processors,” Computer, vol. 38,
Available: http://msdn. microsoft.com/en-us/magazine/cc337887.aspx
no. 5, pp. 11–13, 2005.
[7] S.-E. Choi and E. C. Lewis, “A study of common pitfalls in simple multi- [32] A. K. Kolawa and C. E. Byers, “Modularizing a computer program for
threaded programs,” in SIGCSE technical symposium on Computer science testing and debugging,” May 17 2005, uS Patent 6,895,578.
education. ACM, 2000, pp. 325–329. [33] V. Developers, “Memcheck: a memory error detector,” valgrind.org, 2021.
[8] C.-P. Chen, “The parallel debugging architecture in the intel® debug- [Online]. Available: https://valgrind.org/docs/manual/mc-manual.html
ger,” in International Conference on Parallel Computing Technologies. [34] Y. Chen, Y.-H. Lee, W. E. Wong, and D. Guo, “A race condition graph for
Springer, 2003, pp. 444–451. concurrent program behavior,” in International Conference on Intelligent
[9] software.intel.com, “Intel inspector user guide for linux* os,” last System and Knowledge Engineering (ISKE). IEEE, 2008, pp. 662–667.
Accessed: May 3, 2020. [Online]. Available: https://software.intel.com/en- [35] Y. W. Song and Y.-H. Lee, “Efficient data race detection for c/c++ pro-
us/inspector-user-guide-linux-data-race grams using dynamic granularity,” in International Conference on Parallel
[10] G. Zitzlsberger, “Using intel® c++ compiler with the eclipse* and Distributed Processing Symposium (ICPDPS). IEEE, 2014, pp. 679–
ide on linux,” last Accessed: May 3, 2020. [Online]. Available: 688.
https://software.intel.com/ [36] M. Christiaens and K. De Bosschere, “Accordion clocks: Logical clocks
[11] W. Zhang, J. Lim, R. Olichandran, J. Scherpelz, G. Jin, S. Lu, and for data race detection,” in European Conference on Parallel Processing.
T. Reps, “Conseq: detecting concurrency bugs through sequential errors,” Springer, 2001, pp. 494–503.
in Computer Architecture News (SIGARCH), vol. 39. ACM, 2011, pp. [37] M. Moiseev, M. Glukhikh, A. Zakharov, and H. Richter, “A static analysis
251–264. approach to data race detection in systemc designs,” in International
[12] N. Chatterjee, S. Majumdar, S. Sahoo, and P. P. Das, “Debugging multi- Symposium on Design and Diagnostics of Electronic Circuits & Systems
threaded applications using pin-augmented gdb(pgdb),” International (DDECS). IEEE, 2013, pp. 54–59.
Conf. Software Eng. Research and Practice, SERP’15, p. 109, 2015. [38] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wal-
[13] Y. Cai and L. Cao, “Effective and precise dynamic detection of hidden lace, V. J. Reddi, and K. Hazelwood, “Pin: building customized program
races for java programs,” in International Symposium on Foundations of analysis tools with dynamic instrumentation,” in Special Interest Group on
Software Engineering (FSE). ACM, 2015, pp. 450–461. programming languages notices (SIGPLAN). ACM, 2005, pp. 190–200.
[14] Microsoft. Microsoft concurrency visualizer. Last Accessed: [39] V. Developers, “Valgrind, http://valgrind.org/,” valgrind.org, 2017.
February 1, 2019. [Online]. Available: https://msdn.microsoft.com/en- [Online]. Available: http://valgrind.org/
us/library/dd537632.aspx [40] M. Rai, “Memory leak detection using windbg,
[15] J. Trumper, J. Bohnet, and J. Dollner, “Understanding complex multi- https://www.codeproject.com,” Code Project, 2008. [Online]. Available:
threaded software systems by using trace visualization,” in International https://www.codeproject.com
Symposium on Software Visualization (ISSV). ACM, 2010, pp. 133–142. [41] M. Pool, “Ccmalloc, http://cs.ecs.baylor.edu/ donahoo/tools/ccmalloc/,”
[16] M. Jain and D. Gopalani, “Use of aspects for testing software applica- cs.ecs.baylor.edu, 2022. [Online]. Available: http://cs.ecs.baylor.edu/ don-
tions,” in International Advance Computing Conference (IACC). IEEE, ahoo/tools/ccmalloc/
2015, pp. 282–285. [42] F. Germain, “Leaktracer - trace and analyze memory leaks in c++
[17] S. Iqbal and G. Allen, “Representing aspects in design,” in International programs, http://www.andreasen.org/leaktracer/,” andreasen.org, 2011.
Symposium on Theoretical Aspects of Software Engineering. IEEE, 2009, [Online]. Available: http://www.andreasen.org/LeakTracer/
pp. 313–314. [43] yurikovitch, “Memdebug, https://sourceforge.net/projects/memdebug/,”
[18] software.intel.com. Intel parallel stdio xe. Last Accessed: May 3, 2020. sourceforge.net, 2013. [Online]. Available:
[Online]. Available: https://software.intel.com/en-us/parallel-studio-xe https://sourceforge.net/projects/memdebug/

VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

[44] J. Belmonte, P. Dugerdil, and A. Agrawal, “A three-layer model of source [72] Microsoft, “Debugging in visual studio,” Microsoft Developer
code comprehension,” in Indian Software Engineering Conference (ISEC). Network, 2017. [Online]. Available: http://msdn.microsoft.com/en-
ACM, 2014, pp. 10–14. us/library/vstudio/sc65sadd.aspx
[45] M. Mirakhorli and J. Cleland-Huang, “Detecting, tracing, and monitoring [73] R. Staněk, “Apache netbeans,” Apache, 2020. [Online]. Available:
architectural tactics in code,” IEEE Transactions on Software Engineering, https://netbeans.org/
IEEE, vol. 42, no. 3, pp. 205–220, 2016. [74] MortenMacFly, “Code::blocks - the open source, cross platform, free
[46] D. Djuric and V. Devedzic, “Incorporating the ontology paradigm into c, c++ and fortran ide,” codeblocks.org, 2020. [Online]. Available:
software engineering: Enhancing domain-driven programming in clo- http://www.codeblocks.org/
jure/java,” IEEE Transactions on Systems, Man, and Cybernetics, IEEE, [75] robert-mueller albrecht, “Idb: Intel debugger,” Intel Software Developer
vol. 42, no. 1, pp. 3–14, 2012. Zone, August 2012. [Online]. Available: http://software.intel.com/en-
[47] K. Brown, “Design reverse-engineering and automated design-pattern us/articles/idb-linux
detection in smalltalk,” North Carolina State University, Tech. Rep., 1996. [76] IBM, “Ibm rational rose enterprise 7.0.0.4 ifix001,”
[48] H. Lee, H. Youn, and E. Lee, “Automatic detection of design pattern for IBM Rational Rose XDE, 2019. [Online]. Avail-
reverse engineering,” in International Conference on Software Engineer- able: https://www.ibm.com/support/pages/ibm-rational-rose-enterprise-
ing Research, Management & Applications (SERA). IEEE, 2007, pp. 7004-ifix001
577–583. [77] K. Du Bois, J. B. Sartor, S. Eyerman, and L. Eeckhout, “Bottle graphs: vi-
sualizing scalability bottlenecks in multi-threaded applications,” in Special
[49] G. Antoniol and Y.-G. Gueheneuc, “Feature identification: An epidemi-
Interest Group on programming languages notices (SIGPLAN). ACM,
ological metaphor,” IEEE Transactions on Software Engineering, IEEE,
2013, pp. 355–372.
vol. 32, no. 9, pp. 627–641, 2006.
[78] CodePlex, “Visual leak detector for visual c++ 2008-2015,
[50] S. P. Reiss, “Visualizing program execution using user abstractions,” in
https://vld.codeplex.com/,” vld.codeplex.com, 2017. [Online]. Available:
Symposium on Software Visualization (SOFTVIS). ACM, 2006, pp. 125–
https://vld.codeplex.com/
134.
[79] F. S. Foundation, “Gnu checker,
[51] J. Quante and R. Koschke, “Dynamic protocol recovery,” in Working https://www.gnu.org/software/checker/checker.html,” gnu.org, 2014.
Conference on Reverse Engineering (WCRE). IEEE, 2007, pp. 219–228. [Online]. Available: https://www.gnu.org/software/checker/checker.html
[52] N. R. Tallent and J. M. Mellor-Crummey, “Effective performance mea- [80] T. C. Team, “Threadsanitizer,”
surement and analysis of multithreaded applications,” in Special Interest https://clang.llvm.org/docs/ThreadSanitizer.html, 2020. [Online].
Group on programming languages notices (SIGPLAN). ACM, 2009, pp. Available: https://clang.llvm.org/docs/ThreadSanitizer.html
229–240. [81] V. Developers, “Helgrind: a thread error detector,” valgrind.org, 2019.
[53] O. Spinczyk, A. Gal, and W. Schröder-Preikschat, “Aspectc++: an aspect- [Online]. Available: https://valgrind.org/docs/manual/hg-manual.html
oriented extension to the c++ programming language,” in International
Conference on Tools Pacific: Objects for internet, mobile and embedded
applications. Australian Computer Society, Inc., 2002, pp. 53–60.
[54] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G.
Griswold, “An overview of aspectj,” in European Conference on Object-
Oriented Programming. Springer, 2001, pp. 327–354. NACHIKETA CHATTERJEE is currently a
[55] D. Geer, “Eclipse becomes the dominant java ide,” IEEE Computer, IEEE, Consultant in Tata Consultancy Services Ltd,
vol. 38, no. 7, pp. 16–18, 2005. Kolkata, India since 2006 and doctoral student in
[56] C. Griffin, “Introduction to the eclipse modeling framework,” in de OMG A.K.Choudhury School of Information Technol-
MDA Implementer’s Workshop, 2003. ogy, University of Calcutta, West Bengal, India.
[57] T. Ball and S. K. Rajamani, “The slam toolkit,” in International Confer- He received his B.Tech. in Information Technol-
ence on Computer Aided Verification. Springer, 2001, pp. 260–264. ogy from University of Calcutta in 2004. He also
[58] S. Blair-Chappell and A. Stokes, Parallel programming with intel parallel worked in the area of application development
studio XE. John Wiley & Sons, 2012. for Airlines & Retail domain in Skytech Solu-
[59] A. Kleen and B. Strong, “Intel processor trace on linux,” Tracing Summit, tions Pvt. Ltd., India for 2.5 years. Nachiketa
vol. 1, 2015. Chatterjee has received Best Performance Improvement and Innovation
[60] Intel. Pin 3.2 user guide. Https://software.intel.com/sites/landingpage/pintool/ Pride award in 2020 and 2021 from TCS for accelerating the process with
Last Accessed: April 25, 2020. improved tool strategy for faster time to market for world’s second DIY
[61] Quarkslab. A dynamic binary instrumentation framework based on llvm. retailer. His main research interest is to improve the software development
Https://github.com/QBDI/QBDI Last Accessed: April 25, 2020. process with efficient productivity tools, focusing on profiling and analytics.
[62] Q. Wang, H. Shu, Y. Li, and H.-J. Huang, “Malicious code behavior
analysis based on dynamorio,” Computer Engineering, vol. 37, p. 18, 2011.
[63] N. Nethercote and J. Seward, “Valgrind: a framework for heavyweight
dynamic binary instrumentation,” ACM Sigplan notices, ACM, vol. 42,
no. 6, pp. 89–100, 2007.
SRIJONI MAJUMDAR is currently a post doc-
[64] A. Almomany, A. Alquraan, and L. Balachandran, “Gcc vs. icc comparison toral researcher in the School of Computing,
using parsec benchmarks,” IJITEE, vol. 4, no. 7, 2014.
University of Leeds and work in the area of
[65] IBM, “Eclipse cdt (c/c++ development tooling),” Eclipse Foundation,
computational social sciences. She recieved her
2020. [Online]. Available: https://www.eclipse.org/cdt/
doctorate degree in the area of program analysis
[66] J. Reinders, “Vtune performance analyzer essentials,” Intel Press, 2005.
and knowlege mining using machine learning
[67] Software verify. Last Accessed: February 1, 2021. [Online]. Available:
frameworks from the Advanced Technology De-
https://www.softwareverify.com/contact-software-verification.php
velopment Centre, Indian Institute of Technology,
[68] T. L. Team, “The lldb debugger,” The LLDB Team, 2020. [Online].
Kharagpur, India. She has worked in the area of
Available: https://lldb.llvm.org/
performance engineering of software systems and
[69] A. K. Ghoshal, N. Chatterjee, A. Chakrabarti, and P. Das, “Design of
pin-augmented debugger for multi-threaded applications,” Innovations in data analytics in Tata Consultancy Services Ltd, Mumbai, India for 2.5
Computer Science and Engineering 2019, pp. 153–159, May 2018. years. Her main research interest is software maintenance, focusing on
[70] N. Chatterjee, S. Thakur, and P. P. Das, “Resource management in native building knowledge mining systems from source code and related metadata
languages using dynamic binary instrumentation (pin),” 2nd Interna- (Big Code). She is actively involved with several developers from the
tional Doctoral Symposium on Applied Computation and Security Systems software industry for her research on Software Maintenance. Srijoni is a
(ACSS), 2015, p. 107, 2015. student member of the IEEE and an executive member of IEEE Women in
[71] N. Chatterjee, S. Bose, and P. P. Das, “Dynamic weaving of aspects in Engineering, Asia Pacific Kharagpur Branch. More information is available
c/c++ using pin,” HP3C ’17, pp. 55–59, 2017. at https://sites.google.com/site/srijonicse/home.

14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3293525

Nachiketa Chatterjee et al.: PARALLEL C-A SSIST: Productivity Accelerator Suite based on Dynamic Instrumentation

DR. PARTHA PRATIM DAS is a professor at the


Department of Computer Science and Engineer-
ing, Indian Institute of Technology, Kharagpur,
India. He is currently on long leave from IIT
Kharagpur and working as a visiting professor at
Ashoka University, India. He was the Joint Prin-
cipal Investigator of the National Digital Library
of India project of the Ministry of Education,
Government. of India from 2015 to 2022, and led
the initiative to integrate the digital repositories of
various institutions and publishers across India.
Dr Das received his BTech, MTech, and PhD degrees in 1984, 1985, and
1988, respectively from IIT Kharagpur. He served as a faculty member in
the Department of Computer Science and Engineering, IIT Kharagpur from
1988 to 1998. In 1998, he moved to the industry and served in director
positions until 2011. His current interests include software productivity and
quality, human-computer interaction, computer analysis of Indian classical
dance, and technology-enhanced learning. He has published more than 100
papers in national and international journals and conferences.
Dr. Das has received several recognitions, including the UN-
ESCO/ROSTSCA Young Scientist (1989), INSA Young Scientist Award
(1990), Young Associate-ship of Indian Academy of Sciences (1992),
UGC Young Teachers’ Career Award (1993), INAE Young Engineer Award
(1996), Interra Special (Process) Recognition (2009), and Interra 10 Years’
Tenure Plaque (2011). He is also a co-recipient of m Billionth Awards by
the Digital Empowerment Foundation (2017), Gems of Digital India Award
(2019), and Open Education Award for Excellence in Open Resilience
Category (2020) for the National Digital Library of India. Dr. Das is
currently the editor-in-chief of the Journal of the Institution of Engineers:
Series B.

DR AMLAN CHAKRABARTI is a Professor in


the A.K.Choudhury School of Information Tech-
nology at the University of Calcutta. He is an
M.Tech. from University of Calcutta and did
his Doctoral research at Indian Statistical Insti-
tute, Kolkata. He was a Post-Doctoral fellow at
the School of Engineering, Princeton University,
USA during 2011-2012. He is the recipient of
the DST BOYSCAST fellowship award in Engi-
neering Science (2011), Indian National Science
Academy (INSA) Visiting Faculty Fellowship (2014), JSPS Invitation
Research Award (2016), Erasmus Mundus Leaders Award from European
Union (2017) and Hamied Visiting Professorship from the University
of Cambridge, UK (2018) He is a Sr. Member of IEEE and ACM,
IEEE Computer Society Distinguished Visitor (2020-2022), Distinguished
Speaker of ACM, Secretary of IEEE CEDA India Chapter, Vice President
of Society for Data Science and Life Member of CSI India. He is
the Series Editor of the Springer Transactions on Computer Systems
and Networking, Associate Editor of Elsevier Journal of Computers and
Electrical Engineering and Guest Editor of Springer Journal of Applied
Sciences. His areas of interest are Machine Learning, Computer Vision,
Cyber-physical Systems, Reconfigurable Computing, Quantum Computing
and VLSI CAD.

VOLUME 4, 2016 15

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like