Dynamic Filtering and Prioritization of Static Code Analysis Alerts
Dynamic Filtering and Prioritization of Static Code Analysis Alerts
Abstract—We propose an approach for filtering and prioritiz- review process regarding an alert constitutes useful informa-
ing static code analysis alerts while these alerts are being reviewed tion for the validity and priority of other alerts. Our goal is to
by the developer. We construct a Prolog knowledge base that record and exploit this information. We propose a novel and
captures the data flow information in the source code as well as
the reported alerts, their properties and associations with the data complementary approach for filtering and prioritizing static
flow. The knowledge base is updated as the developer reviews the code analysis alerts while these alerts are being reviewed by
listed alerts and decides whether they point at an actual fault or the developer.
not. These updates provide useful information since some of the
alerts of the same type can be related in terms of their root cause.
Hence, dynamically updated knowledge base can be queried to
eliminate or prioritize the remaining alerts in the review list. We
present a motivating example to illustrate the approach and its
automation by integrating a set of tools.
Index Terms—program analysis, static code analysis, process-
ing alarms/warnings/alerts, Prolog, code reviews
I. I NTRODUCTION
Static code analysis tools [1] analyze source code without
executing it. They pinpoint potential software faults that might
lead to failures at runtime. Their output constitute a list of
alerts [2] (also called as alarms [3] and warnings [4]) each
of which describes a potential fault together with a number
of features such as the corresponding line of code, type and
severity of the fault. As strong points, the analysis is fully
automated and scalable. As a drawback, developers are usually
exposed to a large number of alerts, some of which are subject
to false positives [2], [5], [6], although some others can be
associated with critical faults [5]. Empirical studies report false Fig. 1. The overall approach and the toolset.
positive rates that range between 30% and 100% [7] and the
density of alerts can be typically 2 alerts per KLOC (thousand The overall approach is depicted in Figure 1, which relies
lines of code) on average [6]. As a result, around 3,000 alerts on a knowledge base managed by a Prolog engine. We
are generated for a system with 1,500 KLOC. Each of these construct a Prolog knowledge base that captures the data flow
alerts should be manually inspected by developers to focus on information in the source code as well as the listed alerts, their
those that are true positives, i.e., alerts that are actionable [2]. properties and root causes. The knowledge base is updated as
This inspection process is time and effort consuming. 250 man the developer reviews the source code according to the listed
hours might be needed to inspect 3,000 alerts assuming the alerts one by one and decides whether they point at an actual
inspection time per alert is 5 minutes on average [2], [8]. fault or not. This decision is added to the knowledge base,
To address this problem, numerous approaches have been which is then queried after each such update to re-prioritize the
proposed for providing developers a reduced and/or prioritized remaining alerts. In the next section, we present a motivating
list of alerts [2], [3] by employing a variety of techniques example and a set of related tools for automating the approach.
including testing [9], runtime verification [10] and machine The most recent and the most related work to ours is
learning [11]. We argue that the list of alerts can still be that of Zhang et al. [13], where an interactive approach is
reduced or better (re)prioritized while they are being reviewed proposed to learn from the developer feedback for prioritizing
by the developer. Some alerts are related with each other in alerts. The developer answers a set of questions and the
terms of their root causes [12]–[14]. Hence, the outcome of the corresponding answers are recorded in the form of Datalog
295
Authorized licensed use limited to: Ozyegin Universitesi. Downloaded on February 08,2024 at 10:27:17 UTC from IEEE Xplore. Restrictions apply.