0% found this document useful (0 votes)
27 views

Dynamic Filtering and Prioritization of Static Code Analysis Alerts

This document discusses a proposed approach to dynamically filter and prioritize static code analysis alerts as developers review them. It constructs a Prolog knowledge base to capture code data flow and reported alerts/properties. As developers review alerts and identify actual faults, the knowledge base is updated, providing information to eliminate or prioritize remaining alerts based on shared root causes. An example is presented to illustrate how tools could automate this approach.

Uploaded by

cagla.cengiz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Dynamic Filtering and Prioritization of Static Code Analysis Alerts

This document discusses a proposed approach to dynamically filter and prioritize static code analysis alerts as developers review them. It constructs a Prolog knowledge base to capture code data flow and reported alerts/properties. As developers review alerts and identify actual faults, the knowledge base is updated, providing information to eliminate or prioritize remaining alerts based on shared root causes. An example is presented to illustrate how tools could automate this approach.

Uploaded by

cagla.cengiz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)

Dynamic Filtering and Prioritization of


Static Code Analysis Alerts
2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW) | 978-1-6654-2603-9/21/$31.00 ©2021 IEEE | DOI: 10.1109/ISSREW53611.2021.00086

Ulaş Yüksel Hasan Sözer


Vestel Electronics Ozyegin University
Manisa, Turkey Istanbul, Turkey
[email protected] [email protected]

Abstract—We propose an approach for filtering and prioritiz- review process regarding an alert constitutes useful informa-
ing static code analysis alerts while these alerts are being reviewed tion for the validity and priority of other alerts. Our goal is to
by the developer. We construct a Prolog knowledge base that record and exploit this information. We propose a novel and
captures the data flow information in the source code as well as
the reported alerts, their properties and associations with the data complementary approach for filtering and prioritizing static
flow. The knowledge base is updated as the developer reviews the code analysis alerts while these alerts are being reviewed by
listed alerts and decides whether they point at an actual fault or the developer.
not. These updates provide useful information since some of the
alerts of the same type can be related in terms of their root cause.
Hence, dynamically updated knowledge base can be queried to 
 
eliminate or prioritize the remaining alerts in the review list. We 
present a motivating example to illustrate the approach and its
automation by integrating a set of tools.
Index Terms—program analysis, static code analysis, process-
ing alarms/warnings/alerts, Prolog, code reviews

I. I NTRODUCTION
Static code analysis tools [1] analyze source code without
executing it. They pinpoint potential software faults that might
lead to failures at runtime. Their output constitute a list of
alerts [2] (also called as alarms [3] and warnings [4]) each
of which describes a potential fault together with a number   
     
of features such as the corresponding line of code, type and
severity of the fault. As strong points, the analysis is fully
automated and scalable. As a drawback, developers are usually
   
exposed to a large number of alerts, some of which are subject       
to false positives [2], [5], [6], although some others can be
associated with critical faults [5]. Empirical studies report false Fig. 1. The overall approach and the toolset.
positive rates that range between 30% and 100% [7] and the
density of alerts can be typically 2 alerts per KLOC (thousand The overall approach is depicted in Figure 1, which relies
lines of code) on average [6]. As a result, around 3,000 alerts on a knowledge base managed by a Prolog engine. We
are generated for a system with 1,500 KLOC. Each of these construct a Prolog knowledge base that captures the data flow
alerts should be manually inspected by developers to focus on information in the source code as well as the listed alerts, their
those that are true positives, i.e., alerts that are actionable [2]. properties and root causes. The knowledge base is updated as
This inspection process is time and effort consuming. 250 man the developer reviews the source code according to the listed
hours might be needed to inspect 3,000 alerts assuming the alerts one by one and decides whether they point at an actual
inspection time per alert is 5 minutes on average [2], [8]. fault or not. This decision is added to the knowledge base,
To address this problem, numerous approaches have been which is then queried after each such update to re-prioritize the
proposed for providing developers a reduced and/or prioritized remaining alerts. In the next section, we present a motivating
list of alerts [2], [3] by employing a variety of techniques example and a set of related tools for automating the approach.
including testing [9], runtime verification [10] and machine The most recent and the most related work to ours is
learning [11]. We argue that the list of alerts can still be that of Zhang et al. [13], where an interactive approach is
reduced or better (re)prioritized while they are being reviewed proposed to learn from the developer feedback for prioritizing
by the developer. Some alerts are related with each other in alerts. The developer answers a set of questions and the
terms of their root causes [12]–[14]. Hence, the outcome of the corresponding answers are recorded in the form of Datalog

978-1-6654-2603-9/21/$31.00 ©2021 IEEE 294


DOI 10.1109/ISSREW53611.2021.00086
Authorized licensed use limited to: Ozyegin Universitesi. Downloaded on February 08,2024 at 10:27:17 UTC from IEEE Xplore. Restrictions apply.
(a subset of Prolog) facts together with the rules regarding detect the source of both NUM2 and NUM3 as the function
alerts. Our goal is to provide a less intrusive approach where get numner in the first code snippet. The knowledge base can
the alert review process remains the same for the developer. be saved for later use and it can extended with new facts
The only additional effort will include the labeling of alerts dynamically. For instance, the following fact can be added
(as actionable or not [2]) as they are already being reviewed. later on as a developer decision.
Developers will not be exposed to any kind of formalism.
tp(a1)
II. M OTIVATING E XAMPLE
Then, the knowledge base can be queried for filtering alerts.
A sample code snippet in Python is listed below. Hereby, For instance, the following query in Pytholog retrieves those
the values of both NUM2 and NUM3 are obtained as the return alerts that are actionable.
value of the function get number in lines 2 and 3, respectively.
scatkb.query(pl.Expr("actionable(X)"))
1 NUM1 = 5
2 NUM2 = get_number() As a result of this query and the current status of the
3 NUM3 = get_number() knowledge base, a list of alerts are displayed as follows.
4 result = NUM1 / NUM2
[{’X’: ’a1’}, {’X’: ’a2’}]
5 result = NUM1 / NUM3
6 print(result) The developer does not have to manually add facts to the
Static code analysis tool Semgrep1 allows the specification knowledge base. The review process is not disrupted as the
of patterns for reporting alerts. For instance, the pattern shown developer makes a decision for each alert in a given order.
below leads to two division by zero alerts for lines 4 and 5 in These decisions can be gradually added to the knowledge base
the listed code snippet above. as facts. The status of the remaining alerts can automatically
1 $ZERO = $FUNC(...) be derived due to the logical links among them that are
2 ... established based on the shared data sources.
3 $X / $ZERO R EFERENCES
An alert that is reported based on this pattern is true [1] A. Gosain and G. Sharma, Static Analysis: A Survey of Techniques and
positive (TP), i.e., points at an actual fault, if the function Tools. New Delhi: Springer India, 2015, pp. 581–591.
get number returns 0. This rule can be expressed in Pytholog2 [2] S. Heckman and L. Williams, “A systematic literature review of action-
able alert identification techniques for automated static code analysis,”
only once per alert type in a generic way as shown in line 3 Information and Software Technology, vol. 53, no. 4, pp. 363–387, 2011.
of the code snippet below. The rule specifies that an alert A is [3] T. Muske and A. Serebrenik, “Survey of approaches for handling static
TP if it is of type divison by zero and if the corresponding analysis alarms,” in Proceedings of the 16th IEEE International Working
Conference on Source Code Analysis and Manipulation, Raleigh, NC,
data source is V and if V evaluates to 0. There are two USA, 2016, pp. 157–166.
other rules that specify when an alert is actionable. First, [4] M. Li, Y. Chen, L. Wang, and G. Xu, “Dynamically validating static
the corresponding alert can be directly labelled as TP by the memory leak warnings,” in Proceedings of the 2013 International
Symposium on Software Testing and Analysis, 2013, pp. 112–122.
developer (Line 4 below). Second, the corresponding alert [5] P. Anderson, “Measuring the value of static-analysis tool deployments,”
can have the same type and source with another alert that IEEE Security and Privacy, vol. 10, no. 3, pp. 40–47, 2012.
is labelled by the developer as TP (Lines 5-7 below). These [6] U. Yuksel and H. Sozer, “Automated classification of static code analysis
alerts: A case study,” in Proceedings of the 29th IEEE Conference on
rules can be added to a knowledge base together with the Software Maintenance, Eindhoven, Netherlands, 2013, pp. 532–535.
information regarding the listed alerts as shown below. [7] T. Kremenek and D. Engler, “Z-ranking: using statistical analysis to
1 scatkb = pl.KnowledgeBase("scat") counter the impact of static analysis approximations,” in Proceedings
of the 10th international conference on Static analysis, San Diego, CA,
2 scatkb([ USA, 2003, pp. 295–315.
3 "tp(A):-type(A,divby0),src(A,V),zero(V)", [8] “Effective management of static analysis vulnerabilities and defects,”
4 "actionable(A):-tp(A)", White Paper, Coverity Inc., 2009.
[9] A. K. Joshy, X. Chen, B. Steenhoek, and W. Le, “Validating static
5 "actionable(Z):-tp(A), warnings via testing code fragments,” in Proceedings of the 30th ACM
6 type(A,divby0),src(A,V), SIGSOFT International Symposium on Software Testing and Analysis,
7 type(Z,divby0),src(Z,V)", 2021, p. 540–552.
[10] H. Sozer, “Integrated static code analysis and runtime verification.”
8 "type(a1,divby0)","src(a1,lib1)", Software Practice and Experience, vol. 45, no. 10, pp. 1359–1373, 2015.
9 "type(a2,divby0)","src(a2,lib1)"]) [11] U. Yuksel, H. Sozer, and M. Sensoy, “Trust-based fusion of classifiers
for static code analysis,” in Information Fusion (FUSION), 2014 17th
Note that rules are agnostic to the analyzed system and some International Conference on, Salamanca, Spain, 2014, pp. 1–6.
of the facts regarding alerts, e.g., type(a1,divby0), are provided [12] T. Muske and U. Khedker, “Cause points analysis for effective handling
in the alert description. Data sources can also be automatically of alarms,” in Proceedings of the 27th IEEE International Symposium
on Software Reliability Engineering, 2016, pp. 173–184.
obtained with a data flow analysis. For instance, there exists a [13] X. Zhang, R. Grigore, X. Si, and M. Naik, “Effective interactive reso-
Typescript library3 for Python source code analysis, which can lution of static analysis alarms,” Proceedigns of the ACM Programming
Languages, vol. 1, no. 57, pp. 1–30, 2017.
1 https://semgrep.dev/ [14] T. Muske, R. Talluri, and A. Serebrenik, “Repositioning of static analysis
2 https://pypi.org/project/pytholog/ alarms,” in Proceedings of the 27th ACM SIGSOFT International
3 https://github.com/microsoft/python-program-analysis Symposium on Software Testing and Analysis, 2018, pp. 187–197.

295

Authorized licensed use limited to: Ozyegin Universitesi. Downloaded on February 08,2024 at 10:27:17 UTC from IEEE Xplore. Restrictions apply.

You might also like