Exploring The Relationship Between Web Application Development Tools and Security
Exploring The Relationship Between Web Application Development Tools and Security
40
did not study this in detail.
30
Vulnerabilities introduced later in the product cycle.
Our study considers only those vulnerabilities introduced
20
during initial product development. Continued develop-
ment brings new challenges for developers that simply
10
were not present in this experiment. Our results do not
answer any questions about vulnerabilities introduced
during code maintenance or when features are added af-
0
Java 3 Java 4 Java 9 PHP 6 PHP 7 PHP 8 Perl 1 Perl 2 Perl 5
ter initial product development. Manual Both
Black-box
20
8
15
6
10
4
5
2
0
Java 3 Java 4 Java 9 PHP 6 PHP 7 PHP 8 Perl 1 Perl 2 Perl 5 Java 3 Java 4 Java 9 PHP 6 PHP 7 PHP 8 Perl 1 Perl 2 Perl 5
2
2
1
1
0
Java 3 Java 4 Java 9 PHP 6 PHP 7 PHP 8 Perl 1 Perl 2 Perl 5 Java 3 Java 4 Java 9 PHP 6 PHP 7 PHP 8 Perl 1 Perl 2 Perl 5
SQL injection. Very few SQL injection vulnerabilities CSRF. As seen in Table 5, all of the PHP and Perl
were found. Only two implementations had any such implementations, and 1 of 3 Java implementations were
vulnerabilities, and only 4 were found in total. The dif- vulnerable to CSRF attacks. Fishers exact test reveals
ference between languages is not statistically significant that the difference between languages is not statistically
(F = 0.70, p = 0.5330). significant (p = 0.25).
Session management. All implementations other than
Authentication and authorization bypass. No such 2 of the 3 Perl implementations were found to implement
vulnerabilities were found in 5 of the 9 implementations. secure session management. That is, the Perl implemen-
Each of the other 4 had only 1 or 2 such vulnerabilities. tations were the only ones with vulnerable session man-
The difference between languages is not statistically sig- agement. Fishers exact test reveals that the difference is
nificant (F = 0.17, p = 0.8503). not statistically significant (p = 0.25).
Vulnerabilities found by
Team Language Manual Black- Both Total
Number only box only
1 Perl 4 1 0 5
2 Perl 3 1 0 4
5 Perl 12 3 18 33
3 Java 1 7 0 8 20 19 52
4 Java 2 2 0 4
9 Java 5 0 0 5
6 PHP 7 3 0 10
7 PHP 7 3 0 10
8 PHP 11 0 1 12
Table 6: Number of vulnerabilities found in the implementa- Black-box Manual
tions of People by Temperament. The Vulnerabilities found Figure 3: Vulnerabilities found by manual analysis and black-
by columns display the number of vulnerabilities found only box penetration testing.
by manual analysis, only by black-box testing, and by both
techniques, respectively. The final column displays the total
number of vulnerabilities found in each implementation. Manual review is the clear winner for authentication
and authorization bypass and stored XSS vulnerabilities,
while black-box testing finds more reflected XSS and
Insecure password storage. Most of the implementa- SQL injection vulnerabilities. This motivates the need
tions used some form of insecure password storage, rang- for further research and development of better black-box
ing from storing passwords in plaintext to not using a salt penetration testing techniques for stored XSS and au-
before hashing the passwords. One Perl and one Java thentication and authorization bypass vulnerabilities. We
implementation did not violate current best practices for note that recent research has made progress toward find-
password storage. There does not, however, appear to be ing authentication and authorization bypass vulnerabili-
any association between programming language and in- ties [9, 13], but these are not black-box techniques.
secure password storage. Fishers exact test does not find
a statistically significant difference (p = 0.999). Reviewer ability. We now discuss the 20 vulnerabil-
ities that were not found manually. Our analysis of
these vulnerabilities further supports our conclusion that
4.3 Manual review vs. black-box testing black-box testing complements manual review.
Table 6 and Figure 1 list how many vulnerabilities were For 40% (8) of these, the reviewer found at least one
found only by manual analysis, only by black-box test- similar vulnerability in the same implementation. That
ing, and by both techniques. All vulnerabilities in the bi- is, there is evidence that the reviewer had the skills and
nary vulnerability classes were found by manual review, knowledge required to identify these vulnerabilities, but
and none were found by black-box testing. overlooked them. This suggests that we cannot expect a
We observe that manual analysis fared better overall, reviewer to have the consistency of an automated tool.
finding 71 vulnerabilities (including the binary vulner- For another 40%, the vulnerability detected by the tool
ability classes), while black-box testing found only 39. was in framework code, which was not analyzed by the
We also observe that there is very little overlap between reviewer. An automated tool may find vulnerabilities that
the two techniques; the two techniques find different vul- reviewers are not even looking for.
nerabilities. Out of a total of 91 vulnerabilities found by The remaining 20% (4) represent vulnerabilities for
either technique, only 19 were found by both techniques which no similar vulnerabilities were found by the re-
(see Figure 3). This suggests that they are complemen- viewer in the same implementation. It is possible that
tary, and that it may make sense for organizations to use the reviewer lacked the necessary skills or knowledge to
both. find these vulnerabilities.
Organizations commonly use only black-box testing.
These results suggest that on a smaller budget, this prac-
4.4 Framework support
tice makes sense because either technique will find some
vulnerabilities that the other will miss. If, however, an or- We examine whether stronger framework support is as-
ganization can afford the cost of manual review, it should sociated with fewer vulnerabilities. Figure 4 displays the
supplement this with black-box testing. The cost is small relationship for each integer-valued vulnerability class
relative to that of review, and our results suggest that between the level of framework support for that class and
black-box testing will find additional vulnerabilities. the number of vulnerabilities in that class. If for some
Figure 2 reveals that the effectiveness of the two tech- vulnerability class there were an association between the
niques differs depending upon the vulnerability class. level of framework support and the number of vulnerabil-
Number of Vulnerabilities vs. Framework Support management have vulnerable session management. Sim-
30
ilarly, only the two implementations that have framework
support for CSRF were not found to be vulnerable to
CSRF attacks. Both results were found to be statistically
Number of Vulnerabilities
20
no support manual
Framework Support
opt in opt out
ities. During our manual source code review, we fre-
XSS SQL Injection
quently observed that developers were able to correctly
Auth. Bypass use manual support mechanisms in some places, but they
forgot or neglected to do so in other places.
Figure 4: Level of framework support vs. number of vulnera-
bilities for integer-valued vulnerability classes. The area of a Figure 5 presents the results from our identification of
mark scales with the number of observations at its center. the lowest level at which framework support exists that
could have prevented each individual vulnerability (as
ities, we would expect most of the points to be clustered described in Section 3.4).
around (or below) a line with a negative slope.
For each of the three4 classes, we performed a one- It is rare for developers not to use available automatic
way ANOVA test between framework support for the support (the darkest bars in Figure 5b show only 2 such
vulnerability class and number of vulnerabilities in the vulnerabilities), but they commonly fail to use existing
class. None of these results are statistically significant. manual support (the darkest bars in Figure 5a, 37 vul-
Our data set allows us to compare only frameworks nerabilities). In many cases (30 of the 91 vulnerabilities
with no support to frameworks with manual support be- found), the existing manual support was correctly used
cause the implementations in our data set do not use elsewhere. This suggests that no matter how good man-
frameworks with stronger support (with one exception). ual defenses are, they will never be good enough; devel-
We found no significant difference between these levels opers can forget to use even the best manual framework
of support. However, this data set does not allow us to support, even when it is evident that they are aware of it
examine the effect of opt-in, opt-out, or always-on sup- and know how to use it correctly.
port on vulnerability rates. In future work, we would like
to analyze implementations that use frameworks with For both manual and automatic support, the major-
stronger support for these vulnerability classes. Exam- ity of vulnerabilities could have been prevented by sup-
ple frameworks include CodeIgniters xss clean [1], port from another framework for the same language that
Google Ctemplate [3], and Djangos autoescape [2], the implementation used. That is, it appears that strong
all of which provide opt-out support for preventing XSS framework support exists for most vulnerability classes
vulnerabilities. A more diverse data set might reveal re- for each language in this study.
lationships that cannot be gleaned from our current data.
The annotations in Figure 5 point out particular
Table 5 displays the relationship between framework
shortcomings of frameworks for different vulnerability
support and vulnerability status for each of the binary
classes. We did not find any framework that provides
vulnerability classes.
any level of support for sanitizing untrusted output in a
There does not appear to be any relationship for pass-
JavaScript context, which Team 3 failed to do repeatedly,
word storage. Many of the implementations use frame-
leading to 3 reflected XSS vulnerabilities. We were also
works that provide opt-in support for secure password
unable to find a PHP framework that offers automatic
storage, but they do not use this support and are therefore
support for secure password storage, though we were
vulnerable anyway. This highlights the fact that manual
able to find many tutorials on how to correctly (but man-
framework support is only as good as developers aware-
ually) salt and hash passwords in PHP. Finally, we are not
ness of its existence.
aware of any automatic framework support for prevent-
Session management and CSRF do, on the other hand,
ing authorization bypass vulnerabilities. Unlike the other
appear to be in such a relationship. Only the two im-
vulnerability classes we consider, these require correct
plementations that lack framework support for session
policies; in this sense, this vulnerability class is funda-
4 The
level of framework support for stored XSS and reflected XSS mentally different, and harder to tackle, as acknowledged
is identical in each implementation, so we combined these two classes. by recent work [9, 13].
Where manual support exists to prevent vulnerabilities Where automatic support exists to prevent vulnerabilities
35 35
30 30 Authorization
No known framework No known framework
bypass
Some fwk. for some language Some fwk. for some language
25 Diff. fwk. for language used 25 Diff. fwk. for language used
Newer version of fwk. used Newer version of fwk. used
Framework used Framework used
20 20
0 0
Java3 Java4 Java9 PHP6 PHP7 PHP8 Perl1 Perl2 Perl5 Java3 Java4 Java9 PHP6 PHP7 PHP8 Perl1 Perl2 Perl5
(a) Manual framework support (b) Automatic framework support
Figure 5: For each vulnerability found, how far developers would have to stray from the technologies they used in order to find
framework support that could have prevented each vulnerability, either manually (left) or automatically (right).
4.5 Limitations of statistical analysis the average number of serious vulnerabilities over the
lifetime of the applications. For example, in their sample
We caution the reader against drawing strong, general- of applications, 57% of the vulnerabilities in JSP appli-
izable conclusions from our statistical analysis, and we cations were XSS vulnerabilities, while only 52% of the
view even our strongest results as merely suggestive but vulnerabilities in Perl applications were XSS vulnerabil-
not conclusive. Although we entered this study with spe- ities. Another finding was that PHP applications were
cific goals and hypotheses (as described in Section 2), found to have an average of 26.6 vulnerabilities over their
results that appear statistically significant may not in fact lifetime, while Perl applications had 44.8 and JSP appli-
be valid they could be due to random chance. cations had 25.8. The report makes no mention of statis-
When testing 20 hypotheses at a 0.05 significance tical significance, but given the size of their data set, one
level, we expect one of them to appear significant purely can expect all of their findings to be statistically signifi-
by chance. We tested 19 hypotheses in this study, and cant (though not necessarily practically significant).
3 of them appeared to be significant. Therefore, we
Walden et al. [25] measured the vulnerability density
should not be surprised if one or two of these seemingly-
of the source code of 14 PHP and 11 Java applications,
significant associations are in fact spurious and due
using different static analysis tools for each set. They
solely to chance. We believe more powerful studies with
found that the Java applications had lower vulnerability
larger data sets are needed to convincingly confirm the
density than the PHP applications, but the result was not
apparent associations we have found.
statistically significant.
While these analyses sample across distinct applica-
5 Related work tions, ours samples across implementations of the same
application. Our data set is smaller, but its collection was
In this section, we survey related work, which falls more controlled. The first study focused on fixed combi-
into 3 categories: (1) studies of the relationship be- nations of programming language and framework (e.g.,
tween programming languages and application secu- Java JSP), and the second did not include a framework
rity, (2) comparisons of the effectiveness of different comparison. Our study focuses separately on language
automated black-box web application penetration test- and framework.
ing tools, and (3) comparisons of different bug- and Dwarampudi et al. [12] compiled a fairly comprehen-
vulnerability-finding techniques. sive list of pros and cons of the offerings of several dif-
Programming languages and security. The 9th edi- ferent programming languages with respect to many lan-
tion of the WhiteHat Website Security Statistic Re- guage features, including security. No experiment or data
port [26] offers what we believe is the best insight to analysis were performed as a part of this effort.
date regarding the relationship between programming Finally, the Plat Forms [19] study (from which the
language and application security. Their data set, which present study acquired its data) performed a shallow se-
includes over 1,500 web applications and over 20,000 curity analysis of the data set. They ran simple black-box
vulnerabilities, was gathered from the penetration-testing tests against the implementations in order to find indica-
service WhiteHat performs for its clients. tions of errors or vulnerabilities, and they found minor
Their report found differences between languages in differences. We greatly extended their study using both
the prevalence of different vulnerability classes as well as white- and black-box techniques to find vulnerabilities.
Automated black-box penetration testing. We are 6 Conclusion and future work
aware of three separate efforts to compare the effective-
ness of different automated black-box web application We have analyzed a data set of 9 implementations of the
security scanners. Suto [22] tested each scanner against same web application to look for security differences as-
the demonstration site of each other scanner and found sociated with programming language, framework, and
differences in the effectiveness of the different tools. His method of finding vulnerabilities. Each implementation
report lists detailed pros and cons of using each tool had at least one vulnerability, which indicates that it is
based on his experience. Bau et al. [5] tested 8 differ- difficult to build a secure web application even a small,
ent scanners in an effort to identify ways in which the well-defined one.
state of the art of black box scanning could be improved. Our results provide little evidence that programming
They found that the scanners tended to perform well on language plays a role in application security, but they
reflected XSS and (first-order) SQL injection vulnera- do suggest that the level of framework support for secu-
bilities, but poorly on second-order vulnerabilities (e.g., rity may influence application security, at least for some
stored XSS). We augment this finding with the result that classes of vulnerabilities. Even the best manual support
manual analysis performs better for stored XSS, authen- is likely not good enough; frameworks should provide
tication and authorization bypass, CSRF, insecure ses- automatic defenses if possible.
sion management, and insecure password storage, and In future work, we would like to evaluate more mod-
black-box testing performs better for reflected XSS and ern frameworks that offer stronger support for prevent-
SQL injection. ing vulnerabilities. We are aware of several frameworks
Doupe et al. [11] evaluated 11 scanners against a that provide automatic support for avoiding many types
web application custom-designed to have many different of XSS vulnerabilities.
crawling challenges and types of vulnerabilities. They We have found evidence that manual code review is
found that the scanners were generally poor at crawling more effective than black-box testing, but combining the
the site, they performed poorly against logic vulner- two techniques is more effective than using either one by
abilities (e.g., application-specific vulnerabilities, which itself. We found that the two techniques fared differently
often include authorization bypass vulnerabilities), and for different classes of vulnerabilities. Black-box testing
that they required their operators to have a lot of knowl- performed better for reflected XSS and SQL injection,
edge and training to be able to use them effectively. while manual review performed better for stored XSS,
authentication and authorization bypass, session man-
While these studies compare several black-box tools agement, CSRF, and insecure password storage. We be-
to one another, we compare the effectiveness of a sin- lieve these findings warrant future research with a larger
gle black-box tool to that of manual source code anal- data set, more reviewers, and more black-box tools.
ysis. Our choice regarding which black-box scanner to
We believe it will be valuable for future research to test
use was based in part on these studies.
the following hypotheses, which were generated from
this exploratory study.
Bug- and vulnerability-finding techniques. Wagner
H1: The practical significance of the difference in
et al. [24] performed a case study against 5 applications
security between applications that use different pro-
in which they analyzed the true- and false-positive rates
gramming languages is negligible. If true, pro-
of three static bug-finding tools and compared manual
grammers need not concern themselves with secu-
source code review to static analysis for one of the 5 ap-
rity when choosing which language to use (subject
plications. This study focused on defects of any type,
to the support offered by frameworks available for
making no specific mention of security vulnerabilities.
that language).
They found that all defects the static analysis tools dis-
covered were also found by the manual review. Our study H2: Stronger, more automatic, framework support
focuses specifically on security vulnerabilities in web ap- for vulnerabilities is associated with fewer vulnera-
plications, and we use a different type of tool in our study bilities. If true, recent advances in framework sup-
than they use in theirs. port for security have been beneficial, and research
into more framework-provided protections should
Two short articles [8, 15] discuss differences between
be pursued.
various tools one might consider using to find vulnera-
bilities in an application. The first lists constraints, pros, H3: Black-box penetration testing tools and manual
and cons of several tools, including source code analysis, source code review tend to find different sets of vul-
dynamic analysis, and black-box scanners. The second nerabilities. If true, organizations can make more
article discusses differences between white- and black- informed decisions regarding their strategy for vul-
box approaches to finding vulnerabilities. nerability remediation.
We see no reason to limit ourselves to exploring these [10] D ONALD , K., V ERVAET, E., AND S TOYANCHEV, R. Spring
hypotheses in the context of web applications; they are Web Flow: Reference Documentation, October 2007. http:
//static.springsource.org/spring-webflow/
equally interesting in the context of mobile applications, docs/1.0.x/reference/index.html.
desktop applications, and network services.
[11] D OUP E , A., C OVA , M., AND V IGNA , G. Why Johnny Cant
Finally, we note that future work in this area may ben- Pentest: An Analysis of Black-box Web Vulnerability Scanners.
efit from additional data sources, such as source code In Proceedings of the Conference on Detection of Intrusions and
repositories. These rich data sets may help us answer Malware and Vulnerability Assessment (DIMVA) (Bonn, Ger-
many, July 2010).
questions about (e.g.,) developers intentions or misun-
derstandings when introducing vulnerabilities and how [12] DWARAMPUDI , V., D HILLON , S. S., S HAH , J., S EBASTIAN ,
N. J., AND K ANIGICHARLA , N. S. Comparative study of the
vulnerabilities are introduced into applications over time. Pros and Cons of Programming languages: Java, Scala, C++,
A deeper understanding of such issues will aid us in de- Haskell, VB.NET, AspectJ, Perl, Ruby, PHP & Scheme. http:
signing new tools and processes that will help developers //arxiv.org/pdf/1008.3431.
write more secure software. [13] F ELMETSGER , V., C AVEDON , L., K RUEGEL , C., AND V IGNA ,
G. Toward Automated Detection of Logic Vulnerabilities in Web
Applications. In Proceedings of the USENIX Security Symposium
Acknowledgments (Washington, DC, August 2010).
[14] JASPAL. The best web development frameworks,
We thank Adrienne Felt, Erika Chin, and the anony- June 2010. http://www.webdesignish.com/
the-best-web-development-frameworks.html.
mous reviewers for their thoughtful comments on earlier
drafts of this paper. We also thank the Plat Forms team [15] M C G RAW, G., AND P OTTER , B. Software security testing. IEEE
Security and Privacy 2 (2004), 8185.
for their hard work in putting together the Plat Forms
contest. This research was partially supported by Na- [16] P ETERS , T. PEP 20 The Zen of Python. http://www.
python.org/dev/peps/pep-0020/.
tional Science Foundation grants CNS-1018924 and
[17] P ORT S WIGGER LTD . Burp Suite Professional. http://www.
CCF-0424422. Matthew Finifter was also supported by portswigger.net/burp/editions.html.
a National Science Graduate Research Fellowship. Any
[18] P RECHELT, L. Plat Forms 2007 task: PbT. Tech. Rep. TR-B-
opinions, findings, conclusions or recommendations ex- 07-10, Freie Universitat Berlin, Institut fur Informatik, Germany,
pressed in this publication are those of the author(s) and January 2007.
do not necessarily reflect the views of the National Sci- [19] P RECHELT, L. Plat Forms: A Web Development Platform Com-
ence Foundation. parison by an Exploratory Experiment Searching for Emergent
Platform Properties. IEEE Transactions on Software Engineer-
ing 99 (2010).
References [20] ROBERTSON , W., AND V IGNA , G. Static Enforcement of Web
Application Integrity Through Strong Typing. In Proceedings
[1] CodeIgniter User Guide Version 1.7.3: Input Class. http: of the USENIX Security Symposium (Montreal, Canada, August
//codeigniter.com/user_guide/libraries/ 2009).
input.html. [21] S HANKAR , U., TALWAR , K., F OSTER , J. S., AND WAGNER ,
[2] django: Built-in template tags and filters. http: D. Detecting Format String Vulnerabilities with Type Qualifiers.
//docs.djangoproject.com/en/dev/ref/ In Proceedings of the 10th USENIX Security Symposium (2001),
templates/builtins. pp. 201220.
[3] google-ctemplate. http://code.google.com/p/ [22] S UTO , L. Analyzing the Accuracy and Time Costs
google-ctemplate/. of Web Application Security Scanners, February 2010.
http://www.ntobjectives.com/files/Accuracy_
[4] perl.org glossary. http://faq.perl.org/glossary. and_Time_Costs_of_Web_App_Scanners.pdf.
html#TMTOWTDI.
[23] T HIEL , F. Personal Communication, November 2009.
[5] BAU , J., B URSZTEIN , E., G UPTA , D., AND M ITCHELL , J.
[24] WAGNER , S., J RJENS , J., KOLLER , C., T RISCHBERGER , P.,
State of the art: Automated black-box web application vulnera-
AND M NCHEN , T. U. Comparing bug finding tools with reviews
bility testing. In 2010 IEEE Symposium on Security and Privacy
and tests. In In Proc. 17th International Conference on Testing
(2010), IEEE, pp. 332345.
of Communicating Systems (TestCom05), volume 3502 of LNCS
[6] B ISHOP, M. Computer Security: Art and Science. Addison- (2005), Springer, pp. 4055.
Wesley Professional, Boston, MA, 2003.
[25] WALDEN , J., D OYLE , M., L ENHOF, R., AND M URRAY, J. Java
[7] C OHEN , J. Best Kept Secrets of Peer Code Review. Smart Bear, vs. PHP: Security Implications of Language Choice for Web Ap-
Inc., Austin, TX, 2006, p. 117. plications. In International Symposium on Engineering Secure
Software and Systems (ESSoS) (February 2010).
[8] C URPHEY, M., AND A RAUJO , R. Web application security as-
sessment tools. IEEE Security and Privacy 4 (2006), 3241. [26] W HITE H AT S ECURITY. WhiteHat Website Security Statistic Re-
port: 9th Edition, May 2010. http://www.whitehatsec.
[9] DALTON , M., KOZYRAKIS , C., AND Z ELDOVICH , N. Nemesis: com/home/resource/stats.html.
Preventing authentication & access control vulnerabilities in web
applications. In USENIX Security Symposium (2009), USENIX
Association, pp. 267282.