Antunes 2014
Antunes 2014
Penetration
Testing for
Web Services
Nuno Antunes and Marco Vieira, University
of Coimbra, Portugal
Web services are often deployed with criti- and implementation stages to avoid potential vulnerabil-
cal software security faults that open them ities, and including environment runtime mechanisms
to malicious attack. Penetration testing us- to detect and remove potential vulnerabilities and coun-
ter possible attacks during deployment and testing.4 As a
ing commercially available automated tools
response, various methods for identifying security vul-
can help avoid such faults, but new analy- nerabilities have evolved,5 including static and dynamic
sis of several popular testing tools reveals analysis, runtime anomaly detection, and penetration
significant failings in their performance. testing. The latter, which simulates multiple attempted in-
cursions of malicious values from an attacker’s perspective
using a black-box approach to reveal specific vulnerabili-
W
ties, is particularly useful for third-party testing in a Web
eb services commonly provide the strate- services environment because it doesn’t require that clients
gic vehicle for content distribution, data and providers have access to source code, as static analysis
exchange, and other critical processes and runtime anomaly detection do.
within widely adopted service-oriented Penetration testing may be undertaken manually;
architectures (SOAs).1 As with other Web applications, however, using automated tools for this process can save
Web services’ exposure, in conjunction with improper considerable time and money, and a number of such tools
software coding, can make it relatively easy for hackers to are now commercially available. The question, though, is
uncover and exploit any security vulnerabilities—often by the extent to which these tools, many regarded as state
entering into input fields values that have been specially of the art, are truly effective, both in terms of identifying
tampered with to search for vulnerabilities, a particularly a full range of potential vulnerabilities and of avoiding
vicious hacking technique known as command injection false-positive alarms. Our analysis of several widely used
(which includes SQL injection). Such attacks introduce and automated penetration testing tools, which we describe
execute commands that allow hackers to read, modify, here, suggests that their performance is far from impres-
and even destroy important information and resources, sive for Web services security testing. Researchers and
sometimes corrupting entire databases, thus opening practitioners need to be aware of these limitations in order
application providers to considerable danger.2,3 to lead the way for devising new tools and techniques to
Application vulnerability and security issues must be improve the effectiveness of vulnerability detection meth-
kept in mind over the entire software development life odologies and so insure better Web services security in
cycle—for example, applying best practices during design the future.
FEBRUARY 2014 31
Table 1. Overall number of vulnerabilities detected by an expert security team and four representative scanning tools.
the few that are free, such as WSFuzzer and WSDigger, constraints, so we refer to the four tools as VS1, VS2, VS3,
simply automate the attacking process and log pertinent and VS4, with no particular assigned order. After some ini-
responses, requiring security expertise on the user’s part tial configurations, each of these automated tools is able
as well as a great deal of time in order to examine all the to read the description file of a Web service, test it for vul-
results and manually identify vulnerabilities in the Web nerabilities, and at the end of the testing process generate
service being tested; we have not considered these for the a file reporting the vulnerabilities found (if any).
purposes of this study. The three leading commercial Web
security scanners that support Web services testing are HP Services used
WebInspect, IBM Rational AppScan, and Acunetix Web Vul- For this study, we tested a set of 25 Web services with a
nerability Scanner, all of which claim to scan for, identify, total of 101 operations. The greater part of these services
and assess Web application hacking vulnerabilities. (20) was adapted from the implementations of three stan-
The problem is that, in practice, for purposes of vulner- dard benchmarks developed by the Transaction Processing
ability identification, penetration testing tools must rely Performance Council (www.tpc.org), namely, TPC-App,
on analysis of the Web application output; lack of access TPC-C, and TPC-W. These performance benchmarks cover
to the application’s internal behavior limits their effec- Web services infrastructures, transactional systems, and
tiveness, as previous work evaluating penetration testing e-commerce. (Although TPC-C and TCP-W do not define
tools has confirmed. In one study, Web security scanners transactions in the form of Web services, they can easily
performed poorly when they analyzed complex websites be implemented and deployed as such.) The other five ser-
that had large amounts of client-side navigation code and vices have been adapted from code publicly available on
form-based wizards.8 A later study evaluated 11 security the Internet (www.planet-source-code.com). All of the ser-
scanners and reached similar conclusions, finding that vices are implemented in Java and use a relational database
none of the scanners was able to detect application-specific to store data and SQL commands for data management—
vulnerabilities.9 It is important to point out, however, that except one, which uses an XML data source. These services
these studies were performed in the context of Web ap- comprise more than 15 KLOC (thousands of lines of code),
plications, which, as noted earlier, have characteristics with an average cyclomatic complexity of 9, and were de-
different from those of Web services. This highlights the veloped by independent programmers with no knowledge
need for studies aimed at understanding the effectiveness of our particular security study.
of these tools in Web service-based environments. Performing a complete evaluation such as we proposed
required knowing in advance the existing vulnerabilities
EXPERIMENTAL STUDY: EFFECTIVENESS in each of the services tested. For this purpose, we enlisted
OF PENETRATION TESTING a security team comprised of four people with different
To better understand the effectiveness of penetration backgrounds and experience in secure development prac-
testing in a Web services environment, we conducted an tices. This team reviewed the services’ source code looking
experimental study using four Web security scanners to for vulnerabilities, while crosschecking to eliminate false
detect vulnerabilities in a defined set of services. Three positives. Their analysis determined that the 24 services
of the scanners we used—HP WebInspect, IBM Rational (98 operations) using a relational database contain 201
AppScan, and Acunetix Web Vulnerability Scanner—are SQL injection vulnerabilities, while the other service (three
widely available and considered to provide state-of-the- operations) has four XPath injection vulnerabilities. (These
art support for Web services. The fourth scanning tool we results suggest that the service developers did not pay much
used for our study is an academic prototype that imple- attention to security.)
ments an approach we have proposed elsewhere.10 For This information provided the basis for a deep analysis
purposes of this discussion, the specific brands have been of the effectiveness of the scanners. First, we performed a
masked to assure neutrality and conform with licensing false-positives analysis to determine which vulnerabilities
32 COMPUTER
Overall results Figure 1. Number of false-positive vulnerabilities reported by Web security scanning
Table 1 presents the overall re- tools compared with vulnerabilities reported by a team of security experts.
sults of our study, showing the
total number of vulnerabilities reported by each tool. As The percentage of reported false positives is very high
can be observed, each of the different penetration testing for scanners VS1 and VS2 and certainly in the high range
tools reported a different number of vulnerabilities. This for VS4: in the case of VS1 and VS2, more than half of the
is a first indicator that tools implement different forms of reported vulnerabilities do not exist, and in the case of
penetration testing and that the outputs from different tools VS4, false positives account for more than one-third of
may be difficult to compare. Another observation worth the vulnerabilities reported. This level of false positives
noting is that all four tools detected SQL injection vulnera- suggests that in many instances, software developers
bilities, but only two (VS1 and VS4) reported XPath injection may waste considerable effort “fixing” nonexistent
issues. Although the actual number of XPath-related vulner- vulnerabilities, reducing overall confidence in the
abilities is quite small in comparison to SQL injection, this tools. The number of false positives is due to the use of
discrepancy is equally true in real-world scenarios because heuristics to detect vulnerabilities by evaluating Web
Web services more often use a traditional database instead services responses; these heuristics, while facilitating
of XML solutions for storing information (as is the case in detection of vulnerabilities otherwise undetectable, also
other Web applications). This fact may also explain why cause the scanners to report a large number nonexistent
some scanning tools apparently do not include features to vulnerabilities. It is interesting to note, by the way, that the
detect XPath vulnerabilities. three XPath injection vulnerabilities reported by VS1 (two)
A key aspect of our analysis not reported in the table, and VS4 (one) all correspond to true positives.
but still worth mentioning: different tools reported VS3, on the other hand, reported no false-positive
different vulnerabilities even in the same Web services. alarms. Obviously, one factor contributing to this low
For example, although the number of SQL injection number of false positives is the relatively small number
vulnerabilities reported by VS1 is much higher than the of vulnerabilities reported by VS3 overall. This scanner
number of vulnerabilities detected by the other scanners, would seem to employ a very conservative detection ap-
the other three actually detected some vulnerabilities not proach that, although avoiding reports of false positives,
reported by VS1. We will return to this point later. also leaves many vulnerabilities undetected.
FEBRUARY 2014 33
Table 2. Detection coverage rates for penetration testing tools are not yet complete enough to execute those code
by four representative security scanners. paths. There are also situations where an undetected vul-
nerability is preceded by another very similar one, such
Scanner Coverage rate (%)
that the second can only be detected after the first has
VS1 31.22 been fixed (that is, because the first is exploited, execution
VS2 20.49 never reaches the second).
VS3 2.93
VS4 23.41 LESSONS LEARNED
Penetration testing is fundamental for deploying secure
code, particularly for consumers testing in Web services
environments where internals are inaccessible. But, as pre-
one- third of the total number of vulnerabilities. VS1 vious research and our own study have shown, currently
clearly provided the best coverage, but even so detected available commercial Web security scanning tools are not
only about 31 percent of the known vulnerabilities. VS2 entirely satisfactory for purposes of penetration testing of
and VS4 each detected fewer than a quarter of the vulner- Web applications and Web services.
abilities, with VS4 offering a slight edge—a difference that Essentially, users of automated penetration testing face
is actually significant considering that VS4 also reported two problems. First, the very high number of false positives
many fewer false positives. VS3 provided an extremely reported by available penetration testing tools reduces
low coverage rate; its conservative detection approach developer confidence regarding their precision and also
(which, as we have noted, avoids false positives) left 97 lowers the productivity of development teams who must
percent of the known vulnerabilities undetected. It is analyze vulnerabilities that in fact do not exist. Second, the
worth pointing out that although VS1 had the highest relatively limited vulnerability detection coverage provided
detection coverage, it is also the scanner reporting the by available tools inevitably means that significant num-
most false positives. bers of vulnerabilities remain undetected, a major concern
Understanding the relationship among the actual vul- for applications with the level of exposure to external en-
nerabilities reported by each scanner requires a different vironmental factors that Web services have.
type of analysis. Figure 2 shows the overlap among the sets These problems result from the intrinsic limitations
of vulnerabilities reported by each scanning tool (after re- of penetration testing as a black-box technique: it has
moving false positives), with the area of its respective circle no access to the internal behavior of the tested services
roughly proportional to the number of vulnerabilities de- and can observe an application only from the point of
tected by each tool. view of an external user. These limitations have two
Clearly, there were differences among the four tools primary ramifications:
in terms of the actual vulnerabilities reported, which
highlights the difficulty involved in selecting and using •• The fact that vulnerability detection is based solely
a single tool for penetration testing. As Figure 2 shows, on analysis of a Web service’s output leads to a lack
for example, 17 of the vulnerabilities were detected only of information for decision making. Tool capability is
by VS1, and only five of the vulnerabilities were detected restricted because the amount of information released
by all four scanners—although this low total in terms to the client is insufficient to detect vulnerabilities ef-
of overall detection coverage is obviously affected by fectively. Moreover, the fact that application output is
the low coverage of VS3. VS4 was able to detect all the often processed to limit (or prohibit) leakage of any
vulnerabilities detected by VS2 and VS3 plus six more. system information can make detecting vulnerabili-
And VS1, while detecting the most vulnerabilities overall, ties impossible (even though vulnerabilities exist and
failed to detect one vulnerability that all the other tools testing tools may effectively exploit them).
detected—particularly interesting given the very low •• The fact that the tool cannot identify all appropriate
number of vulnerabilities detected by VS3. (As far as we inputs necessary to maximize the number of Web
were able to determine, this specific vulnerability is lo- service code paths that are tested results in inad-
cated in a difficult-to-reach code path that the requests equate code coverage. Obviously, if some paths of
performed by VS1 were unable to exercise.) code are not executed during the testing process, any
Equally interesting, 140 vulnerabilities were undetected vulnerabilities that occur in these pieces of code will
by any of the four tools, emphasizing even more strongly not be detected.
the difficulty of selecting one tool to use. Manual analysis
of the services reveals that many of those undetected vul- Clearly, innovative approaches are necessary to over-
nerabilities are located in places of the code hard to reach come the present limitations of penetration testing. These
via black-box testing, and the workloads generated by the must be designed both to improve the quality of the tests
34 COMPUTER
D
constraints to increase internal visibility within the tested
Web s ervice—but without requiring access to the source espite current limitations, penetration testing will
code. Such visibility can be achieved via two different tech- continue to play an important role in evaluating
niques, one having a greater degree of intrusiveness than Web services security. It is, therefore, important
the other. The more intrusive would involve injecting the to keep improving these tools, using better workload
tested Web service with specially localized probes provid- and attackload generation techniques, and devising new
ing sensitive information about its internal behavior during mechanisms to detect vulnerabilities. Equally important
the penetration testing process; using the less intrusive is the ability to correlate information provided by differ-
method, developers could monitor all interfaces between ent types of requests (regular requests, robustness-testing
a Web service and appropriate external resources, looking requests, malicious requests, and others). Furthermore,
for information that might help to unveil vulnerabilities. penetration testing tools should be based on standardized
The assumption underlying this second method is that and consistent procedures, implementing a well-defined
the most crucial vulnerabilities manifest in the interfaces set of testing components to provide integrated support
FEBRUARY 2014 35
for detecting a maximum number of vulnerabilities and 3. Open Web Application Security Project, OWASP Top
a minimal number of false positives. A generic penetra- 10—2013, OWASP Foundation, 2013.
tion testing tool for Web services that combines all these 4. M. Howard and D.E. Leblanc, Writing Secure Code, 2nd
attributes is a goal for the future. ed., Microsoft Press, 2004.
Finally, security concerns should be a paramount con- 5. D. Stuttard and M. Pinto, The Web Application Hacker’s
sideration throughout the entire software development Handbook: Finding and Exploiting Security Flaws, 2nd
process, not just during the testing phase. Applying multi- ed., Wiley, 2011.
ple security best practices at every step to reduce potential 6. L. Richardson and S. Ruby, RESTful Web Services: Web
security problems will require that developers correctly Services for the Real World, O’Reilly Media, 2007.
use approaches and tools already at their disposal, as well 7. D.P. Freedman and G.M. Weinberg, Handbook of
as improvements in current techniques and innovative new Walkthroughs, Inspections, and Technical Reviews:
methods for security assessment. Evaluating Programs, Projects, and Products, 3rd ed.,
Dorset House, 2000.
8. M. Curphey and R. Araujo, “Web Application Security
References Assessment Tools,” IEEE Security & Privacy, vol. 4, no.
1. D.A. Chappell and T. Jewell, Java Web Services, O’Reilly 4, 2006, pp. 32–41.
Media, 2002. 9. A. Doupé, M. Cova, and G. Vigna, “Why Johnny Can’t
2. M. Vieira, N. Antunes, and H. Madeira, “Using Web Pentest: An Analysis of Black-Box Web Vulnerability
Security Scanners to Detect Vulnerabilities in Web Scanners,” Proc. 7th Int’l Conf. Detection of Intrusions
Services,” Proc. 2009 IEEE/IFIP Int’l Conf. Dependable and Malware, and Vulnerability Assessment (DIMVA 10),
Systems & Networks (DSN 09), IEEE, 2009, pp. 566–571. 2010, Springer, pp. 111–131.
10. N. Antunes and M. Vieira, “Detecting SQL Injection
Vulnerabilities in Web Services,” 4th Latin-American
Symp. Dependable Computing (LADC 09), Springer,
2009, pp. 17–24.
11. N. Laranjeiro, M. Vieira, and H. Madeira, “Improving
Web Services Robustness,” Proc. 2009 IEEE Int’l Conf.
Web Services (ICWS 09), 2009, IEEE, pp. 397–404.
Showcase Your
Multimedia Content Nuno Antunes is a PhD student in the Department of Infor-
matics Engineering at the University of Coimbra, Portugal,
on Computing Now! where he received an MSc in informatics engineering. His
research interests include methodologies and tools for de-
veloping secure Web applications and services. Antunes is
IEEE Computer Graphics and Applications a member of the IEEE Computer Society. Contact him at
seeks computer graphics-related
[email protected].
multimedia content (videos, animations,
simulations, podcasts, and so on) to
feature on its Computing Now page, Marco Vieira is an assistant professor in the Department
www.computer.org/portal/web/ of Informatics Engineering at the University of Coimbra,
computingnow/cga. Portugal. His research interests include dependability and
security benchmarking, experimental dependability eval-
If you’re interested, contact us at
uation, fault injection, software development processes,
[email protected]. All content will be
reviewed for relevance and quality. and software quality assurance. Vieira received a PhD in
computer engineering from the University of Coimbra. He
is a member of the IEEE Computer Society. Contact him at
[email protected].
36 COMPUTER