Chamilton
Chamilton
Colin B Hamilton
December 15, 2015
1
The benefit of a language-based approach is that which works as expected. But if
it is built in to the developer’s environment, instead a user submits a login of the form
of being a separate piece that can be ignored. It pass=blah&login=none’ OR ’1’ = ’1, the query
gives assurances (to the programmer, team, and or- becomes
ganization) that some vulnerabilities will not exist,
and it does this automatically, by default. A secure "SELECT id FROM users WHERE password =
programming language could be an invaluable tool in SHA1(’blah’) AND login = ’none’ OR ’1’ =
’1’"
improving the state of security.
which, because ’1’ = ’1’ is always true (and AND
3 Preventing Injection has higher precedence than OR), will select all users
in the database.
SQL injection is an incredibly common vulnerability, This example shows the key behind injection at-
and it has been this way essentially since it was first tacks: the attacker is trying to get their own input
discovered. A recent Vice article gives an overview of executed as part of a command. Techniques exist
the history and basics of SQL injection, and remarks to prevent such attacks, including stripping control
on the fact that it has been a common attack for over characters (like the single quote in the example) from
fifteen years.[2] The article credits Jeff Forristal with user input strings, or replacing them with escape se-
the first documentation of this vulnerability in 1998, quences so they are not interpretted as control char-
when he explained that users could “possibly piggy- acters. But what if a developer forgets to do such a
back SQL commands.” In their input they can in- thing? Ideally, the mistake could be caught before
clude an unexpected value that “forces the database the query is executed.
to do something it’s not meant to do.”[2]
General injection attacks are not limited to SQL
injection, though that is a very common form. In-
3.1 Tainted Variables
jection can take place any time user input is used to The idea of tainted variables has existed since early
create a command meant to be run by an external versions of the Perl programming language, and at
application. If input is not properly handled (which least as early 2001 in Ruby.[3, 4] The idea behind
is frequently the case), users can insert special con- tainting comes from the knowledge that user input
trol characters. These can then change the meaning can never be trusted. Therefore, any values that have
of the query from what was intended. come from a user should be considered dangerous.
As an example, suppose a webpage validates users They carry a taint that is passed on to any values
with the following code: derived from them. Specifically, if a tainted string is
concatenated with another string, the resulting string
password = request.get(’pass’)
will also be tainted.[3]
login = request.get(’login’)
query = "SELECT id FROM users WHERE password = This results in taint propegating throughout the
SHA1(’" + password + "’) AND login = ’" + program. Operations that must be secure can inspect
login + "’" parameters to see if they are tainted – and the op-
eration, perhaps, could report an error if an attempt
When input is submitted normally, eg. was made to run it on tainted data.
login=mchow01&pass=baseball01234, the query The approach of tainted variables can help devel-
becomes opers analyze a system that is already in place. Be-
cause the path of tainted data is clear, they can find
"SELECT id FROM users WHERE password = the operations that might be fed dangerous data, and
SHA1(’baseball01234’) AND login = ’mchow01’" work to secure them.
Using taint for anything more than analysis, how-
2
ever, is not likely to help. The reason is that, for This literal would be constructed into a SQLQuery
innumerable types of queries made by web applica- object, and the inputs would be properly and safely
tions, the query requires user data. For example, the integrated into the query. Note that this is not a
application may need to verify a login or search for a string; instead, everything after the assignment op-
particular term. In these cases, use of the user’s input erator is parsed according to the SQLQuery type’s
in the query would mark the entire query as tainted, specification. The input strings are placed into the
though it may be prefectly safe. There must, then, resulting data structure, which unambiguously marks
be some way of removing taint. It would make sense them as literals, so that injection is impossible.[5]
to remove the taint from a variable if it passes an The extensible nature of this system also means
analysis. that it does not rely on the language designers’ fore-
This brings us back to validation of input strings. sight. This system can be applied to any kind of
Ultimately, injection attacks work because they can structured data, including HTML, XML, JSON, and
masquerade as a normal command, so validation of command line scripts. When future languages are
the query string cannot happen after it has been con- created, this system allows for them to be integrated
structed. Hence, the developer would have to check too.[6]
tainted variables before constructing their query Though research and development on Wyvern
string. But if we still require developers to do this is still underway, mainstream languages have at-
manually, what help was the language in keeping tempted to adopt similar features. The most com-
track of taint? mon approach is through templating, which allows
This, in addition to other, more specific concerns programmers to place delimiters within their query
(such as user inputs being used in multiple types of string that are safely filled in by the API. This is the
commands), suggest that taint checking, while a good approach taken by Python’s MySQLdb module, and
tool for analysis, will not work as a general solution Java’s java.sql.
to the problem of injection. And yet, injection vulnerabilities are still common.
Template strings are still strings, and can still be con-
structed through concatenation. Templating and em-
3.2 Embedded Languages bedded languages could eventually help, but it seems
Many efforts, especially recently, have focused on em- for the time being that programmers are more com-
bedding the language of concern into the host lan- fortable sticking to what they know, namely string
guage. Commands, instead of being constructed with concatenation. As long as these concatenated strings
strings, would be special literal values that the lan- can still be executed as queries, with no oversight,
guage environment can interpret. That way, the lan- these vulnerabilities will likely remain.
guage is responsible for building and executing the
query, so it can properly deal with pieces that come 3.3 Record-Keeping Strings
from the user.
One approach to this issue is that by the Wyvern The earlier description of injection provides intuition,
programming language, currently in development.[5] but for a true foolproof prevention technique, we need
Instead of strings, Wyvern introduces what its au- a more precise definition of such an attack. In their
thors call type-specific languages – literals that allow paper “The Essence of Command Injection Attacks in
one to easily construct an object.[6] For example, a Web Applications,” Zhendong Su and Gary Wasser-
library could define a SQLQuery type that allows the man provide the first definition of an injection attack.
following syntax to construct a query: They formulate the definition based on how user in-
put strings map to the abstract syntax tree of the
let query : SQLQuery = SELECT id FROM users resulting query.[7]
WHERE password = SHA1({password}) and login To facilitate their definition, they consider the
= {login} query string not as a single, unbroken string, but
3
Figure 1: A subtree of the AST for a valid SQL query. Figure 2: A subtree of the AST in an SQL injection
Note that the user input strings correspond to specific attack. Note that with the addition of the disjunc-
subtrees (str_lit nodes, in this case). tion node and its right subtree, the user input string
crosses subtrees. No single subtree encompasses the
user’s input.
rather as a concatenation of substrings. This is, after
all, how the query is built. Some of these substrings
came from the user, while others did not. Parts that and only if there exists a substring s that came from
came from a user are enclosed in delimiters, so the a user, for which there is no node in the AST whose
specific portions of a query can be identified that descendant leaves comprise s.[7]
might be dangerous. Discovering such an attack requires checking the
To execute a query, it must be parsed and turned substrings that came from the user against all nodes
into an abstract syntax tree (AST), representing the of the AST. If there is a substring with no matches,
meaning of the query. The AST has branches based then this is an injection attack. Importantly, this ap-
on how the query is structured. Figure 1 shows an proach requires tracking user input throughout the
example of the AST of a normal SQL query. program. The paper’s solution is more specific, how-
An injection attack can either result in a valid or ever, than taint checking. The latter can only identify
an invalid AST. The latter case is of little interest, a binary state: a given string value is either tainted,
as a malformed query cannot be executed anyways. or is not. Their solution is instead to have each string
Serious potential attacks are those that do map to keep track of any substrings that originally came from
a valid AST, and so could be executed. Figure 2 users by enclosing them in delimiters.
gives an example of an AST for a successful injection This kind of behavior is difficult to integrate with
attack. existing languages, because it causes interference
The difference between the two ASTs, Su and with existing algorithms that operate on strings. In-
Wassermann argue, is that in the first, the substrings serting delimiters around inputs only truly works if
from the user correspond precisely with subtrees in the delimiters 1) cannot be removed or modified by
the AST. In the injection attack, however, they cross programmers, 2) cannot be part of the original input
subtree boundaries, essentially escaping the part of string , but 3) can be observed by code checking for
the AST they were intended to be confined to. The injection.
authors use this observation to formulate a definition Su and Wasserman’s solution is to choose delim-
of a SQL command injection attack. Essentially, their iters as a random sequence of alphabetic characters
definition states that a query is an injection attack if that are not English words. This satisfies the third
4
requirement, and the authors reason that, in the sec- Importantly, use of these modules guarantees that
ond requirement, collision is unlikely. For the first re- if all user input is obtained through the server API
quirement, they say that they suspect that program- methods, and if all queries are made through its
mers will probably not filter out alphabetic characters SQL methods, then injection is impossible. Notably,
from a user input string.[7] clients of the code can treat inputs as if they were
Their solution is based on heuristics, which is per- strings. Unfortunately, because the implementation
haps the best that can be done if the only hope is to is an addition to an existing language, instead of a
retrofit less extensible programming languages with built-in feature, it is possible to get around it, specifi-
this functionality. But it is possible to do better in cally by modifying the private variables of the object
general, by building a string class that recalls the or by casting it to a raw string. Barring these kinds
pieces that were used to build it, while supporting of abnormal interactions, security from SQL injection
normal string operations, and allowing inspection of is guaranteed.
the string to determine the pieces that came from This is primarily a proof of concept, and has not
users. been optimized for time or space efficiency. The
A proof of this concept is presented in Python, safeserver module works only with the webapp2
which was chosen because of its extensibility. server configuration, although the “go-between” code
is so short that it can easily apply to other frame-
works as well. Further, processing of SafeStrings
4 Application has not yet been applied to other queries or code
fragments that could also be subject to injection at-
As a supplement to this paper, I provide a link to
tacks (cross-site scripting, directory traversal, etc).
a set of three Python modules that can be used to
Again, however, the same techniques for writing the
write a web server, available at https://github.
SQL module could be applied to these other areas as
com/cbh66/safeserver. Alongside it is an example
well. All that is needed is a parser for the format of
server application that is immune to SQL injection
interest.
despite using unchecked string concatenation to cre-
ate its queries. Python was chosen for this implementation be-
The modules rely on a common definition for a cause it allows for operator overloading with dynamic
SafeString class. This class supports all of the same dispatch, and does not treat built-in strings any dif-
operations as a normal Python string. In addition, ferently from user-defined classes. The same cannot
SafeString objects keep a record of the components be said for many other languages, where this solu-
used to build the string, specifically those consid- tion would not work. However, this proof of concept
ered unsafe. But importantly, developers can interact is not about modifying existing languages. It is to
with such objects without knowing this; they can use demonstrate that such strings could exist as a built-
and build them like a normal string. in feature of a language.
The other two modules lie between the developer’s
code, the server code, and the database code. The
first module, safeserver, is a “go-between” for the
developer and the server framework. It simply mod-
ifies the built-in methods for retrieving user input
5 Other Vulnerabilities
for get and post parameters, returning a SafeString
instead of a normal string. The second module, Although this paper has primarily focused on pre-
safesql, is for interacting with SQL databases. It venting injection, there are many other vulnerabil-
handles SafeString parameters specially, which it ities that are common in software. Some could be
can do because SafeStrings have a record of which helped by language design, while others may require
pieces came from users. other approaches.
5
5.1 Secure Data Flow 5.2 Problems That Can’t Be Solved
While it may be less of a direct threat than injection A developer’s programming language can be a pow-
attacks, information leakage can be a risk for a system erful tool to prevent her from making mistakes. It is
that is meant to restrict access to certain categories worth noting, however, that there are security flaws
of data. Manual checking of credentials is easy for a that are unlikely to be solved by programming lan-
developer to forget, and verifying that an application guage design.
properly restricts information flow is very difficult -
any tool that would analyze it would need to know 5.2.1 Insecure Authorization and Weak
how the data is meant to be restricted. Passwords
The easiest way to restrict access to data is by giv-
ing labels to data that describe who can read and The second vulnerability in OWASP’s top ten
write to it. From there, it is easy to automate check- ranking is “Broken Authentication and Session
ing of labels when access to labeled data is attempted. Management.”[1] This does not originate in a specific
program, and so do not fall under the jurisdiction of
One implementation of data labeling is suggested
a programming language. Instead, it is based on the
in the design of a language called Laminar. The au-
design of the system. Examples of these flaws are
thors place security of resources into the language’s
exposing authentication tokens, improperly hashing
type system. They make the distinction between se-
passwords in a database, and sending authorization
crecy labels (which restrict who can read), and in-
information over insecure channels.
tegrity labels (which restrict who can write).[8]
Related to this flaw is the use of weak passwords by
Actors (ie. functions, threads, and processes) are
users. Users with weak passwords are exponentially
restricted to reading only data for which they have
more likely to have their accounts compromised, so it
the secrecy label, and writing to resources for which
is in their interest that they not be allowed to make
they have the integrity label. Some actors may be
passwords that are easily guessed or found with a
given the ability to classify or declassify information
password-cracker. Building a verification system into
by copying it with a higher or lower set of secrecy
the language environment is impractical, in part be-
labels.
cause of the performance penalty, and in part because
The important part is that these labels must be
there is not likely to be a one-size-fits-all solution.
given explicitly by the programmer. It is fairly easy
to do so: programmers enclose secure blocks with
a secure keyword, which specifies the secrecy and 5.2.2 Security Misconfiguration
integrity levels they are requesting. It is assumed The fifth vulnerability in OWASP’s top ten ranking
that this will happen only infrequently, and so will is “Security Misconfiguration.”[1] These weaknesses
not be a large burden on the programmer. Moreover, can come from forgetfulness (leaving a setting en-
it requires the author to make explicit which sections abled) or miscommunication between developers in
of the code are critical.[8] different areas. A language would be unlikely to catch
A downside of the approach presented is that, to be many of these issues (eg. open ports or web pages,
feasible in a system where processes interact with sys- extra privleges) because it could not know what was
tem resources, the operating system itself must take intended.
charge of enforcing labels. This requires the intro-
duction of system calls to establish labels of processes
5.2.3 Social Engineering
and shared resources (like files and pipes). The paper
was able to accomplish this with about 1500 lines of The nature of social engineering is manipulation and
code added to the Linux kernel.[8] While this is not deceit – using human nature to get access to a sys-
much, it is perhaps too much to expect organizations tem or its information. This can take many forms,
to change their operating system. including email phishing, dropping a dangerous USB
6
outside an office, impersonation, etc. There is not Inc, 2001. http://ruby-doc.com/docs/
likely to be a good technological defence against these ProgrammingRuby/html/taint.html
attacks until our machines are intelligent enough to
recognize them (and that might take some time). As [5] Darya Kurilova, Alex Potanin, and Jonathan
it is, the best defence is the establishment of proce- Aldrich. “Wyvern: Impacting Software Secu-
dures for these types of situations, along with proper rity via Programming Language Design”, Oc-
training of employees. tober 2014. http://www.cs.cmu.edu/~aldrich/
papers/plateau14-wyvern.pdf
[6] Cyrus Omar et al. “Safely Composable Type-
6 Conclusion Specific Languages”, http://www.cs.cmu.edu/
Programming language design has introduced pow- ~aldrich/papers/ecoop14-tsls.pdf.
erful concepts that help users design, built, and de- [7] Zhendong Su and Gary Wasserman. “The Essence
bug systems. Integrating security into the language of Command Injection Attacks in Web Applica-
is an extension of that premise. For those languages tions”, January 2006. http://web.cs.ucdavis.
that lend themselves to extensibility, adding mod- edu/~su/publications/popl06.pdf
ules could achieve this goal, as shown in the proof of
concept supplement. For the more general problem, [8] Indrajit Roy et al. “Laminar: Practical Fine-
however, such issues should be taken into account Grained Decentralized Information Flow Con-
when new languages are designed, as a core part of trol”, June 2009. http://www.cs.utexas.edu/
the language. Security concerns too often come as an users/witchel/pubs/roy09pldi.pdf
afterthought in building systems, leading to the state
of computer security today. If security is ever going
to be improved, this must change, and what better
place to start than in improving the tools we use?
References
[1] OWASP. “OWASP Top Ten Project”, (Columbia,
MD: Open Web Application Security Project).
https://www.owasp.org/index.php/Top_10_
2013-Top_10