Implementation of Automata Theory in Database Engine
Implementation of Automata Theory in Database Engine
By
MujtabaRafiq (Reg# 16620)
Asad Ali (Reg# 16621)
CSC-340
Automata Theory
BS (CS) 4-A
Fall 2010
Submitted To: Asif Ali
Table of Contents
Automata 1
Simple Automaton 1
Grammars 2
Database Systems 3
Database 3
SQL 4
Relational Database 4
Select Statement........................................................................................................... 5
Aggregation ................................................................................................................. 7
Contribution…………………………………………………………………………..………………………………... 19
SQL-graph Representation……………………………………………………..………………………………... 19
References……..………………….………..………………………………….………………………….... 22
i
TABLE OF FIGURES
6. SQL-graph Representation……............................................................................................ 21
ii
Automata
Automata Theory is the study of the abstract mathematical models of machines and the problems
they are capable of solving. Automata are used to study and test the limitations of computation
and focus on the decidability and the intractability of computers. Decidability tests whether or
not a problem can be solved while intractability determines the efficiency of the computer with
respect to the increase in size or complexity of the problem. An automaton is a self operating
electronic or mechanical device. An automaton consists of a finite set of internal states which
accept and respond to an external stimulus to produce an output based on the transition rules
applied to the internal states.
Simple Automaton
Figure 1 is an example of a simple automaton. It accepts an input at s0 or the start state and
switches to a new state within the automaton based on the evaluation of the stimulus. The output
of each state determines the next state of the automaton. Figure 1, can be classified as a finite
automaton as there are a finite number of possible states (4) of the automaton. Finite automata
define a class of automata for which there exists a finite number of states, transitions and actions.
1|Page
Figure 1
The main concepts of the automata theory are alphabet, strings and languages. The alphabet may
be defined as some finite set of symbols; strings as a finite sequence of symbols derived from
this alphabet and the language as a set of strings such that all the symbols of the strings are
derived from this alphabet. In Figure 1, the alphabet is {A, B, C, D} while the strings, as shown
could be {AB}, {C D} or any combination of members of the alphabet. Alan Turing attempted to
define the boundaries of computing machines and created Turing machines which exist as the
most general form of automata. These machines could compute the same types of algorithms as
modern computers and as such, his tests and theories still apply to modern computation. Turing
machines can simulate the functionality of modern computers which implement a category of
automata called finite automata. These machines implement algorithms defined in regular
languages and grammars and are employed in the design and implementation of modern
computing languages and software solutions.
Grammars
Precisely defining natural languages has long been regarded as the primary work of linguist
Noam Chomsky. Chomsky sought to use mathematical representation to create a structured,
flexible and defined model for natural languages. One of his approaches, context-free grammars,
was later adopted for modeling computer languages where a grammar may be defined as: … A
quadruple (Σ; V; S; P), where: 1. Σ is a finite nonempty set called the terminal alphabet. The
elements of Σ are called the terminals. 2. V is a finite nonempty set disjoint from Σ. The elements
of V are called the non terminals or variables.
3. S ∈ V is a distinguished non terminal called the start symbol. 4. P is a finite set of productions
(or rules) of the form 𝛼 𝛽
where α ∈∪ V *V ∪ V * is a string of terminals and non terminals at least one non terminal and
β is a string of terminals and non terminals. [1]
There are four distinct types of grammars: type 3 or regular grammars, type 2 or context free
grammars, type 1 or context sensitive grammars and type 0 or unrestricted . A Context-free
grammar (CFG), more generally called a grammar is such that every production is represented
by the Backus-Naur form: 𝑉 𝑡 Where V is a non terminal and t is a terminal symbol from the
2|Page
language’s alphabet. Context free grammars allow for flexible syntax structures and the
definition strings in natural language. This is then converted into the strict mathematical
structures envisioned by Backus Productions in CFG are generally in one of the two approaches;
recursive inference or derivation. In recursive inference, strings of the language body variables
are concatenated with any terminals in the body. From this it is inferred that the resulting string
is in the language of the variable in the head. Derivation involves expanding the start symbol and
all subsequent strings by substituting variables with a production until the string is comprised
entirely of terminals. Derivation grammars exist as either Leftmost (LL) grammars or rightmost
LR grammars. In LL grammars, the production rules are applied on the left most variable in the
body of a production. The leftmost variable must always be expressed as terminals before
proceeding since left recursion is not permitted. The production is resolved using the lookahead
values LL(k) specified in the grammar. Most LL grammars use a value of LL(1), with localised
values to resolve ambiguities. In LR grammars, the converse takes place. The grammar is read
from left to right; however the production rules are applied from right to left such that the
rightmost variable must always be firstly resolved to a terminal. Like LL grammars, the default
lookahead value is LR(k), however, localised values can be used to resolve ambiguities. LR
grammars, are relatively easy to implement and can detect syntactic errors more readily than LL
grammars, however LL grammars are more easily written.
Database Systems
Database
Data persistence at some level has always been a requirement in programming. Early primitive
systems stored data in ad-hoc flat files. These databases were designed for purpose, and with
minimal interoperability, required a recompilation of client programs subsequent to any semantic
or syntactic changes in the database. The lack of optimization, which was inherent in these
databases, resulted from the challenges associated with their fragmentation and redundancy. A
database is a formally defined, organized collection of related data about the real world which
can be accessed, shared and manipulated by users. In order to achieve this, modern databases are
required to store data in a defined format, as prescribed by its related metadata and schema. The
data must be stored independent of programs and users while remaining fully secured and
3|Page
accessible to both. The Database Management System (DBMS), the component responsible for
this, further manages disk storage and optimization techniques for fast and efficient data
management.
Relational Database
The Relational Database (RDBMS) exists as the most common database, with its basic object
being the table or relation. Each table is structured with a number of rows (tuples) along with
columns (attributes) with values describing each row. The relational model enforces constraints
such as unique identifiers, required attributes and table referencing where a tuple 𝑁 is related to
some other tuple 𝑀 in another table. Through relational algebra, it supports the five basic
operations of union, selection, difference, projection and product. However, most RDBMSs
implement a higher level language such as SQL to allow users to express queries in a more
meaningful manner. Common RDBMSs include Oracle, Microsoft SQL Server and Postgre SQL
SQL
Structured Query Language (SQL) was developed by IBM in the mid 1970’s as means of
querying relational databases. SQL, over a short period of time, had become the de facto query
language for relational database management systems. It subsequently gained acceptance by the
American National Standards Institute and the International Organization for Standardization in
1986 and 1987 respectively. Further revisions were adopted in 1989, 1992, 1999, 2003 and 2006.
This scope of this paper is restricted to ANSI compliant SQL 1992. As a declarative language,
SQL remains confined to “what” instead of “how” when querying databases. The advantage
being simplicity as the user defines the conditions for the data set to be returned, but almost
always never provides any instructions on how the data is to be retrieved. This task is undertaken
by the database engine and varies with the Database Management Software. SQL can be
subdivided into two main parts; the Data Definition Language (DDL), and the Data Manipulation
Language (DML) for database management.
4|Page
The DDL is used to define the database and its constituent objects. It is not involved in data
manipulation and includes the commands: 𝐶𝑅𝐸𝐴𝑇𝐸,,,𝑎𝑛𝑑𝐷𝑅𝑂𝑃. DDL is restricted to database
administrators is used to manipulate the database objects and schemas and to set constraints,
relationships, indices and namespaces.
The DML is the more commonly used sub-language and is routinely used by all database users
for inserting, retrieving, manipulating data. The DML includes the commands: 𝐼𝑁𝑆𝐸𝑅𝑇,,𝐿𝐸𝑇𝐸
and the most frequently used 𝑆𝐸𝐿𝐸𝐶𝑇. This project will be focused on the translation of the
𝑆𝐸𝐿𝐸𝐶𝑇 statement subset of the DML sub-language.
Select Statement
The select statement comprises of three distinct parts: select, from and where. These parts can be
directly linked to the structure of relational algebra constructs, where the select clause is the
implementation of the projection operation. The from clause is the implementation of the
Cartesian product operation. The where clause corresponds with the selection predicate of the
relational algebra involving the attributes of the relations expressed in the where clause (13).
For two collections M and N, both having attributes {𝑎,,} a typical SQL query, selecting {𝑎,𝑏,𝑐}
from M and N joining on attribute {𝑐} would be expressed as:
The from clause is firstly evaluated, followed by the where clause and finally the select clause.
5|Page
Select Clause
The select clause contains the list of attributes or columns that will be returned from the
underlying query. The standard select statement implies a 𝑠𝑒𝑙𝑒𝑐𝑡𝑎𝑙𝑙 operation aimed at reducing
the cost overheads of eliminating duplicates from result set. The 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 function can be
optionally applied to remove duplicate tuples.
It may also contain aggregation and grouping functions that are applied to the attributes before
projection. These include numeric operators +, -, * and / as well as aggregators: sum, avg, min,
max and count. SQL allows for aggregation to be applied between attributes or a combination of
attributes and real numbers (e.g. 1.2, 2.5) or integers (e.g. 1, 2, and 3). A query, returning the
average of the sum of the attributes {a, b} of all tuples in M may be expressed as:
𝑆𝐸𝐿𝐸𝐶𝑇𝑎𝑣𝑔(𝑎+𝑏)
𝐹𝑅𝑂𝑀𝑀;
In order to return all the attributes, the asterisk symbol could be substituted for attributes names
indicating that all the attributes resulting from the from clause be returned by the query. This is
expressed as:
𝑆𝐸𝐿𝐸𝐶𝑇𝑀.∗
𝐹𝑅𝑂𝑀𝑀;
From Clause
The from clause of the select statement contains the relations to be scanned while evaluating the
underlying query expression. An example is as follows:
𝐹𝑅𝑂𝑀𝑀,𝑁
When expressed as a statement with only select and from clauses, the from clause is evaluated as
the cartesian product of all the relations. Further filtering can be achieved by explicitly defining
joins or filtering in the where clause. Nested queries are fully supported in the from clause and
like the relations, can be manipulated within the query.
6|Page
Where Clause
The where clause of the select statement is the filtering clause allowing a query to return those
tuples which are of interest to the user. It is expressed as follows:
𝑊𝐻𝐸𝑅𝐸𝑀.𝑎> 5 𝑎𝑛𝑑𝑁.𝑏<10
This clause follows the from clause and consists of filtering conditions using the logical
connectives (and, not and or) as well as string and numeric comparison operators
(<,>,=,>=,<=,<>). Both string and numeric expressions can be evaluated as well as special types
such as dates (13). Nested queries are fully supported in the where clause and like the relations,
can be manipulated within the query.
Aggregation
SQL provides for five built in aggregation functions: avg, min, max, sum and count. Aggregation
is supported across different data types including strings and numeric types. However, the
functions avg and sum can only operate on numeric data types. A query, returning the average of
the sum of the attributes {a, b} of all tuples in M may be expressed as:
𝑆𝐸𝐿𝐸𝐶𝑇𝑎𝑣𝑔(𝑀.𝑎+𝑀.𝑏)
𝐹𝑅𝑂𝑀𝑀;
This query produces a result set containing a single value for the average value. The aggregation
function is more commonly used with group by functions to produce averages based on a
common grouping of tuples instead of the entire relation. The retention of duplicate tuples will
affect the accuracy of any aggregation function executed. In relations, this maybe done on a
commonly repeated attribute, such as a sales item attribute in a sales table, the removal of any
sales item tuple based on this attribute could result in an inaccurate aggregate returned.
Duplicates however can be eliminated using the distinct function when needed. The distinct
function, while legal by the SQL BNF, effects no changes to the result of max and min and is not
permitted with the count (*) function.
7|Page
Where grouping is used along with an aggregation, further filtering can be accomplished by
means of the 𝑎𝑣𝑖𝑛𝑔 clause. This clause allows a higher level of filtering based on the values
returned and unlike the where clause, filters on groups of tuples rather than individually.
Nested Queries
SQL provides full support for the arbitrary nesting of sub-queries. A sub-query is a select-from-
where expression that is nested within another query (13). The nested query may be used in the
from clause in instead of a schema relation. The result set produced from the execution of the
nested query provides the tuples for the outer query. A query with nesting in the from clause
may be expressed as:
𝑆𝐸𝐿𝐸𝐶𝑇𝑇.𝑎,𝑇.𝑏
𝐹𝑅𝑂𝑀
(
𝑆𝐸𝐿𝐸𝐶𝑇𝑀.𝑎,𝑀.𝑏
𝐹𝑅𝑂𝑀𝑀
𝑊𝐻𝐸𝑅𝐸𝑀.𝑎>10
) 𝑎𝑠𝑇;
When nested in the 𝑓𝑟𝑜𝑚 clause as a derived relation, the nested query must be renamed using
the 𝑎𝑠 clause and the attributes may also be renamed to avoid ambiguities in the result set. This
renaming can occur within the derived query or in the select clause of the outer query.
SQL further allows for testing set membership using the 𝑖𝑛 or 𝑛𝑜𝑡𝑖𝑛 functions. This is typically
employed in the 𝑤h𝑒𝑟𝑒 clause for filtering. An example of testing for test membership is:
𝑆𝐸𝐿𝐸𝐶𝑇𝑎,𝑏
𝐹𝑅𝑂𝑀𝑁
𝑊𝐻𝐸𝑅𝐸𝑎𝑁𝑂𝑇𝐼𝑁
(
𝑆𝐸𝐿𝐸𝐶𝑇𝑎
𝐹𝑅𝑂𝑀𝑀
8|Page
𝑊𝐻𝐸𝑅𝐸𝑎>10
);
Where, for each tuple in M, its attribute is evaluated for membership in the set returned from the
sub-query. The sub-query could return a single aggregate value and as such the where clause
could evaluate this using the standard comparison operators. Such an example would be:
𝑆𝐸𝐿𝐸𝐶𝑇𝑎,𝑏
𝐹𝑅𝑂𝑀𝑁
𝑊𝐻𝐸𝑅𝐸𝑎>
(
𝑆𝐸𝐿𝐸𝐶𝑇𝑀𝐴𝑋(𝑎)
𝐹𝑅𝑂𝑀𝑀
);
Set Operators
SQL supports full set operations including union, intersection, union all and minus. SQL requires
that the relations participating in the operation must be compatible. They must have the same
number and types of attributes. The union and union all operators append both the results of both
queries to produce a result set. The union query by default eliminates duplicate tuples, however
union all may be used when needed to retain all tuples. A simple union query on all the attributes
of M and N maybe expressed as:
(𝑆𝐸𝐿𝐸𝐶𝑇∗𝐹𝑅𝑂𝑀𝑁)
𝑈𝑁𝐼𝑂𝑁
(𝑆𝐸𝐿𝐸𝐶𝑇∗𝐹𝑅𝑂𝑀𝑀);
All the set operators can be similarly used. They can both be nested or contain full SQL nested
queries.
These represent the basic structure of SQL statements.
9|Page
Data Types
IQL like SQL supports integers (e.g. 0, 1, and 2) and Boolean types (e.g. True | False) as well as
floats (e.g. 1.5, 0.012); implemented as Java primitives. It also supports strings enclosed in single
quotes (‘e.g. Jamie’) and date time objects (e.g. dt ‘2007-08-14’), both of which are represented
as Java String objects. Other data types supported include lists (e.g. 1, A, C), bags and sets
The complete list of comparison operators as translated in the where clause is listed below
>
<
<>
10 | P a g e
=
in
not in
like
not
The group by clause is moved to the head of the comprehension and the grouping co column
separated using parenthesis and commas. This is straightforward and requires simple rules, while
making no changes to the body of the comprehension. Having identified the possible areas for
attention in translating from IQL to SQL, the translator can be implemented based in these
defined rules. The approach provides a suitable development environment and language to create
an accurate and useable solution.
Supported Query Types
The translator was created to support as a large a subset as possibly from the SQL ANSI 1992
standard. The development time and effort needed, meant it was kept confined to a more focused
subset. The subset of queries supported includes: The SELECT FROM WHERE GROUP BY
structure. The select clause requires fully qualified column names of the form table.name. It
allows for aggregation over the columns returned as well as arithmetic operations involving one
or more columns. Aggregation and arithmetic functions supported are:
Sum
Count
Avg
Max
Min
+
-
*
/
The FROM clause supports the use of both tables and nested queries. Queries can be arbitrarily
nested; however nested queries with aggregate columns would return longer columns names
since they cannot be aliased. Therefore a nested query which performs some aggregation on two
11 | P a g e
columns {a, b} would have the resulting column referenced in the outer query as {a_b}. This
holds true for nested queries in the WHERE clause. Correlated queries are not supported by the
translator in both the FROM and WHERE clauses. The translator supports the standard WHERE
clause, including set membership and comparisons. However it does not support the SQL
functions “EXISTS” or “BETWEEN” as well as their negated forms. The GROUP BY clause is
supported by the translator, however aggregation is only allowed over a single column and the
HAVING clause is not supported. The translator supports the distinct function across the entire
result set, however does not allow a count (*) as is used in SQL without specifying a column.
The list below shows the general types of queries supported:
12 | P a g e
Figure 2
13 | P a g e
The database server as a finite-state automaton
Figure 3
We can rewrite the control flow as a diagram as depicted in Figure 3. Each circle is a state that
the client-server communication can be in, and the arcs between the circles depict actions that the
server performs to transform the communication from one state to the other. Note that at the end
14 | P a g e
of each arc label there is an I/O call—either a read from or a write to the client, and furthermore,
each I/O call is immediately followed by a state transition. Thus, with the information about the
state the communication is in, the server knows exactly what to do with the I/O that just
completed. Rewriting this diagram into code is easy.
Figure 4
Figure 4. State diagram of the SmallLibrary Web application, including notification services
actions that have been added to the existing system.
15 | P a g e
Various corporate bodieswhose business model completely focuses on the Weblike Google,
Yahoo, Amazon etc. have taken web interactions
to newer heights. As many enterprise applicationsdealing with sensitive financial and medical
data turn online,the security of such web applications has come underclose scrutiny.
Compromise of these applications representsa serious threat to organizations that have
deployedthem, and also to users that trust these systems to store confidential data. The potential
downtime and damages thatcould easily amount to millions of dollars have also prohibitedmany
mission critical applications, which could greatly benefit users, from going online. Hence, it is
crucial to protect these applications from targeted attacks.
However, the current state of application security leaves much to be desired. The 2002 CSI and
FBI revealed that, on a yearly basis, over half of all databases experience at least one security
breach and an average episode results in close to $4 million in losses. A recent penetration
testing study of more than 250 Web applications concluded that at least 92% of Web applications
are vulnerable to some form
of malicious intrusions [2]. Recent U.S. industry regulations such as Sarbanes-Oxley Act, try to
enforce strict security compliance by application vendors [3] and there is an urgent need to find
means of satisfying these requirements. SQL Injection Attacks (SQLIAs) constitute an important
class of attacks against web applications. SQLIAs can give attackers direct access to the database
underlying an application and allow them to leak/alter con_dential information [4] or to even
execute any malicious code [5]. There are many examples of SQLIAs with serious
consequences, and the list of victims includes high-pro_le organizations, such as Travelocity,
Tower Records, RIAA etc. [6]. The increasing number of web applications falling prey to these
attacks is alarmingly high [7] [8] [9]. In fact, SQLIAs have been included in list of top 10 threats
to web applications [10].
In this section, we present a web application that is vulnerable to a SQLIA and explain how an
attacker could exploit this vulnerability. We also discuss various other techniques that can be
employed to gain illegitimate access to systems. Consider a typical web application in which an
16 | P a g e
user on a client machine can access services provided by a web server, having a database
backend, like an online email account.
When the user enters a login and a password in the web form and presses the Submit button, an
URL is generated (http://foo.com/home.jsp?login=guest&pass=test) and sent to the web server.
The user input is interpreted by the servlet home.jsp, which then in turn builds a dynamic SQL
query, submits the query to the database and uses the response from the database to generate
HTML-pages that are sent
back to the user. Suppose query in servlet page is of form;
If the login and password as provided by the user are used, the query to be submitted to the database takes the form;
A web site that uses this servlet would be vulnerable to SQLIAs. If the user were to enter [' OR 1=1 ] and [ ] instead
of [guest] and [test], the query would take the form;
SELECT * FROM user WHERE login=' ' OR 1=1 ' AND pass=' '
The characters “- -“mark the beginning of a SQL comment, and anything beyond is ignored. The query as
interpreted by the database now has a tautology and is always satis_ed; hence returning information about all users.
Thus an attacker can bypass all authentication modules gaining unrestricted access to critical information on
the server. An SQL Injection Attack (SQLIA) is a subset of the unverified/unsanitized input
vulnerability and occurs when an attacker attempts to change the logic, semantics or syntax of a
legitimate SQL statement by inserting new SQL keywords or operators into the statement. This
definition includes, but is not limited, to attacks based on tautologies, injected additional
statements, exploiting untyped parameters, stored procedures, overly descriptive error messages,
alternate encodings, length limits, second-order injections and injection of .UNION
SELECT., .ORDER BY. And .HAVING.clauses. A detailed explanation of different forms of
SQLIAs and ways in which they can be exploited are available in the public domain [11]
[12].The widely deployed defense today is to train the programmersand web-developers about
the security implicationsof their code and to teach them corrective measures andgood
17 | P a g e
programming practices, as outlined in [13]. However,rewriting or revising the entire lot of
existing legacy code isnot an easy process and is not anancially viable option formany
organizations. Even this does not guarantee any foolproofdefense and hence we need automated
processes todetect the vulnerability and eliminate them. Various othertechniques like use of
stored procedures [14], prohibitingdisplay of database server error messages and use of escape
sequences (available in PHP as Magic Quotes) for sanitizinguser inputs are employed as a quick
signature detectionhave been proposed by IDS/IPS vendors [16], but their successis still limited
attack mechanisms [17]. It is of even greater concern thatproducts like Microsoft SQL Server etc.
provide attackersdirect access to command line shell, registry using methodslike xpcmdshell,
xpregreadetc.
Related Work
Various SQLIA detection techniques have been proposed in literature but many of them suffer in
terms of immediate usability and deployability. Many existing techniques, such as filtering,
information flow analysis, penetration testing, and defensive coding, can detect and prevent a
subset of the vulnerabilities that lead to SQLIAs. Techniques that employ input validation are
prone to a large number of false positives and yet there is no guarantee that there are no false
negatives. Safe Query Objects [18] and SQLDOM [19] use encapsulation of database queries to
provide a safe and reliable way to access databases but they require developers to learn and use a
18 | P a g e
[20] provided a radical shift in the way this problem can be approached using query
randomization [21]. However, it could be circumvented if the key used for randomization were
to be exposed.
Another popular mechanism has been static analysis of the code for vulnerabilities [22]. The
Java String Analysis library [23] provides us with a mechanism for generating models for Java
strings and can be extended to generate fairly accurate SQL-query models. JDBCChecker [24]
[25] statically checks for the type correctness of dynamically generated SQL queries. Although
these techniques are effective, they cannot capture more general forms of SQLIAs that generate
syntactically and type correct queries. The authors use automated reasoning in [26] to detect
queries, but other forms of SQLIAs still go undetected. Recently, researchers have been
exploring the use of static analysis in conjunction with runtime validation [27] to detect instances
of SQLIAs. In [28], the authors have proposed the use of parse trees to detect malicious user
input. In [29] [30], the authors have used an automaton construction technique to defend against
SQLIAs. However, these techniques still require modification of the application source code
which may not be preferable to most developers
and their organizations in general. Also there is an additional runtime analysis overhead in terms
of execution time which cannot be avoided due to the sequential nature of the analysis
techniques. Also the element of access control is not captured in such models which can, in the
theoretically worst case, still allow SQLIAs to occur for certain poor implementations of the
application.
Contribution
SQLIA detection combining static & runtime analysis with following features:
19 | P a g e
1. No code modi_cation required, simple web server patch required (SQLIAs captured by altered
data flow path).
2. Optimized runtime analysis using SQL-graphs, and SQL query validation in parallel, for
faster webpage accesses.
3. SQLIAs using access control violations in the script and different character encodings also
captured.
SQL-graph Representation
We can thus construct a SQL-FSM for each of the hotspots in the program. These data structures
now capture the semantics of the different SQL queries that are to be sent to the database at
runtime. Any user input would be compared against this template and any change in the SQL-
FSM structure would indicate a possible SQLIA.We note that running each and every query
under the scanner at runtime could be an expensive process. Given that the user input would
realistically consist of a few strings only but the number of SQL queries that get executed in a
program could be very large, we now try to optimize number of queries that need to be put under
the scanner during runtime to ensure the validity of dynamically generated queries, using a SQL-
graph.
The SQL-graph in Fig. 6 represents 4 different SQL queries in the program as nodes within a
logical boundary, and 3 different user inputs as being outside the logical boundary. If a particular
user input (I) is used in a SQL query (Q), the relationship (R) between the two nodes is indicated
20 | P a g e
by an undirected link between the 2 nodes. We now define dependencies (D) in the SQL-graph
as links that point from one
SQL query to another SQL query such that the user inputs used by the former is a proper
superset of the user inputs used by the latter. For SQL queries that use the same set of user
inputs, one of them is chosen as a representative query and is made to point to the others. We see
the dependencies
represented as directed arrows in the SQL-graph. Drawing equivalence to Code 1, Q1, Q2 and
Q3 represent the 3 different SQL queries (also the 3 different hotspots in this case), while I1 and
I2 represent the user inputs login and pass. Q4 and I3 could possibly correspond to some other
hotspot in the program not represented in the code snippet.
The concept of SQL-graph is used to reduce runtime scanning overhead by restricting the
number of queries that need to be scanned along any execution path that is taken in the program.
SQL queries that do not use user inputs are not included in the SQL-graph. Only the SQL queries
that are exposed to the user inputs in some form or the other (string manipulations included) are
included in the SQL-graph epresentation. The choice of such a representation and the resulting
bene_ts in terms of runtime verhead would be explained as part of Runtime Validation. 4.2.
Runtime Validation During runtime, the SQL queries (with the user inputs embedded) are
compared against the corresponding SQLFSMs
to check for their validity. If the user inputs cause the dynamically generated SQL queries to not
conform to the semantics of the intended SQL queries as in the SQLFSMs, then they are _agged
21 | P a g e
as SQLIAs, else they are passed through. Fig. 4 shows the case where an SQLIA is not caused
and the query is passed through. Also, it shows the second example where an SQLIA has been
caused and
hence gets rejected as a potentially malicious query. The literals along both the static SQL-FSM
and the runtime SQL-FSM, as one traverses from the Start node to the End node, should be
identical. The other check that can be enforced is that the length of the SQL-FSM chain for a
particular instance is exactly the same for the static and runtime SQL-FSMs. Thus SQLIAs
employing tautologies and injecting additional statements can be captured by this technique. The
case where alternate encodings like URL Encoding,
UTF-8 etc. are used by the attackers can also be addressed by requiring the runtime validation to
occur only after all the user input has been converted to a single encoding format as interpreted
by SQL Engine in database server.
Figure 7. Program Execution Time w/ and w/o SQLIA Protection (Sequential/Parallel Architecture)
References
[1] . Jiang, Tao; Li, Ming; Regan, Kenneth W; Bala, Ravikumar Formal Grammars and Languages.
pp. 6 - 10.
22 | P a g e
[2] WebCohort Inc., Only 10% Web applicationssecured against common hacking techniques,
http://www.imperva.com/company/news/2004-feb-02.html,2004
[3] K. Beaver, Achieving Sarbanes-Oxley compliance for Web
applications,http://www.spidynamics.com/support/whitepapers/, 2003
[4] Cesar Cerrudo, Manipulating Microsoft SQL Server UsingSQL Injection,
http://www.appsecinc.com/presentations/ Manipulating
SQL Server Using SQL Injection.pdf
[5] Steve Friedl, SQL Injection Attacks by Example,http://www.unixwiz.net/techtips/sql-
injection.html
[6] Tower Records site exposes Data, http://www.netsecurity.org/news.php?id=1593, 2002
[7] SANS ISC, @RISK: The Consensus Security Vulnerability Report,
http://www.sans.org/newsletters/risk/display.php?v=5&i=22
[8] CERT Note, http://www.kb.cert.org/vuls/id/982109, 2005
[9] Bugtraq Archive, http://seclists.org/lists/bugtraq/2005/, 2005
[10] Open Web Application Security Project (OWASPD), TopTen most critical web application
vulnerabilities, http://www.owasp.org/documentation/ topten.html, 2005
[11] C. Anley, Advanced SQL Injection in SQL Server Applications,
http://www.nextgenss.com/papers/advanced sql injection.pdf
[12] C. Anley,(more) Advanced SQL Injection, http://www.nextgenss.com/papers/more advanced
sql injection.pdf
[13] M. Howard, D. LeBlanc, Writing Secure Code, Microsoft Press,Redmond, Washington, 2nd
Edition, 2003
[14] Statements and Prepared Statements, http://g.bookpool.com/gp/0507apress/159059407X-
2367.pdf, White Paper
[15] Kevin Spett, Blind SQL Injection, http://www.spidynamics.com/whitepapers/Blind
SQLInjection.pdf, White Paper
[16] K. Mookhey, N. Burghate, Detection of SQL Injection & Cross-siteScripting Attacks,
http://www.securityfocus.com/infocus/1768
[17] O. Maor, A. Shulman, SQL Injection Signature
Evasion,http://www.imperva.com/application defense center/white papers/
sql injection signatures evasion.html, White Paper
23 | P a g e
[18] W. R. Cook, S. Rai, Safe Query Objects: Statically Typed Objects
as Remotely Executable Queries, ICSE 2005
[19] R. McClure, I. Kruger, SQL DOM: Compile Time Checking of DynamicSQL Statements,
ICSE 2005
[20] S. W. Boyd, A. D. Keromytis, SQLrand: Preventing SQL InjectionAttacks, pp. 292-304,
ACNS 2004
[21] G. S. Kc, A. D. Keromytis, V. Prevelakis, Countering Code-Injection Attacks with
Instruction-Set Randomization, CCS 2003
[22] V. B. Livshits, M. S. Lam, Finding Security Vulnerabilities in JavaApplications with Static
Analysis, USENIX Security Symp., 2005[23] A. S. Christensen, A. Moller, M. I. Schwartzbach,
Precise Analysisof String Expressions, pp. 1-18, SAS 2003
[24] C. Gould, Z. Su, P. Devanbu, JDBC Checker: A Static Analysis Toolfor SQL/JDBC
Applications, ICSE 2004
[25] C. Gould, Z. Su, P. Devanbu, Static Checking of Dynamically GeneratedQueries in
Database Applications, ICSE 2004
[26] G. Wassermann, Z. Su, An Analysis Framework for Security in WebApplications, pp. 70-78,
SAVCBS 2004
[27] Y. Huang et. al., Securing Web Application Code by Static Analysis& Runtime Protection,
WWW2004
[28] G. T. Buehrer, B. W. Weide, P. A. G. Sivilotti, Using Parse TreeValidation to Prevent SQL
Injection Attacks, SEM 2005
[29] W. G. Halfond, A. Orso, Combining Static Analysis & RuntimeMonitoring to Counter SQL-
Injection Attacks, WODA 2005
[30] W. G. Halfond, A. Orso, AMNESIA: Analysis and Monitoring forNEutralizing SQL
Injection Attacks, ASE 2005
[31] R. Vallee-Rai, E. Gagnon, L. Hendren, P. Lam, P. Pominville, V.Sundaresan, Optimizing
Java Bytecode Using the Soot Framework:Is It Feasible?, LNCS, Volume 1781, pp. 18, Jan
2000
[32] A. R. Group, Java Architecture for Bytecode Analysis
(JABA),http://www.cc.gatech.edu/aristotle/Tools/jaba.html
24 | P a g e
[33] Ke Wei, M. Muthuprasanna, S. Kothari, Eliminating SQL InjectionAttacks in Stored
Procedures, pp. 191-198, IEEE ASWEC, 2006
25 | P a g e