0% found this document useful (0 votes)
10 views41 pages

Lecture 2 - Web Application Mapping

The document discusses web application mapping techniques, focusing on enumeration methods to identify resources and functionality within web applications. It covers spidering, both automated and manual, as well as brute-force enumeration to discover hidden content and server misconfigurations. Additionally, it highlights the importance of analyzing user input and server-side information to enhance security assessments during penetration testing.

Uploaded by

cdtramontini2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views41 pages

Lecture 2 - Web Application Mapping

The document discusses web application mapping techniques, focusing on enumeration methods to identify resources and functionality within web applications. It covers spidering, both automated and manual, as well as brute-force enumeration to discover hidden content and server misconfigurations. Additionally, it highlights the importance of analyzing user input and server-side information to enhance security assessments during penetration testing.

Uploaded by

cdtramontini2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Web Application Mapping

INFR 4662U – Winter 2020


Garrett Hayes

Excerpts and concepts taken from the Web Application Hacker’s Handbook 2nd Edition License: Creative Commons
Stuttard & Pinto, Wiley Press
2

E n u m e ra t i n g C o n t e n t
3

Enumeration Basics

§ Enumeration refers to identifying the set of


resources and functionality that’s part of a
web application

§ This includes pages, JS files, application


logs, external resources, etc.

§ Basic enumeration can be done by simply


visiting the web application and exploring
how it works

§ Other automated and systematic approaches


exist to sus out functionality occurring behind
the scenes
4

What is Spidering?

§ Spidering refers to the use of automated


tools that identify and recursively follow links
in a web application to collect information
about its structure

§ Content without direct links can be


found using brute-force techniques that
look for common/predictable content
and page names

§ Effective spidering utilities will also parse


JS and forms to identify backend
functionality like APIs, WebSockets, etc.
5

Automated Spidering

§ Automated spidering tools can miss whole


areas of an application due to:

§ JS being used to render links and drop-


down menus not visible to the utility

§ Form submission endpoints not being


seen due to failed automatic form filling

§ AJAX-rendered pages may not show until


an action is completed by a user
(e.g. logging in)
6

Automated Spidering

§ Automated spidering tools can miss whole


areas of an application due to:

§ Random values in the URL (e.g. expiry


times) may cause the application to
spider forever

§ Some content may not be accessible by


authenticated users

§ Embedded objects like Java applets are


difficult to spider and may contain links
or consume other backend assets
7

Enumeration: robots.txt

§ In some cases, a webmaster may not want


automated spidering tools (like a GoogleBot)
to cache or crawl specific pages

§ To avoid this, administrators create a


robots.txt file in the web root that
identifies all pages that shouldn’t be
mapped

§ This file often contains sensitive


endpoints and directories not intended
to show up on Google, of which are very
interesting to an attacker
8

Manual/Directed Spidering

§ Since a variety of situations cause automated


spidering tools to fail, some pentesters will
manually explore a web application while
using an intercepting proxy to automatically
build a map of the site

§ For example, one might use BurpSuite to


automatically index all pages and
resources found while browsing

§ Two common intercepting proxies are


BurpSuite and WebScarab
9

Manual/Directed Spidering

§ Manual spidering is often superior to


automated spidering for many reasons,
including:

§ More effective identification and


following of navigation controls

§ Avoiding application actions that can


break a site (for example, calling a
backup script or backend functionality)

§ Identifying pages & resources only


available to logged-in users
10

Spidering Tool: BurpSuite


11

Enumerating Hidden Content

§ Some web application functionality is not made


visible to users through links or buttons

§ Examples:

§ A form submission triggers a backend call to


another PHP file

§ A script called backup.php zips up the


contents of a web application

§ An automation script called test.php adds a


demo user to a web app

§ Some web app functionality may not be


visible to all users
12

Enumerating Hidden Content

§ Common hidden content I’ve seen in many


pentests include:

§ Backup files or code files with extensions


like index.php.bak

§ Old versions of files/code that can still be


called (e.g. home2.php may imply
home1.php exists)

§ Exposed configuration files

§ Hidden directories used for testing/backups


that have directory indexing enabled

§ Exposed log files


13

Brute-Force Enumeration
§ In order to identify backend content not directly
visible to users, the use of automated brute
forcing utilities is paramount

§ I recommend gobuster, but there is also


a GUI version called DirBuster that ships
with Kali

§ Brute-force utilities require three inputs:


1. A good wordlist containing common
directory and file names
2. One or more known file extensions likely to
be used by the web app (e.g. .php)
3. A starting point ( / , for example )
14

GoBuster Brute-Force Attempt


ubuntu@security:~$ ./go/bin/gobuster dir -w ~/Wordlists/common.txt -s 200 -u http://xxxxxxxx.com
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url: http:// xxxxxxxx.com
[+] Threads: 10
[+] Wordlist: /home/ubuntu/Wordlists/common.txt
[+] Status codes: 200
[+] User Agent: gobuster/3.0.1
[+] Add Slash: true
[+] Timeout: 10s
===============================================================
2020/01/02 19:44:08 Starting gobuster
===============================================================
/backup/ (Status: 200)
/css/ (Status: 200)
/fonts/ (Status: 200)
/highslide/ (Status: 200)
/icons/ (Status: 200)
/images/ (Status: 200)
/js/ (Status: 200)
15

Brute-Force Results
16

Brute-Force Enumeration
§ When brute-forcing an application, each request will
return a status code

§ Some common “gotchas” for status codes include:

§ 302 often means a resource exists but you must be


logged in to access it

§ 401 & 403 means the resource exists but is not


accessible by any user

§ A 200 code for a page that would never exist (e.g.


/dassdsdads.php) indicates a redirect is occurring

§ A 400 code indicates you’re using an incorrect


extension or incorrectly formatted RESTful URL
17

Brute-Force Wordlists

§ Most web applications use common page


names and endpoint URLs, allowing us to
generate effective wordlists by crawling the
web
§ SecLists on GitHub has a lot of great
wordlists, including RobotsDisallowed-
Top1000.txt and common.txt

§ Don’t forget that a lot can vary in a web app.


You may need to:
§ Use a trailing slash when brute-forcing
directories
§ Add a specific file extension to requests
§ Filter out non 200/300 status codes
18

Inferring Web Content


§ Considering the structured nature of web apps, it’s
common to see predictable page names or RESTful
resource URLs when exploring or spidering

§ For example:

https://example.com/users/user/1

May infer the following pages exist:

https://example.com/users/user/2
https://example.com/users/
https://example.com/admins/
https://example.com/admins/user/1
https://example.com/admins/admin/1
19

Inferring File Extensions


§ Although a web app may consistently use a single
file extension, like .php for example, it’s possible
that other file extensions exist and are used for
backups, alternative versions of files, or older
versions of files

§ It makes sense to use a good wordlist and append


the following extensions when brute-forcing files:

§ .old § .tar § ~1
§ .bak § .tar.gz § .tmp
§ .backup § .zip § .temp
§ .sql § .src
§ .txt § .php5
20

Server Misconfigurations

§ Even if a web application is built securely, it is


possible that the underlying webserver is
misconfigured and leaking sensitive
information

§ Webservers can leak resources like:

§ Whole directory contents if directory


indexing is enabled

§ Users on a system, especially if user


directories are enabled
21

Directory Indexing Misconfiguration


22

User Directories Misconfiguration


Google Dork: inurl:"/~john" intext:"index of"

Note: when user directories are enabled


in Apache, users on the system that have
a public_html directory in their home
path will automatically have that
directory make public at the location
/~username

What might our next steps be to


identify additional users on the system?
23

Hidden Parameters
§ Webmasters may use custom or hidden parameters
in GET or POST requests to toggle the visibility or
functionality of a web app

§ For example, the following URLs may result in a


response with different content and lengths:

https://example.com/index.php
https://example.com/index.php?debug=1

§ A brute-force tool can be used to find hidden


parameters using:
§ Common parameter names like test, debug,
bypass, source, etc.
§ Common parameter values like 0, 1, true, false,
null
24

Discovering User Input


25

Analyzing User Input


§ In preparation for future exploitation attempts, its
crucial to identify all user input fields and actions
that can be submitted to the web application

§ User input may be present in:

§ URLs using standard GET request parameters

§ RESTful URLs between slashes

§ Cookies

§ HTTP headers

§ Out-of-band channels
26

User Input: URLs

§ Standard URLS that include GET parameters


take user input or input that directs the
functionality of the web application

§ Typical URL parameters look like:

/search.php?searchTerm=data&results=10

§ Some abnormal URL parameter styles do


exist, such as:

/process/search;searchTerm=data
/process/search?searchTerm=data$results=10
/process/searchTerm=data/search
/process/search?searchTerm=data:data2
27

User Input: RESTful URLs

§ RESTful URLs do not use standard GET parameters;


rather, data is provided inline in the URL between
slashes

§ Typical RESTful URL parameters look like:

/search/data

§ Other alternative forms exist, such as:

/search/searchTerm/data
/search/searchTerm/data/
/search/data/10
/search/data/data2/10.json

§ In the last case, output data is requested in JSON


format – it may also be possible to ask for .txt or .xml
28

User Input: Cookies

§ Cookies set by the web application may be used to


identify a user or store data temporarily for a
session

§ Cookie values may be looked up in a


database or may be used to load specific
resources

§ For example, a cookie can be used to rebuild a


shopping cart:

Cookie: cart=item676&cart=item888&discount=10

§ Or can be used to identify a user:

Cookie: username=joe.blow&authenticated=1
29

User Input: HTTP Headers

§ HTTP headers are automatically generated by client


browsers, but may be used by a web application
when directing functionality or enforcing access
control mechanisms

§ The host header, for example, indicates to the


webserver which site the request is destined for

§ The user agent header indicates the kind of client


accessing the site (e.g. Chrome vs. GoogleBot)

§ Access control headers may provide session strings or


other client-identifying data that is passed to a backend
database or system

§ The X-Forwarded-For header used by load balancers


can be manipulated to make requests look like they’re
coming from the webserver
30

User Input: OOB

§ Out-of-band (OOB) functionality refers to any


code, scripts, automation tools, or external
services used to facilitate the operations of a
web application

§ These include external resources such as: web


forms (Google forms), SMTP services like
Mailgun, fileservers, etc.

§ OOB resources can be potentially manipulated to


modify input to a web application – especially if it’s
an API

§ For example, web services may use a provider


like MailGun to automatically receive password
reset requests via email
31

S e r v e r- S i d e A n a l y s i s
32

Technique: Banner Grabbing

§ Used to glean information about computer


systems on a network and the services
running on its open ports

§ Banner grabbing helps identify the version of


software running on a remote host

§ Usually performed on: HTTP, FTP, and SMTP

§ Tools commonly used:

§ Curl, telnet, Nmap, and Netcat


33

Banner Grabbing Example


Request:
curl -I https://ontariotechu.ca

Result:
HTTP/1.1 200 OK
Date: Mon, 13 Jan 2020 20:18:25 GMT
Server: Apache/2.4.18 (Ubuntu)
Strict-Transport-Security: max-age=2600000;
Vary: Host
Content-Type: text/html; charset=UTF-8
34

Analyzing File Extensions

§ File extensions are the simplest way to identify


the underlying technology being used to render
pages
§ Keep in mind that file extensions are
arbitrary and may be modified or removed
to evade dissection

§ Common extensions include:


§ .php & .php5 for PHP applications
§ .jsp for Java server pages
§ .pl for Perl CGIs or pages
§ .py for Python CGIs or pages
§ .dll for compiled CGIs or pages (C, C++, etc.)
§ .d2w for WebSphere
35

Analyzing Error Messages


§ The simplest way to determine the underlying
framework or webserver being used is to trigger a
fault in the system that causes an error page to show

§ For example, browsing to /sadklhadlkas will


likely causes a 404, of which may show the
webserver version

§ Manipulating GET parameters may cause SQL or


other application errors, ultimately leaking additional
information

§ Examples:
https://example.com?search=’
https://example.com/users?id=-1000
36

Analyzing Directory Names

§ Predictable and standard directory naming


conventions may indicate specific technologies
are being used

§ For example, Java servlets are often served


at web paths like /server/name

§ A few other modern and common cases


include:

§ /rails/ for ruby-on-rails applications

§ /pls/ for Oracle applications and SQL


gateways
37

Analyzing Session Tokens


§ Certain session token names (present in cookies)
may indicate specific web technologies are being
used by the application:

§ Java uses JSESSIONID

§ PHP uses PHPSESSID

§ The IIS webserver uses ASPSESSIONID

§ Whereas ASP.Net uses


ASP.NET_SessionID

§ Django uses a more generic session


38

Analysis Example #1

https://wahh-app.com/calendar.jsp?name=new%20applicants
&isExpired=0&startDate=22%2F09%2F2010
&endDate=22%2F03%2F2011&OrderBy=name

Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
39

Analysis Example #2

https://wahh-app.com/workbench.aspx?template=NewBranch.tpl
&loc= /default&ver=2.31&edit=false

Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
40

Analysis Example #3

POST /feedback.php HTTP/1.1


Host: wahh-app.com
Content-Length: 389

[email protected]&[email protected]&subject=
Problem+logging+in&message=Please+help...

Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
41

Let’s break!
S e e Yo u N e x t T i m e

You might also like