Google Dorks: Analysis,
Creation, and new Defenses
Flavio Toffalini, University of Verona, IT,
[email protected]Maurizio Abb, LastLine, UK,
[email protected]Damiano Carra, University of Verona, IT,
[email protected]Davide Balzarotti, Eurecom, FR,
[email protected]GOOGLE DORKS
MOTIVATION
Attackers use Dorks to quickly locate targets
After a new vulnerability is disclosed, one Google query is
sufficient to identify a large amount of vulnerable installations
No time for sysadmins to apply patches !!
MOTIVATION
Attackers use Dorks to quickly locate targets
After a new vulnerability is disclosed, one Google query is
sufficient to identify a large amount of vulnerable installations
No time for sysadmins to apply patches !!
If we could prevent dorks, attackers would need to
resort to Internet scanning which is several orders
of magnitude slower
GOALS
Current practices
Understand which information is used by existing dorks
Design simple solutions to defeat those dorks
Future threats
Test if attackers could move towards new styles of dorks
Design simple solutions to prevent it
GOOGLE DORKS
TAXONOMY
The Exploit-DB database contains over 5143 dorks
Automated/manual analysis
URL Patterns
File Extensions
Content-Based
(44%)
(6%)
(74%)
TAXONOMY
The Exploit-DB database contains over 5143 dorks
Automated/manual analysis
URL Patterns
File Extensions
(44%)
(6%)
Content-Based
Banners
Misconfigurations
Error messages
Common words
(54%)
(8%)
(1%)
(11%)
DORKS EVOLUTION BY CATEGORY
URL Patterns
Misconfiguration
Banner
Common words
10
KNOWN DEFENSES
URL Patterns
File Extensions
Content-Based
Banners
Misconfigurations
improve system configuration
Error messages
proper error handling
Common words
11
remove banners
CONTRIBUTION
URL Patterns
??
File Extensions
Content-Based
Banners
remove banners
Misconfigurations
improve system configuration
Error messages
proper error handling
Common words
??
12
URL-DORKS
Force search engines to index randomized URLs
Let the users navigate and share using cleartext URLs
http://www.web-site.com/wp-content/dimva.html
http://www.web-site.com/HD12DAF35TR/dimva.html
13
URL-DORKS
XOR (part of) URLs with random seed kept in the server
a = resource a
O(a) = obfuscated resource a
Redirect 301 to inform search engine that the page is moved
Canonical URL Tag to delete plain URLs in the results
Intercept and replace SiteMap
14
OBFUSCATION PROTOCOL - CRAWLERS
Crawler
URL Obfuscator
Web Site
a
Redir. 301 to O(a)
O(a)
resp. of a + canonical tag
15
a
resp. of a
OBFUSCATION PROTOCOL - BROWSER
URL Obfuscator
Browser
O(a)
resp. of a
resp. of b
16
Web Site
a
resp. of a
b
resp. of b
URL Patterns
File Extensions
Content-Based
Banners
remove banners
Misconfigurations
improve system configuration
Error messages
proper error handling
Common words
??
17
WORD-BASED DORKS
Goal
Using words left by CMSs to create a Google Dork
Greedy search algorithm to maximizes
Hit-rank: percentage of web site made by a target technology
Coverage: number of entries extracted by the Dork
18
WORD-BASED DORKS: CREATION
Joomla!
19
WORD-BASED DORKS: CREATION
Categories Buy
Recent
Register
Submit
Users Contact Registration
Vanilla
installation
List
Compute hit rank
& coverage
20
Category +
Submit +
....
WORD-BASED DORKS: CREATION
Gradient Ascent algorithm
How to add a new word?
22
At each step, we add the word that provides the highest hit
rank between the ones that have a coverage above the
median of all candidate words
(more details in the paper)
WORD-BASED DORKS:
Common Words
WordPress
Joomla!
Drupal
Magento
OpenCart
24
Ground Truth
938/1000
967/1000
47.1 M
83.6 M
878/1000
887/1000
7.24 M
3.73 M
827/1000
997/1000
7.87 M
3.27 M
871/1000
852/1000
0.39 M
0.68 M
891/1000
998/1000
0.59 M
1.42 M
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
WORD-BASED DORKS:
Common Words
WordPress
Joomla!
Drupal
Magento
OpenCart
25
Ground Truth
938/1000
967/1000
47.1 M
83.6 M
878/1000
887/1000
7.24 M
3.73 M
827/1000
997/1000
7.87 M
3.27 M
871/1000
852/1000
0.39 M
0.68 M
891/1000
998/1000
0.59 M
1.42 M
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
WORD-BASED DORKS:
Common Words
WordPress
Joomla!
Drupal
Magento
OpenCart
26
Ground Truth
938/1000
967/1000
47.1 M
83.6 M
878/1000
887/1000
7.24 M
3.73 M
827/1000
997/1000
7.87 M
3.27 M
871/1000
852/1000
0.39 M
0.68 M
891/1000
998/1000
0.59 M
1.42 M
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
Hit rank
Coverage
WORD-BASED DORKS: DEFENSES
Idea: add invisible characters to break words and
prevent them to be indexed.
Powered by WordPress
Power⁣ed b⁣y Wor⁣dPress
29
DORKS DEFENSES
URL Patterns
File Extensions
Content-Based
Banners
Misconfigurations
improve system configuration
Error messages
proper error handling
Common words
30
remove banners
CONCLUSION
1) Dork classification
2) URL Pattern Dork Defense
3) New type of Dork using common words
4) Defense against common word dorks
31