DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
R eview B as ic
S C R EE N S C R APE R
THEORY
THEOR
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
Define what
C ons titutes a
DIFFIC ULT C AS E
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
Demo s ome
S C R EE N S C R APE R
TR IC K S
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
Look at ideas for
LAR G E-S C ALE
DE PLOYMENT
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
S hare a
HEAR TWAR MING
MOMENT
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Agenda
S hare a
HEAR TWAR MING
MOMENT
Featuring
C A PTC HA s!
TODAY'S
AGENDA
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Goals of this Talk
Gain an understanding of some unusual (useful)
web scraping techniques
Your not going to walk away form here with
ready-made solutions
The goal is to expose you to some new ideas
that you can apply to your specific situation
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Goals of this Talk
Gain an understanding of some unusual (useful)
web scraping techniques
Your not going to walk away form here with
ready-made solutions
The goal is to expose you to some new ideas
that you can apply to your specific situation
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Goals of this Talk
Gain an understanding of some unusual (useful)
web scraping techniques
Your not going to walk away form here with
ready-made solutions
The goal is to expose you to some new ideas
that you can apply to your specific situation
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Technologies & Tools Discussed
For the purposes of this discussion,
the solutions have to meet three criteria:
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Technologies & Tools Discussed
For the purposes of this discussion,
the solutions have to meet three criteria:
#1. Completely customizable (hackable)
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Technologies & Tools Discussed
For the purposes of this discussion,
the solutions have to meet three criteria:
#1. Completely customizable (hackable)
#2. Free (or Open Source)
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Technologies & Tools Discussed
For the purposes of this discussion,
the solutions have to meet three criteria:
#1. Completely customizable (hackable)
#2. Free (or Open Source)
#3. Platform independent
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Michael Schrenk
Las Vegas, Nevada
[email protected]BIO:
Minneapolis-based bot writer, consultant & author
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Michael Schrenk
Las Vegas, Nevada
[email protected]BIO:
Minneapolis-based bot writer, consultant & author
(Soon to be) Las Vegas-based
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Michael Schrenk
Las Vegas, Nevada
[email protected]BIO:
Minneapolis-based bot writer, consultant & author
(Soon to be) Las Vegas-based
Work for clients in North America, Asia & Europe
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Michael Schrenk
Las Vegas, Nevada
[email protected]BIO:
Minneapolis-based bot writer, consultant & author
(Soon to be) Las Vegas-based
Work for clients in North America, Asia & Europe
Active in my local DEFCON group DC612
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Talk:
Introduction to Writing Spiders & Agents
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Talk:
Online Corporate Intelligence
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Talk:
The
Fabulous
Executable
Image
Exploit
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Today's Talk:
Screen Scraper Tricks
Difficult Cases
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]My book
2007, No Starch Press
San Francisco
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Traditional strategies not obsolete
Downloading, Parsing, Form submission
Authentication, Stealth, Fault tolerance, etc.
I won't spend a lot of time discussing these things
Supplement traditional
approaches with
what you learn today
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Why are Screen Scrapers Important?
Browsers (alone) are deficient
Browsers are manual, error prone & time consuming tools
Browsers do not make decisions for you
Browsers are not proactive
You won't excel by just doing what everyone else
does
Webbots & Screen scrapers offer competitive
advantages
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Why are Screen Scrapers Important?
Browsers (alone) are deficient
Browsers are manual, error prone & time consuming tools
Browsers do not make decisions for you
Browsers are not proactive
You won't excel by just doing what everyone else
does
Webbots & Screen scrapers offer competitive
advantages
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Why are Screen Scrapers Important?
Browsers (alone) are deficient
Browsers are manual, error prone & time consuming tools
Browsers do not make decisions for you
Browsers are not proactive
You won't excel by just doing what everyone else
does
Webbots & Screen scrapers offer competitive
advantages
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Why are Screen Scrapers Important?
Browsers (alone) are deficient
Browsers are manual, error prone & time consuming tools
Browsers do not make decisions for you
Browsers are not proactive
You won't excel by just doing what everyone else
does
Webbots & Screen scrapers offer competitive
advantages
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Why are Screen Scrapers Important?
Browsers (alone) are deficient
Browsers are manual, error prone & time consuming tools
Browsers do not make decisions for you
Browsers are not proactive
You won't excel by just doing what everyone else
does
Webbots & Screen scrapers offer competitive advantages
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Manage cookies
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Manage cookies
Facilitate (SSL) encryption
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Manage cookies
Facilitate (SSL) encryption
Handle server redirection
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Manage cookies
Facilitate (SSL) encryption
Handle server redirection
Hide your identity with proxies &
random timing
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Manage cookies
Facilitate (SSL) encryption
Handle server redirection
Hide your identity with proxies &
random timing
Emulate form submission
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Review of traditional
screen scraping
Download a web page
Manage cookies
Facilitate (SSL) encryption
Handle server redirection
Hide your identity with proxies &
random timing
Emulate form submission
Parse information from web
pages & take action
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Review of traditional
screen scraping
FREE DOWNLOAD
Download a web page
Manage cookies
These tasks (except proxy functions)
Facilitatecan
(SSL)
beencryption
coded with the free
Handle
server
redirection
PHP
code
libraries from my book
Hide your identity with proxies &
http://www.schrenk.com/nostarch/webbots/DSP_download.php
random timing
Emulate form submission
Parse information from web
pages & take action
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]What constitutes a difficult case?
Either by designor by accident, web pages
have become harder for webbots and screen
scrapers to use.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]What constitutes a difficult case?
Interstitial web pages
Commonly used by travel sites when there is
a long delay between a database query and a
result set.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]What constitutes a difficult case?
JavaScript
When used to dynamically modify forms
before submission
Usually solved with my book's online form
analyzer.
www.schrenk.com/nostarch/webbots/form_analyzer.php
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]What constitutes a difficult case?
JavaScript
AJAX used to populate pages
Example:
You cannot do a view source
after first page of search
results
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]What constitutes a difficult case?
Flash
When used as a navigation technique.
DHTML
When used as a navigation technique
Elaborate cookie behavior
Sequence dependent cookies
Strange JavaScript scripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]What constitutes a difficult case?
Randomly generated form element names
<input
Type
Name
= submit
=
9S8DUF9S8DUFS98DFUS9
D8FUS9D8FHNSIDJFSIDFJNW98
3FHSJEFNSKUJFNWO83FJWOSEJ
KFNSKU3FHS9A38FHIWwe832>
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]FACT: We're still tied
to the browser
Sometimes you can fool a server into
delivering simpler data formats by pretending
to be a mobile device.
Often you need to find a way to emulate
browser capability while maintaining full
control
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]FACT: We're still tied
to the browser
Sometimes you can fool a server into
delivering simpler data formats by pretending
to be a mobile device.
Often you need to find a way to emulate
browser capability while maintaining full
control
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Browser Macros
Browser plug-in
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Browser Macros
Browser plug-in
Readily available
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Browser Macros
Browser plug-in
Readily available
Solves all the
Difficult Cases
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Browser Macros
Browser plug-in
Readily available
Solves all the
Difficult Cases
Easily extended
(hacked) beyond
intended use
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Browser Macros
iMacros solves all of the
Browser plug-in
difficult
cases
because an actual
Readilybrowser
availableis used.
Solves all the
issues hacks
mentioned
A few additional
make it
screen
a serious
scraper tool.
Easily hacked
beyond intended
use
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
INSTALL
iMacros
Search for
iMacros add-on at
addons.mozilla.org
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]RECORDING
A MACRO
Once iMacros is
installed
Start the add-on
And press Record
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]RECORDING
A MACRO
Enter URL
Fill form and
press Save
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]RECORDING
A MACRO
Press Stop
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]PLAYING
A MACRO
Find the
#Current.imm macro
And press Play
Your macro will
replay!
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Switch to demo
This is a REALLY SIMPLE demo!
You need to trust me that it will also
work in a much more complex
environment (i.e. a difficult case)!
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]The Macro File (file_name.iim)
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
VERSION BUILD=6230608 RECORDER=FX
TAB T=1
URL GOTO=http://www.google.com/
URL GOTO=http://localhost/defcon17/simple_form.php
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:name CONTENT=Michael<SP>Schrenk
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:address CONTENT=1725<SP>West<SP>Lilac<SP>Drive
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:city CONTENT=Minneapolis
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:state CONTENT=MN
TAG POS=2 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=ZIP:state CONTENT=55423
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:simple_form
ATTR=NAME:save&&VALUE:Save
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]The Macro File (file_name.iim)
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
VERSION BUILD=6230608 RECORDER=FX
TAB T=1
URL GOTO=http://www.google.com/
URL GOTO=http://localhost/defcon17/simple_form.php
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:name CONTENT=Michael<SP>Schrenk
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:address CONTENT=1725<SP>West<SP>Lilac<SP>Drive
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:city CONTENT=Minneapolis
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:state CONTENT=MN
TAG POS=2 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=ZIP:state CONTENT=55423
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:simple_form
ATTR=NAME:save&&VALUE:Save
Where Tags can't be
identified (FLASH) X/Y
coordinates can be used
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Dynamic Macro Creation
Create a
macro
Template
(text file)
Run PHP
program
to convert
template
into a macro
Run the
macro
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Creating the Template File
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
VERSION BUILD=6230608 RECORDER=FX
TAB T=1
URL GOTO=http://www.google.com/
URL GOTO=http://localhost/defcon17/simple_form.php
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:name CONTENT=#_NAME_#
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:address CONTENT=#_ADDRESS_#
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:city CONTENT=#_CITY_#
TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:state CONTENT=#_STATE_#
TAG POS=2 TYPE=INPUT:TEXT FORM=NAME:simple_form
ATTR=NAME:zip CONTENT=#_ZIP_#
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:simple_form
ATTR=NAME:save&&VALUE:Save
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Substituting Variables
#01
#02
#03
#04
#05
#06
#07
#08
// Get variables (from somewhere, more on this later)
$name
= (some data)
$address = (some data)
$city
= (some data)
$state
= (some data)
$zip
= (some data)
$macro = file_get_contents(macro.proto);
$macro = str_replace(#_NAME_#, $name, $macro);
$macro = str_replace(#_ADDRESS_#, $address, $macro);
$macro = str_replace(#_CITY_#, $city, $macro);
$macro = str_replace(#_STATE_#, $state, $macro);
$macro = str_replace(#_ZIP_#, $zip, $macro);
$macro = file_put_contents(macro.imm, $macro);
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Substituting Variables
#01
#02
#03
#04
#05
#06
#07
#08
// Get variables (from somewhere, more on this later)
$name
= (some data)
$address = (some data)
$city
= (some data)
$state
= (some data)
$zip
= (some data)
$macro = file_get_contents(macro.proto);
$macro = str_replace(#_NAME_#, $name, $macro);
$macro = str_replace(#_ADDRESS_#, $address, $macro);
$macro = str_replace(#_CITY_#, $city, $macro);
$macro = str_replace(#_STATE_#, $state, $macro);
$macro = str_replace(#_ZIP_#, $zip, $macro);
$macro = file_put_contents(macro.imm, $macro);
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Substituting Variables
#01
#02
#03
#04
#05
#06
#07
#08
// Get variables (from somewhere, more on this later)
$name
= (some data)
$address = (some data)
$city
= (some data)
$state
= (some data)
$zip
= (some data)
$macro = file_get_contents(macro.proto);
$macro = str_replace(#_NAME_#, $name, $macro);
$macro = str_replace(#_ADDRESS_#, $address, $macro);
$macro = str_replace(#_CITY_#, $city, $macro);
$macro = str_replace(#_STATE_#, $state, $macro);
$macro = str_replace(#_ZIP_#, $zip, $macro);
$macro = file_put_contents(macro.imm, $macro);
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Write the Dynamic Macro file
#01
#02
#03
#04
#05
#06
#07
#08
// Get variables (from somewhere, more on this later)
$name
= (some data)
$address = (some data)
$city
= (some data)
$state
= (some data)
$zip
= (some data)
$macro = file_get_contents(macro.proto);
$macro = str_replace(#_NAME_#, $name, $macro);
$macro = str_replace(#_ADDRESS_#, $address, $macro);
$macro = str_replace(#_CITY_#, $city, $macro);
$macro = str_replace(#_STATE_#, $state, $macro);
$macro = str_replace(#_ZIP_#, $zip, $macro);
$macro = file_put_contents(macro.imm, $macro);
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Write the Dynamic Macro file
#01
#02
#03
#04
#05
#06
#07
#08
// Get variables (from somewhere, more on this later)
$name
= (some data)
$address = (some data)
$city
= (some data)
$state
= (some data)
$zip
= (some data)
$macro = file_get_contents(macro.proto);
$macro = str_replace(#_NAME_#, $name, $macro);
$macro1.=Program
str_replace(#_ADDRESS_#,
form field values $address, $macro);
$macro = str_replace(#_CITY_#, $city, $macro);
2. Change the website URL
$macro = str_replace(#_STATE_#, $state, $macro);
delay times
$macro3.=Change
str_replace(#_ZIP_#,
$zip, $macro);
$macro4.=Change
file_put_contents(macro.proto,
$macro);
destination files
Use this substitution
technique to dynamically:
5. Change status message values
6. Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Write the Dynamic Macro file
#01
// Get variables (from somewhere, more on this later)
$name
= (some data)
$address = (some data)
$city
= (some data)
$state
= (some data)
$zip
= (some data)
$macro = file_get_contents(macro.proto);
$macro = str_replace(#_NAME_#, $name, $macro);
1. Create
loops
$macro
= str_replace(#_ADDRESS_#,
$address, $macro);
$macro
= str_replace(#_CITY_#,
$city, $macro);
2. Change
data sources
$macro
= str_replace(#_STATE_#,
$state,
$macro);
3.
Send
status
messages
to
central
server
$macro = str_replace(#_ZIP_#, $zip, $macro);
4. Etc.,
etc.
$macro
= etc.,
file_put_contents(macro.proto,
$macro);
Use the programmability to:
#02
#03
#04
#05
#06
#07
#08
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Launching iMacros (macro) from PHP
#01 <?php
#02 if($os=="linux")
#03
{
#04
system("firefox http://www.google.com" );
#05
sleep(5);
#06
system("firefox http://run.imacros.net/?
m=macro_name.iim");
#07
}
#08 else
#09
{
#10
system("start /B firefox http://run.imacros.net/?
m=macro_name.iim");
#11
}
#12 ?>
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Launching iMacros (macro) in a cron
I've had better luck launching iMacros (as a
scheduled task) as a batch file (Windows) or a BASH
file (Linux)
If scheduled on a Linux system, remember to specify
a video output.
Display =:0 php /pathname/php_program.php
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Launching iMacros (macro) in a cron
I've had better luck launching iMacros (as a
scheduled task) as a batch file (Windows) or a BASH
file (Linux)
If scheduled on a Linux system, remember to specify
a video output.
Display =:0 php /pathname/php_program.php
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]iMacros Hints
Always dedicate a browser for iMacros use.
If you don't use the commercial version of iMacros,
use Firefox.
Make sure that iMacros is activated in the browser
before launching a macro
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]iMacros Hints
Always dedicate a browser for iMacros use.
If you don't use the commercial version of iMacros,
use Firefox.
Make sure that iMacros is activated in the browser
before launching a macro
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]iMacros Hints
Always dedicate a browser for iMacros use.
If you don't use the commercial version of iMacros,
use Firefox.
Make sure that iMacros is activated in the browser
before launching a macro
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Preferred iMaco Header commands
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
A complete iMacros
command reference
Is available at:
wiki.imacros.net/Command_Reference
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Let's look at where the data can come from
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Central
Server
Target
Website(s)
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Let's look at where the data can come from
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Periodically asks
for instructions
Central
Server
Target
Website(s)
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Let's look at where the data can come from
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Target
Website(s)
Periodically asks
for instructions
Tells Harvester
what to do
Central
Server
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Let's look at where the data can come from
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Target
Website(s)
iMacros Macro
1. Request data
2. Save Screens
3. Parse results
Periodically asks
for instructions
Tells Harvester
what to do
Central
Server
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Let's look at where the data can come from
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Target
Website(s)
iMacros Macro
1. Request data
2. Save Screens
3. Parse results
Periodically asks
for instructions
Tells Harvester
what to do
Update central
server
Central
Server
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Large scale deployment
(challenges traditional thoughts regarding hosting)
Harvester
Website
requests
Target
Website(s)
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Raw
websites
Harvester
Harvester
Harvester
Harvester
Instructions
or software
updates
Central
Server
Harvester
Harvester
Harvester
Data and/or
scraping
status
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
First example was a very straight forward iMacros
example
iMacros also some JavaScript-like scripting compatibility
(in the paid version)
iMacros has limited parsing and data extraction
capability
While solving many problems--without further hacking,
iMacros leaves you with many (or most) browser
limitations.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
First example was a very straight forward iMacros
example
iMacros also some JavaScript-like scripting compatibility
(in the paid version)
iMacros has limited parsing and data extraction
capability
While solving many problems--without further hacking,
iMacros leaves you with many (or most) browser
limitations.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
First example was a very straight forward iMacros
example
iMacros also some JavaScript-like scripting compatibility
(in the paid version)
iMacros has limited parsing and data extraction
capability
While solving many problems--without further hacking,
iMacros leaves you with many (or most) browser
limitations.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
First example was a very straight forward iMacros
example
iMacros also some JavaScript-like scripting compatibility
(in the paid version)
iMacros has limited parsing and data extraction
capability
While solving many problems--without further hacking,
iMacros leaves you with many (or most) browser
limitations.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
Suppose you could execute
an iMacros macro in
one browser tab...
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
And then open another
browser tab to act on the
data iMacros downloaded
and
Parse data
Read/Write to a database
Pass data back to the iMacros macro
Or, anything else
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
Let's finish our first
example.
When we get to
this point:
Create a 2nd tab
Launch a local
php program in
Apache
Parse the web
page
Return the
access code
Complete the
form submission
in the original
tab
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
Let's finish our first
example.
When we get to
this point:
Create a 2nd tab
Launch a local
php program in
Apache
Parse the web
page
Return the
access code
Complete the
form submission
in the original
tab
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
Let's finish our first
example.
When we get to
this point:
Create a 2nd tab
Launch a local
php program in
Apache
Parse the web
page
Return the
access code
Complete the
form submission
in the original
tab
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
Let's finish our first
example.
When we get to
this point:
Create a 2nd tab
Launch a local
php program in
Apache
Parse the web
page
Return the
access code
Complete the
form submission
in the original
tab
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Advanced iMacros Hacks
Let's finish our first
example.
When we get to
this point::
Create a 2nd tab
Launch a local
php program in
Apache
Parse the web
page
Return the
access code
Complete the
form submission
in the original
tab
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Switch to demo #2
You need to trust me that it will also work in a
more complex environment (i.e. a difficult case)!
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]This code was added to the original iMacros macro
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'# SAVE A COPY OF THE WEBPAGE TO FILE SYSTEM
SAVEAS TYPE=HTM FOLDER=* FILE=PARSE_FILE.html
'# OPEN A NEW TAB FOR THE PARSING SOFTWARE
TAB OPEN
TAB T=2
URL GOTO=http://localhost/defcon17/simple_parse.php
'
'# READ THE PARSED RESULTS
TAB T=1
CMDLINE !DATASOURCE data.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_LINE {{!LOOP}}
TAG POS=1 TYPE=INPUT:TEXT
Saves a copy of the screen
FORM=NAME:simple_form
data to a file in the
ATTR=NAME:access_code CONTENT={{!COL1}}
WAIT SECONDS=5
/iMacros/Downloads directory.
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:simple_form
ATTR=NAME:save&&VALUE:Save
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]This code was added to the original iMacros macro
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'# SAVE A COPY OF THE WEBPAGE TO FILE SYSTEM
SAVEAS TYPE=HTM FOLDER=* FILE=PARSE_FILE.html
'# OPEN A NEW TAB FOR THE PARSING SOFTWARE
TAB OPEN
TAB T=2
URL GOTO=http://localhost/defcon17/simple_parse.php
'
'# READ THE PARSED RESULTS
TAB T=1
CMDLINE !DATASOURCE data.csv
Opens the second tab
SET !DATASOURCE_COLUMNS
1
Loads and
SET !DATASOURCE_LINE
{{!LOOP}}
runs the file simple_parse.php
TAG POS=1 TYPE=INPUT:TEXT
on a local installation of Apache
FORM=NAME:simple_form
ATTR=NAME:access_code CONTENT={{!COL1}}
WAIT SECONDS=5
This program
TAG POS=1 TYPE=INPUT:SUBMIT
FORM=NAME:simple_form
Reads the previously
stored file
ATTR=NAME:save&&VALUE:Save
Parses the access code
Stores it in a iMacros (CSV) data file
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]This code was added to the original iMacros macro
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'# SAVE A COPY OF THE WEBPAGE
TO FILE SYSTEM
Return to first tab
SAVEAS TYPE=HTM FOLDER=* FILE=PARSE_FILE.html
Read (CSV) data file
'# OPEN A NEW TAB FOR THE PARSING
SOFTWARE
Insert data into form
TAB OPEN
TAB T=2
URL GOTO=http://localhost/defcon17/simple_parse.php
This is a simplified example, can also employ
'
loops (CSV rows) and many more data fields
'# READ THE PARSED RESULTS (CSV columns)
TAB T=1
CMDLINE !DATASOURCE data.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_LINE {{!LOOP}}
TAG POS=1 TYPE=INPUT:TEXT
FORM=NAME:simple_form
ATTR=NAME:access_code CONTENT={{!COL1}}
WAIT SECONDS=5
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:simple_form
ATTR=NAME:save&&VALUE:Save
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]This code was added to the original iMacros macro
#01
#02
#03
#04
#05
#06
#07
#08
#09
#10
#11
#12
#13
#14
#15
'# SAVE A COPY OF THE WEBPAGE TO FILE SYSTEM
SAVEAS TYPE=HTM FOLDER=* FILE=PARSE_FILE.html
'# OPEN A NEW TAB FOR THE PARSING SOFTWARE
TAB OPEN
TAB T=2
URL GOTO=http://localhost/defcon17/simple_parse.php
'
'# READ THE PARSED RESULTS
TAB T=1
CMDLINE !DATASOURCE data.csv
SET !DATASOURCE_COLUMNS 1Submit form
SET !DATASOURCE_LINE {{!LOOP}}
TAG POS=1 TYPE=INPUT:TEXT
FORM=NAME:simple_form
ATTR=NAME:access_code CONTENT={{!COL1}}
WAIT SECONDS=5
TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:simple_form
ATTR=NAME:save&&VALUE:Save
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Using additional tabs to run local programs
facilitates advanced features not possible in
traditional iMacros configurations
Interrupted macros
Parse data from pages and act on results
Interface with local peripherals
Change proxy settings
Aggregate data from multiple websites
Aggregate services from multiple websites
Upload data in mid-macro
Etc., etc., etc.
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Heartwarming moment
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA
250 million CAPTCHAS executed daily
Free CAPTCHA service
30 million of these CAPTCHAS are solved daily
CAPTCHA words are scanned from old manuscripts
Solved CAPTCHAS actually digitize manuscripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA
250 million CAPTCHAS executed daily
Free CAPTCHA service
30 million of these CAPTCHAS are solved daily
CAPTCHA words are scanned from old manuscripts
Solved CAPTCHAS actually digitize manuscripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA
250 million CAPTCHAS executed daily
Free CAPTCHA service
30 million of these CAPTCHAS are solved daily
CAPTCHA words are scanned from old manuscripts
Solved CAPTCHAS actually digitize manuscripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA
250 million CAPTCHAS executed daily
Free CAPTCHA service
30 million of these CAPTCHAS are solved daily
CAPTCHA words are scanned from old manuscripts
Solved CAPTCHAS actually digitize manuscripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA
250 million CAPTCHAS executed daily
Free CAPTCHA service
30 million of these CAPTCHAS are solved daily
CAPTCHA words are scanned from old manuscripts
Solved CAPTCHAS actually digitize manuscripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA
250 million CAPTCHAS executed daily
Free CAPTCHA service
30 million of these CAPTCHAS are solved daily
CAPTCHA words are scanned from old manuscripts
Solved CAPTCHAS actually digitize manuscripts
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]ReCAPTCHA Digitizing Success
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]CAPTCHA Solving Services (APIs)
There are services
(APIs)
that solve
CAPTCHAs
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]CAPTCHA Solving Services (APIs)
There are services
(APIs)
that solve
CAPTCHAs
Unlike OCR
these are solved
by REAL people
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]CAPTCHA Solving Services (APIs)
There are services
(APIs)
that solve
CAPTCHAs
Unlike OCR
these are solved
by REAL people
Do a quick
Google search
for details
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
Las Vegas, Nevada
[email protected]DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
CAPTCHA
SOLVED
BY HUMAN
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
CAPTCHA
IMAGE SENT
TO SERVICE
CAPTCHA
SOLVED
BY HUMAN
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
TEXT IS
ENTERED
IN CAPTCHA
TEXTBOX
CAPTCHA
SOLVED
BY HUMAN
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
CAPTCHA
SOLVED
BY HUMAN
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
TEXT IS
ENTERED
IN CAPTCHA
TEXTBOX
CAPTCHA SOLVED!
(Unintentional
Consequences)
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]Heartwarming moment
A FEEL GOOD WIN-WIN SITUATION!
There are CAPTCHA solving services
CAPTCHA
CAPTCHA
SPAMMERS
PAY
TO
DIGITIZE
CAPTCHA
DISPLAYED
IMAGE SENT
SOLVED
ON
TO SERVICE
BY HUMAN
WEB PAGE OLD DOCUMENTS
CAPTCHHA SOLVED!
PEOPLE IN DEVELOPING
(Unintentional
Consequences)
NATIONS HAVE JOBS
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
TEXT IS
ENTERED
IN CAPTCHA
TEXTBOX
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Las Vegas, Nevada
[email protected]In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
DEFCON XVII July 31-Aug 2, 2009
Screen Scraper Tricks: Difficult cases
Thank you!
Questions?
www.schrenk.com
[email protected]
twitter.com/mgschrenk
Las Vegas, Nevada
[email protected]