Creating A Web Crawler in 3 Steps: Issac Goldstand Mirimar Networks
Creating A Web Crawler in 3 Steps: Issac Goldstand Mirimar Networks
Steps
Issac Goldstand
[email protected]
Mirimar Networks
http://www.mirimar.net/
The 3 steps
• Creating the User Agent
• Creating the content parser
• Tying it together
Step 1 – Creating the User Agent
• Lib-WWW Perl (LWP)
• OO interface to creating user agents for
interacting with remote websites and web
applications
• We will look at LWP::RobotUA
Creating the LWP Object
• User agent
• Cookie jar
• Timeout
Robot UA extras
• Robot rules
• Delay
• use_sleep
Implementation of Step 1
use LWP::RobotUA;
my $ua=LWP::RobotUA->new('MyBot/1.0', \
'[email protected]');