100% found this document useful (1 vote)
467 views1 page

Ruby Mechanize Cheat Sheet

The document summarizes how to use the Mechanize gem to programmatically control a web browser and interact with web pages and forms. It describes how to configure the agent, access and submit forms, navigate pages and select elements, and provides examples of common tasks like form filling, link clicking, and extracting page content.

Uploaded by

Attila Gáspár
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
467 views1 page

Ruby Mechanize Cheat Sheet

The document summarizes how to use the Mechanize gem to programmatically control a web browser and interact with web pages and forms. It describes how to configure the agent, access and submit forms, navigate pages and select elements, and provides examples of common tasks like form filling, link clicking, and extracting page content.

Uploaded by

Attila Gáspár
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

The Agent

require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new # disable keep_alive, when running into # problems with session timeouts or getting an # EOFError agent.keep_alive = false # Setting user agent: agent.user_agent = 'Friendly Mechanize Script" # Using one of the predefined user agents: # 'Mechanize', 'Mac Mozilla', 'Linux Mozilla'. # 'Windows IE 6', 'iPhone', 'Linux Konqueror', # 'Windows IE 7', 'Mac FireFox', 'Mac Safari', # 'Windows Mozilla' agent.user_agent_alias = 'Mac Safari' # To verify server certificates: # (A collection of certificates is available # here: http://curl.haxx.se/ca/ ) agent.ca_file = 'cacert.pem' # Don't follow HTTP redirects agent.redirect_ok = false # Follow refresh in meta tags agent.follow_meta_refresh = true # Enable logging require 'logger' agent.log = Logger.new('mechanize.log')
# The current page agent.page

Accessing page elements

Forms
# Submitting a form without a button: form.submit # Submitting a form with the default button form.click_button() # Submitting a form with a specific button form.click_button(form.button_with(:name => 'OK') # Form elements form.fields, form.buttons, form.file_uploads, form.radio_buttons, form.checkboxes # Form elements can be selected just like page elements # form.element(s)_with(:criteria => value) # e.g.: form.field_with(:name => 'password') form.field_with('password') form.checkboxes(:value => /view_.*/) # Field values can also be selected directly by their name form.password = 'secret' # Setting field values # field : .value = 'something' # checkbox : .(un)check / .checked = true|false # radio_button: .(un)check / .checked = true|false # file_upload : .file_name = '/tmp/upload.dat' # e.g.: form.field_with('foo').value = 'something' form.checkbox_with(:value => 'blue').uncheck form.radio_buttons[3].check # Select lists / drop down fields: form.field_with('color').option[2].select form.field_with('color').options.find{|o| o.value == 'red'}.select form.field_with('color').select_none form.field_with('color').select_all

# The HTML page content page.body # forms, links, frames page.forms, page.links, frames # Selecting by criteria follows the pattern: # page.element(s)_with(:criteria => value) # The plural form (.elements) returns an # array, the singular form (.element) the # first matching element or nil. Criteria # is an attribute symbol and value may be # a string or a regular expression. If no # criteria attributr is given, :name will # be used. e.g.: page.form_with(:name => 'formName') page.form_with('formName') page.links_with(:text => /[0-9]*/

Ruby / Mechanize
http://mechanize.rubyforge.org/mechanize/

Nokogiri
http://nokogiri.org/

Parsing the page content Hello Mechanize! Navigation/History


# load a page agent.get('http://the.internet.net') # Go back to the last page: agent.back # Follow a link by its text agent.link_with(:text => 'click me').click # Backup history, execute block and # restore history agent.transact do ... end require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new agent.get('http://rubyforge.org/') agent.page.forms.first.words = 'mechanize' agent.page.forms.first.click_button agent.page.link_with(:text => /WWW::Mechanize/).click agent.page.link_with(:text => 'Files').click links = agent.page / 'strong/a' version = links.find do |link| link['href'] =~ /shownotes.*release_id/ end.text puts "Hello Mechanize #{version}!"

# Selecting elements from the documents DOM nodes = agent.page.search('expression') nodes = agent.page / 'expression' # Selecting the first matching element or nil node = agent.page.at('expression') # 'expression' might be an XPath or CSS selector nodes = agent.page.search('//h2/a[@class="title"]') nodes = agent.page.search('.h2 a.title') # navigating the document tree: node.parent node.children # node content and attributes node.text node.inner_html node.attributes['width'] # found nodes, can be searched the same way rows = agent.page / 'table/tr' value = rows[0].at('td[@class="value"]').text

Version 2010-01-30 (c) 2010 Tobias Grimm

Creative Commons License http://creativecommons.org/licenses/by/3.0

You might also like