Web+Scraping+Cheat+Sheet+2 0
Web+Scraping+Cheat+Sheet+2 0
Cheat Sheet
Frank Andrade
Web Scraping “Siblings” are nodes with the same parent.
It’s recommended for beginners to use IDs to find
XPath
Cheat Sheet
We need to learn XPath to scrape with Selenium or
elements and if there isn't any build an XPath.
Scrapy.
Let's take a look at the HTML element syntax. Fetch the pages Let’s check some examples to locate the article,
result=requests.get("www.google.com") title, and transcript elements of the HTML code we
Tag Attribute Attribute result.status_code # get status code
End tag result.headers # get the headers used before.
name name value
import scrapy
Find an element
class ExampleSpider(scrapy.Spider):
driver.find_element(by="id", value="...") # selenium 4
allowed_domains = ['example.com'] Class
Find elements
start_urls = ['http://example.com/']
driver.find_elements(by="xpath", value="...") # selenium 4
The class is built with the data we introduced in the previous command, but the
Getting the text parse method needs to be built by us. To build it, use the functions below.
data = element.text
Finding elements
Implicit Waits To find elements in Scrapy, use the response argument from the parse method
import time
time.sleep(2) response.xpath('//tag[@AttributeName="Value"]')
title = response.xpath(‘//h1/text()’).get()
Options: Headless mode, change window size