Web Scaping - YL
Web Scaping - YL
Agenda
● What is HTML
● URL and Page Structure: indeed.com as an example
● Hands-On
What is HTTP?
● HTTP: HyperText Transfer Protocol
○ client/server model
○ client (browser, program, curl…) opens a connection and sends a message to an server (Nginx,
Apache,...)
○ server answers with a response and closes the connection
● Example HTTP Request Header
What is HTTP?
● Example HTTP Response Header
HTTP codes:
● 2XX for successful requests
● 3XX for redirects
● 4XX for bad requests (the most famous being 404 Not
found)
● 5XX for server errors
In case you are sending this HTTP request with your web
browser, the browser will parse the HTML code, fetch all the
eventual assets (Javascript files, CSS files, images…) and it
will render the result into the main window.
What is HTML?
<title>Business Analytics</title>
Toy Example