URL handling Python modules (urllib)

Python language is used extensively for web programming. When we browser website we use the web address which is also known as URL or uniform resource locator. Python has inbuilt materials which can handle the calls to the URL as well as pass the result that comes out of visiting the URL. In this article we will see a module named as urllib. We will also see the various functions present in this module which help in getting the result from the URL.

Installing urllib

To install urllib in the python environment, we use the below command using pip.

pip install urllib

Running the above code gives us the following result −

Opening an URL

The request.urlopen method is used to visit an URL and fetch its content to the python environment.

Example

import urllib.request
address = urllib.request.urlopen('https://www.tutorialspoint.com/')
print(address.read())

Output

Running the above code gives us the following result −

b'<!DOCTYPE html>\r\n<!--[if IE 8]><html class="ie ie8"> <![endif]-->\r\n<!--[if IE 9]><html class……..
……………
……………….
new Date());\r\ngtag(\'config\', \'UA-232293-6\');\r\n</script>\r\n</body>\r\n</html>\r\n' -->

urllib.parse

We can parse the URL to check if it is a valid one or not. We can also Pass a query string to the search option. The response can be checked for its validity and we can print the entire response if it is a valid one.

Example

import urllib.request
import urllib.parse
url='https://tutorialspoint.com'
values= {'q':'python'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
print(data)
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
print(resp)
respData = resp.read()
print(respData)

Output

Running the above code gives us the following result −

b'q=python'
<http.client.HTTPResponse object at 0x00000195BF706850>
b'<!DOCTYPE html>\r\n<!--[if IE 8]><html class="ie ie8"> <![endif]…………
…………………
\r\n</script>\r\n</body>\r\n</html<\r\n' -->

urllib.parse.urlsplit

urlsplit can be used to takein an url, then split it into parts which can be used for further data manipulation. For example if we want to programmatically judge if a URL is SSL certified or not then we apply urlsplit and get the scheme value to decide. In the below example we check the different parts of the supplied URL.X

Output

import urllib.parse
url='https://tutorialspoint.com/python'
value = urllib.parse.urlsplit(url)
print(value)

Running the above code gives us the following result −

SplitResult(scheme='https', netloc='tutorialspoint.com', path='/python', query='', fragment='')