How can BeautifulSoup be used to extract ‘href’ links from a website?

BeautifulSoup is a third party Python library that is used to parse data from web pages. It helps in web scraping, which is a process of extracting, using, and manipulating the data from different resources.

Web scraping can also be used to extract data for research purposes, understand/compare market trends, perform SEO monitoring, and so on.

The below line can be run to install BeautifulSoup on Windows −

pip install beautifulsoup4

Following is an example −

Example

from bs4 import BeautifulSoup
import requests
url = "https://en.wikipedia.org/wiki/Algorithm"
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
print("The href links are :")
for link in soup.find_all('a'):
   print(link.get('href'))

Output

The href links are :
…
https://stats.wikimedia.org/#/en.wikipedia.org
https://foundation.wikimedia.org/wiki/Cookie_statement
https://wikimediafoundation.org/
https://www.mediawiki.org/

Explanation

The required packages are imported, and aliased.
The website is defined.
The url is opened, and data is read from it.
The ‘BeautifulSoup’ function is used to extract text from the webpage.
The ‘find_all’ function is used to extract text from the webpage data.
The href links are printed on the console.