Open In App

How to Scrape Text from Tag in Python

Last Updated : 03 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we are going to scrape text data from <strong> tag. We will scrape all the data which comes under the strong tag of a website. We will cover all the basic understandings with clear and concise examples.

Scraping Text from Tag

Scraping text from HTML tags can be easily done by using:

In Selenium, we will simply use the By class to get all the <strong> tag data. After acquiring the data, simply use a for loop to display the data from the <strong> tag. In the second method, we need to request the HTML content from the webpage. We will use the request library to do so. After getting the data we will parse it BeautifulScoup library of Python and at the end, we will display it.

Scraping Text from Tag using Selenium

In this, we will use selenium to fetch the text data from the <strong> tag. Let's see the code implementation.

Example

  1. By class, selenium will provide us with the set of attributes that will help us locate the web elements.
  2. Chrome Configuration options:-
    • headless option will allow us to operate Chrome without GUI.
    • sandboxing : sandboxing sandboxing problems of some websites.
    • –disable-dev-shm-usage will disable /dev/shm/ file.
  3. Then, we will provide our desired website to the web driver object and display the data under the strong tag using a for loop.
  4. At the end, we will close the browser.
Python
from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

dr = webdriver.Chrome(options=options)
dr.get("https://www.geeksforgeeks.org/machine-learning-types-of-artificial-intelligence/")

st = dr.find_elements(By.TAG_NAME, 'strong')
for i in st:
  print(i.text)
dr.quit()

Output

web_scape
Selenium web scrape text

Scraping Text from Tag using BeautifulScoup

In this, we are going to scrape the test from <strong> tag using Python's request library and beautiful soup.

Example

We will first import all the installed libraries in our code. Then we will fetch the HTML content from the webpage using the requests module of Python. After getting the data we will parse it BeautifulScoup library of Python and at the end, we will display it. We will use the find_all() function to find the text under the <strong> tag. We will finally display our acquired data and exit from our defined function.

Python
#importing necessary libraries
import requests
from bs4 import BeautifulSoup

#creating a function in which we will accept the url and
#fetch the html content from the url using request and apply the parser function on it
def strongText(url):
    r = requests.get(url)

    TextData = BeautifulSoup(r.content, 'html.parser')

    st = TextData.find_all('strong')

    #displaying the data
    for data in st:
        print(data.text)
        
if __name__ == "__main__":

    #input url
    url = 'https://www.geeksforgeeks.org/machine-learning-types-of-artificial-intelligence/'  

    #function calling
    strongText(url)

Output

web_scape01
BeautifulScoup web scrape text

Best Practices of Web Scraping

  • Follow ethical rules while scraping the data. Do not scrape any sensitive or private information from any website.
  • To store the scraped data, use efficient data storage techniques such as databases or structured files such as CSV or JSON.
  • Make sure that websites do not block you. You can use randomized intervals or proxies to avoid getting blocked.
  • Prepare your code in such a way that it can handle errors like error 404(page not found).
  • Do not overload the server by making too many requests in a short period. Keep some delays between your requests.
  • Make sure you follow the guidelines suggested by the websites before scraping the data.

Conclusion

Web Scraping is an efficient way to scrape the desired data from our provided website. We can scrape text, files, links, and many more. Although, we need to consider some ethical rules before scraping the data. We have covered, how we can scrape the data from <strong> tag. We have shown two methods to perform this task. In the first method, we used selenium to scrape the data whereas, in the second method, we used BeautifulScoup along with the requests library of Python. In both of these methods, we have demonstrated a clear and concise way to scrape the data efficiently.


Next Article

Similar Reads