How to Scrape Text from Tag in Python
Last Updated :
03 Jul, 2024
In this article, we are going to scrape text data from <strong> tag. We will scrape all the data which comes under the strong tag of a website. We will cover all the basic understandings with clear and concise examples.
Scraping Text from Tag
Scraping text from HTML tags can be easily done by using:
In Selenium, we will simply use the By class to get all the <strong> tag data. After acquiring the data, simply use a for loop to display the data from the <strong> tag. In the second method, we need to request the HTML content from the webpage. We will use the request library to do so. After getting the data we will parse it BeautifulScoup library of Python and at the end, we will display it.
Scraping Text from Tag using Selenium
In this, we will use selenium to fetch the text data from the <strong> tag. Let's see the code implementation.
Example
- By class, selenium will provide us with the set of attributes that will help us locate the web elements.
- Chrome Configuration options:-
- headless option will allow us to operate Chrome without GUI.
- sandboxing : sandboxing sandboxing problems of some websites.
- –disable-dev-shm-usage will disable /dev/shm/ file.
- Then, we will provide our desired website to the web driver object and display the data under the strong tag using a for loop.
- At the end, we will close the browser.
Python
from selenium import webdriver
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
dr = webdriver.Chrome(options=options)
dr.get("https://www.geeksforgeeks.org/machine-learning-types-of-artificial-intelligence/")
st = dr.find_elements(By.TAG_NAME, 'strong')
for i in st:
print(i.text)
dr.quit()
Output
Selenium web scrape text Scraping Text from Tag using BeautifulScoup
In this, we are going to scrape the test from <strong> tag using Python's request library and beautiful soup.
Example
We will first import all the installed libraries in our code. Then we will fetch the HTML content from the webpage using the requests module of Python. After getting the data we will parse it BeautifulScoup library of Python and at the end, we will display it. We will use the find_all() function to find the text under the <strong> tag. We will finally display our acquired data and exit from our defined function.
Python
#importing necessary libraries
import requests
from bs4 import BeautifulSoup
#creating a function in which we will accept the url and
#fetch the html content from the url using request and apply the parser function on it
def strongText(url):
r = requests.get(url)
TextData = BeautifulSoup(r.content, 'html.parser')
st = TextData.find_all('strong')
#displaying the data
for data in st:
print(data.text)
if __name__ == "__main__":
#input url
url = 'https://www.geeksforgeeks.org/machine-learning-types-of-artificial-intelligence/'
#function calling
strongText(url)
Output
BeautifulScoup web scrape text Best Practices of Web Scraping
- Follow ethical rules while scraping the data. Do not scrape any sensitive or private information from any website.
- To store the scraped data, use efficient data storage techniques such as databases or structured files such as CSV or JSON.
- Make sure that websites do not block you. You can use randomized intervals or proxies to avoid getting blocked.
- Prepare your code in such a way that it can handle errors like error 404(page not found).
- Do not overload the server by making too many requests in a short period. Keep some delays between your requests.
- Make sure you follow the guidelines suggested by the websites before scraping the data.
Conclusion
Web Scraping is an efficient way to scrape the desired data from our provided website. We can scrape text, files, links, and many more. Although, we need to consider some ethical rules before scraping the data. We have covered, how we can scrape the data from <strong> tag. We have shown two methods to perform this task. In the first method, we used selenium to scrape the data whereas, in the second method, we used BeautifulScoup along with the requests library of Python. In both of these methods, we have demonstrated a clear and concise way to scrape the data efficiently.
Similar Reads
How to get text of a tag in selenium - Python? Selenium is a powerful tool for controlling web browsers through programs and performing browser automation. It is functional for all browsers, works on all major OS and its scripts are written in various languages i.e Python, Java, C#, etc, we will be working with Python. In this article, we will w
1 min read
How to scrape all the text from body tag using Beautifulsoup in Python? strings generator is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. One drawback of the string attribute is that it only works for tags with string inside it an
2 min read
How to Build Web scraping bot in Python In this article, we are going to see how to build a web scraping bot in Python. Web Scraping is a process of extracting data from websites. A Bot is a piece of code that will automate our task. Therefore, A web scraping bot is a program that will automatically scrape a website for data, based on our
8 min read
Parsel: How to Extract Text From HTML in Python Parsel is a Python library used for extracting data from HTML and XML documents. It provides tools for parsing, navigating, and extracting information using CSS selectors and XPath expressions. Parsel is particularly useful for web scraping tasks where you need to programmatically extract specific d
2 min read
How to Remove tags using BeautifulSoup in Python? Prerequisite- Beautifulsoup module In this article, we are going to draft a python script that removes a tag from the tree and then completely destroys it and its contents. For this, decompose() method is used which comes built into the module. Syntax: Beautifulsoup.Tag.decompose() Tag.decompose() r
2 min read