Extract JSON from HTML using BeautifulSoup in Python Last Updated : 23 Jul, 2025 Comments Improve Suggest changes 4 Likes Like Report In this article, we are going to extract JSON from HTML using BeautifulSoup in Python. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.pip install bs4requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.pip install requests Approach: Import all the required modules.Pass the URL in the get function(UDF) so that it will pass a GET request to a URL, and it will return a response. Syntax: requests.get(url, args) Now Parse the HTML content using bs4. Syntax: BeautifulSoup(page.text, 'html.parser') Parameters: page.text : It is the raw HTML content.html.parser : Specifying the HTML parser we want to use.Now get all the required data with find() function. Now find the customer list with li, a, p tag where some unique class or id. You can open the webpage in the browser and inspect the relevant element by pressing right-click as shown in the figure. Create a Json file and use json.dump() method to convert python objects into appropriate JSON objects. Below is the full implementation: Python3 # Import the required modules import requests from bs4 import BeautifulSoup import json # Function will return a list of dictionaries # each containing information of books. def json_from_html_using_bs4(base_url): # requests.get(url) returns a response that is saved # in a response object called page. page = requests.get(base_url) # page.text gives us access to the web data in text # format, we pass it as an argument to BeautifulSoup # along with the html.parser which will create a # parsed tree in soup. soup = BeautifulSoup(page.text, "html.parser") # soup.find_all finds the div's, all having the same # class "col-xs-6 col-sm-4 col-md-3 col-lg-3" that is # stored in books books = soup.find_all( 'li', attrs={'class': 'col-xs-6 col-sm-4 col-md-3 col-lg-3'}) # Initialise the required variables star = ['One', 'Two', 'Three', 'Four', 'Five'] res, book_no = [], 1 # Iterate books classand check for the given tags # to get the information of each books. for book in books: # Title of book in <img> tag with "alt" key. title = book.find('img')['alt'] # Link of book in <a> tag with "href" key link = base_url[:37] + book.find('a')['href'] # Rating of book from <p> tag for index in range(5): find_stars = book.find( 'p', attrs={'class': 'star-rating ' + star[index]}) # Check which star-rating class is not # returning None and then break the loop if find_stars is not None: stars = star[index] + " out of 5" break # Price of book from <p> tag in price_color class price = book.find('p', attrs={'class': 'price_color' }).text # Stock Status of book from <p> tag in # instock availability class. instock = book.find('p', attrs={'class': 'instock availability'}).text.strip() # Create a dictionary with the above book information data = {'book no': str(book_no), 'title': title, 'rating': stars, 'price': price, 'link': link, 'stock': instock} # Append the dictionary to the list res.append(data) book_no += 1 return res # Main Function if __name__ == "__main__": # Enter the url of website base_url = "https://books.toscrape.com/catalogue/page-1.html" # Function will return a list of dictionaries res = json_from_html_using_bs4(base_url) # Convert the python objects into json object and export # it to books.json file. with open('books.json', 'w', encoding='latin-1') as f: json.dump(res, f, indent=8, ensure_ascii=False) print("Created Json File") Output: Created Json File Our JSON file output: Create Quiz Comment A anilabhadatta Follow 4 Improve A anilabhadatta Follow 4 Improve Article Tags : Python Python BeautifulSoup Python bs4-Exercises Explore Python FundamentalsPython Introduction 2 min read Input and Output in Python 4 min read Python Variables 4 min read Python Operators 4 min read Python Keywords 2 min read Python Data Types 8 min read Conditional Statements in Python 3 min read Loops in Python - For, While and Nested Loops 5 min read Python Functions 5 min read Recursion in Python 4 min read Python Lambda Functions 5 min read Python Data StructuresPython String 5 min read Python Lists 4 min read Python Tuples 4 min read Python Dictionary 3 min read Python Sets 6 min read Python Arrays 7 min read List Comprehension in Python 4 min read Advanced PythonPython OOP Concepts 11 min read Python Exception Handling 5 min read File Handling in Python 4 min read Python Database Tutorial 4 min read Python MongoDB Tutorial 3 min read Python MySQL 9 min read Python Packages 10 min read Python Modules 3 min read Python DSA Libraries 15 min read List of Python GUI Library and Packages 3 min read Data Science with PythonNumPy Tutorial - Python Library 3 min read Pandas Tutorial 4 min read Matplotlib Tutorial 5 min read Python Seaborn Tutorial 3 min read StatsModel Library - Tutorial 3 min read Learning Model Building in Scikit-learn 6 min read TensorFlow Tutorial 2 min read PyTorch Tutorial 6 min read Web Development with PythonFlask Tutorial 8 min read Django Tutorial | Learn Django Framework 7 min read Django ORM - Inserting, Updating & Deleting Data 4 min read Templating With Jinja2 in Flask 6 min read Django Templates 5 min read Build a REST API using Flask - Python 3 min read Building a Simple API with Django REST Framework 3 min read Python PracticePython Quiz 1 min read Python Coding Practice 1 min read Python Interview Questions and Answers 15+ min read Like