0% found this document useful (0 votes)
19 views

On Python Project VI Semester: Academic Year: 2018-2019

This project report summarizes a student project on web scraping using Beautiful Soup. The objective was to build a system to extract large amounts of data from websites and save it to a local file. The system was built using Python, PyCharm IDE, Beautiful Soup and lxml libraries to scrape the Wikipedia page on Python and extract and display heading text. Screenshots show the source code, results and scraped webpage. The project was carried out by three students under the guidance of Mr. Shyam Sundar in the Department of Computer Science and Engineering at GAT for the 2018-2019 academic year.

Uploaded by

okokok
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

On Python Project VI Semester: Academic Year: 2018-2019

This project report summarizes a student project on web scraping using Beautiful Soup. The objective was to build a system to extract large amounts of data from websites and save it to a local file. The system was built using Python, PyCharm IDE, Beautiful Soup and lxml libraries to scrape the Wikipedia page on Python and extract and display heading text. Screenshots show the source code, results and scraped webpage. The project was carried out by three students under the guidance of Mr. Shyam Sundar in the Department of Computer Science and Engineering at GAT for the 2018-2019 academic year.

Uploaded by

okokok
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Report

On

PYTHON PROJECT

VI Semester
Academic Year: 2018-2019

Title: WEB SCRAPING USING BEAUTIFUL SOUP


USN Name Signature
1GA14CS010 Akash Kumar S
1GA15CS053 G Janany
1GA16CS191 Vishal Kumar

Guide
[Mr.Shyam Sundar]

Dept. of CSE, GAT 2018-19 1


Objective of the Project
To build a system that is capable of extracting large amounts of data from websites
whereby the data is extracted and saved to a local file or displayed. It is either custom
built for a specific website or is one which can be configured to work with any website.
With the click of a button we can easily save the data available in the website to a file in
our computer.

Dept. of CSE, GAT 2018-19 2


System Requirement Specification

Software Requirements Specification

➢ Language used : Python Programming Language


➢ IDE/Compiler used : PyCharm
➢ OS used : Windows 10

Hardware Requirements Specification

o Processor : i7 8th generation


o Hard Disk : 1 TB
o Monitor : HD LED Antiglare
o Keyboard : Island Style

Dept. of CSE, GAT 2018-19 3


Source Code

# make sure to have python ver 3.5 or higher


# 1> install requests using - pip install requests
# 2> install beautifulsoup using - pip install beautifulsoup4
# 3> install lxml using - pip install lxml
(enter the commands on cmd promt , not on python shell)

import requests #imports requests module


import bs4 #imports beautifulsoup module

res = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)')
res.text #obtains the entire HTML and/or CSS code of the
website

soup = bs4.BeautifulSoup(res.text, 'lxml') #lxml is a data structure


result = soup.select('.mw-body-content h2') #here you can give any HTML
tag which you want to scrape

for i in soup.select('https://en.wikipedia.org/wiki/Python_(programming_language)'):
print(i.text)

result #displays the required data in html code


result[0] #displays first element in the array(in this case there is only
one element)
result[0].getText() #displays the required data in string format

Dept. of CSE, GAT 2018-19 4


Snapshots

1.Snapshot of Source Code

Dept. of CSE, GAT 2018-19 5


2. Snapshot of Result

Dept. of CSE, GAT 2018-19 6


3. Snapshot of Webpage

Dept. of CSE, GAT 2018-19 7

You might also like