Savitendra Miniproject

WEB SCRAPING OF IMBD TOP 250
MOVIE
SYNOPSIS
OF MINI PROJECT
BACHELOR OF TECHNOLOGY
COMPUTER SCINCE AND TECHNOLOGY
SUBMITTED BY SUBMITTED TO
SAVITENDRA MANI PANDEY PROF. SHOBHIT SHUKLA

ROLL NO. – 228205 PROF. ASHA SINGH
KAMLA NEHRU INSTITUTE OF TECHNOLOGY,

SULTANPUR (U.P)
(An Autonomous Govt. Engineering Institute under 2f and 12B of UGC Act)
Affiliated to
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY,
LUCKNOW (U.P), INDIA
i
CERTIFICATE
This is hereby to certify that, the original and genuine investigation work has been carried
out to investigate about the subject matter and the related data collection and investigation
has been completed solely, sincerely and satisfactorily done by
Savitendra Mani Pandey students of COMPUTER SCIENCE AND ENGINNERING for the
academic session 2023-2024 from Kamla Nehru Institute of Technology, Sultanpur.
Regarding the investigatory project entitled
"WEB SCRAPING OF IMBD TOP 250 MOVIE".
For Project Department under direct supervision of the undersigned as per the requirement
for the Board Examination. The project report embodies results of original work and studies
carried out by students and contents do not form the basis for the award of any other
certification courses.
PROF. SHOBHIT SHUKLA

PROF. ASHA SINGH
KAMLA NEHRU INSTITUTE OF TECHNOLOGY,

SULTANPUR (U.P)
(An Autonomous Govt. Engineering Institute under 2f and 12B of UGC Act)
Affiliated to
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY,
LUCKNOW (U.P), INDIA
ii
ACKNOWLEDGEMENT
We have taken efforts in this project. However, it would not have been possible without the
kind support and help of many individuals. I would like to extend my sincere thanks to all of
them. It has been great honour and privilege to complete my project work at Kamala Neharu
Institute of Technology Sultanpur. We are highly indebted to Prof. SHOBHIT SHUKLA and
Prof. ASHA SINGH for their guidance and constant supervision as well as for providing
necessary information regarding the project and also for their support in completing the
project. His constant guidance and willingness to share his vast knowledge made us understand
this project and its manifestations in great depths ad helped us to complete the assigned tasks
on time. I would like to express my gratitude towards my parents and members of Kamala
Neharu Institute of Technology Sultanpur for their kind cooperation which help me in
completion of this project. My thanks and appreciations also go to my colleagues in developing
the project and people who have willingly helped me out with their abilities.
iii
TABLE OF CONTENT
Abstract V
Introduction 1
Proposed Work 2
Prerequisite 5
Application of Architecture 5
Conclusion and future scope 6
Reference 7
iv
ABSTRACT:
Web Scraping or Web Harvesting is a software technology aims at extracting information
from website. Web scraping typically exploring of the World Web by creating Hyper Text
Transfer Protocol or implement a Suitable Web Browser. It is closely related to Web
Indexing, an information extracting technique used by multiple search engines to index-data
on the Web human programmed bots.
In comparison, web scraping stresses on transforming unstructured information (usually in
HTML format) on the web structured information that can be saved and processed in a
centralised database.
Web scraping mostly used for price comparison online, webpage interface change detection,
weather forecast information, web information integration, webpage mix ups or mashups,
and web surveys. Currently, there are multiple software gadgets available that aim to apply
scraping techniques to personalize your website.
v
INTRODUCTION
Web scraping is a technique using which the webpages from the internet are fetched and
parsed to understand and extract specific information to human being. Web scraping consists
of two part:
1. Web Crawling – Accessing the webpages over the internet and pulling data from them.
2. HTML Parsing – Parsing the HTML content of the webpages obtained through web
crawling and then extracting specific information from it.
Hence, web scrapers are applications/bots, which automatically send requests to website and
then extract the desired information from the website output. Let’s take an example : how do
we buy a phone online ? 1.We first look for a phone with good reviews 2. We see on which
website it’s available at lowest price 3. We check whether it’s delivered in out area or not 4. If
everything looks good, then we buy the phone. What if there is a computer program that can
do all of these for us? That’s what web scrappers necessarily do. They try to understand the
webpage content as human being would do. Other examples of the application of web
scrapping are:
➢ Competitive pricing.
➢ Manufacturers monitor the market, whether the retailer is maintaining a minimum price
or not.
➢ Sentiment analysis of consumers, whether they are happy with the services and products
or not.
➢ To aggregate Marketing data.
➢ To gain financial insights from the market.
➢ To gather data for research.
➢ To generate marketing leads.
➢ To collect trending topics by media houses. And , the list goes on.
1
PROPOSED WORK
In this document , we’ll take the example of searching movie info online further and try to
scrap the movie info from the website about the movie that we searching for . For example ,
if we open imbd.com and search for top 250 movie , the search result will be as follows:
2
Then if we click on a movie link, it will take us to the following page:
Now , we will get to see following information about movie like timing ,releasing year ,
IMBD ratings, and if we scroll down , we will get the director name :
3
Then if we click on a director link, it will take us to the following page:
Now ,if we scroll down, we will get to see top four movies of directed by director:
4
PREREQUISITE
The thing needed before we start building a python based web scraper are:
• Python installed.
• A Python IDE (Integrated Development Environment): like PyCharm, Spyder , or any
other IDE of choice.
• Basic understand of Python and HTML.
• Basic understanding of Request and BeautifulSoup Module.
APPLICATION OF ARCHITECTURE
The architecture of the application is:
START
USER ENTERS A MOVIE

NAME
MOVIE FOUND
SHOW MOVIE INFO TO STOP

USER
5
CONCLUSION AND FUTURE SCOPE
In this project, we built a web scraper from scratch that collects the movie info of movie from
the internet and also collect the director information for a movie name from the internet.
Text scrappers are extensively used in the industry today for competitive pricing, market
studies, customer sentiment analysis, etc…
In the future , Web scraping will be one of the important tools in the lead generation process.
The web scraping tool can make market research of the particular product/services and
enormous benefits to offer in the marketing field.
REFERENCES
➢ Web3School:
Website-https://www.w3school.com/
➢ Udemy:
Website-https://www.udemy.com/
➢ “Datahen.”3 Advantage of web scraping””
➢ Greeks for Greeks
Website-https://www.greeksforgreeks.org
➢ IMDB
Website-https://imbd.com

Savitendra Miniproject

Uploaded by

Copyright:

Available Formats

Savitendra Miniproject

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Savitendra Miniproject

Uploaded by

Copyright:

Available Formats

WEB SCRAPING OF IMBD TOP 250

SAVITENDRA MANI PANDEY PROF. SHOBHIT SHUKLA

KAMLA NEHRU INSTITUTE OF TECHNOLOGY,

Regarding the investigatory project entitled

"WEB SCRAPING OF IMBD TOP 250 MOVIE".

PROF. SHOBHIT SHUKLA

KAMLA NEHRU INSTITUTE OF TECHNOLOGY,

Conclusion and future scope 6

USER ENTERS A MOVIE

SHOW MOVIE INFO TO STOP

You might also like