Savitendra Miniproject

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

WEB SCRAPING OF IMBD TOP 250

MOVIE

SYNOPSIS
OF MINI PROJECT

BACHELOR OF TECHNOLOGY
COMPUTER SCINCE AND TECHNOLOGY

SUBMITTED BY SUBMITTED TO

SAVITENDRA MANI PANDEY PROF. SHOBHIT SHUKLA


ROLL NO. – 228205 PROF. ASHA SINGH

KAMLA NEHRU INSTITUTE OF TECHNOLOGY,


SULTANPUR (U.P)
(An Autonomous Govt. Engineering Institute under 2f and 12B of UGC Act)
Affiliated to
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY,
LUCKNOW (U.P), INDIA

i
CERTIFICATE

This is hereby to certify that, the original and genuine investigation work has been carried
out to investigate about the subject matter and the related data collection and investigation
has been completed solely, sincerely and satisfactorily done by

Savitendra Mani Pandey students of COMPUTER SCIENCE AND ENGINNERING for the
academic session 2023-2024 from Kamla Nehru Institute of Technology, Sultanpur.

Regarding the investigatory project entitled

"WEB SCRAPING OF IMBD TOP 250 MOVIE".

For Project Department under direct supervision of the undersigned as per the requirement
for the Board Examination. The project report embodies results of original work and studies
carried out by students and contents do not form the basis for the award of any other
certification courses.

PROF. SHOBHIT SHUKLA


PROF. ASHA SINGH

KAMLA NEHRU INSTITUTE OF TECHNOLOGY,


SULTANPUR (U.P)
(An Autonomous Govt. Engineering Institute under 2f and 12B of UGC Act)
Affiliated to
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY,
LUCKNOW (U.P), INDIA

ii
ACKNOWLEDGEMENT

We have taken efforts in this project. However, it would not have been possible without the
kind support and help of many individuals. I would like to extend my sincere thanks to all of
them. It has been great honour and privilege to complete my project work at Kamala Neharu
Institute of Technology Sultanpur. We are highly indebted to Prof. SHOBHIT SHUKLA and
Prof. ASHA SINGH for their guidance and constant supervision as well as for providing
necessary information regarding the project and also for their support in completing the
project. His constant guidance and willingness to share his vast knowledge made us understand
this project and its manifestations in great depths ad helped us to complete the assigned tasks
on time. I would like to express my gratitude towards my parents and members of Kamala
Neharu Institute of Technology Sultanpur for their kind cooperation which help me in
completion of this project. My thanks and appreciations also go to my colleagues in developing
the project and people who have willingly helped me out with their abilities.

iii
TABLE OF CONTENT

Abstract V

Introduction 1

Proposed Work 2

Prerequisite 5

Application of Architecture 5

Conclusion and future scope 6

Reference 7

iv
ABSTRACT:
Web Scraping or Web Harvesting is a software technology aims at extracting information
from website. Web scraping typically exploring of the World Web by creating Hyper Text
Transfer Protocol or implement a Suitable Web Browser. It is closely related to Web
Indexing, an information extracting technique used by multiple search engines to index-data
on the Web human programmed bots.
In comparison, web scraping stresses on transforming unstructured information (usually in
HTML format) on the web structured information that can be saved and processed in a
centralised database.
Web scraping mostly used for price comparison online, webpage interface change detection,
weather forecast information, web information integration, webpage mix ups or mashups,
and web surveys. Currently, there are multiple software gadgets available that aim to apply
scraping techniques to personalize your website.

v
INTRODUCTION
Web scraping is a technique using which the webpages from the internet are fetched and
parsed to understand and extract specific information to human being. Web scraping consists
of two part:
1. Web Crawling – Accessing the webpages over the internet and pulling data from them.
2. HTML Parsing – Parsing the HTML content of the webpages obtained through web
crawling and then extracting specific information from it.
Hence, web scrapers are applications/bots, which automatically send requests to website and
then extract the desired information from the website output. Let’s take an example : how do
we buy a phone online ? 1.We first look for a phone with good reviews 2. We see on which
website it’s available at lowest price 3. We check whether it’s delivered in out area or not 4. If
everything looks good, then we buy the phone. What if there is a computer program that can
do all of these for us? That’s what web scrappers necessarily do. They try to understand the
webpage content as human being would do. Other examples of the application of web
scrapping are:
➢ Competitive pricing.
➢ Manufacturers monitor the market, whether the retailer is maintaining a minimum price
or not.
➢ Sentiment analysis of consumers, whether they are happy with the services and products
or not.
➢ To aggregate Marketing data.
➢ To gain financial insights from the market.
➢ To gather data for research.
➢ To generate marketing leads.
➢ To collect trending topics by media houses. And , the list goes on.

1
PROPOSED WORK
In this document , we’ll take the example of searching movie info online further and try to
scrap the movie info from the website about the movie that we searching for . For example ,
if we open imbd.com and search for top 250 movie , the search result will be as follows:

2
Then if we click on a movie link, it will take us to the following page:

Now , we will get to see following information about movie like timing ,releasing year ,
IMBD ratings, and if we scroll down , we will get the director name :

3
Then if we click on a director link, it will take us to the following page:

Now ,if we scroll down, we will get to see top four movies of directed by director:

4
PREREQUISITE
The thing needed before we start building a python based web scraper are:
• Python installed.
• A Python IDE (Integrated Development Environment): like PyCharm, Spyder , or any
other IDE of choice.
• Basic understand of Python and HTML.
• Basic understanding of Request and BeautifulSoup Module.

APPLICATION OF ARCHITECTURE
The architecture of the application is:

START

USER ENTERS A MOVIE


NAME

MOVIE FOUND

SHOW MOVIE INFO TO STOP


USER

5
CONCLUSION AND FUTURE SCOPE
In this project, we built a web scraper from scratch that collects the movie info of movie from
the internet and also collect the director information for a movie name from the internet.

Text scrappers are extensively used in the industry today for competitive pricing, market
studies, customer sentiment analysis, etc…
In the future , Web scraping will be one of the important tools in the lead generation process.
The web scraping tool can make market research of the particular product/services and
enormous benefits to offer in the marketing field.
REFERENCES
➢ Web3School:
Website-https://www.w3school.com/
➢ Udemy:
Website-https://www.udemy.com/
➢ “Datahen.”3 Advantage of web scraping””
➢ Greeks for Greeks
Website-https://www.greeksforgreeks.org
➢ IMDB
Website-https://imbd.com

You might also like