0% found this document useful (0 votes)
457 views11 pages

Synopsis WS

This document provides a synopsis report on a project to develop a web scraper using Python that can extract both text and images from websites. It was submitted by three students to fulfill requirements for their B.Tech degree in Information Technology. The report introduces web scraping and outlines the proposed work to build scrapers to collect product reviews from e-commerce sites and images for specified keywords from the internet. It describes the necessary prerequisites, application architecture, and provides screenshots of the final results of scraping reviews and images. The conclusion discusses potential future applications and references several sources on web scraping and related topics.

Uploaded by

Nishit Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
457 views11 pages

Synopsis WS

This document provides a synopsis report on a project to develop a web scraper using Python that can extract both text and images from websites. It was submitted by three students to fulfill requirements for their B.Tech degree in Information Technology. The report introduces web scraping and outlines the proposed work to build scrapers to collect product reviews from e-commerce sites and images for specified keywords from the internet. It describes the necessary prerequisites, application architecture, and provides screenshots of the final results of scraping reviews and images. The conclusion discusses potential future applications and references several sources on web scraping and related topics.

Uploaded by

Nishit Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A

Synopsis Report
On

Web Scrapping(Text+Image)
For
partial fulfillment of award of the
B. Tech Degree in Information Technology

Under the Supervision of


Dr. Arun Kumar Singh

Submitted by:

NISHIT CHAUDHARY (1901920130119)


PANKAJ SHARMA (1901920130120)
PIYUSH SHARMA (1901920130121)

Session:

G. L. Bajaj Institute of Technology and Management,


Greater Noida
TABLE OF CONTENTS

1. Introduction
2. Relevant Work
3. Pre-reqisites
4. Application Architecture
5. Conclusion and Future Scope
6. Final Result
7. References
INTRODUCTION

Web scraping is a technique using which the webpages from the internet are fetched and parsed
to understand and extract specific information similar to a human being. Web scrapping consists
of two parts:

• Web Crawling→ Accessing the webpages over the internet and pulling data from them.

• HTML Parsing→ Parsing the HTML content of the webpages obtained through web crawling
and then extracting specific information from it.

Hence, web scrappers are applications/bots, which automatically send requests to websites and
then extract the desired information from the website output. Let’s take an example: how do we
buy a phone online? 1. We first look for a phone with good reviews 2. We see on which website
it’s available at the lowest price 3. We check whether it’s delivered in our area or not 4. If
everything looks good, then we buy the phone. What if there is a computer program that can do
all of these for us? That’s what web scrappers necessarily do. They try to understand the webpage
content as a human would do. Other examples of the applications of web scrapping are:

• Competitive pricing.

• Manufacturers monitor the market, whether the retailer is maintaining a minimum price or not.

• Sentiment analysis of the consumers, whether they are happy with the services and products or
not.

• To aggregate news articles.

• To aggregate Marketing data.

• To gain financial insights from the market.

• To gather data for research.

• To generate marketing leads.

• To collect trending topics by media houses. And, the list goes on.
Figure 1: Web scraping process

PROPOSED WORK

a) For text :
In this document, we’ll take the example of buying a phone online further and try to scrap the
reviews from the website about the phone that we are planning to buy. For example, if we open
filpkart.com and search for ‘iPhone’, the search result will be as follows:
Then if we click on a product link, it will take us to to the following page:

Now, if we scroll down, we will get to see following comments posted by customers:

Our end goal is to build a web scraper that collects the reviews of a product from
the internet.

b) For image :
Our end goal is to build a web scraper that collects the images for a keyword from
the internet.

PREREQUISITES

The things needed before we start building a python based web scraper are:
• Python installed.
• A Python IDE (Integrated Development Environment): like PyCharm, Spyder, or any other
IDE of choice.
• Flask Installed. (A simple command: pip install flask)
• MongoDB installed (Explained Later).
• Basic understanding of Python and HTML.
• Basic understanding of Git (download Git CLI from https://gitforwindows.org/ ).

APPLICATION ARCHITECTURE

The architecture of the application is:


a) For text :
b) For image :
CONCLUSION AND FUTURE SCOPE

In this project, we built a web scraper from scratch that collects the reviews of products from the
internet and also collects the images for a keyword fom the internet collects the images for a
keyword from the internet and then deploying it to the heroku cloud platform.
It is a step by step guide for creating a web scraper, in this case, a review scrapper right from
scratch and then deploying it to the heroku cloud platform.

Text scrappers are extensively used in the industry today for competitive pricing, market studies,
customer sentiment analysis, etc…

Image scrappers are extensively used in the industry today for collecting a huge number of
images that are used as inputs for training the object detection, classification and identification
models.

In the near future, Web scraping will be one of the important tools in the lead generation
process. The web scraping tool can make market research of the particular
product/services and enormous benefits to offer in the marketing field.

FINAL RESULT

a) For text :
b) For image :
REFERENCES

[1]. ”Renita Crystal Pereira, Vanitha T. “Web Scraping of Social Networks.” International
Journal of Innovative Research in Computer and Communication Engineering, vol. 3, pp.237-
239, Oct. 7, 2018”

[2].”Ghazvinian, Holbert, Viswanathan.


“SimpleWebScraping.”Internet:https://seanholbert.wordpress.co m/2011/07/15/scrappy-simple-
webscraping/, Jun. 2015”

[3].”Bellarosey.“Crowdsourcing-Definition.”
Internet:http://crowdsourcing.typepad.com/cs/2006/06/crowdsour cing_a.html, Jun. 02, 2006”

[4].”Kolari, Pand Joshi A. ,“Web mining : research and practice , Computing in Science
&Engineering”, IEEE Transactions on Knowledgeand Data Engineering, vol. 6, no. 2,Vol. 6 ,
No. 4, 2004”

[5].”Kengtel,W:Wagner,M.Proteins1999,37,334-345.”

[6]. “Datahen."3 Advantages of web scraping


foryourenterprise"Internet:https://www.datahen.com/3- advantages-web-
scrapingenterprise/,May.17,2017””

[7].”http://resources.distilnetworks.com/h/i/53822104-iswebscraping- illegal-depends-on-
whatthe-meaning-of-thewordis-is/181642”

You might also like