Scrapy
Scrapy
Scrapy
Scrapy project architecture is built around "spiders", which are Initial release 26 June 2008
self-contained crawlers that are given a set of instructions. Stable release 2.9.0[1] / 8
Following the spirit of other don't repeat yourself frameworks, May 2023
such as Django,[4] it makes it easier to build and scale large Repository github.com
crawling projects by allowing developers to reuse their code.
/scrapy/scrapy
The Scrapy framework provides you with powerful features such (https://github.
as auto-throttle, rotating proxies and user-agents, allowing you com/scrapy/scr
scrape virtually undetected across the net. Scrapy also provides a apy)
web-crawling shell, which can be used by developers to test their
Written in Python
assumptions on a site’s behavior.[5]
Operating system Windows,
Some well-known companies and products using Scrapy are: macOS, Linux
Lyst,[6][7] Parse.ly,[8] Sayone Technologies,[9] Sciences Po Type Web crawler
Medialab,[10] Data.gov.uk’s World Government Data site.[11]
License BSD License
Website scrapy.org (htt
History ps://scrapy.or
g)
Scrapy was born at London-based web-aggregation and e-
commerce company Mydeco, where it was developed and
maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo,
Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release
happening in June 2015.[12] In 2011, Zyte (formerly Scrapinghub) became the new official
maintainer.[13][14]
References
1. "Release 2.9.0" (https://github.com/scrapy/scrapy/releases/tag/2.9.0). 8 May 2023. Retrieved
31 May 2023.
2. Commit 975f150 (https://github.com/scrapy/scrapy/commit/975f15003efc911809983150852
e04433d9811dd)
3. Scrapy at a glance (http://doc.scrapy.org/en/latest/intro/overview.html).
4. "Frequently Asked Questions" (http://doc.scrapy.org/en/latest/faq.html#did-scrapy-steal-x-fro
m-django). Frequently Asked Questions, Scrapy 2.8.0 documentation. Retrieved 28 July
2015.
5. "Scrapy shell" (http://doc.scrapy.org/en/latest/topics/shell.html). Retrieved 28 July 2015.
6. Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning" (https://web.ar
chive.org/web/20160604082034/http://talks.lystit.com/dsl-scraping-presentation/#/4).
Archived from the original (http://talks.lystit.com/dsl-scraping-presentation/#/4) on 4 June
2016. Retrieved 28 July 2015.
7. Scrapy | Companies using Scrapy (http://scrapy.org/companies/)
8. Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python" (htt
ps://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python). Web
Crawling & Metadata Extraction in Python - Speaker Deck. Retrieved May 11, 2015.
9. "Scrapy Companies" (https://scrapy.org/companies/). Scrapy | Companies using Scrapy.
10. Hyphe v0.0.0: the first release of our new webcrawler is out! (http://www.medialab.sciences-
po.fr/blog/hyphe-v0-0-0-the-first-release-of-our-new-webcrawler-is-out/)
11. Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr,
Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (https://
twitter.com/bfirsh/status/8025368963) (Tweet) – via Twitter.
12. Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!" (https://groups.google.com/fo
rum/#!topic/scrapy-users/sMbBVIq0sko). scrapy-users (Mailing list).
13. Hoffman, Pablo (2013). List of the primary authors & contributors (https://github.com/scrapy/s
crapy/blob/master/AUTHORS). Retrieved 18 November 2013.
14. Interview Scraping Hub (http://decisionstats.com/2015/12/12/interview-scrapinghub-python-
webcrawling/).
External links
Official website (https://scrapy.org)
Scrapy Tutorial Series (https://coderslegacy.com/python/scrapy-tutorial/)