“Scrapy”的意思、由来-开放百科全书

Scrapy ({{IPAc-en|ˈ|s|k|r|eI|p|i}} {{respell|SKRAY|pee}})^[2] is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.^[3] It is currently maintained by Scrapinghub Ltd., a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,^[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code. Scrapy also provides a web-crawling shell, which can be used by developers to test their assumptions on a site’s behavior.^[5]

Some well-known companies and products using Scrapy are: Lyst,^[6] ^[7] Parse.ly,^[8] Sayone Technologies^[9], Sciences Po Medialab,^[10] Data.gov.uk’s World Government Data site.^[11][https://www.sayonetech.com/services/data-scraping/]

History

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.^[12] In 2011, Scrapinghub became the new official maintainer.^[13]^[14]

References

1. ^{{Cite web|url=https://doc.scrapy.org/en/latest/news.html|title=Release notes — Scrapy documentation|website=doc.scrapy.org|language=en|access-date=2019-02-15}}
2. ^[https://groups.google.com/forum/#!topic/scrapy-users/tA_1T8du_WU How do you pronounce "Scrapy"?]
3. ^Scrapy at a glance.
4. ^{{ cite web | url= http://doc.scrapy.org/en/latest/faq.html#did-scrapy-steal-x-from-django | title= Frequently Asked Questions | access-date= 28 July 2015 }}
5. ^{{ cite web | url= http://doc.scrapy.org/en/latest/topics/shell.html | title = Scrapy shell | access-date= 28 July 2015}}
6. ^{{ cite web | url= http://talks.lystit.com/dsl-scraping-presentation/#/4 | title=Scalable Scraping Using Machine Learning |first1=Eddie|last1=Bell|first2=Jonathan|last2=Heusser | access-date= 28 July 2015}}
7. ^Scrapy | Companies using Scrapy
8. ^{{ cite web | url=https://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python| title=Web Crawling & Metadata Extraction in Python| first= Andrew | last=Montalenti}}
9. ^{{Cite web |url=https://scrapy.org/companies/ |title=Scrapy Companies |last= |first= |date= |website=Scrapy website |archive-url= |archive-date= |dead-url= |access-date=}}
10. ^Hyphe v0.0.0: the first release of our new webcrawler is out!
11. ^{{Cite tweet |user=bfirsh |author=Ben Firshman |number=8025368963 |date = 21 January 2010 |title=World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords http://bit.ly/5jU3La #opendata #datastore }}
12. ^{{cite mailing list |url=https://groups.google.com/forum/#!topic/scrapy-users/sMbBVIq0sko | title= Scrapy 1.0 official release out! |mailing-list=scrapy-users|last=Medina |first=Julia |date=19 June 2015}}
13. ^{{cite book |author=Pablo Hoffman |title=List of the primary authors & contributors |url=https://github.com/scrapy/scrapy/blob/master/AUTHORS |accessdate=18 November 2013 |year=2013}}
14. ^Interview Scraping Hub.

History

References

External links