8000 Verify SPIDER_MANAGER_CLASS interface while loading it in CrawlerRunner by curita · Pull Request #1148 · scrapy/scrapy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Verify SPIDER_MANAGER_CLASS interface while loading it in CrawlerRunner #1148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2015

Conversation

curita
Copy link
Member
@curita curita commented Apr 10, 2015

Discussed in #873.

pablohoffman added a commit that referenced this pull request Apr 13, 2015
Verify SPIDER_MANAGER_CLASS interface while loading it in CrawlerRunner
@pablohoffman pablohoffman merged commit 71c0afa into scrapy:master Apr 13, 2015
@curita curita added this to the Scrapy 1.0 milestone Apr 14, 2015
@curita curita deleted the verify-spidermanager-interface branch April 14, 2015 14:11
@kmike
Copy link
Member
kmike commented Apr 16, 2015

This change broke some SpiderManager implementations which don't implement find_by_request method. find_by_request is only useful for scrapy fetch, scrapy parse and scrapy shell commands; crawling should work fine without this method.

@@ -42,3 +43,19 @@ class CustomSettingsSpider(DefaultSpider):

self.assertFalse(settings.frozen)
self.assertTrue(crawler.settings.frozen)


def SpiderManagerWithWrongInterface(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why def?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh god, why did we merged this (should be 'class', got a little too functional there). I'm not even sure how the test is working.

@curita
Copy link
Member Author
curita commented Apr 16, 2015

About find_by_request, I'm not sure, I kept it in the interface on purpose (It was there before my changes in the Crawler API), I'd like that any SpiderManager implemented all needed functions to run all commands. We're doing this instead of letting it raise an exception later on because some method is not implemented, and that's going to happen eventually if we remove find_by_request from the interface (but I grant that another use-case, probably more important, for checking the interface is to verify that the SpiderManager is not using the old interface, and that could be done without find_by_request since it's present in the old and the new interface).

I still prefer this check being done over the complete interface, but we could change it.

@curita
Copy link
Member Author
curita commented Apr 16, 2015

Now that I think of it again maybe we could issue a warning instead of an exception if SpiderManager (soon to be SpiderLoader) doesn't comply with the interface, seems like it's the more tolerant alternative.

Another option could be to raise an exception if the absolutely needed methods to run are not implemented (which I think are from_settings and load, list seems like it's only needed for scrapy list) and then issue a warning if the other two methods are not there.

I like both options (but the first one slightly more).

@MojoJolo
Copy link

@kmike pointed me into this issue. Looks like my problem is related.

I'm having an error and getting The find_by_request attribute was not provided. message when executing scrapy crawl or even scrapy version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0