8000 Aiohttp downloadhandler by novykovd · Pull Request #6798 · scrapy/scrapy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Aiohttp downloadhandler #6798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

novykovd
Copy link
@novykovd novykovd commented May 14, 2025

Fixes #6707
this aiohttp dh passes all tox envs and the http test and should be ready to use, but admittedly this is my first contribution so if this is missing some functionality or something needs to be changed or tested im happy to adjust always

novykovd and others added 10 commits May 4, 2025 17:50
@wRAR
Copy link
Member
wRAR commented May 14, 2025

This is interesting! I have several comments and I think we have more tests in tests/test_downloader_handlers.py that can be attempted to run with this (the class structure there is very weird but at least you can skip the 1.0 parts).

@@ -8,6 +8,7 @@ dynamic = ["version"]
description = "A high-level Web Crawling and Web Scraping framework"
dependencies = [
"Twisted>=21.7.0",
"aiohttp>=3.11.18",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this for now but I think this will be an optional dependency for some time.

@wRAR wRAR mentioned this pull request May 15, 2025
Copy link
codecov bot commented May 17, 2025

Codecov Report

Attention: Patch coverage is 94.59459% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.77%. Comparing base (b41aea4) to head (d0c1cf9).

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
scrapy/core/downloader/handlers/aiohttp.py 94.59% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6798      +/-   ##
==========================================
- Coverage   90.78%   90.77%   -0.02%     
==========================================
  Files         164      165       +1     
  Lines       12381    12418      +37     
  Branches     1609     1610       +1     
==========================================
+ Hits        11240    11272      +32     
- Misses        834      836       +2     
- Partials      307      310       +3     
Files with missing lines Coverage Δ
scrapy/core/downloader/handlers/aiohttp.py 94.59% <94.59%> (ø)

... and 2 files with indirect coverage changes

@novykovd
Copy link
Author

additionally some of the tests for py 3.11-3.13 failed on the first push for unix systems because there was a test that is skipped for windows. i added code that hopefully fixes it but im not running ubuntu or macos at this time so im not sure whether its fully fixed

@wRAR
Copy link
Member
wRAR commented May 17, 2025

I think for the test_timeout_download* tests you just need to add asyncio.TimeoutError to the list of expected exceptions.

tested this on linux with with dockers slim images so hopefully the timeout is passed
@novykovd
Copy link
Author

im not sure how my code has an effect on the engine tests in the case that is failing atm but i will look into it

@wRAR
Copy link
Member
wRAR commented May 19, 2025

That's just #6783

@novykovd
Copy link
Author
novykovd commented May 20, 2025

i thought maybe its some versioning thing because it fails on a lower python version, but ig the code is fine right?

@wRAR
Copy link
Member
wRAR commented May 20, 2025

Yes, the problem is unrelated to the changes in this PR.

@novykovd
Copy link
Author

this issue looks similar to that previous one i think? in both cases its something with task being pending it seems

@novykovd
Copy link
Author

do you think its ready or should i change something else?

@wRAR
Copy link
Member
wRAR commented May 24, 2025

As for the tests, I would wait until #6821 is merged, after which it should be easier to subclass test base classes. I don't expect this handler to pass all of them as e.g. dataloss and maxsize handling is not implemented, but it would be better to subclass all test classes and mark the unimplemented ones with xfail.

@wRAR
Copy link
Member
wRAR commented May 27, 2025

I've enabled all existing tests for this handler and skipped ones that don't pass.
I'm fine with fixing all those in a separate PR. I'm more worried about stuff like ignoring cookies, actually returning correct responses etc., which we don't test for non-default handlers, but that may be needed to be tested manually.

@Gallaecio
Copy link
Member

Another thing to consider, although maybe also OK to address on a separate PR, is to both extend the default list of RETRY_EXCEPTIONS to include aiohttp ones, and to modify the code that loads them so that it silently ignores those that cannot be imported. Or maybe we could make the default value depend on installed dependencies, so that we do not need to change the load handling and silence anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an aiohttp-based download handler
3 participants
0