RSS scrape image from article if otherwise none is found #630
+118
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some RSS Feeds don't provide a media tag and the other failsafes to try and get an image to display also sometimes fail, an example would be the RSS Feed of Bleeping Computer:
https://www.bleepingcomputer.com/feed/
This PR implements another failover which uses the worker pool to scrape an image from the article itself.
element found within these selectors that has a valid src attribute is used as the preview image.
It uses the CSS selectors article img, main img, and .post-content img to look for images typically found in article content. The first
Tested using Docker and the case of Bleeping Computer where else no pictures are displayed, loading times don't seem affected thanks to the worker pool.
Keep in mind to change the User Agent if trying to replicate with Bleeping Computer, otherwise you will be blocked by Cloudflare.
Before:

After:
