Refactor CLI scraping: fix trending words, handle dynamic classes, improve outputs, and add request headers #1

spithash · 2025-06-18T09:18:22Z

Addresses multiple issues in the CLI scraper for the Online Etymology Dictionary:

Trending Words:
The previous method relied on CSS classes that are dynamically generated and change frequently, causing scraping failures. To fix this, the trending words are now extracted from the /word/test page where the sidebar is consistently present, using stable selectors instead of brittle class names.

Request Headers:
Added a User-Agent header to all HTTP requests to mimic a browser and reduce risk of being blocked.

Output Formatting:
Improved both plain text and rich output functions for better title extraction, whitespace trimming, and readability.

Fuzzy Search:
Added headers and error handling for robustness.

General:
Updated selectors to be more resilient against frontend changes and dynamic classes.

…prove outputs, and add request headers This PR addresses multiple issues in the CLI scraper for the Online Etymology Dictionary: Trending Words: The previous method relied on CSS classes that are dynamically generated and change frequently, causing scraping failures. To fix this, the trending words are now extracted from the /word/test page where the sidebar is consistently present, using stable selectors instead of brittle class names. Request Headers: Added a User-Agent header to all HTTP requests to mimic a browser and reduce risk of being blocked. Output Formatting: Improved both plain text and rich output functions for better title extraction, whitespace trimming, and readability. Fuzzy Search: Added headers and error handling for robustness. General: Updated selectors to be more resilient against frontend changes and dynamic classes.

'from html import unescape'

spithash added 2 commits June 18, 2025 09:14

redundant library removal

ab051a4

'from html import unescape'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor CLI scraping: fix trending words, handle dynamic classes, improve outputs, and add request headers #1

Refactor CLI scraping: fix trending words, handle dynamic classes, improve outputs, and add request headers #1

Uh oh!

Uh oh!

Uh oh!

Refactor CLI scraping: fix trending words, handle dynamic classes, improve outputs, and add request headers #1

Are you sure you want to change the base?

Refactor CLI scraping: fix trending words, handle dynamic classes, improve outputs, and add request headers #1

Uh oh!

Conversation

Uh oh!

Uh oh!