-
Notifications
You must be signed in to change notification settings - Fork 8
Scraper parsing tests #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's Guide by SourceryThis pull request refactors the scraper module to improve modularity, readability, and type safety. It introduces new helper functions for extracting zone URLs and WHOIS servers, adds type annotations to existing functions, and implements comprehensive tests for the new functionality. The changes focus on enhancing the code structure and testability of the scraper module. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kgaughan - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 3 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kgaughan - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 4 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kgaughan - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 4 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
tests/test_scraper.py
Outdated
from os import path | ||
|
||
import bs4 | ||
|
||
from uwhoisd import scraper | ||
|
||
HERE = path.dirname(__file__) | ||
|
||
|
||
def test_extract_zone_urls(): | ||
with open(path.join(path.dirname(__file__), "iana-root-zone.html"), encoding="utf-8") as fh: | ||
body = bs4.BeautifulSoup(fh, "html.parser") | ||
result = list(scraper.extract_zone_urls("http://example.com", body)) | ||
# The test zone should not appear | ||
assert result == [ | ||
("aaa", "http://example.com/domains/root/db/aaa.html"), | ||
("bt", "http://example.com/domains/root/db/bt.html"), | ||
("xxx", "http://example.com/domains/root/db/xxx.html"), | ||
] | ||
|
||
|
||
def test_extract_zone_urls_edge_cases(): | ||
empty_body = bs4.BeautifulSoup("", "html.parser") | ||
assert list(scraper.extract_zone_urls("http://example.com", empty_body)) == [] | ||
|
||
|
||
def test_extract_whois_server(): | ||
with open(path.join(path.dirname(__file__), "zone-info-fragment.html"), encoding="utf-8") as fh: | ||
body = bs4.BeautifulSoup(fh, "html.parser") | ||
result = scraper.extract_whois_server(body) | ||
assert result == "whois.nic.abc" | ||
|
||
|
||
def test_extract_whois_server_not_found(): | ||
body = bs4.BeautifulSoup("<html><body></body></html>", "html.parser") | ||
result = scraper.extract_whois_server(body) | ||
assert result is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add tests for scrape_whois_from_iana function
The tests cover the new helper functions well, but there are no tests for the main scrape_whois_from_iana function. Consider adding tests that mock the network requests and verify the function's behavior with different inputs and scenarios.
from unittest.mock import patch
import pytest
from uwhoisd import scraper
def test_scrape_whois_from_iana():
with patch('uwhoisd.scraper.requests.get') as mock_get:
mock_get.return_value.text = '<html></html>'
result = scraper.scrape_whois_from_iana()
assert isinstance(result, dict)
assert len(result) > 0
assert all(isinstance(k, str) and isinstance(v, str) for k, v in result.items())
Summary by Sourcery
Refactor the scraper to improve code modularity by introducing helper functions for extracting zone URLs and WHOIS servers. Add comprehensive tests to validate the new functions and ensure robust handling of edge cases.
Enhancements:
scrape_whois_from_iana
function to use helper functionsextract_zone_urls
andextract_whois_server
for improved readability and modularity.Tests:
extract_zone_urls
andextract_whois_server
functions to ensure correct functionality and edge case handling.iana-root-zone.html
andzone-info-fragment.html
to support the new test cases.