Scraper parsing tests #41

kgaughan · 2024-08-22T01:42:39Z

Summary by Sourcery

Refactor the scraper to improve code modularity by introducing helper functions for extracting zone URLs and WHOIS servers. Add comprehensive tests to validate the new functions and ensure robust handling of edge cases.

Enhancements:

Refactor the scrape_whois_from_iana function to use helper functions extract_zone_urls and extract_whois_server for improved readability and modularity.

Tests:

Add new test cases for the extract_zone_urls and extract_whois_server functions to ensure correct functionality and edge case handling.
Include HTML test fixtures iana-root-zone.html and zone-info-fragment.html to support the new test cases.

sourcery-ai · 2024-08-22T01:42:46Z

Reviewer's Guide by Sourcery

This pull request refactors the scraper module to improve modularity, readability, and type safety. It introduces new helper functions for extracting zone URLs and WHOIS servers, adds type annotations to existing functions, and implements comprehensive tests for the new functionality. The changes focus on enhancing the code structure and testability of the scraper module.

File-Level Changes

Files	Changes
`src/uwhoisd/scraper.py`	Introduced new helper functions `extract_zone_urls` and `extract_whois_server` to modularize the scraping logic
`src/uwhoisd/scraper.py`	Added type annotations to `fetch` and `scrape_whois_from_iana` functions
`src/uwhoisd/scraper.py`	Refactored `scrape_whois_from_iana` to use the new helper functions
`tests/test_scraper.py`	Implemented new test cases for `extract_zone_urls` and `extract_whois_server` functions
`tests/iana-root-zone.html` `tests/zone-info-fragment.html`	Added HTML fixtures to support new test cases

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

sourcery-ai

Hey @kgaughan - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 3 issues found
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

tests/test_scraper.py

kgaughan · 2024-08-22T01:48:03Z

@sourcery-ai review

sourcery-ai

Hey @kgaughan - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟡 Testing: 4 issues found
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

src/uwhoisd/scraper.py

tests/test_scraper.py

kgaughan · 2024-08-22T01:54:51Z

@sourcery-ai review

sourcery-ai

Hey @kgaughan - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 4 issues found
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

tests/test_scraper.py

sourcery-ai · 2024-08-22T01:56:13Z

tests/test_scraper.py

+from os import path
+
+import bs4
+
+from uwhoisd import scraper
+
+HERE = path.dirname(__file__)
+
+
+def test_extract_zone_urls():
+    with open(path.join(path.dirname(__file__), "iana-root-zone.html"), encoding="utf-8") as fh:
+        body = bs4.BeautifulSoup(fh, "html.parser")
+    result = list(scraper.extract_zone_urls("http://example.com", body))
+    # The test zone should not appear
+    assert result == [
+        ("aaa", "http://example.com/domains/root/db/aaa.html"),
+        ("bt", "http://example.com/domains/root/db/bt.html"),
+        ("xxx", "http://example.com/domains/root/db/xxx.html"),
+    ]
+
+
+def test_extract_zone_urls_edge_cases():
+    empty_body = bs4.BeautifulSoup("", "html.parser")
+    assert list(scraper.extract_zone_urls("http://example.com", empty_body)) == []
+
+
+def test_extract_whois_server():
+    with open(path.join(path.dirname(__file__), "zone-info-fragment.html"), encoding="utf-8") as fh:
+        body = bs4.BeautifulSoup(fh, "html.parser")
+    result = scraper.extract_whois_server(body)
+    assert result == "whois.nic.abc"
+
+
+def test_extract_whois_server_not_found():
+    body = bs4.BeautifulSoup("<html><body></body></html>", "html.parser")
+    result = scraper.extract_whois_server(body)
+    assert result is None


suggestion (testing): Add tests for scrape_whois_from_iana function

The tests cover the new helper functions well, but there are no tests for the main scrape_whois_from_iana function. Consider adding tests that mock the network requests and verify the function's behavior with different inputs and scenarios.

from unittest.mock import patch import pytest from uwhoisd import scraper def test_scrape_whois_from_iana(): with patch('uwhoisd.scraper.requests.get') as mock_get: mock_get.return_value.text = '<html></html>' result = scraper.scrape_whois_from_iana() assert isinstance(result, dict) assert len(result) > 0 assert all(isinstance(k, str) and isinstance(v, str) for k, v in result.items())

tests/test_scraper.py

kgaughan added 3 commits August 22, 2024 02:15

Extract extraction of zone URLs

ee1bd36

Extract extraction of WHOIS server address

fbf9b9e

A bit more test coverage

8fd9259

sourcery-ai bot reviewed Aug 22, 2024

View reviewed changes

tests/test_scraper.py Show resolved Hide resolved

tests/test_scraper.py Show resolved Hide resolved

tests/test_scraper.py Outdated Show resolved Hide resolved

Test empty documents

bca6202

sourcery-ai bot reviewed Aug 22, 2024

View reviewed changes

More reasonable tests

8000

b923218

sourcery-ai bot reviewed Aug 22, 2024

View reviewed changes

Parametrise the no-match tests

090a7a7

kgaughan merged commit bb50859 into master Aug 22, 2024

kgaughan deleted the tests branch August 22, 2024 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper parsing tests #41

Scraper parsing tests #41

Scraper parsing tests #41

Scraper parsing tests #41

Conversation

Summary by Sourcery

Reviewer's Guide by Sourcery

File-Level Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment