8000 Scraper parsing tests by kgaughan · Pull Request #41 · kgaughan/uwhoisd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Scraper parsing tests #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 22, 2024
Merged

Scraper parsing tests #41

merged 6 commits into from
Aug 22, 2024

Conversation

kgaughan
Copy link
Owner
@kgaughan kgaughan commented Aug 22, 2024

Summary by Sourcery

Refactor the scraper to improve code modularity by introducing helper functions for extracting zone URLs and WHOIS servers. Add comprehensive tests to validate the new functions and ensure robust handling of edge cases.

Enhancements:

  • Refactor the scrape_whois_from_iana function to use helper functions extract_zone_urls and extract_whois_server for improved readability and modularity.

Tests:

  • Add new test cases for the extract_zone_urls and extract_whois_server functions to ensure correct functionality and edge case handling.
  • Include HTML test fixtures iana-root-zone.html and zone-info-fragment.html to support the new test cases.

Copy link
Contributor
sourcery-ai bot commented Aug 22, 2024

Reviewer's Guide by Sourcery

This pull request refactors the scraper module to improve modularity, readability, and type safety. It introduces new helper functions for extracting zone URLs and WHOIS servers, adds type annotations to existing functions, and implements comprehensive tests for the new functionality. The changes focus on enhancing the code structure and testability of the scraper module.

File-Level Changes

Files Changes
src/uwhoisd/scraper.py Introduced new helper functions extract_zone_urls and extract_whois_server to modularize the scraping logic
src/uwhoisd/scraper.py Added type annotations to fetch and scrape_whois_from_iana functions
src/uwhoisd/scraper.py Refactored scrape_whois_from_iana to use the new helper functions
tests/test_scraper.py Implemented new test cases for extract_zone_urls and extract_whois_server functions
tests/iana-root-zone.html
tests/zone-info-fragment.html
Added HTML fixtures to support new test cases

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link
Contributor
@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kgaughan - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 3 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@kgaughan
Copy link
Owner Author

@sourcery-ai review

Copy link
Contributor
@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kgaughan - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 4 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@kgaughan
Copy link
Owner Author

@sourcery-ai review

Copy link
Contributor
@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kgaughan - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 4 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

Comment on lines 1 to 37
from os import path

import bs4

from uwhoisd import scraper

HERE = path.dirname(__file__)


def test_extract_zone_urls():
with open(path.join(path.dirname(__file__), "iana-root-zone.html"), encoding="utf-8") as fh:
body = bs4.BeautifulSoup(fh, "html.parser")
result = list(scraper.extract_zone_urls("http://example.com", body))
# The test zone should not appear
assert result == [
("aaa", "http://example.com/domains/root/db/aaa.html"),
("bt", "http://example.com/domains/root/db/bt.html"),
("xxx", "http://example.com/domains/root/db/xxx.html"),
]


def test_extract_zone_urls_edge_cases():
empty_body = bs4.BeautifulSoup("", "html.parser")
assert list(scraper.extract_zone_urls("http://example.com", empty_body)) == []


def test_extract_whois_server():
with open(path.join(path.dirname(__file__), "zone-info-fragment.html"), encoding="utf-8") as fh:
body = bs4.BeautifulSoup(fh, "html.parser")
result = scraper.extract_whois_server(body)
assert result == "whois.nic.abc"


def test_extract_whois_server_not_found():
body = bs4.BeautifulSoup("<html><body></body></html>", "html.parser")
result = scraper.extract_whois_server(body)
assert result is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests for scrape_whois_from_iana function

The tests cover the new helper functions well, but there are no tests for the main scrape_whois_from_iana function. Consider adding tests that mock the network requests and verify the function's behavior with different inputs and scenarios.

from unittest.mock import patch
import pytest
from uwhoisd import scraper

def test_scrape_whois_from_iana():
    with patch('uwhoisd.scraper.requests.get') as mock_get:
        mock_get.return_value.text = '<html></html>'
        result = scraper.scrape_whois_from_iana()
        assert isinstance(result, dict)
        assert len(result) > 0
        assert all(isinstance(k, str) and isinstance(v, str) for k, v in result.items())

@kgaughan kgaughan merged commit bb50859 into master Aug 22, 2024
@kgaughan kgaughan deleted the tests branch August 22, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0