-
-
Notifications
You must be signed in to change notification settings - Fork 7
fix: allow accented letters in author name #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow accented letters in author name #221
Conversation
Allow accented letters (e.g. áéó, etc.) in an author's name.
WalkthroughThe changes update the commit check to support an extended range of Unicode characters in author names by refining its regular expression. Additionally, the test suite now includes a new test that verifies the behavior of the author name validation when accented characters are present. No public API declarations were modified. Changes
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
commit_check/__init__.py (1)
35-35
: LGTM! Comprehensive Unicode support added for author names.The updated regex pattern effectively supports accented letters while maintaining backward compatibility with ASCII characters and the [bot] suffix.
Consider simplifying the Unicode ranges.
The pattern includes some overlapping ranges that could be simplified:
Ā-ž
overlaps with\u0100-\u017F
- Consider using only the explicit Unicode ranges for better readability
- 'regex': r'^[A-Za-zÀ-ÖØ-öø-ÿĀ-ž\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot])', + 'regex': r'^[A-Za-zÀ-ÖØ-öø-ÿ\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot])',tests/author_test.py (1)
33-51
: Add negative test cases for accented characters.While the positive test case is good, consider adding tests for invalid scenarios:
- Mixed valid/invalid Unicode characters
- Characters outside the supported ranges
def test_check_author_with_invalid_unicode(self, mocker): # Test with characters outside the supported ranges checks = [{ "check": "author_name", "regex": r'^[A-Za-zÀ-ÖØ-öø-ÿ\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot])' }] m_get_commit_info = mocker.patch( f"{LOCATION}.get_commit_info", return_value="invalid_name_🚀" # Emoji outside supported range ) m_re_match = mocker.patch( "re.match", side_effect=lambda *args: None # Simulate no match ) retval = check_author(checks, "author_name") assert retval == FAIL
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
commit_check/__init__.py
(1 hunks)tests/author_test.py
(2 hunks)
🔇 Additional comments (1)
tests/author_test.py (1)
12-12
: LGTM! Good test data with accented characters.The test variable includes a good mix of accented characters and follows the existing naming convention.
The check failed due to my name not passing the check ... :)
|
Hi @kohanyirobert, Thanks again for your pull request! the default regex only covers the general scenario. But you can add your own Please let me know if you have any questions or suggestions. |
Agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
commit_check/__init__.py (1)
35-35
: Simplify the regex pattern by removing overlapping ranges.The pattern includes overlapping ranges that could be simplified:
Ā-ž
overlaps with\u0100-\u017F
Apply this diff to simplify the pattern while maintaining the same functionality:
- 'regex': r'^[A-Za-zÀ-ÖØ-öø-ÿĀ-ž\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot])', + 'regex': r'^[A-Za-zÀ-ÖØ-öø-ÿ\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot])',.commit-check.yml (1)
17-17
: Simplify the regex pattern to match commit_check/init.py.The pattern correctly allows accented characters in author names, which meets the PR objective. However, for consistency with the suggested changes in
commit_check/__init__.py
, consider simplifying the overlapping ranges.Apply this diff to simplify the pattern:
- regex: ^[A-Za-zÀ-ÖØ-öø-ÿĀ-ž\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot]) + regex: ^[A-Za-zÀ-ÖØ-öø-ÿ\u0100-\u017F\u0180-\u024F ,.\'-]+$|.*(\[bot])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.commit-check.yml
(1 hunks)commit_check/__init__.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
commit_check/__init__.py
36-36: Dictionary key literal 'regex'
repeated
(F601)
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #221 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 7 7
Lines 290 290
=========================================
Hits 290 290 ☔ View full report in Codecov by Sentry. |
Allow accented letters (e.g. áéó, etc.) in an author's name. I think this is quite commonplace (e.g. my name contains accented letters).
Since Python doesn't allow Unicode character class matching this needs to be added to the existing regex:
[À-ÖØ-öø-ÿĀ-ž\u0100-\u017F\u0180-\u024F]
This matches the following ranges:
À-Ö → Matches uppercase Latin letters with diacritics from À (U+00C0) to Ö (U+00D6).
Ø-ö → Matches uppercase Ø (U+00D8) to lowercase ö (U+00F6).
ø-ÿ → Matches lowercase ø (U+00F8) to ÿ (U+00FF).
Ā-ž → Matches Latin Extended-A characters from Ā (U+0100) to ž (U+017F).
\u0100-\u017F → Explicitly specifies the same range as Ā-ž in Unicode notation.
\u0180-\u024F → Matches Latin Extended-B characters, covering additional accented and special Latin characters.
This covers quite a lot I think. Also, I don't there is an argument over not having something like the default, hence the PR, or is there?
Summary by CodeRabbit