-
-
Notifications
You must be signed in to change notification settings - Fork 220
fix(config): remove trailing newlines in regexes #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Closes #372 Fun YAML fact of the day: What is the difference between how these two expressions are parsed? ```yaml foo: > bar ``` ```yaml foo: >- bar ``` They are invisible in yaml, but when you evaluate them to JSON the difference is obvious: ```json { "foo": "bar\n" } ``` ```json { "foo": "bar" } ``` User-Agent strings, URL path values, and HTTP headers _do_ end in newlines in HTTP/1.1 wire form, but that newline is usually stripped before the server actually handles it. Also HTTP/2 is a thing and does not terminate header values with newlines. This change makes Anubis more aggressively detect mistaken uses of the yaml `>` operator and nudges the user into using the yaml `>-` operator which does not append the trailing newline. I had honestly forgotten about this YAML behavior because it wasn't relevant for so long. Oops! Glad I released a beta. Whenever you get into this state, Anubis will throw a config parsing error and then give you a message hinting at the folly of your ways. ``` config.Bot: regular expression ends with newline (try >- instead of > in yaml) ``` Big thanks to https://yaml-multiline.info, this helped me realize my folly instantly. @aiverson, this is official permission to say "told you so". Signed-off-by: Xe Iaso <me@xeiaso.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR prevents configuration issues caused by trailing newlines in regex definitions by detecting and rejecting regexes that end with newline characters, nudging users to use the ">-" YAML operator.
- Added explicit checks for trailing newline characters in regex values (user agent, path, and headers) in config.go.
- Updated YAML test data and production configurations to use the ">-" YAML operator.
- Adjusted regex compilation in checker.go to trim whitespace.
Reviewed Changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
lib/policy/config/testdata/bad/regex_ends_newline.yaml | Test data demonstrating regexes that end with a newline |
lib/policy/config/config.go | Added validations to detect regexes ending with newline |
lib/policy/checker.go | Updated regex compilation to trim whitespace before compiling |
docs/docs/CHANGELOG.md | Documented the change regarding trailing newlines in regexes |
data/bots/ai-robots-txt.yaml | Updated YAML regex scalar to use ">-" instead of ">" |
data/botPolicies.yaml | Updated YAML regex scalar to use ">-" instead of ">" |
Files not reviewed (2)
- data/botPolicies.json: Language not supported
- lib/policy/config/testdata/bad/regex_ends_newline.json: Language not supported
Comments suppressed due to low confidence (2)
lib/policy/checker.go:113
- Using strings.TrimSpace removes all leading and trailing whitespace, which may inadvertently modify regexes that are intended to have specific leading or trailing spaces. Consider trimming only the trailing newline (or explicitly checking for '\n') instead of all whitespace.
rex, err := regexp.Compile(strings.TrimSpace(rexStr))
lib/policy/checker.go:138
- Trimming all whitespace from the regex string might remove intentional formatting. It may be preferable to strip only trailing newline characters to align with the PR's intent.
rex, err := regexp.Compile(strings.TrimSpace(rexStr))
Can't update from beta1 to beta2 on Ubuntu amd64. Got this error:
|
https://github.com/TecharoHQ/anubis/releases/tag/v1.17.0-beta3 <- fixed packages will show up here TODO: make sure that packages are installable in a CI test |
or not, ugh, did i somehow break git tag resolution logic by having two tags point to the same commit? |
Yep, I just noticed it. |
https://github.com/TecharoHQ/anubis/releases/tag/v1.17.0-beta4 this is actually fixed including a summary of the problem. I will develop automation for this in the near future. |
Before this assumed that all tags started with `v` and cut off the v: ```go // current version: v1.2.3 fmt.Println(internal.GitVersion()) // 1.2.3 ``` This fails horribly when versions do not start with `v`: ```go // current version: 1.2.3 fmt.Println(internal.GitVersion()) // .2.3 ``` This breaks Debian's package manager[0] horribly because `.2.3` is not a valid version number in Debian land. There needs to be additional validation for version numbers pre-build time, but for now it's easy enough to make sure that this exact failure case doesn't happen again. [0] TecharoHQ/anubis#373 (comment) Signed-off-by: Xe Iaso <me@xeiaso.net>
Before this assumed that all tags started with `v` and cut off the v: ```go // current version: v1.2.3 fmt.Println(internal.GitVersion()) // 1.2.3 ``` This fails horribly when versions do not start with `v`: ```go // current version: 1.2.3 fmt.Println(internal.GitVersion()) // .2.3 ``` This breaks Debian's package manager[0] horribly because `.2.3` is not a valid version number in Debian land. There needs to be additional validation for version numbers pre-build time, but for now it's easy enough to make sure that this exact failure case doesn't happen again. [0] TecharoHQ/anubis#373 (comment) Signed-off-by: Xe Iaso <me@xeiaso.net>
Closes TecharoHQ#372 Fun YAML fact of the day: What is the difference between how these two expressions are parsed? ```yaml foo: > bar ``` ```yaml foo: >- bar ``` They are invisible in yaml, but when you evaluate them to JSON the difference is obvious: ```json { "foo": "bar\n" } ``` ```json { "foo": "bar" } ``` User-Agent strings, URL path values, and HTTP headers _do_ end in newlines in HTTP/1.1 wire form, but that newline is usually stripped before the server actually handles it. Also HTTP/2 is a thing and does not terminate header values with newlines. This change makes Anubis more aggressively detect mistaken uses of the yaml `>` operator and nudges the user into using the yaml `>-` operator which does not append the trailing newline. I had honestly forgotten about this YAML behavior because it wasn't relevant for so long. Oops! Glad I released a beta. Whenever you get into this state, Anubis will throw a config parsing error and then give you a message hinting at the folly of your ways. ``` config.Bot: regular expression ends with newline (try >- instead of > in yaml) ``` Big thanks to https://yaml-multiline.info, this helped me realize my folly instantly. @aiverson, this is official permission to say "told you so". Signed-off-by: Xe Iaso <me@xeiaso.net>
Closes #372
Fun YAML fact of the day:
What is the difference between how these two expressions are parsed?
They are invisible in yaml, but when you evaluate them to JSON the difference is obvious:
User-Agent strings, URL path values, and HTTP headers do end in newlines in HTTP/1.1 wire form, but that newline is usually stripped before the server actually handles it. Also HTTP/2 is a thing and does not terminate header values with newlines.
This change makes Anubis more aggressively detect mistaken uses of the yaml
>
operator and nudges the user into using the yaml>-
operator which does not append the trailing newline.I had honestly forgotten about this YAML behavior because it wasn't relevant for so long. Oops! Glad I released a beta.
Whenever you get into this state, Anubis will throw a config parsing error and then give you a message hinting at the folly of your ways.
Big thanks to https://yaml-multiline.info, this helped me realize my folly instantly.
@aiverson, this is official permission to say "told you so".
Checklist:
[Unreleased]
section of docs/docs/CHANGELOG.mdnpm run test:integration
(unsupported on Windows, please use WSL)