8000 fix(config): remove trailing newlines in regexes by Xe · Pull Request #373 · TecharoHQ/anubis · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix(config): remove trailing newlines in regexes #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 26, 2025
Merged

fix(config): remove trailing newlines in regexes #373

merged 1 commit into from
Apr 26, 2025

Conversation

Xe
Copy link
Contributor
@Xe Xe commented Apr 26, 2025

Closes #372

Fun YAML fact of the day:

What is the difference between how these two expressions are parsed?

foo: >
  bar
foo: >-
  bar

They are invisible in yaml, but when you evaluate them to JSON the difference is obvious:

{
  "foo": "bar\n"
}
{
  "foo": "bar"
}

User-Agent strings, URL path values, and HTTP headers do end in newlines in HTTP/1.1 wire form, but that newline is usually stripped before the server actually handles it. Also HTTP/2 is a thing and does not terminate header values with newlines.

This change makes Anubis more aggressively detect mistaken uses of the yaml > operator and nudges the user into using the yaml >- operator which does not append the trailing newline.

I had honestly forgotten about this YAML behavior because it wasn't relevant for so long. Oops! Glad I released a beta.

Whenever you get into this state, Anubis will throw a config parsing error and then give you a message hinting at the folly of your ways.

config.Bot: regular expression ends with newline (try >- instead of > in yaml)

Big thanks to https://yaml-multiline.info, this helped me realize my folly instantly.

@aiverson, this is official permission to say "told you so".

Checklist:

  • Added a description of the changes to the [Unreleased] section of docs/docs/CHANGELOG.md
  • Added test cases to the relevant parts of the codebase
  • Ran integration tests npm run test:integration (unsupported on Windows, please use WSL)

< 8000 div data-view-component="true" class="comment-reactions js-reactions-container js-reaction-buttons-container social-reactions reactions-container d-flex">
Closes #372

Fun YAML fact of the day:

What is the difference between how these two expressions are parsed?

```yaml
foo: >
  bar
```

```yaml
foo: >-
  bar
```

They are invisible in yaml, but when you evaluate them to JSON the
difference is obvious:

```json
{
  "foo": "bar\n"
}
```

```json
{
  "foo": "bar"
}
```

User-Agent strings, URL path values, and HTTP headers _do_ end in
newlines in HTTP/1.1 wire form, but that newline is usually stripped
before the server actually handles it. Also HTTP/2 is a thing and does
not terminate header values with newlines.

This change makes Anubis more aggressively detect mistaken uses of the
yaml `>` operator and nudges the user into using the yaml `>-` operator
which does not append the trailing newline.

I had honestly forgotten about this YAML behavior because it wasn't
relevant for so long. Oops! Glad I released a beta.

Whenever you get into this state, Anubis will throw a config parsing
error and then give you a message hinting at the folly of your ways.

```
config.Bot: regular expression ends with newline (try >- instead of > in yaml)
```

Big thanks to https://yaml-multiline.info, this helped me realize my
folly instantly.

@aiverson, this is official permission to say "told you so".

Signed-off-by: Xe Iaso <me@xeiaso.net>
@Xe Xe requested a review from Copilot April 26, 2025 13:56
@Xe Xe self-assigned this Apr 26, 2025
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prevents configuration issues caused by trailing newlines in regex definitions by detecting and rejecting regexes that end with newline characters, nudging users to use the ">-" YAML operator.

  • Added explicit checks for trailing newline characters in regex values (user agent, path, and headers) in config.go.
  • Updated YAML test data and production configurations to use the ">-" YAML operator.
  • Adjusted regex compilation in checker.go to trim whitespace.

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
lib/policy/config/testdata/bad/regex_ends_newline.yaml Test data demonstrating regexes that end with a newline
lib/policy/config/config.go Added validations to detect regexes ending with newline
lib/policy/checker.go Updated regex compilation to trim whitespace before compiling
docs/docs/CHANGELOG.md Documented the change regarding trailing newlines in regexes
data/bots/ai-robots-txt.yaml Updated YAML regex scalar to use ">-" instead of ">"
data/botPolicies.yaml Updated YAML regex scalar to use ">-" instead of ">"
Files not reviewed (2)
  • data/botPolicies.json: Language not supported
  • lib/policy/config/testdata/bad/regex_ends_newline.json: Language not supported
Comments suppressed due to low confidence (2)

lib/policy/checker.go:113

  • Using strings.TrimSpace removes all leading and trailing whitespace, which may inadvertently modify regexes that are intended to have specific leading or trailing spaces. Consider trimming only the trailing newline (or explicitly checking for '\n') instead of all whitespace.
rex, err := regexp.Compile(strings.TrimSpace(rexStr))

lib/policy/checker.go:138

  • Trimming all whitespace from the regex string might remove intentional formatting. It may be preferable to strip only trailing newline characters to align with the PR's intent.
rex, err := regexp.Compile(strings.TrimSpace(rexStr))

@Xe Xe enabled auto-merge (squash) April 26, 2025 13:58
@Xe Xe merged commit ef52550 into main Apr 26, 2025
4 checks passed
@boydaihungst
Copy link
boydaihungst commented Apr 26, 2025

Can't update from beta1 to beta2 on Ubuntu amd64. Got this error:

Get:1 /home/huyhoang/anubis_1.17.0-beta2_amd64.deb anubis amd64 .17.0-beta2 [4,642 kB]
dpkg: error processing archive /home/huyhoang/anubis_1.17.0-beta2_amd64.deb (--unpack):
 parsing file '/var/lib/dpkg/tmp.ci/control' near line 2 package 'anubis':
 'Version' field value '.17.0-beta2': version number does not start with digit
Errors were encountered while processing:
 /home/huyhoang/anubis_1.17.0-beta2_amd64.deb
N: Download is performed unsandboxed as root as file '/home/huyhoang/anubis_1.17.0-beta2_amd64.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
E: Sub-process /usr/bin/dpkg returned an error code (1)

@Xe
Copy link
Contributor Author
Xe commented Apr 26, 2025

https://github.com/TecharoHQ/anubis/releases/tag/v1.17.0-beta3 <- fixed packages will show up here

TODO: make sure that packages are installable in a CI test

@Xe
Copy link
Contributor Author
Xe commented Apr 26, 2025

or not, ugh, did i somehow break git tag resolution logic by having two tags point to the same commit?

@boydaihungst
Copy link

Yep, I just noticed it.

@Xe
Copy link
Contributor Author
Xe commented Apr 26, 2025

https://github.com/TecharoHQ/anubis/releases/tag/v1.17.0-beta4 this is actually fixed including a summary of the problem. I will develop automation for this in the near future.

Xe added a commit to TecharoHQ/yeet that referenced this pull request Apr 26, 2025
Before this assumed that all tags started with `v` and cut off the v:

```go
// current version: v1.2.3
fmt.Println(internal.GitVersion()) // 1.2.3
```

This fails horribly when versions do not start with `v`:

```go
// current version: 1.2.3
fmt.Println(internal.GitVersion()) // .2.3
```

This breaks Debian's package manager[0] horribly because `.2.3` is not a
valid version number in Debian land.

There needs to be additional validation for version numbers pre-build
time, but for now it's easy enough to make sure that this exact failure
case doesn't happen again.

[0] TecharoHQ/anubis#373 (comment)

Signed-off-by: Xe Iaso <me@xeiaso.net>
Xe added a commit to TecharoHQ/yeet that referenced this pull request Apr 26, 2025
Before this assumed that all tags started with `v` and cut off the v:

```go
// current version: v1.2.3
fmt.Println(internal.GitVersion()) // 1.2.3
```

This fails horribly when versions do not start with `v`:

```go
// current version: 1.2.3
fmt.Println(internal.GitVersion()) // .2.3
```

This breaks Debian's package manager[0] horribly because `.2.3` is not a
valid version number in Debian land.

There needs to be additional validation for version numbers pre-build
time, but for now it's easy enough to make sure that this exact failure
case doesn't happen again.

[0] TecharoHQ/anubis#373 (comment)

Signed-off-by: Xe Iaso <me@xeiaso.net>
JasonLovesDoggo pushed a commit to JasonLovesDoggo/anubis that referenced this pull request Jun 17, 2025
Closes TecharoHQ#372

Fun YAML fact of the day:

What is the difference between how these two expressions are parsed?

```yaml
foo: >
  bar
```

```yaml
foo: >-
  bar
```

They are invisible in yaml, but when you evaluate them to JSON the
difference is obvious:

```json
{
  "foo": "bar\n"
}
```

```json
{
  "foo": "bar"
}
```

User-Agent strings, URL path values, and HTTP headers _do_ end in
newlines in HTTP/1.1 wire form, but that newline is usually stripped
before the server actually handles it. Also HTTP/2 is a thing and does
not terminate header values with newlines.

This change makes Anubis more aggressively detect mistaken uses of the
yaml `>` operator and nudges the user into using the yaml `>-` operator
which does not append the trailing newline.

I had honestly forgotten about this YAML behavior because it wasn't
relevant for so long. Oops! Glad I released a beta.

Whenever you get into this state, Anubis will throw a config parsing
error and then give you a message hinting at the folly of your ways.

```
config.Bot: regular expression ends with newline (try >- instead of > in yaml)
```

Big thanks to https://yaml-multiline.info, this helped me realize my
folly instantly.

@aiverson, this is official permission to say "told you so".

Signed-off-by: Xe Iaso <me@xeiaso.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default botPolicy rule for Opera|Mozilla doesn't detect Opera
2 participants
0