8000 GitHub · Where software is built
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Fix regex patterns that look for multi-byte characters #3325
Open
@RedXanadu

Description

@RedXanadu

Describe the bug

We need to agree on a consistent way to reference multi-byte characters in regular expression patterns (and we can enforce this via rules-check.py.)

Our current approach to detect multi-byte characters (i.e. Unicode code points above \xFF) is not working as intended, in the few rules where we do this. Our current approach of simply putting the multi-byte UTF-8 character in a regular expression pattern in a rule file causes false positives with non-Latin scripts (e.g. rule 942430: see #3284 for a real user's false positive example).

Example: Let's say that we want to match using the pattern [abc’] (that last character is Unicode character U+2019, "RIGHT SINGLE QUOTATION MARK"):

@rx [abc’]

That pattern, as saved to the rule file, is saved as (byte for byte):

[abc\xE2\x80\x99]

and is seemingly interpreted one byte at a time (so the multibyte char looses its meaning). So, suddenly, any content containing, for example, the byte \xE2 will match (which is many, many UTF-8 encoded Unicode characters).

  • Possible approach 1: SecRule ARGS "@rx (*UTF8)[abc\x{2019}]"
    • Probably PCRE-specific? Coraza and other engines might hate it. Without the UTF8 'verb' it isn't possible to use \x{2019} (limited to max of \x{ff}).
  • Possible approach 2: SecRule ARGS "@rx [abc]|%u2019" ... t:none,t:utf8toUnicode...
    • The portable option. We could agree to always use t:utf8toUnicode for any rules that need to match Unicode characters above \xFF.
      • Team Coraza confirms that t:utf8toUnicode has been implemented, so this approach is Coraza-friendly too.

Further discussion that was had on Slack: https://owasp.slack.com/archives/CBKGH8A5P/p1694012219559839

Metadata

Metadata

Assignees

Labels

🐛 bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0