Description
Issue for tracking original pull request created by user allanrbo on date 2020-03-03 01:55:23.
Link to original PR: SpiderLabs/owasp-modsecurity-crs#1708.
HEAD is: 1c1ca6aaa2077d78fd3c6f0a03cef29fd14e5b85
BASE is: 8fe1a39
On the Slack channel 2020-03-02, Airween pointed out some PCRE performance issues with requests containing long consecutive strings of digits. He suggested a fix using the (*SKIP) flag. Since this flag is PCRE specific, I'd like to avoid it, in order to pave the way for modsec-alternatives that use regex libraries with guarenteed linear run time (RE2, Hyperscan, and Go's Regexp, for example).
This PR relates is a suggestion to modify 941120, 942210, and 942260.
Here's my understanding of the issue: The regexes in these rules all have the option to begin matching with a repeated number of digits, such as [0-9]+
. For example the regex [0-9]+a
would match 11111a
, 111111111a
, 11111111111111a
, etc. Now the problem arises with for example a string like 11111
(without an a
). In this example, PCRE has consumed 11111
and then looks for a
, doesn't find it, and therefore backtracks to try instead with just 1111
, then 111
, then 11
, and then finally try with 1
. To avoid this backtracking, we can instead anchor the regex before the [0-9]+
, to either the beginning of the string or some character that is not a digit. The fixed example then becomes (?:^|[^0-9])[0-9]+a
.
Mirko suggested below a much cleaner approach than my first approach: simply don't bother with the quantifier. Updated the PR with this cleaner approach.
Tested manually with a large request as suggested by Airween, and verified that now perform many orders of magnitude faster.
There is also a perf issue with 942130, but I will send a PR for a suggestion to fix this separately, as I have (by chance) been working on some changes to this rule for some weeks.