8000 Perf issue with regexes that start with repeating digits · Issue #1708 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Perf issue with regexes that start with repeating digits #1708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CRS-migration-bot opened this issue May 13, 2020 · 3 comments
Closed

Perf issue with regexes that start with repeating digits #1708

CRS-migration-bot opened this issue May 13, 2020 · 3 comments
Labels
⌛ Stale issue This issue has been open 120 days with no activity.

Comments

@CRS-migration-bot
Copy link

Issue for tracking original pull request created by user allanrbo on date 2020-03-03 01:55:23.
Link to original PR: SpiderLabs/owasp-modsecurity-crs#1708.

HEAD is: 1c1ca6aaa2077d78fd3c6f0a03cef29fd14e5b85
BASE is: 8fe1a39
On the Slack channel 2020-03-02, Airween pointed out some PCRE performance issues with requests containing long consecutive strings of digits. He suggested a fix using the (*SKIP) flag. Since this flag is PCRE specific, I'd like to avoid it, in order to pave the way for modsec-alternatives that use regex libraries with guarenteed linear run time (RE2, Hyperscan, and Go's Regexp, for example).

This PR relates is a suggestion to modify 941120, 942210, and 942260.

Here's my understanding of the issue: The regexes in these rules all have the option to begin matching with a repeated number of digits, such as [0-9]+. For example the regex [0-9]+a would match 11111a, 111111111a, 11111111111111a, etc. Now the problem arises with for example a string like 11111 (without an a). In this example, PCRE has consumed 11111 and then looks for a, doesn't find it, and therefore backtracks to try instead with just 1111, then 111, then 11, and then finally try with 1. To avoid this backtracking, we can instead anchor the regex before the [0-9]+, to either the beginning of the string or some character that is not a digit. The fixed example then becomes (?:^|[^0-9])[0-9]+a.

Mirko suggested below a much cleaner approach than my first approach: simply don't bother with the quantifier. Updated the PR with this cleaner approach.

Tested manually with a large request as suggested by Airween, and verified that now perform many orders of magnitude faster.

There is also a perf issue with 942130, but I will send a PR for a suggestion to fix this separately, as I have (by chance) been working on some changes to this rule for some weeks.

@fzipi
Copy link
Member
fzipi commented May 19, 2020

Hey @allanrbo! As you can see, we migrated our repo to a new location.

Could you please create your PR using this repo now?

Thanks!

lifeforms added a commit that referenced this issue Jun 10, 2020
Decrease processing time of rules, fixes #1665 #1708
@github-actions
Copy link
Contributor

This issue has been open 120 days with no activity. Remove the stale label or comment, or this will be closed in 14 days

@github-actions github-actions bot added the ⌛ Stale issue This issue has been open 120 days with no activity. label Sep 17, 2020
@fzipi
Copy link
Member
fzipi commented Sep 17, 2020

Fixed in #1791

@fzipi fzipi closed this as completed Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⌛ Stale issue This issue has been open 120 days with no activity.
Projects
None yet
Development

No branches or pull requests

3 participants
0