8000 fix: prevent invalid commands matches on 5 characters or less (932220 PL-2, 932230 PL-1, 932232 PL-3, 932235 PL-1, 932236 PL-2, 932237 PL-3, 932238 PL-3, 932239 PL-2, 932250 PL-1, 932260 PL-1) by EsadCetiner · Pull Request #3735 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: prevent invalid commands matches on 5 characters or less (932220 PL-2, 932230 PL-1, 932232 PL-3, 932235 PL-1, 932236 PL-2, 932237 PL-3, 932238 PL-3, 932239 PL-2, 932250 PL-1, 932260 PL-1) #3735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 85 commits into from
Feb 10, 2025

Conversation

EsadCetiner
Copy link
Member
@EsadCetiner EsadCetiner commented Jun 18, 2024

There have been many reports of false positives with the 932260 family of rules, most of them stem from invalid commands being matched (such as identity matching id). This pr solves most of these issues by preventing matches of invalid commands on 5 characters or less, I didn't prevent invalid matches for all commands to avoid the regex size exploding in size. I've modified the definition of @ from [\s<>&|)] to (?:[\s<>&|)]|$) and adding @ to all commands with 5 characters or less. This results in either matching a command without arguments, or an command with an argument without matching permutations of a commands (For example, sudo matching sudoaaaa).

Since this is a big change, I likely may have missed some commands that were intended to be matched (For example, ip matching iptables and ip6tables) but were shortened for performance or other reasons.

Some new attacks are detected with this PR, but some false negatives are introduced. The unix rules need a complete overhaul however this PR should be fine as an short to medium term fix.

Fixes false positives with english words:
identity
unique
time express

This should fix false positives with random data such as UUID, session cookies, tokens, and base64 data.

This might cause some new false positives because some commands are being matched with no arguments when they previously weren't, but this shouldn't be as bad as the false positives it fixes. I've excluded some commands to try and prevent the most common instances of these false positives (including some old ones, for example matching id=1).

closes: #3711
closes: #3631
closes: #3725
closes: #3812
closes: #3985

EsadCetiner requested a review from theseion June 18, 2024 09:38
@theseion
Copy link
Contributor

I'll need a couple of days before I can look at this PR in detail.

@fzipi
Copy link
Member
fzipi commented Jun 20, 2024

Since this is a big change, I likely may have missed some commands that were intended to be matched (For example, ip matching iptables and ip6tables) but were shortened for better performance.

Just as a note: ip is a command, not a shortened version.

@fzipi
Copy link
Member
fzipi commented Jun 20, 2024

(?:[\s<>&|)]|$)

Maybe using a word boundary instead? Like (?:[\s<>&|)])?\b.

@EsadCetiner
Copy link
Member Author
EsadCetiner commented Jun 22, 2024

@fzipi

Maybe using a word boundary instead? Like (?:[\s<>&|)])?\b

I've further simplified this by just using a word boundry and getting rid of the optional non-capturing group. The effect is pretty much the same, detection might be slightly better, but the generated regexes have reduced in size dramatically with this change (especially 932236).

For some reason only using a word boundry for rule 932237 crashes Apache, I couldn't figure out the cause so I just added a workaround and comment.

@theseion
Copy link
Contributor
theseion commented Jun 29, 2024

Thank you very much @EsadCetiner for this PR. I think the idea of looking for word boundaries is great. What I don't like is that we now need to add @ everywhere. It was bad enough already.

I have an idea for a different approach that I think makes more sense in the long run. I think we should simplify the command line processor in crs-toolchain and remove the logic for @ and ~. Instead, a block processed with the command line processor should end with a \b. Consider the following block:

##!> cmdline unix
  gcc
  sudo
##!<

The result would look as follows (simplified):

(?:gcc|sudo)\b

This still leaves ~. I think we can take care of that by using \w+ instead. Consider the following:

##!> cmdline unix
  python\w+
##!<

This would result in the following (simplified):

(?:python(?:\w+))\b

This would match python2 and python3, for example, but not python . I would make these replacements in the list of commands directly, as there are only a couple of them anyway.

In conclusion:

  • we would get rid of @ and ~
  • we could remove the replacements for @ and ~ in the assembly files for rules that don't use the command line processor
  • we would fix most of the false positives that have been reported over recent weeks where a command is matched as a prefix
  • we would reduce the sizes of many regular expressions significantly

@EsadCetiner Please open a separate PR with the additions (I saw a couple svn related commands).

@theseion
Copy link
Contributor

I realise your PR solves an issue we have now and will prevent more issues from being filed. I propose to go ahead with your PR and implement my proposal later on.

To complete your PR, @EsadCetiner, you'll have to also adapt toolchain.yaml.

@EsadCetiner
Copy link
Member Author

@theseion

I have an idea for a different approach that I think makes more sense in the long run. I think we should simplify the command line processor in crs-toolchain and remove the logic for @ and ~. Instead, a block processed with the command line processor should end with a \b.

Agreed, I had a general idea of something like this but I wasn't sure on how to approach this.

I realise your PR solves an issue we have now and will prevent more issues from being filed. I propose to go ahead with your PR and implement my proposal later on.

4.5.0 is coming out in 3 weeks or so, I'll try and address all of your feedback before then. I'd rather get it out of the way now so it's not forgotten.

Please open a separate PR with the additions (I saw a couple svn related commands).

These commands are already detected in the current release, but they stopped being detected once I prevented the invalid match. Shouldn't those additions be part of this PR so those commands are still covered, therefore avoiding some regressions?

To complete your PR, @EsadCetiner, you'll have to also adapt toolchain.yaml.

I'm not entirely clear on what this does, I see what looks to be@ here, but I don't see ~.

  anti_evasion_suffix:
    # - <>: redirection, e.g., `cat<foo`
    # - ,: brace expansion, e.g., `""{nc,-p,777}`
    ## - &|: logical operators in headers, e.g., `a=nc&&$a -nlvp 555`
    unix: |
      [\s<>,&|)].*

I assume you want me to modify the unix evasion suffix to this, then remove all references to @.

    unix: |
-      [\s<>,&|)].*
+      \b

I think your saying here:

Instead, a block processed with the command line processor should end with a \b

This still leaves ~. I think we can take care of that by using \w+ instead.

To modify the affected regex assembly files to something like this, while keeping ~:

##!> assemble
  ##!> cmdline unix
    ##!> include-except unix-shell-upto3 unix-shell-fps-pl1 -- ~ \w+
##!<

I hope this wasn't too convoluted, is this the general idea of what you want me to do?

@theseion
Copy link
Contributor
theseion commented Jul 2, 2024

These commands are already detected in the current release, but they stopped being detected once I prevented the invalid match. Shouldn't those additions be part of this PR so those commands are still covered, therefore avoiding some regressions?

Yes.

I'm not entirely clear on what this does, I see what looks to be@ here, but I don't see ~.

~ is anti_evasion_no_space_suffix. Don't replace the wildcard match .*.

I assume you want me to modify the unix evasion suffix to this, then remove all references to @.

Modify, yes. We still need @, @ will tell the toolchain to add that suffix.

I think your saying here:

Instead, a block processed with the command line processor should end with a \b

This still leaves ~. I think we can take care of that by using \w+ instead.

To modify the affected regex assembly files to something like this, while keeping ~:

##!> assemble
##!> cmdline unix
##!> include-except unix-shell-upto3 unix-shell-fps-pl1 -- ~ \w+
##!<

You can ignore that part. That's my proposal for the change of the processor. It has nothing to do with your PR.

@EsadCetiner
Copy link
Member Author

@theseion done, ready for review

@EsadCetiner
Copy link
Member Author

I've gone back to my original solution of resolving this false positive without a word boundry (?:[\s<>&|)]|$), the word boundry was causing a fair few false positives. Added tests to make sure those false positives don't come up again.

@theseion
8000 Copy link
Contributor

Do you think the \b makes sense for ~? Could you give me an example where \b makes a difference?

@EsadCetiner
Copy link
Member Author

@theseion

\b is needed to prevent matching permutations of commands beyond 5 characters when ~ is used, it acts similar to adding @ to a command. The quantifier is primarily there since I'm not sure how long the permutations of some commands can be, I'm just being conservative with causing too many false negatives. I can remove it if you think it's redundant.

I've checked over my work again this morning and I think this PR is ready to be merged if nobody else has objections.

@theseion
Copy link
Contributor

Ok. I think it makes sense but I think 5 might be too conservative. For example, docker-compose would require 7. How about we increase it to 10? That should suffice for almost anything.

Please also update the anti_evasion_no_space_suffix definitions for both windows and unix to use the same restriction.

@EsadCetiner
Copy link
Member Author
EsadCetiner commented Jan 14, 2025

@theseion these look machine generated, is there another file I'm supposed to edit then compile the regex or should I just replace them completely? It's hard to read the regex and understand what it's trying to match.

@theseion
Copy link
Contributor

No, I created them manually. But you're right. I'll handle it.

Match at most 10 consecutive characters
@theseion theseion force-pushed the fix-invalid-command-matches branch from 256d481 to 20ecdd8 Compare January 15, 2025 19:43
@theseion
Copy link
Contributor

I've updated the expressions @EsadCetiner, please take a look. The expressions mainly consist of the anti_evasion and anti_evation_suffix patterns, with a few additions. Once you see that, they're not hard to understand.

@EsadCetiner
Copy link
Member Author

@theseion I think I have a better understanding of the regex now, the extra comments helped clear up some confusion.

I've taken a look at your changes and found a regression with the anti_evasion_no_space_suffix for UNIX, it's not doing what it says in the tin (It's now matching python space when it shouldn't be).

I've pushed a fix, but the fix felt a little too easy, can you double check it?

@theseion
Copy link
Contributor

Duh. You're right. \s mustn't be part of that group. That's exactly why we have the ~ suffix :)

@theseion
Copy link
Contributor

LGTM

@theseion
Copy link
Contributor
theseion commented Feb 6, 2025

@EsadCetiner are you waiting for me to perform the merge?

@EsadCetiner
Copy link
Member Author

@theseion no, I was waiting for somebody else to review it since it's a big change. I'll merge in a few days if nobody has any objections.

@EsadCetiner EsadCetiner added this pull request to the merge queue Feb 10, 2025
Merged via the queue into coreruleset:main with commit 26bec41 Feb 10, 2025
6 checks passed
@EsadCetiner EsadCetiner deleted the fix-invalid-command-matches branch February 10, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
0