8000 The Big Backslash Hunt · Issue #2332 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

The Big Backslash Hunt #2332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
22 tasks done
RedXanadu opened this issue Dec 10, 2021 · 17 comments
Closed
22 tasks done

The Big Backslash Hunt #2332

RedXanadu opened this issue Dec 10, 2021 · 17 comments

Comments

@RedXanadu
Copy link
Member
RedXanadu commented Dec 10, 2021

Motivation

After discussion at the December monthly meeting (#2291), it was decided to standardise on using \x5c to represent the backslash \ character in regular expressions. Some of the reasons we chose this method are:

  • It is portable across engines: it works with Apache, Nginx, and Coraza.
  • It works with the new regexp-assemble.py script.

Alternatives

We (I) had previously started using the pattern [\\\\] in CRS rules. This is an alternative portable backslash representation. It was decided not to adopt this method. For future reference, some of the problems with this method are:

  • It can be confusing and difficult to understand how it works.
  • It doesn't work with the new regexp-assemble.py script.
  • It doesn't work with Coraza.
  • It isn't obvious how to use it in a bracket expression, e.g. [a-zA-Z<portable-backslash>]

Plan

The plan is to tackle each rule individually with a separate PR, to make it simple to track the work and review each rule change.

I'll start by undoing the previous changes I made to move to [\\\\] and will modify those rules to use \x5c instead.

The goal is to use \x5c in all rules and rule building files. There is also a note on the documentation planning wiki page to document how and why we represent backslashes in this way for the benefit of future CRS developers.

Progress

@airween
Copy link
Contributor
airween commented Dec 11, 2021

There is only one occurrence of [\\\\]:

$ git diff
diff --git a/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf b/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf
index 5d0a5e3..e89a66c 100644
--- a/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf
+++ b/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf
@@ -178,7 +178,7 @@ SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|REQUEST_H
 #
 # [NoScript InjectionChecker] Attributes injection
 #
-SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|REQUEST_HEADERS:User-Agent|REQUEST_HEADERS:Referer|ARGS_NAMES|ARGS|XML:/* "@rx (?i)(?:\W|^)(?:javascript:(?:[\s\S]+[=\\\(\[\.<]|[\s\S]*?(?:\bname\b|[\\\\][ux]\d))|data:(?:(?:[a-z]\w+/\w[\w+-]+\w)?[;,]|[\s\S]*?;[\s\S]*?\b(?:base64|charset=)|[\s\S]*?,[\s\S]*?<[\s\S]*?\w[\s\S]*?>))|@\W*?i\W*?m\W*?p\W*?o\W*?r\W*?t\W*?(?:/\*[\s\S]*?)?(?:[\"']|\W*?u\W*?r\W*?l[\s\S]*?\()|[^-]*?-\W*?m\W*?o\W*?z\W*?-\W*?b\W*?i\W*?n\W*?d\W*?i\W*?n\W*?g[^:]*?:\W*?u\W*?r\W*?l[\s\S]*?\(" \
+SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|REQUEST_HEADERS:User-Agent|REQUEST_HEADERS:Referer|ARGS_NAMES|ARGS|XML:/* "@rx (?i)(?:\W|^)(?:javascript:(?:[\s\S]+[=\\\(\[\.<]|[\s\S]*?(?:\bname\b|\xc5[ux]\d))|data:(?:(?:[a-z]\w+/\w[\w+-]+\w)?[;,]|[\s\S]*?;[\s\S]*?\b(?:base64|charset=)|[\s\S]*?,[\s\S]*?<[\s\S]*?\w[\s\S]*?>))|@\W*?i\W*?m\W*?p\W*?o\W*?r\W*?t\W*?(?:/\*[\s\S]*?)?(?:[\"']|\W*?u\W*?r\W*?l[\s\S]*?\()|[^-]*?-\W*?m\W*?o\W*?z\W*?-\W*?b\W*?i\W*?n\W*?d\W*?i\W*?n\W*?g[^:]*?:\W*?u\W*?r\W*?l[\s\S]*?\(" \
     "id:941170,\
     phase:2,\
     block,\

which is not built but created manually. Perhaps you mean all \\\\ occurrence, not just the bounded form?

How do we want to change the other forms, eg. in case of 941330?

SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|!REQUEST_COOKIES:/_pk_ref/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx (?i:[\"'][ ]*(?:[^a-z0-9~_:' ]|in).*?(?:(?:l|\\\\u006C)(?:o|\\\\u006F)(?:c|\\\\u0063)(?:a|\\\\u0061)(?:t|\\\\u0074)(?:i|\\\\u0069)(?:o|\\\\u006F)(?:n|\\\\u006E)|(?:n|\\\\u006E)(?:a|\\\\u0061)(?:m|\\\\u006D)(?:e|\\\\u0065)|(?:o|\\\\u006F)(?:n|\\\\u006E)(?:e|\\\\u0065)(?:r|\\\\u0072)(?:r|\\\\u0072)(?:o|\\\\u006F)(?:r|\\\\u0072)|(?:v|\\\\u0076)(?:a|\\\\u0061)(?:l|\\\\u006C)(?:u|\\\\u0075)(?:e|\\\\u0065)(?:O|\\\\u004F)(?:f|\\\\u0066)).*?=)" \

What's the expected new form?

Note, I've made a small script with msc_pyparser which replaces the occurrence of \\\\ if the regular expression isn't built by regex-assemble (the script checks the data file exists or not), so you only need to care with the files under util/regexp-assemble/data/. I can fix the rest.

@RedXanadu
Copy link
Member Author

@airween Good questions.

I think both: first, undoing the changes I instigated in #2183 and then also fixing any rules that currently work on Apache but don't work on Nginx (e.g. the \\\\xyz rules you mentioned). That's been on my "todo" list for a while :)

@airween
Copy link
Contributor
airween commented Dec 12, 2021

I've made some researches, looks like replacing the \\ by the \x5c solves our problem. I do not want to share any regex101.com link (as I know those will expired after few days), but here is my test case:

(?i:[\"'][ ]*(?:[^a-z0-9~_:' ]|in).*?(?:(?:l|\x5cu006C)(?:o|\x5cu006F)(?:c|\x5cu0063)(?:a|\x5cu0061)(?:t|\x5cu0074)(?:i|\x5cu0069)(?:o|\x5cu006F)(?:n|\x5cu006E)|(?:n|\x5cu006E)(?:a|\x5cu0061)(?:m|\x5cu006D)(?:e|\x5cu0065)|(?:o|\x5cu006F)(?:n|\x5cu006E)(?:e|\x5cu0065)(?:r|\x5cu0072)(?:r|\x5cu0072)(?:o|\x5cu006F)(?:r|\x5cu0072)|(?:v|\x5cu0076)(?:a|\x5cu0061)(?:l|\x5cu006C)(?:u|\x5cu0075)(?:e|\x5cu0065)(?:O|\x5cu004F)(?:f|\x5cu0066)).*?=)

and the subject is from our test case:

"in \u0076\u0061l\u0075e\u004F\u0066=

It's same as in case of test. This subject matches with the modified pattern.

@fzipi
Copy link
Member
fzipi commented Dec 25, 2021

🤔 We should also create a wiki page with the motivation and alternatives, so it has more visibility.

@RedXanadu
Copy link
Member Author
RedXanadu commented Dec 26, 2021

@fzipi Maybe a wiki page for now, and then move it into the "Information for Developers and Contributors" section of the documentation when that is written. Do you think that's a good plan?

I wanted to put it straight into the documentation, but I think that section is a way off from being ready to be written, at the moment.

@fzipi
Copy link
Member
fzipi commented Jan 2, 2022

Sure, makes sense. I was thinking on a place were we store a list of "design decisions", so we don't forget why we did this 10 years from now 😄

@fzipi
Copy link
Member
fzipi commented Feb 5, 2022

@RedXanadu From what I can see, now what remains here is to walk the rest of the ruleset and find new occurrences, right? Do you need help for that?

@theseion
Copy link
Contributor
theseion commented Feb 5, 2022

Sure, makes sense. I was thinking on a place were we store a list of "design decisions", so we don't forget why we did this 10 years from now 😄

Possible staring point: https://github.com/coreruleset/coreruleset/wiki/Technical-Decisions-and-Best-Practices

@RedXanadu
Copy link
Member Author

@fzipi Correct. I just need to find the time to do it.

@theseion I think what you've written is great. Do you have any objections to it being put onto the documentation site, rather than the wiki? It can go on the new 'Developers' section.

@theseion
Copy link
Contributor
theseion commented Feb 6, 2022

Go ahead. I wasn't sure where to put it anyway :). Just let me know where it lives because I need to reference it from 920120.

@theseion
Copy link
Contributor

@RedXanadu we should add our names to the todo items we're working on so we don't overlap.

@RedXanadu
Copy link
Member Author

@theseion Sure.

I got up to 932100 and then got stuck. I think all of the 932xxx rules require a change to regexp-assemble in order to fix them. I think regexp-assemble puts [\\\\... into the 932xxx regular expressions because of the use of the cmdline unix option. If that can be modified in the regex assembler script then all of the 932xxx rules can be fixed, I think?

In the mean time, I'll start working from the bottom of the list up :)

@theseion
Copy link
Contributor

Ok. I'll look into that.

This was referenced Feb 21, 2022
@theseion
Copy link
Contributor
theseion commented Mar 9, 2022

@RedXanadu let me know if I an give you a hand with the remaining two PR's.

@fzipi
Copy link
Member
fzipi commented Mar 24, 2022

One task remaining only! 😱

1 similar comment
@fzipi
Copy link
Member
fzipi commented Mar 24, 2022

One task remaining only! 😱

@RedXanadu
Copy link
Member Author

All remaining use of \\\\ is either in tests or submodules, which appear to be legitimate.

As such, this task is complete! Thank you to everyone who helped submit and review PRs along the way 🎉

Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0