8000 Rule 941310: false positive for Russian letters "м" and "о" · Issue #1942 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Rule 941310: false positive for Russian letters "м" and "о" #1942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Dec 4, 2020 · 14 comments · Fixed by #2107
Closed

Rule 941310: false positive for Russian letters "м" and "о" #1942

ghost opened this issue Dec 4, 2020 · 14 comments · Fixed by #2107
Assignees
Labels
➕ False Positive PR available this issue is referenced by an active pull request

Comments

@ghost
Copy link
ghost commented Dec 4, 2020

Description

I just enter a word from all the letters of the Russian alphabet and get a false positive.

  • абвгдеёжзийклмнопрстуфхцчшщъыэюя – false positive
  • абвгдеёжзийклнопрстуфхцчшщъыэюя – true negative
  • абвгдеёжзийклмнпрстуфхцчшщъыэюя – true negative

I decided to check the whole alphabet after I started noticing problems with some Russian sentences. As it turned out, the problem is in specific letters

¼ = bc
¾ = be

м = d0 bc
о = d0 be

Audit Logs / Triggered Rule Numbers

"response":{"body":"<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n","http_code":403,"headers":{"Server":"","Server":"","Date":"Fri, 04 Dec 2020 08:50:05 GMT","Content-Length":"146","Content-Type":"text/html","Access-Control-Allow-Origin":"*","Connection":"close","Access-Control-Allow-Credentials":"true","Strict-Transport-Security":"max-age=15724800; includeSubDomains"}},"producer":{"modsecurity":"ModSecurity v3.0.4 (Linux)","connector":"ModSecurity-nginx v1.0.1","secrules_engine":"Enabled","components":["OWASP_CRS/3.3.0\""]},"messages":[{"message":"US-ASCII Malformed Encoding XSS Filter - Attack Detected","details":{"match":"Matched \"Operator `Rx' with parameter `\\xbc[^\\xbe>]*[\\xbe>]|<[^\\xbe]*\\xbe' against variable `ARGS:json.description' (Value: `\\xd0\\xb0\\xd0\\xb1\\xd0\\xb2\\xd0\\xb3\\xd0\\xb4\\xd0\\xb5\\xd1\\x91\\xd0\\xb6\\xd0\\xb7\\xd0\\xb8\\xd0\\xb9\\xd0\\xba\\xd0 (156 characters omitted)' )","reference":"o27,5v17,64t:urlDecodeUni,t:lowercase,t:urlDecode,t:htmlEntityDecode,t:jsDecode","ruleId":"941310","file":"/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf","lineNumber":"527","data":"Matched Data: но found within ARGS:json.description: абвгдеёжзийклмнопрстуфхцчшщъыэюя","severity":"2","ver":"OWASP_CRS/3.3.0","rev":"","tags":["application-multi","language-multi","platform-tomcat","attack-xss","paranoia-level/1","OWASP_CRS","capec/1000/152/242"],"maturity":"0","accuracy":"0"}},{"message":"Inbound Anomaly Score Exceeded (Total Score: 5)","details":{"match":"Matched \"Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `5' )","reference":"","ruleId":"949110","file":"/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf","lineNumber":"80","data":"","severity":"2","ver":"OWASP_CRS/3.3.0","rev":"","tags":["application-multi","language-multi","platform-multi","attack-generic"],"maturity":"0","accuracy":"0"}}]}

Your Environment

  • CRS version (e.g., v3.2.0): v3.3.0
  • Paranoia level setting: 1
  • ModSecurity version (e.g., 2.9.3): v3.0.4
  • Image: k8s.gcr.io/ingress-nginx/controller:v0.41.2

Confirmation

[X] I have removed any personal data (email addresses, IP addresses,
passwords, domain names) from any logs posted.

@ghost ghost added the ➕ False Positive label Dec 4, 2020
@ghost
8000 Copy link
Author
ghost commented Dec 4, 2020

Also #1645

@ghost ghost changed the title Rule 941310: false positive for a Russian letter "м" Rule 941310: false positive for a Russian letters "м" and "о" Dec 4, 2020
@ghost ghost changed the title Rule 941310: false positive for a Russian letters "м" and "о" Rule 941310: false positive for Russian letters "м" and "о" Dec 4, 2020
@franbuehler franbuehler self-assigned this Dec 21, 2020
@franbuehler
Copy link
Contributor
franbuehler commented Feb 17, 2021

As already stated in the linked #1645 the same problem exists for german Umlaute:
Ü and ä for example: Über den Wolken gehört...

[2021-02-16 15:33:52.762256] [-:error] 1.2.3.4:36123 XXXXXXXXXXXXXXXXXX [client 2.2.3.4] ModSecurity: Warning. Pattern match "\\\\xbc[^\\\\xbe>]*[\\\\xbe>]\|<[^\\\\xbe]*\\\\xbe" at ARGS:myarg. [file "/xyz/httpd/modsecurity/crs/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"] [line "546"] [id "941310"] [msg "US-ASCII Malformed Encoding XSS Filter - Attack Detected"] [data "Matched Data: \\xbcber den Wolken geh\\xc3\\xb6rt ...

I do not know how to easily prevent these false positives. But I think this rule is too strict and definitely does not belong to PL 1.
So I see the following possibilities:

  • Move the rule to a higher PL
  • Eliminate the rule entirely
  • Extend the rule so that a script must follow like @theseion proposed in the linked issue above.

My suggestion would be to move the rule to PL 2 or even 3.

What do you think?

@franbuehler franbuehler linked a pull request Feb 18, 2021 that will close this issue
@franbuehler franbuehler added the PR available this issue is referenced by an active pull request label Feb 18, 2021
@franbuehler
Copy link
Contributor

It's annoying but I can not reproduce this behavior!
All I can say is that sometimes something (what?) encodes the German Umlaut ü as a \\xc3\\xbc instead of \\xfc and this triggers this rule.
When I want to reproduce the call with exactly the same string, it's correctly encoded as \\xfc. I see the string that triggered the rule in the logs and in the UI of the application (I see the resulting and saved string there).

What I found is this explanation here:
https://www.python-forum.de/viewtopic.php?t=18464

ist die UTF-8-Kodierung eines Umlauts "ü", der nach der Unicode-Tabelle den Codepoint (so etwas wie eine laufende Nummer) 252 hat. Diese Nummer entspricht "zufällig" der ISO-8859-1- und ISO-8859-15- bzw. CP1252-Kodierung des Umlauts. Noch genauer, ist \xC3\xBC noch die string-escape-Kodierung der UTF-8-Kodierung des Umlauts.

Google Translate:

is the UTF-8 coding of the umlauts "ü", which, according to the Unicode table, has the code point (something like a sequential number) 252. This number "coincidentally" corresponds to the ISO-8859-1 and ISO-8859-15 or CP1252 coding of the umlauts. More precisely, \ xC3 \ xBC is still the string escape coding of the UTF-8 coding of the umlauts.

It seems that sometimes this can happen. I don't know when and why.

I have now completely removed this rule in my setup. I don't have the time to look into it further. Either we live with it and say the false positive rarely occurs or we move the rule to PL 2 or someone else can take over this issue.

@theseion
Copy link
Contributor

The differences in encoding are due to different ways that the glyph ü can be written in UTF-8. There is a single glyph called U with two dots (code point U+00FC) and it is also possible to write it as a combination of the glyph u and the combining diaeresis glyph (code point U+0308, also known as "trema"). This will often trip up fonts that don't support all combining diacritical marks. When you write a test, be sure to use some way to generate the exact byte sequence (e.g. using hexadecimal encoding), so that your editor doesn't change the glyph to one supported by the font you are using.

@franbuehler
Copy link
Contributor

Oh, that's smart. Thanks a lot for your input!!
Yes, it seems that my editor is changing the u and the trema (¨) to a ü.
Is it possible to find a curl example that reproduces this problem?

I proposed a PR that moves the rule to PL2. You proposed to extend the rule with a script. I thought that maybe this is too limited. But if we extended the rule with script and img and maybe other valid tags this would be the better solution.
What do you think?

@dune73
Copy link
Member
dune73 commented Apr 2, 2021

Thank you @theseion for explaining this once more. I did not get it when @franbuehler mentioned it above.

Here is how I get the "incorrect" UTF-8 representation. Maybe this helps with reproducing.

$ echo "666f6f3d75cc880a" |  xxd -plain -revert > /tmp/tmp1
$ hexdump -C /tmp/tmp1
00000000  66 6f 6f 3d 75 cc 88 0a                           |foo=u...|
00000008
$ curl http://localhost -d "@/tmp/tmp1" --trace-ascii -
...
=> Send data, 7 bytes (0x7)
0000: foo=u..
...

@theseion
Copy link
Contributor
theseion commented Apr 3, 2021

That looks good @dune73.

@franbuehled As detailed in the description of the evasion (https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) this particular evasion only works with US-ASCII and the only known web server to use US-ASCII is Tomcat. So we could say that PL2 is enough since it is not a common configuration (and it would take a misconfigured Tomcat anyway).

Additionally, checking for script would disable the most powerful attacks that use this evasion. However, as you pointed out, other tags could possibly also be used, e.g. an img with a maliciously encoded image (assume for example, that the there's a zero-day exploit in a decompression or image processing library). Therefore, we would probably need to check for any tag that can be used to load external resources:

  • script
  • img
  • link
  • a
  • embed
  • audio
  • video
  • source
  • iframe
  • map
  • object
  • picture
  • portal
  • ...

This clearly isn't a maintainable option. Then again, most of these would today be covered by CORS policies.

I went back and looked at #1645. My original proposal \xbc[^\xbe>]*[\xbe>]|<[^\xbe]*\xbe doesn't work because only the last byte of the two-byte UTF-8 sequence is being considered. This is actually correct because the first byte could be anything else, only the last byte is relevant to the evasion. But that also means that the regular expression isn't specific enough and will match many false positives.
There is one thing we could do with this expression, which is to extend it to look for a complete tag and optional end tag. That should mitigate the false positives that were reported. There are generally two cases here:

  1. the tag does not have an end tag and ends with />
  2. the tag has an end tag </xxx>

Let's do the first case: (?:\xbc[^\xbe>]*/\s*[\xbe>])|(?:<[^\xbe]*/\s*\xbe).
For the second case it's probably better to combine two rules to minimize the cost of running the regular expression. The first rule can check for a start tag and the second for an end tag if the first rule matches:
start tag: \xbc[^\xbe>]*[\xbe>]|<[^\xbe]*\xbe
end dat: (?:\xbc\s*/\s*[^\xbe>]*[\xbe>])|(?:<\s*/\s*[^\xbe]*\xbe)

It would be interesting to test this against content from @Ais8Ooz8 since I don't have a good understanding of the possible byte combinations in the Russion alphabet.

@franbuehler
Copy link
Contributor

Thank you for your detailed explanation and your very appreciated help, @theseion!

I summarize the three options:

  • Move the rule to PL2: that would be an option for you too, since, as you say,

PL2 is enough since it is not a common configuration (and it would take a misconfigured Tomcat anyway).

  • Add all possible additional keywords to the script keyword: this is not maintainable.
  • extend the expression to look for a complete tag and optional end tag with a chained rule:
    start tag: \xbc[^\xbe>]*[\xbe>]|<[^\xbe]*\xbe
    end tag: (?:\xbc\s*/\s*[^\xbe>]*[\xbe>])|(?:<\s*/\s*[^\xbe]*\xbe).
    We would like to have @Ais8Ooz8 test this.

Is @Ais8Ooz8 still available so we could test this?
Then I would adjust my PR accordingly.

Would that be a good way to go?

@franbuehler
Copy link
Contributor

Meeting decision May (#2053 (comment)):
We go to PL2 and @franbuehler or @dune73 ask @theseion to implement his idea and then we check and we take it back to PL1

@theseion
Copy link
Contributor
theseion commented May 4, 2021

I'll take a look in the next few days.

@franbuehler
Copy link
Contributor

That would be awesome, thank you!!

@theseion
Copy link
Contributor
8000

Just to let you know: I haven't forgotten. I'm finishing up some other stuff and will start working on this ASAP.

@theseion
Copy link
Contributor

@franbuehler I've opened a PR. Could you take a look?

@franbuehler
Copy link
Contributor

Thank you @theseion!! Yes, I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
➕ False Positive PR available this issue is referenced by an active pull request
Projects
None yet
3 participants
0