Rule 941310: false positive for Russian letters "м" and "о" #1942

ghost · 2020-12-04T09:12:07Z

Description

I just enter a word from all the letters of the Russian alphabet and get a false positive.

абвгдеёжзийклмнопрстуфхцчшщъыэюя – false positive
абвгдеёжзийклнопрстуфхцчшщъыэюя – true negative
абвгдеёжзийклмнпрстуфхцчшщъыэюя – true negative

I decided to check the whole alphabet after I started noticing problems with some Russian sentences. As it turned out, the problem is in specific letters

¼ = bc
¾ = be

м = d0 bc
о = d0 be

Audit Logs / Triggered Rule Numbers

"response":{"body":"<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n","http_code":403,"headers":{"Server":"","Server":"","Date":"Fri, 04 Dec 2020 08:50:05 GMT","Content-Length":"146","Content-Type":"text/html","Access-Control-Allow-Origin":"*","Connection":"close","Access-Control-Allow-Credentials":"true","Strict-Transport-Security":"max-age=15724800; includeSubDomains"}},"producer":{"modsecurity":"ModSecurity v3.0.4 (Linux)","connector":"ModSecurity-nginx v1.0.1","secrules_engine":"Enabled","components":["OWASP_CRS/3.3.0\""]},"messages":[{"message":"US-ASCII Malformed Encoding XSS Filter - Attack Detected","details":{"match":"Matched \"Operator `Rx' with parameter `\\xbc[^\\xbe>]*[\\xbe>]|<[^\\xbe]*\\xbe' against variable `ARGS:json.description' (Value: `\\xd0\\xb0\\xd0\\xb1\\xd0\\xb2\\xd0\\xb3\\xd0\\xb4\\xd0\\xb5\\xd1\\x91\\xd0\\xb6\\xd0\\xb7\\xd0\\xb8\\xd0\\xb9\\xd0\\xba\\xd0 (156 characters omitted)' )","reference":"o27,5v17,64t:urlDecodeUni,t:lowercase,t:urlDecode,t:htmlEntityDecode,t:jsDecode","ruleId":"941310","file":"/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf","lineNumber":"527","data":"Matched Data: но found within ARGS:json.description: абвгдеёжзийклмнопрстуфхцчшщъыэюя","severity":"2","ver":"OWASP_CRS/3.3.0","rev":"","tags":["application-multi","language-multi","platform-tomcat","attack-xss","paranoia-level/1","OWASP_CRS","capec/1000/152/242"],"maturity":"0","accuracy":"0"}},{"message":"Inbound Anomaly Score Exceeded (Total Score: 5)","details":{"match":"Matched \"Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `5' )","reference":"","ruleId":"949110","file":"/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf","lineNumber":"80","data":"","severity":"2","ver":"OWASP_CRS/3.3.0","rev":"","tags":["application-multi","language-multi","platform-multi","attack-generic"],"maturity":"0","accuracy":"0"}}]}

Your Environment

CRS version (e.g., v3.2.0): v3.3.0
Paranoia level setting: 1
ModSecurity version (e.g., 2.9.3): v3.0.4
Image: k8s.gcr.io/ingress-nginx/controller:v0.41.2

Confirmation

[X] I have removed any personal data (email addresses, IP addresses,
passwords, domain names) from any logs posted.

The text was updated successfully, but these errors were encountered:

ghost · 2020-12-04T11:20:24Z

Also #1645

franbuehler · 2021-02-17T09:49:53Z

As already stated in the linked #1645 the same problem exists for german Umlaute:
Ü and ä for example: Über den Wolken gehört...

[2021-02-16 15:33:52.762256] [-:error] 1.2.3.4:36123 XXXXXXXXXXXXXXXXXX [client 2.2.3.4] ModSecurity: Warning. Pattern match "\\\\xbc[^\\\\xbe>]*[\\\\xbe>]\|<[^\\\\xbe]*\\\\xbe" at ARGS:myarg. [file "/xyz/httpd/modsecurity/crs/rules/REQUEST-941-APPLICATION-ATTACK-XSS.conf"] [line "546"] [id "941310"] [msg "US-ASCII Malformed Encoding XSS Filter - Attack Detected"] [data "Matched Data: \\xbcber den Wolken geh\\xc3\\xb6rt ...

I do not know how to easily prevent these false positives. But I think this rule is too strict and definitely does not belong to PL 1.
So I see the following possibilities:

Move the rule to a higher PL
Eliminate the rule entirely
Extend the rule so that a script must follow like @theseion proposed in the linked issue above.

My suggestion would be to move the rule to PL 2 or even 3.

What do you think?

franbuehler · 2021-03-30T13:54:30Z

It's annoying but I can not reproduce this behavior!
All I can say is that sometimes something (what?) encodes the German Umlaut ü as a \\xc3\\xbc instead of \\xfc and this triggers this rule.
When I want to reproduce the call with exactly the same string, it's correctly encoded as \\xfc. I see the string that triggered the rule in the logs and in the UI of the application (I see the resulting and saved string there).

What I found is this explanation here:
https://www.python-forum.de/viewtopic.php?t=18464

ist die UTF-8-Kodierung eines Umlauts "ü", der nach der Unicode-Tabelle den Codepoint (so etwas wie eine laufende Nummer) 252 hat. Diese Nummer entspricht "zufällig" der ISO-8859-1- und ISO-8859-15- bzw. CP1252-Kodierung des Umlauts. Noch genauer, ist \xC3\xBC noch die string-escape-Kodierung der UTF-8-Kodierung des Umlauts.

Google Translate:

is the UTF-8 coding of the umlauts "ü", which, according to the Unicode table, has the code point (something like a sequential number) 252. This number "coincidentally" corresponds to the ISO-8859-1 and ISO-8859-15 or CP1252 coding of the umlauts. More precisely, \ xC3 \ xBC is still the string escape coding of the UTF-8 coding of the umlauts.

It seems that sometimes this can happen. I don't know when and why.

I have now completely removed this rule in my setup. I don't have the time to look into it further. Either we live with it and say the false positive rarely occurs or we move the rule to PL 2 or someone else can take over this issue.

theseion · 2021-03-30T18:22:51Z

The differences in encoding are due to different ways that the glyph ü can be written in UTF-8. There is a single glyph called U with two dots (code point U+00FC) and it is also possible to write it as a combination of the glyph u and the combining diaeresis glyph (code point U+0308, also known as "trema"). This will often trip up fonts that don't support all combining diacritical marks. When you write a test, be sure to use some way to generate the exact byte sequence (e.g. using hexadecimal encoding), so that your editor doesn't change the glyph to one supported by the font you are using.

franbuehler · 2021-03-31T05:14:33Z

Oh, that's smart. Thanks a lot for your input!!
Yes, it seems that my editor is changing the u and the trema (¨) to a ü.
Is it possible to find a curl example that reproduces this problem?

I proposed a PR that moves the rule to PL2. You proposed to extend the rule with a script. I thought that maybe this is too limited. But if we extended the rule with script and img and maybe other valid tags this would be the better solution.
What do you think?

dune73 · 2021-04-02T17:01:44Z

Thank you @theseion for explaining this once more. I did not get it when @franbuehler mentioned it above.

Here is how I get the "incorrect" UTF-8 representation. Maybe this helps with reproducing.

$ echo "666f6f3d75cc880a" |  xxd -plain -revert > /tmp/tmp1
$ hexdump -C /tmp/tmp1
00000000  66 6f 6f 3d 75 cc 88 0a                           |foo=u...|
00000008
$ curl http://localhost -d "@/tmp/tmp1" --trace-ascii -
...
=> Send data, 7 bytes (0x7)
0000: foo=u..
...

theseion · 2021-04-03T09:09:33Z

That looks good @dune73.

@franbuehled As detailed in the description of the evasion (https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) this particular evasion only works with US-ASCII and the only known web server to use US-ASCII is Tomcat. So we could say that PL2 is enough since it is not a common configuration (and it would take a misconfigured Tomcat anyway).

Additionally, checking for script would disable the most powerful attacks that use this evasion. However, as you pointed out, other tags could possibly also be used, e.g. an img with a maliciously encoded image (assume for example, that the there's a zero-day exploit in a decompression or image processing library). Therefore, we would probably need to check for any tag that can be used to load external resources:

script
img
link
a
embed
audio
video
source
iframe
map
object
picture
portal
...

This clearly isn't a maintainable option. Then again, most of these would today be covered by CORS policies.

I went back and looked at #1645. My original proposal \xbc[^\xbe>]*[\xbe>]|<[^\xbe]*\xbe doesn't work because only the last byte of the two-byte UTF-8 sequence is being considered. This is actually correct because the first byte could be anything else, only the last byte is relevant to the evasion. But that also means that the regular expression isn't specific enough and will match many false positives.
There is one thing we could do with this expression, which is to extend it to look for a complete tag and optional end tag. That should mitigate the false positives that were reported. There are generally two cases here:

the tag does not have an end tag and ends with />
the tag has an end tag </xxx>

Let's do the first case: (?:\xbc[^\xbe>]*/\s*[\xbe>])|(?:<[^\xbe]*/\s*\xbe).
For the second case it's probably better to combine two rules to minimize the cost of running the regular expression. The first rule can check for a start tag and the second for an end tag if the first rule matches:
start tag: \xbc[^\xbe>]*[\xbe>]|<[^\xbe]*\xbe
end dat: (?:\xbc\s*/\s*[^\xbe>]*[\xbe>])|(?:<\s*/\s*[^\xbe]*\xbe)

It would be interesting to test this against content from @Ais8Ooz8 since I don't have a good understanding of the possible byte combinations in the Russion alphabet.

franbuehler · 2021-04-30T12:52:09Z

Thank you for your detailed explanation and your very appreciated help, @theseion!

I summarize the three options:

Move the rule to PL2: that would be an option for you too, since, as you say,

PL2 is enough since it is not a common configuration (and it would take a misconfigured Tomcat anyway).

Add all possible additional keywords to the script keyword: this is not maintainable.
extend the expression to look for a complete tag and optional end tag with a chained rule:
start tag: \xbc[^\xbe>]*[\xbe>]|<[^\xbe]*\xbe
end tag: (?:\xbc\s*/\s*[^\xbe>]*[\xbe>])|(?:<\s*/\s*[^\xbe]*\xbe).
We would like to have @Ais8Ooz8 test this.

Is @Ais8Ooz8 still available so we could test this?
Then I would adjust my PR accordingly.

Would that be a good way to go?

franbuehler · 2021-05-03T19:45:20Z

Meeting decision May (#2053 (comment)):
We go to PL2 and @franbuehler or @dune73 ask @theseion to implement his idea and then we check and we take it back to PL1

theseion · 2021-05-04T19:27:54Z

I'll take a look in the next few days.

franbuehler · 2021-05-04T19:28:40Z

That would be awesome, thank you!!

theseion · 2021-05-18T11:42:07Z

8000

Just to let you know: I haven't forgotten. I'm finishing up some other stuff and will start working on this ASAP.

theseion · 2021-05-28T18:58:50Z

@franbuehler I've opened a PR. Could you take a look?

franbuehler · 2021-06-01T06:24:48Z

Thank you @theseion!! Yes, I'll take a look.

ghost added the ➕ False Positive label Dec 4, 2020

ghost changed the title ~~Rule 941310: false positive for a Russian letter "м"~~ Rule 941310: false positive for a Russian letters "м" and "о" Dec 4, 2020

ghost changed the title ~~Rule 941310: false positive for a Russian letters "м" and "о"~~ Rule 941310: false positive for Russian letters "м" and "о" Dec 4, 2020

dune73 added the 🔖 Meeting Agenda label Dec 17, 2020

dune73 mentioned this issue Dec 21, 2020

Monthly Chat Agendas December (2020-12-07 and 2020-12-21) #1944

Closed

franbuehler self-assigned this Dec 21, 2020

dune73 removed the 🔖 Meeting Agenda label Jan 18, 2021

dune73 mentioned this issue Jan 18, 2021

Rule 941120: false positive for the Russian language #1943

Closed

franbuehler mentioned this issue Feb 18, 2021

Move 941310 from PL1 to PL2 #2014

Closed

franbuehler linked a pull request Feb 18, 2021 that will close this issue

Move 941310 from PL1 to PL2 #2014

Closed

franbuehler added the PR available this issue is referenced by an active pull request label Feb 18, 2021

theseion mentioned this issue May 28, 2021

Use chained end tag detection for rule 941310 #2107

Merged

dune73 closed this as completed in #2107 Aug 2, 2021

RedXanadu mentioned this issue Oct 17, 2022

\xc5\xbc --> Polish 'ż' character blocked #2852

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rule 941310: false positive for Russian letters "м" and "о" #1942

Rule 941310: false positive for Russian letters "м" and "о" #1942

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rule 941310: false positive for Russian letters "м" and "о" #1942

Rule 941310: false positive for Russian letters "м" and "о" #1942

Comments

Uh oh!

Description

Audit Logs / Triggered Rule Numbers

Your Environment

Confirmation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!