8000 Fixed assembly of 920120-no-backtracking.data by theseion · Pull Request #2333 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fixed assembly of 920120-no-backtracking.data #2333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 22, 2022

Conversation

theseion
Copy link
Contributor

No description provided.

@theseion theseion requested a review from fzipi December 11, 2021 07:30
@fzipi
Copy link
Member
fzipi commented Dec 11, 2021

Just to have it in one place, the target of this rule is:

# 920120: PL1 : FILES_NAMES, FILES
#                ['\";=] but allowed:
#                &[aAoOuUyY]uml); &[aAeEiIoOuU]circ; &[eEiIoOuUyY]acute;
#                &[aAeEiIoOuU]grave; &[cC]cedil; &[aAnNoO]tilde; & '

Copy link
Member
@fzipi fzipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the intention on this rule to get a collective view into this one.

First, and not related, the 920120.data looks like the regexp is not "disassembled". So it is a bit difficult to compare.

Regarding the new one, looks like aacute was left out in line 37. It should be &[aeEiIoOuUyY]acute. I'm going to review all these html metas so see if we are missing others, just in case. BTW, it is out also in the original no backtracking data file....

Other than that, my test for just file works as it is not matching now. Will create a new issue to track changes in 920120 in general.

@theseion
Copy link
Contributor Author

Not sure it helps, but I've properly disassembled 920120.

@theseion
Copy link
Contributor Author

Just learned something new: lookbehind assertions must be of fixed length. That even means that you can't use certain grouping patterns, so we can't use regexp-assemble.py to assemble the individual branches inside of the lookbehind assertion.

@theseion
Copy link
Contributor Author

You're right about aacute. If we're messing around with the role, I feel we could also just make it case insensitive instead of listing lower and uppercase variants in the character classes.

@fzipi
Copy link
Member
fzipi commented Dec 12, 2021

Also, take a look at the linked issue.

@fzipi
Copy link
Member
fzipi commented Dec 29, 2021

@theseion With the extended html entities, and using case ignore this is what I get:

##!+ i

##!> neglook
&[aeiouy]uml;
&[aeioucghjswy]circ;
&[aeiouclnrszg]acute;
&[aeiou]grave;
&[cgklnrst]cedil;
&[anoi]tilde;
&[cdelnrstz]caron;
ø
&
'
&[au]ring;
##!<

##!$ ;|['\"=]

What do you think?

@franbuehler
Copy link
Contributor
franbuehler commented Jan 3, 2022

Meeting decision January: @franbuehler volunteered to review this P,R but just saw that @fzipi is already reviewing it...

@theseion
Copy link
Contributor Author
theseion commented Jan 4, 2022

Responding to the question in chat: the line ##!$ ;|['\"=] is a suffix comment. This will append ;|['\"=] to the end of the regex. If we didn't mark this as a suffix, ;|['\"=] would be added as a branch instead (...;|['\"=] vs. ...|;|['\"=]).

@theseion
Copy link
Contributor Author
theseion commented Jan 5, 2022

@fzipi I've updated the data files and the rule comment with your suggestion. I noticed that you've added a couple more entities and I was wondering about the rational. The rule comment says that "common" entities are allowed but I'm not sure what would be considered common. I'd also include &nbsp; in the list of common entities. Do you have a better idea for which entities to includes than just "common"?

@theseion
Copy link
Contributor Author

@fzipi pint

@fzipi
Copy link
Member
fzipi commented Jan 17, 2022

@theseion I just added all the entities that could be used in a file name.... probably adding &nbsp; might help.

Improved comments in data files for 920120
@theseion
Copy link
Contributor Author

@fzipi Can you run your test once more on the latest commit?

@fzipi
Copy link
Member
fzipi commented Jan 22, 2022

Yes, all tests look good using the regex generated with no-backtracking data:

(?i)(?:(?:^|[^enlcgpsh])|(?:^|[^tvd])e|(?:^|[^o])n|(?:^|[^im])l|(?:^|[^r])c|(?:^|[^n])g|(?:^|[^ms])p|(?:^|[^o])s|(?:^|[^s])h|(?:^|[^u])te|(?:^|[^r])on|(?:^|[^d])il|(?:^|[^i])rc|(?:^|[^a])ve|(?:^|[^i])ng|(?:^|[^l])de|(?:^|[^u])ml|(?:^|[^a])mp|(?:^|[^p])os|(?:^|[^b])sp|(?:^|[^a])sh|(?:^|[^c])ute|(?:^|[^a])ron|(?:^|[^e])dil|(?:^|[^c])irc|(?:^|[^r])ave|(?:^|[^r])ing|(?:^|[^i])lde|(?:^|[^aeiouy])uml|(?:^|[^&])amp|(?:^|[^a])pos|(?:^|[^n])bsp|(?:^|[^l])ash|(?:^|[^a])cute|(?:^|[^c])aron|(?:^|[^c])edil|(?:^|[^aeioucghjswy])circ|(?:^|[^g])rave|(?:^|[^au])ring|(?:^|[^t])ilde|(?:^|[^&])[aeiouy]uml|(?:^|[^&])apos|(?:^|[^&])nbsp|(?:^|[^s])lash|(?:^|[^aeiouclnrszg])acute|(?:^|[^cdelnrstz])caron|(?:^|[^cgklnrst])cedil|(?:^|[^&])[aeioucghjswy]circ|(?:^|[^aeiou])grave|(?:^|[^&])[au]ring|(?:^|[^anoi])tilde|(?:^|[^o])slash|(?:^|[^&])[aeiouclnrszg]acute|(?:^|[^&])[cdelnrstz]caron|(?:^|[^&])[cgklnrst]cedil|(?:^|[^&])[aeiou]grave|(?:^|[^&])[anoi]tilde|(?:^|[^&])oslash);|['\"=]

Tests:

❯ ./ftw run -d tests -i ^920120
1:21PM INF ftw/config: no duration found
1:21PM INF 🛠️  Starting tests!

🚀 Running go-ftw!
👉 executing tests in file 920120.yaml
	running 920120-1: ✔ passed in 109.261559ms
	running 920120-2: ✔ passed in 1.051451432s
	running 920120-3: ✔ passed in 123.82658ms
	running 920120-4: ✔ passed in 81.976114ms
	running 920120-5: ✔ passed in 93.73251ms
	running 920120-6: ✔ passed in 93.490047ms
	running 920120-7: ✔ passed in 103.964909ms
	running 920120-8: ✔ passed in 96.080811ms
	running 920120-9: ✔ passed in 93.157169ms
	running 920120-10: ✔ passed in 100.846706ms
	running 920120-11: ✔ passed in 110.68612ms
	r
8000
unning 920120-12: ✔ passed in 102.233191ms
	running 920120-13: ✔ passed in 94.317103ms
	running 920120-14: ✔ passed in 92.950227ms
	running 920120-15: ✔ passed in 90.528696ms
	running 920120-16: ✔ passed in 92.780996ms
	running 920120-17: ✔ passed in 95.518904ms
	running 920120-18: ✔ passed in 93.303631ms
	running 920120-19: ✔ passed in 98.386815ms
	running 920120-20: ✔ passed in 89.70153ms
	running 920120-21: ✔ passed in 91.733384ms
➕ run 21 total tests in 2.999928434s
⏭  skept 2586 tests
🎉 All tests successful!

@fzipi fzipi linked an issue Jan 22, 2022 that may be closed by this pull request
@theseion
Copy link
Contributor Author

I've added tests for all of the entities (including variants). Tests are green. From my point of view this is ready merge. @fzipi?

@fzipi
Copy link
Member
fzipi commented Jan 22, 2022

Agreed, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dissassemble regexp for 920120 and extend matching
3 participants
0