-
-
Notifications
You must be signed in to change notification settings - Fork 402
feat(config): Adds enable_default_collections flag to do not initialize collections by default #3141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid to me, but I'm a bit fuzzy on the details of setup.
I've asked @dune73 to review as well. |
Thank you for picking this up, @M4tteoP. It's good somebody is taking care of it. The way you do this, probably works, but I would like to see this optimized.
Well, maybe we need to think about the use case and keeping the ua-hash variable around, but I am confident the way ModSec works with collections, you do not need the key anymore once it's initialized. Sorry this makes your PR much more complicated, but now that we touch it, we should also make it simpler and more performant. |
Thanks for the guidance! PTAL 🙏 |
That works nicely. Thank you. Would you mind shifting the rule to 901320? I think the slot is free. For the record the key, including the UA-Hash, is accessible via |
Thank you. Ready to fly from my side. |
Looks good to me too. One thing I noticed: why do we need two chained rules? We're checking |
Good thinking. Yes, this looks redundant. @M4tteoP could you try it out without that chain? |
Done, I did some little tests, and it seems that it is working as expected, but the lint is complaining about relying on a unset variable 🤔:
|
And it didn't complain before? It should have complained before, I think. Well, you just have to set the variable explicitly. We introduced this linter step for a reason :) |
How do you mean before? It's a new variable and the initial PR did set the variable explicitly. I think we've been in this discussion before. Adding a SecRule just to please the linter seems wrong to me. This is a case where the variable is optional and the rule treats that just fine. @airween: What do you think? |
No. Unfortunately now I don't find the discussion (Slack or GH issue/PR?), but we discussed this and the decision was that there wouldn't be any uninitialized variable. All of them must be checked and set a default value. I asked that before I started to implement that feature, so I clearly remember :). It's not too hard work to remove this check - so, what to do with it? |
Thanks for chiming in @airween. I remember the discussion lively and also the decision to enforce variable initialization always. However, I am no longer familiar with the use case and my gut feeling tells me the use case was less clear than the one here, so maybe we can return on the question. Here, we have a situation where initialization adds an overhead of an additional SecRule that is not needed otherwise. Given most people are likely to run without collections, the initialization of this optional variable seems entirely futile to me. But this is a special use case. Generally, we should enforce variable initialization and we have discovered several bugs where the vars have not been initialized. So the check makes a lot of sense in 90% of the cases. For me the question is what to do with the remaining cases. And the one here falls into this category:
I think in such a case, we should be able to make an exception to the general rule. I see two ways:
If I can convince you that the overhead of the unnecessary initialization should be avoided. |
May be I misunderstand something, but I think the enforce_bodyproc_urlencoded and crs_validate_utf8_encoding are the same as this new one. We can dispense with the initialization of variables (even just we use it only one place), but then the rule set won't consistent.
How can we decide which cases makes sense and which don't? (My algorithm asks :))
Same question as above - what makes a variable optional?
The
Is there any way to check that the written rule works as we expected in each cases? Anyway,
I should avoid that.
I thought the special comment too, or we can use some name convention for these optional variables. Eg. the If I modify the linter, should we remove the mentioned two rules? I would insist the consistent usage of variables. |
The discussion happened on Slack: https://github.com/coreruleset/project-chat-archive/blob/master/chat-archive-2022-11-21.md. The discussion then was pretty much identical :). And I think we've discussed this before too: why don't we just initialize the variable to 0? We have the rule in the setup file, it's just commented out. Or we do what @airween ded for I'm with @airween on this one, if we allow variables to be uninitialized, however we do it, at some point we will be in a similar situation, where we have a bunch of |
Thanks for the link. My reasoning is purely around performance: Why execute an unnecessary SecRule to initialize something that 90% of users do not need. A SecRule directive has an overhead of 0.5 microseconds. All CRS systems combined serve many, many billions of pages a day. This adds up. And what we get in return is convenience and a cleaner code base. I think this is not worth it. I know I am probably in a minority position with this discussion again. I just do not understand why my performance argument is so irrelevant to you. |
I do get the argument for performance. Do the 0.5 microseconds also apply to chained rules? Because we do check the variable either way. Another idea: @airween, would it be possible to read |
It's definitely not irrelevant. But then we have to remove the other two mentioned rules too (for the consistency). And then we will again when we were before. But let's say we want to allow for the special cases where we don't need to initialize the variable. Who will decide whether the claim is legitimate or not? The rule author can enter the special comment (no one will notice...) or use the prefix (which is what I suggested). What is the real reason to use an uninitialized variable? So I think we can choose now performance vs. consistency. The CRS currently contains approximately 580 rules. If we count 0.5 usec for each rule (at least), we get 290 usec (now we are not counting the real evaluation of the rules). With the new rule (@M4tteoP) this increases to 290.5 usecs. This is a decrease of 0.1 percent of performance. Again: I'm not against the performance, but then we have to create rule-making rules, and we can't break them. I accept our decision and I am very happy to make the change of |
When I started to work on the I you guys agree that this would be a useful way, I can take a look it how could I do that. |
I think @theseion's proposal could be a useful way of dealing with the problem. Defining where such a commented-out initalization is permitted and where it's not would be part of the review process I guess. Also possible to add guidance to the contributing.md doc. And I agree that an individual rule adds but little. But this is how cruft piles up. |
db73172
to
13bc7fe
Compare
Small indentations and actions ordering fixes, but after #3161, the linter is happy! |
Why was this PR reopened @airween? |
because we haven't finished yet. I'm afraid I accidentally closed it. |
Let me know when you want a review. |
This PR was on hold because it is requiring the commented-out initialization, I think now it is ready to have a second check. Thanks! |
81984a8
to
f9cfa62
Compare
LGTM. Thanks. |
Let's merge this. Thank you very much for the contribution @M4tteoP. |
yay, thanks to you for the whole reasoning about it! |
Hello,
I tried to address #3068, following the decision taken during Jan 16 (issue chat meeting) about relying on a flag to initialize collections for the plugins, keeping it off by default.
Not sure the implementation is as you would expect, happy to have any guidance about it.
Thanks!
Tentatively closes #3068