8000 feat(config): Adds enable_default_collections flag to do not initialize collections by default by M4tteoP · Pull Request #3141 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(config): Adds enable_default_collections flag to do not initialize collections by default #3141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 17, 2023

Conversation

M4tteoP
Copy link
Member
@M4tteoP M4tteoP commented Feb 24, 2023

Hello,
I tried to address #3068, following the decision taken during Jan 16 (issue chat meeting) about relying on a flag to initialize collections for the plugins, keeping it off by default.
Not sure the implementation is as you would expect, happy to have any guidance about it.
Thanks!

Tentatively closes #3068

@M4tteoP M4tteoP changed the title Adds enable_default_collections flag to do not initialize collections by default feat(config): Adds enable_default_collections flag to do not initialize collections by default Feb 24, 2023
Copy link
Contributor
@theseion theseion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid to me, but I'm a bit fuzzy on the details of setup.

@theseion theseion requested a review from dune73 February 26, 2023 16:21
theseion
theseion previously approved these changes Feb 26, 2023
@theseion
Copy link
Contributor

I've asked @dune73 to review as well.

@dune73
Copy link
Member
dune73 commented Feb 26, 2023

Thank you for picking this up, @M4tteoP. It's good somebody is taking care of it.

The way you do this, probably works, but I would like to see this optimized.

  • We do not need rule 901315. Honestly, if the count of this variable is 0, then it's not set. So we can use the var counter as the first rule in a chain.
  • 901318 is nice, but no need to make this into a separate rule. Integrate into the single rule that sets everything.
  • 901321, here you check for the existence of the variable, then you chain a rule that checks for the value of the variable, then you chain a rule that creates the UA_Hash and initiates the collections without creating a ua-hash variable that we do not need anyways.

Well, maybe we need to think about the use case and keeping the ua-hash variable around, but I am confident the way ModSec works with collections, you do not need the key anymore once it's initialized.

Sorry this makes your PR much more complicated, but now that we touch it, we should also make it simpler and more performant.

@M4tteoP
Copy link
Member Author
M4tteoP commented Feb 27, 2023

Thanks for the guidance! PTAL 🙏

@dune73
Copy link
Member
dune73 commented Feb 27, 2023

That works nicely. Thank you.

Would you mind shifting the rule to 901320? I think the slot is free.

For the record the key, including the UA-Hash, is accessible via IP:KEY.

dune73
dune73 previously approved these changes Feb 27, 2023
@dune73
Copy link
Member
dune73 commented Feb 27, 2023

Thank you. Ready to fly from my side.

@theseion
Copy link
Contributor

Looks good to me too. One thing I noticed: why do we need two chained rules? We're checking ... @eq 1 anyway, so do we really need to check that the variable is set first?

@dune73
Copy link
Member
dune73 commented Feb 27, 2023

Good thinking. Yes, this looks redundant. @M4tteoP could you try it out without that chain?

@M4tteoP
Copy link
Member Author
M4tteoP commented Feb 27, 2023

Done, I did some little tests, and it seems that it is working as expected, but the lint is complaining about relying on a unset variable 🤔:

::error  There are one or more unset TX variables.
::error  file=rules/REQUEST-901-INITIALIZATION.conf,line=289,endLine=289,title=unset TX variable: TX variable 'ENABLE_DEFAULT_COLLECTIONS' not set / later set (VAR)

@theseion
Copy link
Contributor

And it didn't complain before? It should have complained before, I think. Well, you just have to set the variable explicitly. We introduced this linter step for a reason :)

@dune73
Copy link
Member
dune73 commented Feb 28, 2023

How do you mean before? It's a new variable and the initial PR did set the variable explicitly.

I think we've been in this discussion before. Adding a SecRule just to please the linter seems wrong to me. This is a case where the variable is optional and the rule treats that just fine.

@airween: What do you think?

@airween
Copy link
Contributor
airween commented Mar 6, 2023

I think we've been in this discussion before. Adding a SecRule just to please the linter seems wrong to me. This is a case where the variable is optional and the rule treats that just fine.

No. Unfortunately now I don't find the discussion (Slack or GH issue/PR?), but we discussed this and the decision was that there wouldn't be any uninitialized variable. All of them must be checked and set a default value. I asked that before I started to implement that feature, so I clearly remember :).

It's not too hard work to remove this check - so, what to do with it?

@dune73
Copy link
Member
dune73 commented Mar 7, 2023

Thanks for chiming in @airween. I remember the discussion lively and also the decision to enforce variable initialization always. However, I am no longer familiar with the use case and my gut feeling tells me the use case was less clear than the one here, so maybe we can return on the question.

Here, we have a situation where initialization adds an overhead of an additional SecRule that is not needed otherwise. Given most people are likely to run without collections, the initialization of this optional variable seems entirely futile to me.

But this is a special use case. Generally, we should enforce variable initialization and we have discovered several bugs where the vars have not been initialized. So the check makes a lot of sense in 90% of the cases.

For me the question is what to do with the remaining cases. And the one here falls into this category:

  • The variable is optional
  • Most people will leave it off (or unitialized if they can)
  • The rules using the variables are diligently created and reviewed. It works just fine when the variable is not initialized.

I think in such a case, we should be able to make an exception to the general rule.

I see two ways:

  • Hard-Code the optional variable names into the linter
  • Add a special format comment read by the linter that allows to pass it the name of a variable that does not have to be initialized.

If I can convince you that the overhead of the unnecessary initialization should be avoided.

@airween
Copy link
Contributor
airween commented Mar 7, 2023

Here, we have a situation where initialization adds an overhead of an additional SecRule that is not needed otherwise.

May be I misunderstand something, but I think the enforce_bodyproc_urlencoded and crs_validate_utf8_encoding are the same as this new one. We can dispense with the initialization of variables (even just we use it only one place), but then the rule set won't consistent.

So the check makes a lot of sense in 90% of the cases.

How can we decide which cases makes sense and which don't? (My algorithm asks :))

  • The variable is optional

Same question as above - what makes a variable optional?

  • Most people will leave it off (or unitialized if they can)

The rules-check script prevents it :)

  • The rules using the variables are diligently created and reviewed. It works just fine when the variable is not initialized.

Is there any way to check that the written rule works as we expected in each cases?

Anyway,

I see two ways:

  • Hard-Code the optional variable names into the linter

I should avoid that.

  • Add a special format comment read by the linter that allows to pass it the name of a variable that does not have to be initialized.

I thought the special comment too, or we can use some name convention for these optional variables. Eg. the OPT_ prefix, like OPT_ENABLE_DEFAULT_COLLECTIONS. Linter can recognize this prefix, and can it can turn a bli 8000 nd eye.

If I modify the linter, should we remove the mentioned two rules? I would insist the consistent usage of variables.

@theseion
Copy link
Contributor
theseion commented Mar 7, 2023

The discussion happened on Slack: https://github.com/coreruleset/project-chat-archive/blob/master/chat-archive-2022-11-21.md.
There's a related GH issue too: #2990.

The discussion then was pretty much identical :). And I think we've discussed this before too: why don't we just initialize the variable to 0? We have the rule in the setup file, it's just commented out. Or we do what @airween ded for crs_validate_utf8_encoding and set the variable if it is unset (https://github.com/coreruleset/coreruleset/pull/2802/files), which is, I think, what @M4tteoP had written in the first place :).

I'm with @airween on this one, if we allow variables to be uninitialized, however we do it, at some point we will be in a similar situation, where we have a bunch of OPT_* variables that are no longer in use and we'll spend unnecessary time figuring that out.

@dune73
Copy link
Member
dune73 commented Mar 7, 2023

Thanks for the link.

My reasoning is purely around performance: Why execute an unnecessary SecRule to initialize something that 90% of users do not need.

A SecRule directive has an overhead of 0.5 microseconds. All CRS systems combined serve many, many billions of pages a day. This adds up. And what we get in return is convenience and a cleaner code base. I think this is not worth it.

I know I am probably in a minority position with this discussion again. I just do not understand why my performance argument is so irrelevant to you.

@theseion
Copy link
Contributor
theseion commented Mar 7, 2023

I do get the argument for performance. Do the 0.5 microseconds also apply to chained rules? Because we do check the variable either way.

Another idea: @airween, would it be possible to read crs-setup.conf.example, uncomment the commented rules and run the check then? Because then the optional variables would have been set.

@airween
Copy link
Contributor
airween commented Mar 7, 2023

I just do not understand why my performance argument is so irrelevant to you.

It's definitely not irrelevant. But then we have to remove the other two mentioned rules too (for the consistency). And then we will again when we were before.

But let's say we want to allow for the special cases where we don't need to initialize the variable. Who will decide whether the claim is legitimate or not? The rule author can enter the special comment (no one will notice...) or use the prefix (which is what I suggested). What is the real reason to use an uninitialized variable?

So I think we can choose now performance vs. consistency.

The CRS currently contains approximately 580 rules. If we count 0.5 usec for each rule (at least), we get 290 usec (now we are not counting the real evaluation of the rules). With the new rule (@M4tteoP) this increases to 290.5 usecs. This is a decrease of 0.1 percent of performance.

Again: I'm not against the performance, but then we have to create rule-making rules, and we can't break them. I accept our decision and I am very happy to make the change of rules-check, just don't break our own rules.

@airween
Copy link
Contributor
airween commented Mar 7, 2023

Another idea: @airween, would it be possible to read crs-setup.conf.example, uncomment the commented rules and run the check then? Because then the optional variables would have been set.

When I started to work on the rules-check, I had a similar idea - but I rejected it, because I thought that leads to misunderstandings.

I you guys agree that this would be a useful way, I can take a look it how could I do that.

@dune73
Copy link
Member
dune73 commented Mar 8, 2023

I think @theseion's proposal could be a useful way of dealing with the problem.

Defining where such a commented-out initalization is permitted and where it's not would be part of the review process I guess. Also possible to add guidance to the contributing.md doc.

And I agree that an individual rule adds but little. But this is how cruft piles up.

@M4tteoP M4tteoP force-pushed the optional_init_coll branch 2 times, most recently from db73172 to 13bc7fe Compare March 25, 2023 13:22
@M4tteoP
Copy link
Member Author
M4tteoP commented Mar 25, 2023

Small indentations and actions ordering fixes, but after #3161, the linter is happy!

@theseion
Copy link
Contributor

Why was this PR reopened @airween?

@airween
Copy link
Contributor
airween commented Mar 25, 2023

Why was this PR reopened @airween?

because we haven't finished yet. I'm afraid I accidentally closed it.

@theseion
Copy link
Contributor

Let me know when you want a review.

@M4tteoP
Copy link
Member Author
M4tteoP commented Mar 27, 2023

This PR was on hold because it is requiring the commented-out initialization, I think now it is ready to have a second check. Thanks!

@theseion theseion force-pushed the optional_init_coll branch from 81984a8 to f9cfa62 Compare March 27, 2023 19:45
@dune73
Copy link
Member
dune73 commented Mar 28, 2023

LGTM. Thanks.

@dune73
Copy link
Member
dune73 commented Apr 17, 2023

Let's merge this. Thank you very much for the contribution @M4tteoP.

@dune73 dune73 merged commit f56a4c6 into coreruleset:v4.0/dev Apr 17, 2023
@M4tteoP
Copy link
Member Author
M4tteoP commented Apr 17, 2023

yay, thanks to you for the whole reasoning about it!

@M4tteoP M4tteoP deleted the optional_init_coll branch April 17, 2023 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Variable tx.ua_hash and IP collection set but not used in CRS core
4 participants
0