-
-
Notifications
You must be signed in to change notification settings - Fork 401
Proposal for a new plugin: Machine Learning on ModSecurity #2067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
My very first, maybe dump, question: What is ML? |
same question from me :) |
ML = machine learning, oops :) |
Thank you for your submission @fggillie. That was fast. For the record: I asked @fggillie to submit her work (-> EPFL Master Thesis) here so it could be polished into an official plugin. We do not yet have a proper process to discuss / refine new plugins, so I thought it fitting to draft the PR here and when we deem it done, we create a separate repo for the new plugin. This is a good base for a start and it has been a long standing desire to take the pain out of Machine Learning an CRS. The problem is that people need to learn ModSec/CRS first before they can concentrate on ML despite their interest being with ML. As listed above, there are several open questions around this functionality. Here are a few thoughtsThe extension to 949110 (that we need to pluginize somehow) sets the terminal This is an interesting concept, yet it is a new concept and we have to think it through. As Floriane mentioned it will lead to situations where the anomaly threshold is reached, yet the request is not blocked. In fact there won't be a trace in the log in such a case the way it is done in the PR now. An alternative way to call the external ML would be to integrate a scoring rule that does the ML call. That would then not be used to fight false positives, but simply an additional detection rule that would score. I think the plugin should allow this as well, or primarily this option. That would result in an OR connection between CRS and ML: CRS scores, or the new plugin scores and if one or the combination of the two result in a high anomaly score, we have a hit. Performancewise, this is a big difference, though, and it is possible that executing lua for every request is too heavy. (Thought: Only execute ML for certain requests, like POST requests, or those requests where the anomaly threshold is within reach. No need to execute ML, when the anomaly score is 0, this is the last rule to execute and it can only score 5 with 10 being the limit. In this situation you should only execute the ML rule if the score is already greater or equal to 5 since only in this situation there is the potential to actually hit or exceed the anomaly threshold.) A big potential of ML is to take the context / the session into consideration when examining a request. It is possible to do this with ModSecurity, but it's really tedious. However, with ML you can follow the flow of the application really easily. Piece of cake. However, the way the lua script presents itself, it does not forward the client's cookies, nor the client's IP address. So that would be a useful addition. It might be interesting from a performance perspective to parallelize the execution. Right now, we branch into the lua script after the threshold is reached. Maybe we could branch very early in the request, execution continues and then in 949110, we poll for the result of the ML call. We ought to look into the scanning of the responses too, since there is a substantial potential for ML to detect data leakages, since these cases will differ significantly from the standard RESPONSE_BODY. |
I think most of problems you mentioned can be easily resolved inside a lua script. |
Are we talking about any concrete ML solution or is it supposed to be some kind of generic ML integration? |
Ideally generic. What we see so far is generic. One could then accompany it with a reference use case in the form of a blog post / tutorial. |
JFYI: we have a similar project that would generate a generic plugin by extending modsecurity with a new operator. Nothing productive yet, but I'm expecting to have something in the next quarter. |
My 2 cents on this PR that I saw way too late.
|
Thank you for chiming in. Are you working on ML as well? |
Thanks for the info, @vloup. Care contacting me via DM? folini@netnea.com |
@deepshikha-s created the Machine Learning Integration Plugin as CRS plugin as a Google Summer of Code 2022 project. This PR can be closed. All work has been integrated into the referenced CRS Machine Learning Integration Plugin. |
This PR is a proposal for a plugin to ease the integration of Machine Learning (ML) in ModSecurity.
How it works
The core idea is to use ML in combination with the CRS, by having a double-check on suspicious requests. For this reason, ML is triggered only for requests for which the anomaly score exceeds the threshold. This is performed by chaining rule 949110 with a "ML rule".
The "ML rule" is a Lua script calling the ML model running on a server in an external container/pod. This obviously adds latency due to communication overhead but has the advantage of only loading and instantiating the ML model once at server start and not for each incoming request.
Attached is an example of a server running the ML model that can be reached by the Lua script. It uses the Flask library, definitely not the fastest option.
dummy_app.txt
What needs to be discussed
I'd be glad to discuss the following (and any other suggestion you may have) with you: