-
Notifications
You must be signed in to change notification settings - Fork 449
Prometheus support #1404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus support #1404
Conversation
cf4066a
to
1d5e941
Compare
dd5dbe2
to
3389bcc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, thanks! I only took a quick look but perhaps counter doesn't need to be a package and can be merged with stats?
out of scope for this PR but in a later PR we should add labels to make this more usable |
I see the point of this, but the stats package has been split in the end between rules and ebpf into separate |
tracee-rules/metrics/metrics.go
Outdated
) | ||
|
||
func RegisterRulesEngineMetrics(engine *engine.Engine) error { | ||
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{ | |
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{...}; err != nil {.....} |
tracee-ebpf/pkg/metrics/metrics.go
Outdated
) | ||
|
||
func RegisterEbpfEngineMetrics(t *tracee.Tracee) error { | ||
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{ | |
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{...}; err != nil {...} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more note:
with this commit, if the user passes the --metrics
flag, it will, by default, open a Prometheus endpoint by calling the RegisterEbpfEngineMetrics
(for tracee-ebpf at least).
Maybe in future we want to give the ability to open endpoint to other services, such as Datadog, Elastic and such. If so, then the current strategy is less effective. As a suggestion, the CLI could specify to which metrics services we could register for example
--metrics=prometheus
or --metrics=prometheus,datadog
If so, then maybe package metrics (for both tracee-rules and tracee-ebpf), could have inside several files, one for each metrics service. For example: tracee-rules/metrics/prometheus.go
.
It is all a matter to how much "flexible" we want to be, which is open to discussion.
What are your thoughts on it @itaysk , @danielpacak ?
9f3f061
to
ce28c5d
Compare
I think that what we could do is that once we add another data source that we can proceed with the above suggestion. @itaysk @danielpacak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM! These are great insights exposed as metrics. I left few comments regarding structs and dependencies that we introduce between Go packages in this PR
tracee-rules/engine/engine.go
Outdated
statStore statStore | ||
} | ||
|
||
type statStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need both statStore and stats.RulesStats? Can we have one struct that represents Tracee-rules metrics and pass it to NewEngine constructor and to the method that actually registers metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting proposal, i'll PoC it myself and if it works i'll probably introduce it.
The reason it was done this way is because that's how it's done in tracee-ebpf but if I change it here i'll change it there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looking at the way we use Prometheus Go SDK I'm wondering why we use so many custom structs. Is it absolutely necessary? I think the simplest approach would be:
// Define a single struct that groups tracee-rules metrics
type Metrics struct {
EventsCount prometheus.Counter
DetectionsCount prometheus.Counter
SignaturesCount prometheus.Counter
}
func main() {
// use promauto package to register counters
metrics := Metrics{
EventsCount: promauto.NewCounter(prometheus.CounterOpts{
Namespace: "tracee_rules",
Name: "events_total",
Help: "events ingested by tracee-rules",
}),
DetectionsCount: promauto.NewCounter(prometheus.CounterOpts{
Namespace: "tracee_rules",
Name: "detections_total",
Help: "detections made by tracee-rules",
}),
SignaturesCount: prometheus.NewCounter(prometheus.CounterOpts{
Namespace: "tracee_rules",
Name: "signatures_total",
Help: "signatures loaded",
}),
}
// Pass metrics to the engine; see my comment in another PR about functional options where metrics can be an optinal arg passed to engine constructor.
NewEngine(metrics)
}
// matchHandler is a function that runs when a signature is matched
func (engine *Engine) matchHandler(res types.Finding) {
// Increment counters whenever you like
metrics.DetectionsCount.Inc()
engine.output <- res
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that came to my mind just now is that with dynamically loaded signatures the SignaturesCount may decrease so the counter is probably not appropriate in our case. Instead of Counter we may use Gauge which has inc, dec, and set methods on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is now structured using you suggestion by registering the stat struct itself. I've also redefined the signatures count as a "gauge" instead of a counter in prometheus
7ee1e72
to
1c4eb1c
Compare
@danielpacak @mtcherni95 I've restructured the solution to decouple the metrics packages from Tracee and Engine, the register functions now take pointers to the Stats struct which is exported from Tracee/Engine. I've also taken into account some other changes you're requested so a re-review is probably required since the PR looks somewhat different now. |
@NDStrahilevitz @mtcherni95 some DoD items: tests, documentation, add to entrypoint |
7df26e0
to
c8f48df
Compare
What specifically should be done with regards to the entrypoint? I've added tests for the counter and documented how the metrics registering works near the relevant function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@itaysk I believe we can merge if you don't have anything else to add.
add it to the entrypoint.. I mean if you want someone to actually use this feature it should be enabled |
Sorry for the confusion but do you mean enabling the |
yes that's what I meant. you should assume that users wouldn't manually configure Tracee, so if you want a feature to be available, you should enable it for them. |
Stats are tracked through register functions which take a pointer to Stats. Stats can be exported from engine/tracee through Stats()
document prometheus in integrations.md
c8f48df
to
381ccb2
Compare
Done, also added relevant docs to this in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me stats and metrics are two different things, but except that the PR LGTM! Names are hard :)
:3366/metrics
):4466/metrics
)As part of #887