8000 Prometheus support by NDStrahilevitz · Pull Request #1404 · aquasecurity/tracee · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Prometheus support #1404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 14, 2022
Merged

Conversation

NDStrahilevitz
Copy link
Collaborator
@NDStrahilevitz NDStrahilevitz commented Jan 26, 2022
  • Adds prometheus metrics endpoint to tracee-ebpf cli enabled by flag (defaults to :3366/metrics)
  • Adds prometheus metrics endpoint to tracee-rules cli enabled by flag (defaults to :4466/metrics)
  • Adds engine stats
  • Moves tracee-ebpf counter to it's own package (reused in engine stats)
  • Adds metrics package which registers tracee-ebpf and tracee-rules stats to prometheus
  • Move stats definition to metrics package (as part of stabilization extract stats package from package tracee #1250)

As part of #887

@NDStrahilevitz
Copy link
Collaborator Author

I've seen an opportunity to solve #1250 here, but since both tracee and engine now expose stats, i'm not sure about a shared stats pkg, I think it could be done better @itaysk

@NDStrahilevitz NDStrahilevitz force-pushed the prometheus-support branch 2 times, most recently from dd5dbe2 to 3389bcc Compare January 27, 2022 14:41
@NDStrahilevitz NDStrahilevitz marked this pull request as ready for review January 27, 2022 16:26
Copy link
Collaborator
@itaysk itaysk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thanks! I only took a quick look but perhaps counter doesn't need to be a package and can be merged with stats?

@itaysk
Copy link
Collaborator
itaysk commented Jan 29, 2022

out of scope for this PR but in a later PR we should add labels to make this more usable

@NDStrahilevitz
Copy link
Collaborator Author

this is great, thanks! I only took a quick look but perhaps counter doesn't need to be a package and can be merged with stats?

I see the point of this, but the stats package has been split in the end between rules and ebpf into separate stats packages so it can't be merged with counter currently.

)

func RegisterRulesEngineMetrics(engine *engine.Engine) error {
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{...}; err != nil {.....}

)

func RegisterEbpfEngineMetrics(t *tracee.Tracee) error {
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{
err := prometheus.Register(prometheus.NewCounterFunc(prometheus.CounterOpts{...}; err != nil {...}

@mtcherni95 mtcherni95 added this to the v0.7.0 milestone Jan 31, 2022
Copy link
Contributor
@mtcherni95 mtcherni95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more note:
with this commit, if the user passes the --metrics flag, it will, by default, open a Prometheus endpoint by calling the RegisterEbpfEngineMetrics (for tracee-ebpf at least).
Maybe in future we want to give the ability to open endpoint to other services, such as Datadog, Elastic and such. If so, then the current strategy is less effective. As a suggestion, the CLI could specify to which metrics services we could register for example
--metrics=prometheus or --metrics=prometheus,datadog
If so, then maybe package metrics (for both tracee-rules and tracee-ebpf), could have inside several files, one for each metrics service. For example: tracee-rules/metrics/prometheus.go.

It is all a matter to how much "flexible" we want to be, which is open to discussion.
What are your thoughts on it @itaysk , @danielpacak ?

@mtcherni95
Copy link
Contributor

One more note: with this commit, if the user passes the --metrics flag, it will, by default, open a Prometheus endpoint by calling the RegisterEbpfEngineMetrics (for tracee-ebpf at least). Maybe in future we want to give the ability to open endpoint to other services, such as Datadog, Elastic and such. If so, then the current strategy is less effective. As a suggestion, the CLI could specify to which metrics services we could register for example --metrics=prometheus or --metrics=prometheus,datadog If so, then maybe package metrics (for both tracee-rules and tracee-ebpf), could have inside several files, one for each metrics service. For example: tracee-rules/metrics/prometheus.go.

It is all a matter to how much "flexible" we want to be, which is open to discussion. What are your thoughts on it @itaysk , @danielpacak ?

I think that what we could do is that once we add another data source that we can proceed with the above suggestion. @itaysk @danielpacak

Copy link
Contributor
@danielpacak danielpacak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! These are great insights exposed as metrics. I left few comments regarding structs and dependencies that we introduce between Go packages in this PR

statStore statStore
}

type statStore struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need both statStore and stats.RulesStats? Can we have one struct that represents Tracee-rules metrics and pass it to NewEngine constructor and to the method that actually registers metrics?

Copy link
Collaborator Author
@NDStrahilevitz NDStrahilevitz Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting proposal, i'll PoC it myself and if it works i'll probably introduce it.

The reason it was done this way is because that's how it's done in tracee-ebpf but if I change it here i'll change it there as well.

Copy link
Contributor
@danielpacak danielpacak Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking at the way we use Prometheus Go SDK I'm wondering why we use so many custom structs. Is it absolutely necessary? I think the simplest approach would be:

// Define a single struct that groups tracee-rules metrics
type Metrics struct {
	EventsCount     prometheus.Counter
	DetectionsCount prometheus.Counter
	SignaturesCount prometheus.Counter
}

func main() {
	// use promauto package to register counters
	metrics := Metrics{
		EventsCount: promauto.NewCounter(prometheus.CounterOpts{
			Namespace: "tracee_rules",
			Name:      "events_total",
			Help:      "events ingested by tracee-rules",
		}),
		DetectionsCount: promauto.NewCounter(prometheus.CounterOpts{
			Namespace: "tracee_rules",
			Name:      "detections_total",
			Help:      "detections made by tracee-rules",
		}),
		SignaturesCount: prometheus.NewCounter(prometheus.CounterOpts{
			Namespace: "tracee_rules",
			Name:      "signatures_total",
			Help:      "signatures loaded",
		}),
	}

	// Pass metrics to the engine; see my comment in another PR about functional options where metrics can be an optinal arg passed to engine constructor.
	NewEngine(metrics)
}

// matchHandler is a function that runs when a signature is matched
func (engine *Engine) matchHandler(res types.Finding) {
	// Increment counters whenever you like
	metrics.DetectionsCount.Inc()
	engine.output <- res
}

Copy link
Contributor
@danielpacak danielpacak Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that came to my mind just now is that with dynamically loaded signatures the SignaturesCount may decrease so the counter is probably not appropriate in our case. Instead of Counter we may use Gauge which has inc, dec, and set methods on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is now structured using you suggestion by registering the stat struct itself. I've also redefined the signatures count as a "gauge" instead of a counter in prometheus

@NDStrahilevitz NDStrahilevitz force-pushed the prometheus-support branch 2 times, most recently from 7ee1e72 to 1c4eb1c Compare February 3, 2022 14:38
@NDStrahilevitz
Copy link
Collaborator Author

@danielpacak @mtcherni95 I've restructured the solution to decouple the metrics packages from Tracee and Engine, the register functions now take pointers to the Stats struct which is exported from Tracee/Engine. I've also taken into account some other changes you're requested so a re-review is probably required since the PR looks somewhat different now.

@itaysk
Copy link
Collaborator
itaysk commented Feb 4, 2022

@NDStrahilevitz @mtcherni95 some DoD items: tests, documentation, add to entrypoint

@NDStrahilevitz NDStrahilevitz force-pushed the prometheus-support branch 5 times, most recently from 7df26e0 to c8f48df Compare February 9, 2022 10:06
@NDStrahilevitz
Copy link
Collaborator Author

@NDStrahilevitz @mtcherni95 some DoD items: tests, documentation, add to entrypoint

What specifically should be done with regards to the entrypoint? I've added tests for the counter and documented how the metrics registering works near the relevant function.

Copy link
Contributor
@mtcherni95 mtcherni95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
@itaysk I believe we can merge if you don't have anything else to add.

@itaysk
Copy link
Collaborator
itaysk commented Feb 10, 2022

What specifically should be done with regards to the entrypoint?

add it to the entrypoint.. I mean if you want someone to actually use this feature it should be enabled

@NDStrahilevitz
Copy link
Collaborator Author

What specifically should be done with regards to the entrypoint?

add it to the entrypoint.. I mean if you want someone to actually use this feature it should be enabled

Sorry for the confusion but do you mean enabling the --metrics flag by default through hard coding in the entrypoint? Because I assumed that people would enable by passing the flag if they want (as is done currently)

@itaysk
Copy link
Collaborator
itaysk commented Feb 14, 2022

yes that's what I meant. you should assume that users wouldn't manually configure Tracee, so if you want a feature to be available, you should enable it for them.

Stats are tracked through register functions which take a pointer to Stats.
Stats can be exported from engine/tracee through Stats()
document prometheus in integrations.md
@NDStrahilevitz
Copy link
Collaborator Author
NDStrahilevitz commented Feb 14, 2022

yes that's what I meant. you should assume that users wouldn't manually configure Tracee, so if you want a feature to be available, you should enable it for them.

Done, also added relevant docs to this in docs/integrations.md.

Copy link
Contributor
@danielpacak danielpacak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me stats and metrics are two different things, but except that the PR LGTM! Names are hard :)

@mtcherni95 mtcherni95 merged commit 47e6638 into aquasecurity:main Feb 14, 2022
@NDStrahilevitz NDStrahilevitz deleted the prometheus-support branch February 20, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0