-
-
Notifications
You must be signed in to change notification settings - Fork 53
Prometheus logger further performance improvements #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus logger further performance improvements #330
Conversation
type Prometheus now implements prometheus.Collector interface. This allows to remove most of prometheus.XXXVec fields, that were keeping duplicates of internal counters duplicates (the ones, updated in Record() function) After this change, we do not need updating prometheus.XXXVec values every second. Instead, we generate Values for metrics on scraping (which runs Collect() function). function ComputeMetrics renamed to ComputeEventsPerSecond, to beter reflect it's work. added a few more test cases to cover more metrics and EPS updates. That also called to some additions to dnsutils.GetFakeDnsMessage() function
Thank for this great PR, |
All values, needed to test Prometheus module are moved inside tests themselves.
Updates to fix failing tests |
Looks like tests look fine now. |
I made some tests under high load (~50 000 events/ops) to check the improvement regarding CPU usage but I have some race conditions issues with this PR. Can you fix-them ?
|
Just to give you more details. I like your code because it's more cleaner, but in my test environment,the CPU usage is more important with your code then my last PR #325 Perhaps make your code safe (by adding lock) consume more CPU under high load ? |
My setup is basic: ns1labs/flame ----> dnsdist ------> dnscollector My dnsdist config (just a extract) -- copy the traffic 4 times to make stress test more high
rl1 = newRemoteLogger("<your_dnscollector_ip>:6000")
rl2 = newRemoteLogger("<your_dnscollector_ip>:6000")
rl3 = newRemoteLogger("<your_dnscollector_ip>:6001")
rl4 = newRemoteLogger("<your_dnscollector_ip>:6001")
addAction(AllRule(),RemoteLogAction(rl1, nil, {serverID="dnsdist1"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl1, nil, true, {serverID="dnsdist1"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl1, nil, true, {serverID="dnsdist1"}))
addAction(AllRule(),RemoteLogAction(rl12, nil, {serverID="dnsdist2"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl2, nil, true, {serverID="dnsdist2"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl2, nil, true, {serverID="dnsdist2"}))
addAction(AllRule(),RemoteLogAction(rl3, nil, {serverID="dnsdist3"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl3, nil, true, {serverID="dnsdist3"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl3, nil, true, {serverID="dnsdist3"}))
addAction(AllRule(),RemoteLogAction(rl4, nil, {serverID="dnsdist4"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl4, nil, true, {serverID="dnsdist4"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl4, nil, true, {serverID="dnsdist4"}))
-- spoof test.com to keep traffic local
addAction(AndRule({RegexRule(".*\\.test\\.com"), OrRule({QTypeRule(DNSQType.A),QTypeRule(DNSQType.AAAA) }) }), SpoofAction({"192.0.2.2", "2001:db8::3:1"}))
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.CNAME) }), SpoofCNAMEAction("aaa.com", {}))
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.TXT) }), SpoofRawAction({"\003aaa\004bbbb", "\003ccc"}) )
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.MX) }), SpoofRawAction({"\000\010\004smtp\006google\003com\000"}) )
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.NS) }), SpoofRawAction({"\003ns\049\006google\003com\000"}) )
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.SOA) }), SpoofRawAction({"\003ns\049\006google\003com\000\009dns\045admin\006google\003com\000\031Eb\020\000\000\003\132\000\000\003\132\000\000\007\008\000\000\000\060"}) ) dnscollector config:
dns traffic generator
|
Hi @dmachard , thanks a lot for sharing your load testing setup! |
I make some new tests and the following race conditions occurred
|
Updated from upstream
….com/kabenin/go-dnscollector into improve_prometheus_custom_collectors
It does not seem to be needed, but just to be extra cautious.
Hi @dmachard! |
Thanks for the fix, |
Hi @dmachard , no rush at all! It is quite a change, and I do appreciate your thoughtful and cautious approach! I'm currently working on another change, on top of this one. Currently, the 'anchor' label for all metrics is the stream_id. However, in our setup it is very important to have NetworkInfo.QueryIp as a label. I'm trying to create a configurable labeling, that, I hope, others may benefit from too. And I keep the performance in mind too. I hope you will consider that PR too. Is it something you want to see in this PR, or as a separate one later on? Thanks! |
As a separate PR, the release notes will be more clear. |
type Prometheus now implements prometheus.Collector interface. This allows to remove most of prometheus.XXXVec fields, that were keeping duplicates of internal counters duplicates (the ones, updated in Record() function)
After this change, we do not need updating prometheus.XXXVec values every second. Instead, we generate Values for metrics on scraping (which runs Collect() function).
function ComputeMetrics renamed to ComputeEventsPerSecond, to beter reflect it's work.
added a few more test cases to cover more metrics and EPS updates.