8000 Prometheus logger further performance improvements by kabenin · Pull Request #330 · dmachard/DNS-collector · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Prometheus logger further performance improvements #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 21, 2023

Conversation

kabenin
Copy link
Contributor
@kabenin kabenin commented Jun 6, 2023

type Prometheus now implements prometheus.Collector interface. This allows to remove most of prometheus.XXXVec fields, that were keeping duplicates of internal counters duplicates (the ones, updated in Record() function)

After this change, we do not need updating prometheus.XXXVec values every second. Instead, we generate Values for metrics on scraping (which runs Collect() function).

function ComputeMetrics renamed to ComputeEventsPerSecond, to beter reflect it's work.

added a few more test cases to cover more metrics and EPS updates.

type Prometheus now implements prometheus.Collector interface.
This allows to remove most of prometheus.XXXVec fields, that
were keeping duplicates of internal counters duplicates (the ones, updated in Record() function)

After this change, we do not need updating prometheus.XXXVec values every second.
Instead, we generate Values for metrics on scraping (which runs Collect() function).

function ComputeMetrics renamed to ComputeEventsPerSecond, to beter reflect it's work.

added a few more test cases to cover more metrics and EPS updates. That also called
to some additions to dnsutils.GetFakeDnsMessage() function
@dmachard
Copy link
Owner
dmachard commented Jun 6, 2023

Thank for this great PR,
Some tests are failed because of the update of dnsutils.GetFakeDnsMessage() , can you take a look ?

All values, needed to test Prometheus module are moved inside tests
themselves.
@kabenin
Copy link
Contributor Author
kabenin commented Jun 6, 2023

Updates to fix failing tests

@kabenin
Copy link
Contributor Author
kabenin commented Jun 6, 2023

Looks like tests look fine now.

@dmachard
Copy link
Owner
dmachard commented Jun 7, 2023

I made some tests under high load (~50 000 events/ops) to check the improvement regarding CPU usage but I have some race conditions issues with this PR. Can you fix-them ?

==================
WARNING: DATA RACE
Read at 0x00c0001cd530 by goroutine 68:
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Collect()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:789 +0x4a8
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
      /home/denis/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/registry.go:455 +0x145

Previous write at 0x00c0001cd530 by goroutine 39:
  runtime.mapassign_faststr()
      /home/denis/sdk/go1.20.3/src/runtime/map_faststr.go:203 +0x0
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Record()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:708 +0x37b8
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Process()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1075 +0x2c4
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Run.func2()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1026 +0x39

Goroutine 68 (running) created at:
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather()
      /home/denis/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/registry.go:547 +0x1279
  github.com/prometheus/client_golang/prometheus.(*noTransactionGatherer).Gather()
      /home/denis/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/registry.go:1073 +0x44
  github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1()
      /home/denis/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/promhttp/http.go:135 +0x107
  net/http.HandlerFunc.ServeHTTP()
      /home/denis/sdk/go1.20.3/src/net/http/server.go:2122 +0x4d
  net/http.(*ServeMux).ServeHTTP()
      /home/denis/sdk/go1.20.3/src/net/http/server.go:2500 +0xc5
  net/http.serverHandler.ServeHTTP()
      /home/denis/sdk/go1.20.3/src/net/http/server.go:2936 +0x682
  net/http.(*conn).serve()
      /home/denis/sdk/go1.20.3/src/net/http/server.go:1995 +0xbd4
  net/http.(*Server).Serve.func3()
      /home/denis/sdk/go1.20.3/src/net/http/server.go:3089 +0x58

Goroutine 39 (running) created at:
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Run()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1026 +0x399
  main.main.func2()
      /home/denis/Lab/github.com/go-dns-collector/dnscollector.go:307 +0x48

@dmachard
Copy link
Owner
dmachard commented Jun 7, 2023

Just to give you more details.

I like your code because it's more cleaner, but in my test environment,the CPU usage is more important with your code then my last PR #325

Perhaps make your code safe (by adding lock) consume more CPU under high load ?

@kabenin
Copy link
Contributor Author
kabenin commented Jun 7, 2023

Hi @dmachard! Thanks for sharing your load testing data. I'll definitely work on race conditions. Hopefully, that will not make it slower compared to PR #325.
If you could share your load testing setting, I would happy to test it on my side before submitting results.

Thanks!

@dmachard
Copy link
Owner
dmachard commented Jun 7, 2023

My setup is basic: ns1labs/flame ----> dnsdist ------> dnscollector

My dnsdist config (just a extract)

-- copy the traffic 4 times to make stress test more high
rl1 = newRemoteLogger("<your_dnscollector_ip>:6000")
rl2 = newRemoteLogger("<your_dnscollector_ip>:6000")
rl3 = newRemoteLogger("<your_dnscollector_ip>:6001")
rl4 = newRemoteLogger("<your_dnscollector_ip>:6001")

addAction(AllRule(),RemoteLogAction(rl1, nil, {serverID="dnsdist1"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl1, nil, true, {serverID="dnsdist1"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl1, nil, true, {serverID="dnsdist1"}))

addAction(AllRule(),RemoteLogAction(rl12, nil, {serverID="dnsdist2"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl2, nil, true, {serverID="dnsdist2"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl2, nil, true, {serverID="dnsdist2"}))

addAction(AllRule(),RemoteLogAction(rl3, nil, {serverID="dnsdist3"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl3, nil, true, {serverID="dnsdist3"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl3, nil, true, {serverID="dnsdist3"}))

addAction(AllRule(),RemoteLogAction(rl4, nil, {serverID="dnsdist4"}))
addResponseAction(AllRule(),RemoteLogResponseAction(rl4, nil, true, {serverID="dnsdist4"}))
addCacheHitResponseAction(AllRule(), RemoteLogResponseAction(rl4, nil, true, {serverID="dnsdist4"}))


-- spoof test.com to keep traffic local
addAction(AndRule({RegexRule(".*\\.test\\.com"), OrRule({QTypeRule(DNSQType.A),QTypeRule(DNSQType.AAAA) })  }), SpoofAction({"192.0.2.2", "2001:db8::3:1"}))
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.CNAME)  }), SpoofCNAMEAction("aaa.com", {}))
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.TXT)  }), SpoofRawAction({"\003aaa\004bbbb", "\003ccc"}) )
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.MX)  }), SpoofRawAction({"\000\010\004smtp\006google\003com\000"}) )
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.NS)  }), SpoofRawAction({"\003ns\049\006google\003com\000"}) )
addAction(AndRule({RegexRule(".*\\.test\\.com"), QTypeRule(DNSQType.SOA)  }), SpoofRawAction({"\003ns\049\006google\003com\000\009dns\045admin\006google\003com\000\031Eb\020\000\000\003\132\000\000\003\132\000\000\007\008\000\000\000\060"}) )

dnscollector config:

multiplexer:
  collectors:
    - name: tap
      powerdns:
        listen-ip: 0.0.0.0
        listen-port: 6000
    - name: tap2
      powerdns:
        listen-ip: 0.0.0.0
        listen-port: 6001
  loggers:
    - name: prom
      prometheus:
        listen-ip: 0.0.0.0
        listen-port: 8081
        basic-auth-enable: false

  routes:
    - from: [ tap, tap2 ]
      to: [ prom ]

dns traffic generator

docker run ns1labs/flame -p 53 -Q 15000 <your_dnsdist_ip> -g randomlabel lblsize=20 lblcount=10 count=100000

@kabenin
Copy link
Contributor Author
kabenin commented Jun 9, 2023

Hi @dmachard , thanks a lot for sharing your load testing setup!
With the latest updates it runs OK under the load along with 100 qps 'scrapes'.

@dmachard
Copy link
Owner

I make some new tests and the following race conditions occurred

WARNING: DATA RACE
Write at 0x00c0004226e0 by goroutine 37:
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).ComputeEventsPerSecond()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:950 +0x7da
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Process()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1086 +0x2ca
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Run.func2()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1033 +0x39

Previous read at 0x00c0004226e0 by goroutine 222:
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Collect()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:788 +0x22e
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
      /home/denis/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/registry.go:455 +0x145

WARNING: DATA RACE
Read at 0x00c000002580 by goroutine 494:
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Collect()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:788 +0x22e
  github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
      /home/denis/go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/registry.go:455 +0x145

Previous write at 0x00c000002580 by goroutine 37:
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).ComputeEventsPerSecond()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:950 +0x7da
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Process()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1086 +0x2ca
  github.com/dmachard/go-dnscollector/loggers.(*Prometheus).Run.func2()
      /home/denis/Lab/github.com/go-dns-collector/loggers/prometheus.go:1033 +0x39

@kabenin
Copy link
Contributor Author
kabenin commented Jun 20, 2023

Hi @dmachard!
Thank you for finding yet another race condition! I reproduced it. Fixed and does not show up anymore with -race.

@dmachard
Copy link
Owner
dmachard commented Jul 7, 2023

Hi @dmachard! Thank you for finding yet another race condition! I reproduced it. Fixed and does not show up anymore with -race.

Thanks for the fix,
I will take a look in next weeks, I am missing some time right now.

@kabenin
Copy link
Contributor Author
kabenin commented Jul 7, 2023

Hi @dmachard , no rush at all! It is quite a change, and I do appreciate your thoughtful and cautious approach!

I'm currently working on another change, on top of this one. Currently, the 'anchor' label for all metrics is the stream_id. However, in our setup it is very important to have NetworkInfo.QueryIp as a label. I'm trying to create a configurable labeling, that, I hope, others may benefit from too. And I keep the performance in mind too. I hope you will consider that PR too.

Is it something you want to see in this PR, or as a separate one later on?

Thanks!

@dmachard
Copy link
Owner
dmachard commented Jul 8, 2023

I'm currently working on another change, on top of this one. Currently, the 'anchor' label for all metrics is the stream_id. However, in our setup it is very important to have NetworkInfo.QueryIp as a label. I'm trying to create a configurable labeling, that, I hope, others may benefit from too. And I keep the performance in mind too. I hope you will consider that PR too.

Is it something you want to see in this PR, or as a separate one later on?

As a separate PR, the release notes will be more clear.
Thanks a lot for your contributions

@dmachard dmachard merged commit a223852 into dmachard:main Jul 21, 2023
@kabenin kabenin deleted the improve_prometheus_custom_collectors branch August 21, 2023 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0