-
Notifications
You must be signed in to change notification settings - Fork 1.2k
concurrency issue with apache + network plugin when gcrypt is used #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The line reads: if ((st->curl = curl_easy_init ()) == NULL) Since no argument is passed to How many hosts / Apache instances do you have configured? Do you use other plugins that use cURL? —octo |
there is only single host (localhost) is set up. And I've just noticed that it's a |
Maybe completely unrelated, but this sounds similar to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=735173 @zerkms, out of curiosity, does the problem still occur if you set the network plugin's |
problem disappears for |
So this could probably be the same issue. @tokkee suggested it could be a thread-safety issue with libgcrypt. If it's the case, #156 and #88 might not be foreign to this. According to http://www.gnupg.org/documentation/manuals/gcrypt/Multi_002dThreading.html "If your application is multi-threaded, you must set the thread support callbacks with the GCRYCTL_SET_THREAD_CBS command before any other function in the library.". This was respected before 80d8e67 but no longer since, it seems. @ChrisLundquist, can you comment on this ? Also, @pieterlexis you might want to follow this conversation. |
It has been a while since I looked at this, but I think I was hitting a similar issue when I was using the network plugin and the curl plugin. Looking over my write up as you pointed to in #88 It seems like we're initializing the library in multiple places, or perhaps curl was.
The mailing list reply linked in the comment says
So we should be safe having that if condition wrapping it. I missed the exception in my implementation as the mailing list doesn't mention the threading flag specifically. Sounds like swapping the threading flag line and the version check line will fix this. |
Nope, I don't have an environment (and experience) to compile collectd :-( |
@zerkms you can use my project to build collect https://github.com/obazoud/collectd-build |
Thanks for pinging me in this convo. If I have time I'll try to build a package today (but the package buildsystem is kind of fucky) |
@obazoud I would be better if someone confident confirmed the issue went away. I'm not even sure how exactly to reproduce the issue :-S |
Cool, ChrisLundquist@d219776 should apply cleanly to most any tag, or be trivial to reproduce. I'd feel best if someone else was able to confirm it fixes this issue. If it doesn't I'm happy to continue working through it. Thank you all for finding this bug and helping me fix it. |
@ChrisLundquist In the debian sources for wheezy (collectd 5.1.0) it doesnt apply. The only gcry_control block is located in
Looking at your patch (calling |
I based that patch from master, which looks pretty close to collectd-5.3.1. In 4.10.1 (the reported version of this issue) it doesn't look like we call It looks like my changes in #156 first appear in collectd-4.10.8 Maybe I didn't introduce this bug? Either way, from what I've read, it sounds like #515 is more correct and would love to have it tested and merged. I wonder if we could get the ubuntu collectd package updated to at least 4.10.lastest or maybe even go crazy with 5.3.latest? |
Oh right, 12.04 LTS. Makes getting a package update there difficult since 14.04 LTS is just around the corner. FWIW (shoutout to @tokkee 💖 )
But that doesn't really help 12.04 LTS users. |
I've also tried switching these 2 statements like in d219776, but the problem I faced is that I was unable to build a collectd which reproduces the bug without the patch to start with :( Collectd 5.4.0-3 from debian jessie reliably crashes with the config snippet from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=735173 but doesn't anymore as soon as I build my own 5.4.0 on this platform. @zerkms, can you share your configuration that makes collectd crash on 4.10.1-2.1ubuntu7 ? @pieterlexis out of curiosity, when did the problem start for you ? After a config change or suddenly with no reason like it did for @zerkms ? @ChrisLundquist I guess your patch doesn't apply on 5.1.0 because of 0ec776a. This patch only appeared in the tree in 5.1.3. It's trivial to backport it if needed. |
@mfournier I will, tomorrow |
@mfournier It started when I added the apache plugin to an existing installation (where encryption was enabled). |
@mfournier @ChrisLundquist I applied the patch in 0ec776a and d219776 and built a new package from 5.1.0-3. The problem indeed doesn't occur 👍 I'll publish the dpatch/source package asap |
Patched packages for Debian Wheezy can be found here |
👍 |
I confirm @pieterlexis's dpatch solves the problem, both for debian wheezy's 5.1.0-3 and ubuntu precise's 4.10.1-2.1ubuntu7. I've been running Therefore +1 for merging @ChrisLundquist's patch. I'm going to adapt it for collectd-4.10 branch as requested by @octo, and I also want to add a comment so that we don't shoot ourselves in the foot in the future. |
Thank you @mfournier . I wasn't quite sure what @octo was asking. I didn't see a 4.10 branch I could PR against. If you would /cc me on the PR / commit, I'd like to see and learn. The only question I'm not entirely clear on is how we were seeing this in collectd-4.10.1 Maybe because we weren't calling |
@zerkms, oh and here's what I did to build a patched package for ubuntu 12.04:
Alternatively, this workaround in collectd.conf seems to work too, but your collectd will probably significantly underperform:
The same procedure works for 5.1.0 BTW. |
@ChrisLundquist, the same is true for 5.1.0, as 80d8e67 was only introduced in 5.1.1/4.10.8 and 0ec776a in 5.1.3/4.10.9. But frankly, I don't know. An explanation would be that this bug was already there before 80d8e67, and we just got mislead in assuming it was the cause by @pieterlexis's comment in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=735173 I just double checked by building a stock 5.1.0 without all the packaging overhead/dpatches/etc, and I confirm the bug is there already. I didn't manage to build 4.10.1: too outdated and broken, but the code in network.c is basically the same, regarding gcrypt's use. So I'm pretty confortable with your patch. More out of thorough testing than perfect understanding though :-/ |
This is a shot in the dark in trying to address #513. By calling this from an init() callback, I hope to be initializing the curl and gcrypt libraries before collectd becomes multi-threaded, avoiding the problems described in the issue.
@ChrisLundquist, your patch just got included in the collectd-4.10 branch as ddffda7. It will soon be merged in collectd-5.3 and collectd-5.4, and will be part of the next bugfix releases. Many thanks to all of you on this issue ! According to @octo, part of the problem was that the apache plugin initializes curl after collectd gets multithreaded, and as curl uses libgcrypt, it could get initialized too late (ie: from inside a thread). An attempt to correct this is 5f2f969. And if these 2 patches don't solve the issue, the next step would be to initalize libgcrypt outside of the plugins, to ensure it's done before any thread gets created. |
As promised here is the config (joined from multiple files and with comments and sensitive data removed) that leads to crashes on my server:
Moving out apache's section solves the issue |
This is a shot in the dark in trying to address collectd#513. By calling this from an init() callback, I hope to be initializing the curl and gcrypt libraries before collectd becomes multi-threaded, avoiding the problems described in the issue.
This is a shot in the dark in trying to address collectd#513. By calling this from an init() callback, I hope to be initializing the curl and gcrypt libraries before collectd becomes multi-threaded, avoiding the problems described in the issue.
The latest ubuntu 12.04 LTS collectd version (
4.10.1-2.1ubuntu7
) crashes on curl initialization in apache plugin. Here is the stacktrace:Some weirdness: it worked until friday, then suddenly started crashing, then worked for one day and at this moment it is still crashing on start. Nothing was installed/reconfigured on the server
The text was updated successfully, but these errors were encountered: