-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ipmi in combination with java plugins crashes #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe something to do with the signal handling of IPMI? |
Hi, do you by any chance have a stack trace from such a crash? Best regards, |
There is no stack trace, just a JVM crash (see above). |
As a work-around I run two instances of collectd. One with the IMPI plugin, and one with the Java plugin. |
is this any use?
|
still present in debian wheezy package collectd_5.1.0-3 |
Built collectd_5.4.0-3ubuntu1_amd64.deb package from Trusty Tahir for our current server build (Precise) and crash is still present. |
Now we have a backtrace for this, I was wondering if it wouldn't be another of these thread concurrency issues. @tokkee, @ChrisLundquist, @katzj are you maybe able to comment on this ? Thanks :-) |
Just to add a little more detail, I have ipmi configured thus...
and java like this...
Removing either java or ipmi configuration prevents the core dump. I've tried adding "ReadThreads 1" and "WriteThreads 1" to the main section of collectd.conf, but the crash still occurs. |
@robmbrooks if you use matching ``` it might help clear up the formatting.
More info here |
@mfournier if I recall correctly, the previous threading issue I looked at had to do with multiple libraries initializing libgcrypt (and libgcrypt not getting the threadsafe flag). I'm not seeing libgcrypt in the backtrace. Though, I could see java trying to use it in some native extension. |
Not sure if this is related, but 513a5ca which was commited a few hours ago seems to do a better job of cleaning up threads. @robmbrooks if you still have your access to this environment where the problem occurs, could you maybe try cherry-picking this patch and let us know if it helps ? Thanks ! |
I am also hit by this problem and 513a5ca doesn't solve it. It still happens with the master branch. I'll try to debug that. |
My current understanding is that OpenIPMI is using a signal to interrupt "wake" its event loop. It does that when timers are changed to be able to recompute the timeout correctly. This is a bit crazy design. However, it is also sending its signal to the exact same thread it is currently running (it is not currently blocked on select()). I have the current bt:
No other thread is currently running in The problem is that the java plugin registered an handler for the very same signal for suspending/resuming its own thread. However, it is possible to change the signal to something else using OpenIPMI shouldn't use signals. A workaround is to export |
Java uses SIGUSR2 to suspend/resume threads. The OpenIPMI plugins also need a signal to resume its event loop when setting a timer. They can't both use the same signal. We ask OpenIPMI to use SIGIO instead. This should fix collectd#114.
Java uses SIGUSR2 to suspend/resume threads. The OpenIPMI plugins also need a signal to 5C42 resume its event loop when setting a timer. They can't both use the same signal. We ask OpenIPMI to use SIGIO instead. This should fix #114.
When using the ipmi plugin in combination with the java plugin, collectd crashes.
The text was updated successfully, but these errors were encountered: