Description
- Version of collectd: 5.4.0-3ubuntu2.2
- Operating system / distribution: Ubuntu, Trusty (14.04)
Expected behavior
At startup, all exec plugins begin executing
Actual behavior
Intermittently, some will not startup; in addition to the main process there will be one or more "collectd -C /etc/collectd/collectd.conf -f" child processes running, still as root, stalled. Attaching with gdb and getting a backtrace from the stuck to-be-exec's gives:
#0 0x00007f1611faff1c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f1611fab649 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f1611fab470 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f160e6b0e23 in ?? () from /usr/lib/x86_64-linux-gnu/libp11-kit.so.0
#4 0x00007f1611a92fff in fork () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f160b81b7a9 in ?? () from /usr/lib/collectd/exec.so
#6 0x00007f160b81c048 in ?? () from /usr/lib/collectd/exec.so
#7 0x00007f1611fa9184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8 0x00007f1611acc37d in clone () from /lib/x86_64-linux-gnu/libc.so.6
indicating a deadlock in the fork, caused by whatever libp11-kit is locking on. Killing (-9) the child process has always (so far) resulted in the exec plugin promptly starting correctly.
Steps to reproduce
- Have at least two Exec plugins (not observed anywhere with only one, but that may be irrelevant)
- Restart collectd (a few times)
- Observe stuck processes.
I'm working through the code but nothing is jumping out at me yet. I'm also at a loss to know why libp11-kit is getting involved at all except as some implicit dependency of libgnutls. I rather suspect if I could convince libp11 not to be involved the problem would go away.