-
Notifications
You must be signed in to change notification settings - Fork 1.2k
collectd crashes when used nested includes #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note:
compiled with these options:
|
After a Include level have been removed ( I've maintained only a Include /opt/collectd/etc/metrics/*.conf ) no more segfault errors found, but I'm getting yet "SIGNAL 6 "Aborted" , signal randomly. Once started all seems ok.
The cores give us the following information.
|
As the configuration is parsed just once, at startup, it seems unlikely to me that an issue with the config file parser would cause collectd to crash later on. Can you please check that by copy-pasting the files you include in your main main configuration file instead of these Include statements ? As you mention, a thread concurrency issue could be the cause (according to the symptoms and the backtraces). Can you try disabling the loaded plugins one by one ? This might give a hint of which one is causing trouble ? #114 and #526 ring a bell to me as possibly related. |
Hi Marc. I will do what you are requesting me as soon as I have a bit of time ( it requires a lot of changes), in a few days I hope . I will report you on that soon. |
maybe starting collectd with valgrind (memcheck or maybe hellgrind) can give more detailed informations on this? |
This looks very much like this issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750440 |
yes! but I'm having this issue in both systems debian and redhat... As a workaround, I've wrote a init.d script that starts collectd inside a loop and waits for some seconds and checks it after . it try to start daemon for at least 10 interactions. It requires usually 2 or 3 restarts to start ok. as a side effect, all forked processes started with exec plugin are hung after crash and must be removed by hand. In the other hand once the daemon have been started ok, collectd works fine with all my plugins ("jvm","oracle","exec", system plugins.. etc) . |
I've executed valgrind with a collectd with the following plugins ( on debian ) /opt/collectd/etc/metrics/plugin_apache.conf Here , the results form memcheck and helgrind. (for anyone who would like to check) https://gist.github.com/toni-moreno/a2f80021535f87202de7 Summary: *memcheck: some leak issues on cpu.c and plugin.c to review |
@toni-moreno, could you please give dothebart@e09d935 a try and report back if this solves the problem for you ? Thanks :) |
@mfournier and @dothebart I've tested your patch and it doesn't solve my crashes. I've reviewed our crashed and I've notice they are not really the same crash because of its causing signal ( always signal 6, Aborted in my case) and segfault in yours as you can see in the title of the bug |
@toni-moreno, can you give the current master branch a try and let us know if you're still experiencing this problem ? @dothebart's patch got merged earlier today, but I'm not sure if it's the same one we talked about back in june. Thanks for your perseverance tracking this down :-) |
Hi @mfournier . Sorry for the late response ( I was out enjoying some vacation days) . Now I have running on top of 50 production servers a customized version of collectd ( and no more crashes detected) My collectd was made from ( d76d251 commit) my three Pull Request: ( #585 #577 #576 ) And @dothebart patch ( dothebart@911b17c ) And All OK from 2 months. I will be pleased to test it when I can change my compiled version for a new more actualized one but I need my "still" opened 3 pull request in the master branch also. If you can merge my 3 PR I will test it ( on test environments first and production after.) . Thank you very much |
Hi @mfournier After 4 month deploying on production system with ( d76d251 commit) , my three Pull Request: ( #585 #577 #576 ) and the @dothebart patch I think we can close happily this issue. Lots of thanks to both. |
I've configured in /opt/collectd/etc/collectd.conf
and inside the metrics configuration
a /opt/collectd/etc/metrics/apache_generic.conf with
a /opt/collectd/etc/metrics/jmx_generic.conf with
a /opt/collectd/etc/metrics/jmx_generic.conf with
a /opt/collectd/etc/metrics/oracle_generic.conf with
and so on:
While staring the daemon it crashes , randomly and I'm not able to reproduce the problem.
I've configured core and I've generated a lot of them ( system and java dumps)
But it seems have no sense since on each crash the backtrace is different, it seems like a concurrency problem, any idea on how to fix or bypass this problem ?
The text was updated successfully, but these errors were encountered: