-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
Random Segfaults #3453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been hitting a random segfault as well starting yesterday on dev. Some tips I've seen for debugging:
And also using gdb:
|
Initial output with gdb running: http://hastebin.com/uvegobuweg.pl |
Recent crash from tonight with faulthandler: |
There are 47 threads running. Highlighted some things from the log that could be it. If we could get a log from another user we could compare. 16 callbacks are waiting for the eventloop (searched for After the segfault, the output from ping is printed to the command line, means that a command line sensor was running.
The recorder is writing a query:
Verisure was updating your lock (search for RFXtrx is reading from the serial connection (search for If possible, could you turn one of the following off 1 by 1 to see if it stops the segfaulting: command line switch, rfxtrx, recorder, verisure |
Here's my config, but it doesn't look like we have much in the way of component overlap: homeassistant: !include homeassistant.yaml
zone: !include zone.yaml
group: !include groups.yaml
scene: !include scenes.yaml
logbook:
frontend:
history:
discovery:
zeroconf:
sun:
http: !include http.yaml
mqtt: !include mqtt.yaml
device_tracker: !include device_trackers.yaml
sensor onlinedness: !include sensor-onlinedness.yaml
switch harmony hub: !include harmony-switches.yaml
switch template: !include template_switches.yaml
sensor forecast: !include forecast.yaml
sensor speedtest: !include speedtest.yaml
notify slack: !include notify-slack.yaml
light hue: !include hue.yaml
ecobee: !include ecobee.yaml
sleepiq: !include sleepiq.yaml
#zwave: !include zwave.yaml
input_boolean: !include input_booleans.yaml
emulated_hue: !include emulated_hue.yaml
vera: !include vera.yaml
script: !include_dir_named scripts
automation: !include_dir_merge_list automations Home Assistant release (hass --version): Python release (python3 --version): Running on a Raspberry Pi 2 with Raspberian Jessie. I didn't see any segfaults yesterday, but I have hass running under gdb now |
Crash from this night. Took longer this time. |
It's now been running without problems for a day and a half. |
If commandline stuff is to blame, we should port it over to use async stuff https://docs.python.org/3/library/asyncio-subprocess.html |
Two things were disabled. First Without command line stuff == crash |
Looks like I'm in the same boat, dmesg is showing me;
Closing #3484 as it's a dupe. |
@lwis : Which components are you using? |
mqtt Think I got all the components + platforms. |
@lwis : Could you try to disable the command_line component? |
@Danielhiversen sure, what's the thought behind why that would cause a segfault? |
No need to disable command line. It sems to be traced to any template stuff. |
Running |
Got a new somewhat different traceback: http://hastebin.com/ilepedefad.vbs |
Finally caught one in the act: https://gist.github.com/technicalpickles/23e097e213fcd4beb2c83c0e8cf7e06b It's at a gdb prompt. Anything I should grab while I have it? I tried |
If anyone with a segfault using the latest dev could |
@bbangert it looks like uvloop requires python 3.5. As far as I can tell, there isn't a Raspberian Jessie package for it. I can build from source to test, but wouldn't requiring it be a pretty significant change? |
@technicalpickles awwww, bummer. Yes, its not feasible to require it since it'd up the required python too far. |
@technicalpickles we wouldn't make it mandatory but it will help to be able to narrow down the seg faults to the default event loop implementation. |
Time to get a better overview of the segfault data thus far:
|
I can uninstall uvloop if required. I'm running an Alpine Docker image on Ubuntu 16.04, also using an amd64 machine. |
Everyone with a segfault is using a component with the template platform? And disabling the template platform config bits makes the segfaults go away? |
I've not tried disabling my template configuration, but I'm happy to branch On Sat, 24 Sep 2016, 7:01 am Ben Bangert, notifications@github.com wrote:
|
I updated to 0.29.3, and left it running overnight with python 3.5.2 and no evloop. I disabled the nmap device tracker, and template sensors. I still had discovery and a template switch enabled. I had another segfault in the morning 😓 I'm trying to disable discovery and template switch next. |
Updated to latest today, and i still have random segfaults. Have disabled several things to try, but makes no difference. Logs do not indicate really anything useful before a crash either. |
I don't believe anything has been done to improve the situation yet, there It's a shame that the occasional brief period of stability is frequently On Fri, 30 Sep 2016, 2:30 pm Thomas, notifications@github.com wrote:
|
I had a segfault overnight on 0.29.4. I've enabled to save core files here and let's what GDB will tell us. |
I have compiled python with debug flags and executed it through gdb. Managed to get the following information, maybe useful for someone with a better knowledge of python garbage collector. python3: Modules/gcmodule.c:364: update_refs: Assertion `_PyGCHead_REFS(gc) != 0' failed. Row 364 of gcmodules.c is commented with:
EDIT: |
I haven't seen any more segfaults since disabling discovery and template switch. Is there a point it makes sense to revert the changes to the core? |
I've pushed a branch that removes a possible issue with Python GC of the Task objects. If someone that has a segfault happen could give it a try and let me know if the segfaults persist that'd be great. https://github.com/home-assistant/home-assistant/tree/fix/monkey-patch-asyncio |
I have ran every 0.29 release and have not had a segfault until today when I upgraded to 0.29.5 since then I have had 2. I am running 30.dev now. |
@mcradit if that still has segfaults, try my branch, which is based on the latest dev with one tweak to remove a possible GC issue. |
Removed most of my template sensors and moved to the core version of wunderground.py (was still using the original in custom components) but still getting segfaults. Running 0.29.5 on Python 3.4.2 Debian Jessie
|
@rpitera How long does it take before a seg-fault? Can you try my branch? |
I haven't had any yet. I didn't change anything in my config. I have a few On Sep 30, 2016 9:43 PM, "Ben Bangert" notifications@github.com wrote:
|
@bbangert It was up for at least 6-8 hours on 29.4, but only about 2-3 under 29.5. I've never used anything besides releases so you'd have to nursemaid me through testing your branch. |
@bbangert I just installed your branch and will let you know what I see. Here's something I've noticed though... I have an alias 'start' that I use all the time. It executes: "systemctl start home-assistant; journalctl -f -u home-assistant". As long as I keep my session open I do not see any issues and my home-assistant instance maintains. A couple minutes after my session ends or I CTRL-C out of the journalctl I lose HA... |
@rpitera - I used: pip3 install git+git://github.com/home-assistant/home-assistant.git@fix/monkey-patch-asyncio |
For the people experiencing segfaults, are you using the discovery component? |
Yes I was, will remove it and start again. |
@persandstrom please also use the patch by @bbangert |
@bbangert I'm testing your branch monkey-patch-asyncio now. I'll post the results tomorrow. |
@balloob Only disabling discovery did not help. Trying to apply patch now. |
I'm using discovery. The @bbangerts's branch has been going solid for me for 9 hours. |
Pretty serious issue.. there should be no code changes until this is fixed. |
@bbangert @balloob Good news!! No segfaults on my environment after running https://github.com/home-assistant/home-assistant/tree/fix/monkey-patch-asyncio Awesome!! 👍 |
I updated to use https://github.com/home-assistant/home-assistant/tree/fix/monkey-patch-asyncio yesterday afternoon, left it running over night, and no segfaults 🎉 |
Make sure you are running the latest version of Home Assistant before reporting an issue.
You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:
Home Assistant release (
hass --version
):0.29.0dev
Python release (
python3 --version
):3.4.4
Component/platform:
asyncio
Description of problem:
Random segfaults. No pattern.
Expected:
Normal operation.
Problem-relevant
configuration.yaml
entries and steps to reproduce:Traceback (if applicable):
http://hastebin.com/donuvejugu.pas
http://hastebin.com/ulicipebes.pas
http://hastebin.com/salopofaya.pas
http://hastebin.com/idereremil.pas
http://hastebin.com/jivipugibe.pas
http://hastebin.com/ixadobikul.pas
http://hastebin.com/qalozikawe.pas
http://hastebin.com/reduwuvewi.pas
Additional info:
Components and platforms:
device_tracker: ddwrt
notify: pushbullet
zwave: climate, switch, sensor, binary_sensor
rfxtrx: switch, light, sensor
verisure: alarm, sensor, switch, lock
media_player: cast
camera: ffmpeg
thermostat: heat_control
climate: generic_thermostat
switch: rfxtrx, zwave, command_line, template
light: rfxtrx
sensor: rfxtrx, template, command_line, systemmonitor, yr
Automations and scripts as well.
OS: ubuntu 16.04 64bit
The text was updated successfully, but these errors were encountered: