8000 Major Z-Wave bug since 0.33 · Issue #4867 · home-assistant/core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Major Z-Wave bug since 0.33 #4867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsg4 opened this issue Dec 12, 2016 · 88 comments · Fixed by #5957 or #5961
Closed

Major Z-Wave bug since 0.33 #4867

jsg4 opened this issue Dec 12, 2016 · 88 comments · Fixed by #5957 or #5961

Comments

@jsg4
Copy link
jsg4 commented Dec 12, 2016

I am seeing Z-Wave states reporting improperly for all types of devices since 0.33. Additionally, when HA tries to turn on a light switch for instance as part of a scene, that state is not even reflected in the front-end sometimes. And if I manually turn on a light switch or lock, the state does not change sometimes.

As soon as I installed back to 0.32.3 this issue is 100% fixed.

Home Assistant release (hass --version):

0.33

Component/platform:

Z-wave

@arsaboo
Copy link
Contributor
arsaboo commented Dec 12, 2016

I think your issue was resolved in 0.33.1. Try updating to the most recent release and see if the issue persists.

@jsg4
Copy link
Author
jsg4 commented Dec 12, 2016

Thanks, it was not fixed. I h 8000 ave tried the latest release and had to downgrade. Here are other reports of this same Z-Wave problem.

https://community.home-assistant.io/t/z-wave-devices-stop-reporting-to-hass/6636/4

@jsg4
Copy link
Author
jsg4 commented Dec 13, 2016

Not fixed in 0.34.5 either.

@Cinntax
Copy link
Contributor
Cinntax commented Dec 14, 2016

I'm also seeing this- it looks like we see an event in the log for calling the turn_on service, but we see no subsequent state change event. So in the UI it looks like the switch turns on momentarily (the light correctly turns on too), and then the switch reverts back. I'll try to get some logs later tonight. I'm running the latest current version 0.34.5

@mikenolet
Copy link
mikenolet commented Dec 14, 2016

I am having the same issue. Seems to be an issue with only light entities and not switches. Works okay in 32.4 but any revision higher is a no go for me. It also appears that it is happening with various manufacturer switches. Seems like I have to turn a light on then off then back on for the state to properly reflect.

@Cinntax
Copy link
Contributor
Cinntax commented Dec 14, 2016

Oh good observation. I have both switches and lights- so I'll try both also

@Cinntax
Copy link
Contributor
Cinntax commented Dec 14, 2016

Yeah i just tried this too- i tend to agree that the light controls definitely seem to be part of the problem. If i just touch the zwave switches, they seem solid. For lights, however, we get in this "limbo" state, where the light controls is not updating. Also, when in this state, it does seem to impact the switches as well. It's almost like the zwave updates get queued up, and the switches don't update until the light gets synched up again too.

@jsg4
Copy link
Author
jsg4 commented Dec 14, 2016

One thing I see though it that states don't update even outside of the UI. So if I turn on a switch or lock a lock, after being polled, they sometimes do not reflect that state in HA.

@mikenolet
Copy link

Is everybody using the Aeotec controller? Wondering why this is not wide spread. Maybe isolated to the controller.

@jsg4
Copy link
Author
jsg4 commented Dec 14, 2016

Yes Aeotec ZStick for me.

@happyleavesaoc
Copy link
Contributor

I'm also experiencing this, but I need to quantify further which actions result in this behavior. I also have an Aeotec Z-Stick Gen 5.

@Cinntax
Copy link
Contributor
Cinntax commented Dec 14, 2016

I also have an aeotec zstick gen 5

@robbiet480
Copy link
Member

I'm also having this problem now. Haven't tried a restart yet. @turbokongen can you take a look when you have time?

@keatontaylor
Copy link
Contributor

This is almost certainly related to async changes and the z-wave polling interval.

@Cinntax
Copy link
Contributor
Cinntax commented Dec 15, 2016

Ok- so taking a closer look at how the z-wave library + hass works, it's very, very similar to what I did with the envisalink component. In that case, i had a polling mechanism built within the pyenvisalink library, and used pydispatch to update hass on a poll timer + any live events.

When the async switch happened, i had to update component states using the following:
self.hass.async_add_job(self.async_update_ha_state()), instead of just self.update_ha_state (which zwave is still using). This caused the update of the states to be properly scheduled on the event loop.

In the zwave case, it still looks like it's using sub-threads, so we have this weird updating of states through the event loop AND in a subthread....

Looks like there's a new helper method called schedule_update_ha_state which is very similar to the command i referenced above. Perhaps that is what the zwave component needs to use...

@kirichkov
Copy link
Contributor
kirichkov commented Dec 19, 2016

I think I have the same issue. I set up a generic_thermostat and the switch is Z-Wave. The UI doesn't change when the switch is turned on by the thermostat.
I'm also using the Aeotec Z-stick Gen 5.

@kirichkov
Copy link
Contributor

What I also noticed is that after some time, one of the hass threads starts using 100% from one my RPi2 cores, and it looks as if it's hung. I attached using gdb and I saw that the executed code is different between breaks, so probably it's not stuck but the 100% usage never drops unless I restart. At one point the thread was calling the open z-wave library.

@happyleavesaoc
Copy link
Contributor
happyleavesaoc commented Dec 24, 2016

I'm on the latest dev version. I think the z-wave issues are getting worse. Sometimes I'll flip switches and nothing will happen. Then, when I turn off HASS, suddenly all those actions will happen at once and I get a bunch of async+zwave stacktraces. Unfortunately I don't know enough about async, or zwave, to properly fix this.

@justinglow
Copy link

Also having this problem. Running 0.35.2 on RP3 AIO and Aeotec Z-stick 5. HASS not showing state changes for GE z-wave dimmer switches. If an automation or some external controller (like Alexa) changes the state, HASS dashboard does not reflect the changes. If I change the state manually from the dashboard, HASS dashboard reflects proper state as expected.

@ghost
Copy link
ghost commented Jan 2, 2017

Just some feedback from me....

I'm not seeing any issues like this reported here but I do have the fibaro dimmer 2 and the everspring an157/an158 that I use with the aeotec zstick gen 5.

I guess this issue is related to the type of devices being used.

@turbokongen
Copy link
Contributor

Someone should test with latest dev code to see if this is resolved.
If not, follow instructions here: #5143
And see if value did change, and compare it to OZW_log.txt

@happyleavesaoc
Copy link
Contributor
happyleavesaoc commented Jan 4, 2017

I tested it. I have a HomeSeer WD100 switch.
If I toggle via UI: perfect and instant toggle of physical switch
If I turn off via switch: UI usually updates
If I turn on via switch: UI never updates

The more I toggle a given physical switch, the less likely the UI actually updates.

@Cinntax
Copy link
Contributor
Cinntax commented Jan 4, 2017

I'll give it a try later tonight- but in terms of the ui not updating when physically changing the switch, I know on my dimmers there was a special config setting I had to set on the dimmer to advertise all changes to the controller- by default it would just sync up with the switches in the immediate group (the accessory dimmer).

@Cinntax
Copy link
Contributor
Cinntax commented Jan 4, 2017

Okay- i just re-pulled dev and tried again. It didn't seem to solve the issue. HOWEVER, In my case, the more i look at it, the more I think perhaps it's just my zwave controller getting behind.
1- any physical change seems to represent just fine in the UI (including on/off/dim).
2- If i make rapid changes in hass, my zwave network seems to "lag" behind (with the UI also lagging).

I've attached a snippet from both hass log and openzwave log.
In this scenario, my lights were already ON, and I turned them OFF using the hass UI. The lights went off immediately, but the UI was incorrect temporarily.
I think i'm just seeing a delay in reports back from the network. Both logs show a good 10 second delay between the service being called, and the confirmation.

is it at all possible that older versions "assumed" the state of the switch until told otherwise?

ozwlog.txt
hasslog.txt

@partofthething
Copy link
Contributor

My report: on latest dev branch, z-wave devices are acting more responsively than ever. GUI updates when I toggle my GE switches physically and the lights switch immediately when I toggle them in the GUI. Happy so far. I have Aeotech Z-stick on a Raspberry Pi 2.

@turbokongen
Copy link
Contributor

@Cinntax Timeouts are problems on the zwave network, and not a problem in HA.
Help can be the following:
Delete the zwcfg_[home_id].xml file and let all devices be rediscovered by the network.
This is helpful if the config for a device has changed, or other settings.

You can tweak the openzwave settings, by adjusting timings at the file options.xml in the openzwave config dir.
Also have a look here on how to troubleshoot the zwave network.
http://www.openzwave.com/knowledge-base/

Also when everything is rediscovered, use the network heal, and look for timeouts or errors during the heal, That will give you a hint on what nodes are problematic.

Before the UI waited 0.5sec before it assumed failed command, but now it is waiting 2 secs, so that should just make it better. 10 seconds delay is alot on the zwave network. A healthy network should give callback in 1-4secs depending on size.

If somebody needs help decoding the ozw log, Pm me and I will have a look at it, and give some hints and try to explain.

@turbokongen
Copy link
Contributor
turbokongen commented Jan 5, 2017

Some more tips:
If it is only light.xxxxx devices you have problem with, do the following in the config:

zwave:
  usb_path: /dev/ttyACM0
  config_path: /srv/hass/src/python-openzwave/openzwave/config
  polling_interval: 30000
  customize:
    light.dragon_tech_in_wall_dimmer_level_31_0:
      refresh_value: true
    light.dragon_tech_in_wall_dimmer_level_32_0:
      refresh_value: true
.....

If that does not do the trick, or just nearly, add:

zwave:
  usb_path: /dev/ttyACM0
  config_path: /srv/hass/src/python-openzwave/openzwave/config
  polling_interval: 30000
  customize:
    light.dragon_tech_in_wall_dimmer_level_31_0:
      refresh_value: true
      delay: 5
    light.dragon_tech_in_wall_dimmer_level_32_0:
      refresh_value: true
      delay: 5
....

It is important that these are set under the zwave section in customize, NOT the homeassistant section.
If these tips fixes your problems, please report back. :)

@Cinntax
Copy link
Contributor
Cinntax commented Jan 6, 2017

Thank you very much! I'll give it a shot and let you know! I definitely understand that the real problem, at least in my case, resides outside of hass, but it looks like these may help at least make it less noticeable.

@Cinntax
Copy link
Contributor
Cinntax commented Jan 17, 2017

So I went ahead and deleted my xml zwcfg_*_.xml file within the .homeassistant folder, and yes that did seem to make everything FAR more robust. I know over the last few months I've added a few switches, change OZW versions, etc. so I suppose in my case the network was just not operating as efficiently as it should have been.

I can see how your other fixes would also help in the event that we're not getting timely callbacks, but I'll hold off on those changes unless I feel they're needed. Thank you!

@lessthanjoey
Copy link
lessthanjoey commented May 4, 2017 via email

@dmourati
Copy link

Same issue for me on 0.44.1. It seems laggy/unresponsive from home assistant on my raspberry pi 3 and z stick to aoetec Z096 smart switches. I notice that after a few iterations, the switch connectivity "wakes up" and becomes more responsive.

@turbokongen
Copy link
Contributor

So this issue is very random it seems. Let's try to find out where the problem is:
Let's start with the network itself.
Firstly: at start of HA the zwave network starts, and will not respond to much of any statechanges from HA.
Only after the event: zwave.network_ready has been fired, zwave will respond properly to commands. That is just how OZW is started.

Check the node entitiy states for the problematic devices.
Look at the RTT times:
image

image

The Aeotec switch node of mine has 46ms and 29ms for response times(Round-Trip-Time).
The Qubino who is not in direct contact with the controller and the node furthest away(14m outdoors) in my network still has 158ms and 78ms. That is a healthy network.
A note on RTT times: Devices can be slow internally. Though they will behave correctly.
Higher RTT means problems in communicating with the node, that may be RF interferrence, even device errors, and encryption errors. A guide to error findings in the OZW log can be found here: http://www.openzwave.com/knowledge-base/
Here you will find message ID's used in the log, and will explain what they are.
That is a start.

@jsg4
Copy link
Author
jsg4 commented May 19, 2017

The issue I see applies to every single Z-Wave light, switch, and lock so I am not sure what looking at the specific device information is going to solve. I have also tried every refresh_true setting from 1-5 and it does not fix the issue. The switches in front end keep flipping after I change the state, and then they update to the proper state after, not just light specific. Anything else to try?

@turbokongen
Copy link
Contributor

Have a look at my post again. This is not specific to any type of device. RTT times are how much time it takes for the node to respond back to the controller after a command. So if it takes longer than 2 secs (2000ms)it will flip back. Check the RTT times.

@jsg4
Copy link
Author
jsg4 commented May 19, 2017

Why did this start happening with 0.33?

@turbokongen
Copy link
Contributor

No idea. Reviewing the changes made to zwave from 0.32.4 to 0.33.4 there is minor changes, we added the possibility to configure delay of value refresh to lights, some refactoring of the climate component and a bugfix to cover component.
So if we are going to figure out why this only happens to some and not others, we need to look at the basics first.

@CrossfireCurt
Copy link
CrossfireCurt commented May 20, 2017 via email

@turbokongen
Copy link
Contributor

Prior to 0.33 it was at default 2secs for ALL devices except the workaround devices. Not configurable. The workaround devices had 5 sec refresh. This made problems with devices that support instant notifications, flooding logs with refresh values. So we added an opt-in for the refresh. Meaning ALL devices will not refresh values by default. If you opt-in with the config (refresh_value: true) then it is set for default 2 secs. If you also specify the delay option, it will delay by that amount of seconds.
This is option is only available for lights.

@jsg4
Copy link
Author
jsg4 commented May 20, 2017

Example with two problematic lights in the front end and also the lock. Please let me know what input to try so these don't flip back and forth. Thanks turbo

{ "is_info_received": true, "sentFailed": 0, "retries": 0, "capabilities": [ "beaming", "routing", "listening" ], "receivedUnsolicited": 0, "manufacturer_name": "Leviton", "max_baud_rate": 40000, "averageRequestRTT": 32, "sentTS": "2017-05-20 09:33:15:803 ", "lastRequestRTT": 24, "lastResponseRTT": 34, "receivedCnt": 13, "is_zwave_plus": false, "friendly_name": "Foyer Lights", "node_id": 3, "receivedTS": "2017-05-20 09:33:15:838 ", "is_awake": true, "neighbors": [ 1, 2, 129, 4, 5, 6, 7, 130, 9, 10, 131, 133, 134, 139, 140, 16, 17, 141, 19, 20, 21, 23, 24, 25, 26, 27, 29, 30, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 56, 59, 60, 62, 64, 65, 69, 70, 74, 75, 76, 77, 78, 79, 81, 82, 83, 91, 94, 96, 99, 112, 113, 114, 116 ], "sentCnt": 23, "is_failed": false, "receivedDups": 0, "is_ready": true, "product_name": "Unknown: type=0401, id=0334", "averageResponseRTT": 34, "query_stage": "Complete" }

And

{ "is_info_received": true, "sentFailed": 0, "retries": 0, "capabilities": [ "beaming", "routing", "listening" ], "receivedUnsolicited": 0, "manufacturer_name": "Leviton", "max_baud_rate": 40000, "averageRequestRTT": 67, "sentTS": "2017-05-20 05:57:18:645 ", "lastRequestRTT": 27, "lastResponseRTT": 36, "receivedCnt": 16, "is_zwave_plus": false, "friendly_name": "Gate Lights", "node_id": 20, "receivedTS": "2017-05-20 05:57:18:682 ", "is_awake": true, "neighbors": [ 128, 1, 2, 3, 4, 5, 6, 7, 129, 9, 10, 130, 131, 133, 134, 139, 140, 141, 21, 23, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 56, 59, 60, 65, 69, 74, 75, 76, 77, 78, 79, 81, 83, 91, 96, 99, 100, 112, 113, 114 ], "sentCnt": 25, "is_failed": false, "receivedDups": 3, "is_ready": true, "product_name": "VRMX1-1LZ Multilevel Scene Switch", "averageResponseRTT": 75, "query_stage": "Complete" }

{ "battery_level": 100, "is_info_received": true, "sentFailed": 13, "retries": 3, "capabilities": [ "beaming", "routing", "frequent" ], "receivedUnsolicited": 6, "manufacturer_name": "Assa Abloy", "max_baud_rate": 40000, "averageRequestRTT": 1887, "sentTS": "2017-05-20 09:35:10:847 ", "lastRequestRTT": 2141, "lastResponseRTT": 2233, "receivedCnt": 335, "is_zwave_plus": true, "friendly_name": "Mudroom Lock", "node_id": 119, "receivedTS": "2017-05-20 09:35:13:079 ", "is_awake": true, "neighbors": [ 128, 129, 2, 4, 5, 132, 133, 9, 10, 141, 16, 17, 19, 23, 24, 25, 27, 32, 37, 38, 39, 40, 41, 42, 59, 64, 75, 81, 82, 99 ], "sentCnt": 348, "is_failed": false, "receivedDups": 6, "is_ready": true, "product_name": "Unknown: type=8002, id=1600", "averageResponseRTT": 2252, "query_stage": "Complete" }

@turbokongen
Copy link
Contributor

@jsg4 In your examples:
Foyer lights and Gate lights should be fixed by setting delay: 1 in config. Remember to use refresh_value: true first. They have low RTT times.
The Assa Abloy one is no cure for. It has an average RTT of more than 2 secs.
2 secs is the time HA waits for a confirmed state change before it resets to old state. This lock will always be flipping forth and back.

@jsg4
Copy link
Author
jsg4 commented May 20, 2017

Trying that now. My point though is that before 0.33 even the lock did not flip and back forth in the front end, this issue did not happen for however long Z-Wave has been in HA until 0.33.

@turbokongen
Copy link
Contributor
turbokongen commented May 20, 2017

I'm not sure when, but before the 2sec wait change we only had 1sec, so that would be even worse.

@jsg4
Copy link
Author
jsg4 commented May 20, 2017

Didn't work. Here's screenshot and video.

image

Light.zip

@turbokongen
Copy link
Contributor

But the problem is only the back and forth flipping of switches? It will display correctly after a short period of time?

@jsg4
Copy link
Author
jsg4 commented May 20, 2017

Correct, it always shows the correct state after that, so it's just a visual front end thing. Makes it annoying when using something like Homebridge for instance as you receive alerts like door is unlocked and 5 seconds later door is locked.

@turbokongen
Copy link
Contributor

cc @pvizeli @balloob @robbiet480 Any of you with suggestions?

@parneli
Copy link
parneli commented May 21, 2017

My situation does not fix itself until a restart of the zwave services, it doesn't update in HASS until you send another command to the device and the state thinks it is still the last sent command. Power cycling the switch hasn't worked.

Trying to add refresh value .. delay 1 but saying invalid config ?


zwave:
  usb_path: /dev/ttyACM0
  polling_interval: 30000  
  config_path: /srv/homeassistant/src/python-openzwave/openzwave/config
  customize:
    switch.aeotec_zw096_smart_switch_6_3:
      refresh_value: true
      delay: 5  

Happy to try any suggestions, as stated above you can see the zwave log upating so its not the zwave network, my RTT is about 36 network looks fine.

@parneli
Copy link
parneli commented May 28, 2017

I seemed to have resolved my issue/s, with a combination of upgrading to the latest HASS 45.1 and OZW OpenZwave Version 1.4.2508 but also changing the above code from customize to device_config :
device_config: switch.coffee_switch_4_0: refresh_value: true polling_intensity: 1 switch.aeotec_zw130_wallmote_quad_switch_2_0: refresh_value: true polling_intensity: 2

@emlove
Copy link
Contributor
emlove commented Jul 14, 2017

It looks like the big change from 0.32 to 0.33 was that we switched to asynchronously firing off the state change instead of blocking. 0.32.3...0.33#diff-758847d0dd4b50b523c035461d973f60R99

update_ha_state is no longer supported, but if someone wants to try replacing schedule_update_ha_state with the lines here https://github.com/home-assistant/home-assistant/blob/0.32.3/homeassistant/helpers/entity.py#L186-L188 we could see if that is what actually caused the problem. force_refresh isn't necessary. Note that this isn't a correct fix, since we don't want to block the zwave thread while waiting for hass core to update the state.

@nriley
Copy link
nriley commented Jul 14, 2017

@armills Thanks for investigating — your description does seem to track pretty well with my observed behavior. Unfortunately, the Z-Wave set up I have access to is nowhere near me for months at a time, so I have limited ability to test this myself. I hope that some other folks who are experiencing this problem might be able to step up and help.

@balloobbot
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.

Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍

@nriley
Copy link
nriley commented Nov 19, 2017

For anyone else who stumbles across this issue, it should be fixed by #9430 — which is currently blocked on OpenZWave/open-zwave#1352. If it's still not updated by the end of the year I should at least be able to do some testing when I'm physically present at the house where I set up HA.

@lifeisafractal
Copy link
Contributor

@nriley , I have tested out #9430 with the OpenZWave/open-zwave#1352 fix in pace as well and it does fix the issues described above (with a Home Seer HS-WD100+). Unfortunately, there hasn't been any feedback on the open-zwave PR, positive or negative, so it's hard to say when this will get merged. I'm try bumping the open-zwave PR to see if there is anyone in particular to talk to about it.

@home-assistant home-assistant locked and limited conversation to collaborators Mar 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
0