Large zwave network settings

rafale77 · July 26, 2019, 5:34pm

I never stop learning with this thing… sigh

Just wanted to share some settings I uncovered over the past couple of days which may help someone looking for this. The zwave network can be a source of instability to the entire vera engine and can cause luup reloads when you expect it least. I have been going through my logs to try to understand some of the sources and found a few pointers.

Polling: On a large network, you should try to disable polling by default both on the zwave menu as a general polling setting but also on each of the devices by setting the PollSettings variable to 0.

a. I recommend doing it via luup code and not the vera UI because… for some odd reason the luup engine wants to reconfigure the device when you change this variable from the Device Settings menu. It is purely a controller setting and does not need any communication to the device to change. You just have to do a luup reload after running the code.

An example below allows me to disable polling on battery operated devices. You can also do the non battery operated devices by changing the condition.

  for k, v in pairs(luup.devices) do
  local var= luup.variable_get("urn:micasaverde-com:serviceId:ZWaveDevice1", "PollSettings",k)
  local bat =  luup.variable_get("urn:micasaverde-com:serviceId:HaDevice1", "BatteryLevel",k)
   if var ~= nil  and v.device_num_parent== 1 and bat == nil then
     if var ~= 0 then 
     luup.variable_set("urn:micasaverde-com:serviceId:ZWaveDevice1", "PollSettings", "0", k)
     luup.variable_set("urn:micasaverde-com:serviceId:ZWaveDevice1", "PollNoReply", "0", k)
     end
   end
 end

b. Oddly some devices did not even have this variable and are polling in the background anyway. It was the case for me for all my EverSpring sensors. I kept on getting polling list full messages in my logs. I ran that code above to target these devices and create the variable, setting it to 0.

c. Note that in practice, the vera is not polling the battery operated devices which are asleep. However if you do not set this variable to 0. The vera will keep a polling list record in its memory which really is unnecessary. These devices get polled in any case whenever they wake up as it is part of the wakeup function.

I have very few devices benefiting from enabling polling by the vera: devices which don’t wakeup (FLiR, and AC powered) and which have have something to update (FLIR battery levels, actuations from something other than the vera)

Lifeline Associations: Most devices support associations and the associations are listed into groups. You can do many things with these associations including making one device actuate another without a controller in between but here the problem I encountered was the Lifeline. It is generally the 1st association group and is used by the device to report its status updates to the controller. The vera does not display or control this very well and I at some point added a secondary controller, the zway as I was considering migrating to that platform. Well, as it turns out some devices can have multiple controllers in their lifeline association group and for me it caused a mess on my network.
a. I excluded the zway device from my network sometime last week thinking I did not need it anymore. I suspect that this caused the devices which had it on their lifeline association group were still sending messages to it causing errant frames on my network and the vera to reload luup frequently with callback processes hanging.
b. Even after I removed these rogue associations (I re-included zway as a secondary controller) I still had some “tardy” messages but with very low delays. I suspect this was because some of the devices sent their updates to either or one of the controllers on their lifeline list and called it good causing the vera to still be waiting for a response to its comm requests… I ended up deleting all the zway associations. No more tardiness in my network.
Got CAN. Credit to @sm2117 on this one who noticed the frequent appearance of this message. It is another factor affecting speed on a large network and is mitigated by disabling instant status. I have not disabled mine and am still getting these zwave dongle freezes of 0.1s (although the log says 1s) and am living ok with it.

Feel free to share your tips on this thread!

therealdb · July 27, 2019, 8:33am

I disabled polling as you with a script, on all non battery operated devices. Everything is good, except fibaro universal sensors, that need polling in order to always stay in sync. It’s a known bug as support told me. Network stability has improved a lot.

slelieveld · August 29, 2019, 5:06pm

For what’s worth it:
… the information we have so far is not the best since the “Wake-up fail” count does not reset and does not restart in your case, the device itself fails to report once or twice and the counter does not start from 1 as it should, but goes on from the previous 50 when the device failed to respond.

This is something that should not happen and it’s happening for your Battery Operated devices that you mentioned, the sensors. We’ve been talking with our integration team just to make sure that there is no Device Integration fault so we could check with the other teams as well.

If this is an intended behavior it should be patched so the counter would increase, but the limit would also be raised with +50, so it goes to 100 as the message should only show when the fail counter reaches 50.

I did mention this in my previous email, but it might of seemed related to the one UV sensor alone.

I will be getting back to you with more information related to this issue once we get word from our other teams.

Fanan · August 31, 2019, 3:26pm

Thank you @rafale77! Could you help me to adjust/change the code, so it includes non battery operated devices? I’m not a hacker…
Thanks!
/Fanan

rafale77 · August 31, 2019, 6:58pm

The first post only disables polling for the non battery operated devices… Do you mean the battery operated ones?

Fanan · September 1, 2019, 9:06am

Yes! Thats right.

rigpapa · September 1, 2019, 3:43pm

@rafale77, I’m curious about something. My network is already pretty stable, but I went ahead and disabled polling on all devices in my system anyway, just to see how much better I could get things. I can’t speak to that just yet, it’s too soon, but it did have an immediate side-effect: a lot of pre-Plus switches aren’t instant status (enough that I won’t be investing in replacements any time soon), so polling has ensured I get updated state at some point. While I don’t need precise timing (e.g. I just need to supervise a bathroom fan, and I don’t care if it runs 10 minutes longer than expected/programmed, just that I know it’s not left on forever) So I created an explicit association group 1 to device 1 on a few of those switches, and now those switches are effectively instant status. When operated, the status on the Vera changes pretty much immediately. Vera seems to move the association to device 4, the global scene controller, but it seems to work nonetheless. Any thoughts on that?

Other readers: this is an experiment, and I haven’t been doing it long enough to judge side-effects. Don’t try this at home. At least, not yet.

sebby · September 1, 2019, 4:16pm

Is there a programmatic way to determine if a device has instant status or not? i have a mix of some older devices and trying to see if i need to replace some.

rafale77 · September 1, 2019, 10:01pm

@rigpapa this is an indication that these devices were included a long time ago and likely were not properly configured on a very old firmware. In zwave terms the association group 1 is called the lifeline associations and is so critical that some of my older device can only accept to be associated to the node id 1. All devices on the zwave network should have their lifeline associated to the master controller . I have a z-way controller setup as a secondary on my network and have verified that this was true.
The Instant status feature, back from the time where there was a Lutron patent preventing every switch to support it due to the licensing cost, had forced some devices manufacturers to use this association as a workaround and this is likely what you are observing. I am just a bit surprised that you had devices which did not have this lifeline association set. Also device id 4 (the scene controller) on the vera is technically a virtual device as it is not physical zwave node. (The altid should be 0)

Most devices do not require any polling. Sensors for example update their tripped and battery status even when they are asleep. The vera still polls them when they wakeup which is a waste of their battery and zwave bandwidth but there could be devices which require it. I currently have seen 0 AC powered devices, and I have a lot of various ones from all brands, which require polling.
Note however that I still have a handful of devices, my sirens and door locks, which are FLiRS and do require polling to update their battery status.

rigpapa · September 1, 2019, 10:43pm

I suppose that’s possible, but it would go back only as far as October 2017, and whatever firmware was new then, as I had just gotten this VeraPlus and wiped (factory reset) every device in the house, no restore from backup, everything included fresh. Do you consider 10/17 “very old firmware”?

The rest I know.

I guess what I’m saying is, Vera does not appear to check or repair the lifeline association. If it is supposed to be on every device, it’s not.

rafale77 · September 1, 2019, 10:55pm

The fact that vera does not repair the associations would not be surprising… Z-way is much better at this. The vera actually does not display all the associations for the device. It is not updated in the user-data.json unless there was a special setting set through the vera UI or through some plugin. The data is normally in a device variable “associations”. It requires sending a “get association” frame to the device to get this information and I am not sure the vera does this during its configuration.

reneboer · September 2, 2019, 11:02am

Hi,

How can you see if the lifeline association is set or missing? I never see it under the Device Options where I normally set associations.

Cheers Rene

rafale77 · September 2, 2019, 12:34pm

As I mentioned, associations are not obviously shown on the vera unless you have set them with the vera and I am not even sure that the vera queries the device for them. I use z-way (zwave.me) as a secondary controller to be able to set these all up.

rigpapa · September 2, 2019, 2:02pm

So far, I haven’t seen any ill effects from just creating the association myself. Still studying, though…

rafale77 · September 2, 2019, 4:37pm

I would not expect anything bad from forcing the creation of these associations. I have done it using zway for the devices missing it but most had it by default as the association is usually created during the configuration part of the inclusion by the vera even though it may not always display it. It is the “setting user association” step.

rafale77 · September 12, 2019, 5:06pm

Something else I meant to report not just about large zwave networks but more about polling. I have some fairly old leviton scene and zone controllers with an embedded relay (essentially two zwave nodes in the same box) which used to hang themselves on a regular basis, once every few weeks. They would lose either their associations (the zone controller actually needs to set itself to associate with the relay within the same device) or the either the relay or the controller would stop responding. The recovery involved going to the breaker and power cycle the entire circuit powering these devices since they do not have an air gap switch. MRCZ4, MRCS4.
Disabling polling has nearly eliminated the need to do this (I still see it maybe once or twice a year) and I am certain that disabling nightly heal will further improve. This brings me to question why the devs set the default polling to where it is and why they continue to recommend doing it. I have only very rarely seen devices which require polling. For the large majority of devices and networks, it actually renders the device or network unstable and unreliable.