Vera Zwave Command Queue Limit


So I am down to, I believe, my last grief against my vera and have the firm intention to address it one way or another: The command queue.

I now have eliminated every source of luup reload and reboot and am down to 2:

  1. The monthly 1st of the month at midnight which, I guess I can live with
  2. The swamping of the command queue.

Given the amount of devices I have, I have some scenes which actuate a very large number of devices. They never seem to cause a reload by themselves but if the scene runs and lags/struggles to complete and I add a few manual commands either from my phone or the UI, I am guaranteed a loss of responsiveness and a luup reload. Now my question is this (Sorin if you could please find out): What is the command queue limit of the vera?
I know it may not be a straightforward answer but I want to know how many commands I can send it at once (within 100ms) and then with 1s. I intend to use openluup to stagger the actions in my scenes to prevent a crash. I don’t think this is documented anywhere.



Hi Rafale,

Had a quick discussion with the guys in the office. So the thing is if you actuate any numbers of devices through a scene, the z-wave chip is sending a single cast command which depending on the number of devices, might turn out to be slow if you have mesh bottlenecks or secure z-wave devices. On the other hand , Turn All ON/OFF, command from your Dashboard, sends a multi cast command to nearby nodes(nodes directly linked to vera, not the routed ones), which should be much faster and not cause a queue. So if you somehow can leverage this function through some kind of scene, you might partially resolve your issues. Otherwise, try to figure out if you have any bottlenecks in your location (search for hourglass effect on z-wave), or even try to trigger these scenes gradually with delays between them. Or offload some of the devices on a secondary controller.

Z-Wave Job queue is handled by the Z_wave chip itself not the Vera unit so I can’t tell about any “limits”, but if there will be limits I think they are measured in kbps of the bandwidth, and the particularities of each z-wave device. (ie.9.6 to 100kbps for z-wave+). You need to know how many kbps a command takes multiplied by the number of devices, and if it exceeds the 100kbps the bandwidth of a z-wave plus chip, you can find out your headroom on this.



Hi Sorin. Is there a way to send a multicast “switch off” command only to a group of switches? I can’t send to all of them, but maybe it’s possible to include only selected devices IDs. Thanks.



Sorin, Thank you. I indeed forgot about that multicast call which is in the zwave SDK of all on and all off which maybe useful for a couple of my scenes.
As for therealdb for most of my problematic scenes, I actually need partial and not all on and off.
The multicast is actually a single call transmitted throughout the network to which the zwave device is set to respond or not. You would have to manually program the device if it supports it so the decision to obey that command is within the device. Not in the vera.

I beg to differ on the handling of the command queue. I read the zwave SDK documentation and it is clearly handled by the host controller. The Zwave chip should not handle the queue. I have for proof controllers like open-zwave, the sillabs pc controller and Zway which both have a UI page showing the queue of command with the wait for reply and the time it has waited before the command is sent. I believe there is some very short queuing in the network as I have observed from delayed sensor reporting but it is not how it should be designed.
See this document attached:

From the sil labs Host appl prg guide:

"The host SHOULD queue up requests for processing."

If the vera is not managing the Job queue and is only a direct passthrough, this might be the problem!! The vera maybe completely swamping the network. I did not see this problem when I tested Z-way on which I had over a 1000 commands in the queue and was processing I believe 6-10 at a time to the zwave chip while waiting for the ACK to send the next one. I understand also that my question is not easy to answer as you have device polling enabled by default which in large networks is not recommended. I have mine disabled.
You are correct on the bottleneck as for example if a device is slow to respond, I have observed the Zway queue to keep that wait for ACK which then occupies a spot in the queue preventing a command to be sent. If the vera does not queue the commands then I may have actually found my answer and as a rule of thumb I may need to limit the number of zwave commands per scene to ~10 and go from there.

Note that even if the Luup engine does not crash, the vera completely misses sensor signals during that queue exceeded period so my sensor states are not updated.



Another alternative explanation …

A single device with poor connectivity can block the entire queue until the connectivity issue is resolved. This might mean waiting tens of seconds for timeout or a few seconds for a retry on that device to succeed.

This is easily reproducible as I detail here.,62343.msg354955.html

The blocking depends on which command is attempted. In that thread I can show setTarget blocking for a binary device, but setTarget does not block for a dimmable device. I’m speculating that setLoadLevel is correctly implemented with retry and error handling while setTarget has a bug that causes it to block and that on dimmable devices setTarget uses the setLoadLevel logic instead.

Anyway, the gist is that not every command implementation has robust handling of connectivity failures and that these failures unnecessarily block the command queue, potentially for a considerable amount of time.



I’m not 100% sure that it is even related to connectivity issues. I have a couple of scenes that actuate large numbers of devices, and they would frequently cause a luup reload even though all devices were functioning well. After breaking up these scenes into parts with short time delays between them, the reloads have completely stopped.



So is this best practice? To disable polling? I have about 60 z-wave devices (plus another 12 z-wave child devices), plus another 20 EnOcean devices, plus 20 DSC alarm devices, PLEGs,etc.



So is this best practice? To disable polling? I have about 60 z-wave devices (plus another 12 z-wave child devices), plus another 20 EnOcean devices, plus 20 DSC alarm devices, PLEGs,etc.[/quote]

This is the way I think about this: What is the purpose of polling? To check if the device is still in the network.
There are 3 types of zwave devices:

  1. AC powered. (light switches, bulbs)
  2. FLIR: Battery Powered but constantly on because they need to be ready to receive a command. (example: locks, vents)
  3. Battery Powered and normally sleeping (usually battery powered sensors)

The vera by default wants to poll all 3. Polling type 3 is moot since they can’t respond until they wake-up. I think the vera just ignores these polls.
Polling Type 2 is useful only to check their battery conditions but there could be automation based on the presence of this type of device if they are mobile (I don’t know of too many). Essentially polling these devices would just consume their battery for not much value added. Locks usually get polled right after a command is sent anyway.
Type 1: Not sure why you would want to poll these since they are normally fixed and constantly powered. Maybe if you want to detect their failure but the vera is more likely to fail than they are so… why bother?

My conclusion has been that I want to disable automatic polling on all types of devices since polling consumes power on FLIRs, have no value on AC-Powered devices and is actually not doing anything for battery powered devices. There are rare exceptions of FLIRS and AC powered devices which do not update status on their own and require the vera to poll them (example : instant light switch status) but they are rare and usually older devices. I believe vera implemented auto polling for these specific devices but have generalized the behavior to all devices which is one of the many mistakes plaguing UI7.



I still have some of the early Jasco/GE light switches that were non-instant status, although I am slowly replacing them with the latest Jasco/GE light switches. So I don’t want to eliminate polling, but you got me thinking about dramatically scaling back the polling frequency.

Back to your original question, and I am going out on a limb here, but I have some anecdotal evidence that the swamping of the z-wave command queue is partially a factor of which devices are sending the commands. Meaning if a single device is sending the requests, then you have a hard limit of three. One can be queued, then a second, but if a third is sent while the other two are still in queue - reload. This I somewhat unscientifically tested with a single giant PLEG, and it seemed to hold true. Then I split the single PLEG into multiple PLEGs and sent the same commands at the same frequency - and had a fraction of the reloads. THEN I structured each of the small PLEGs to be what call ‘confirmative’ - that is they don’t send the next z-wave command until they essentially get confirmation the previous one completed (by adding into each condition that it can not be true if another condition is currently true), and the reloads all but stopped. I am taking something RTS said out of context, and adding my own experiences, and YMMV.,31345.0.html,33445.0.html,36757.msg273838.html#msg273838



Thanks for this.

This somewhat confirms my reading of the zwave documentation. The controller host is supposed to be the one queuing the commands. Not the zwave radio. The chip itself has a very short queue and polling is a command which the vera necessarily abuses if you let it configure devices by default.

For your devices, you can setup specific devices to be polled and others not to. The setting on the zwave menu screen actually does nothing for devices which have already been configured. Funky thing is, the vera wants you to reconfigure the device when changing its polling frequency which is another bizarre requirement.