What mystifies me about this is that I am having a very different experience. Yes, I have occasional reloads, but not anything like before (pre-7.29). Currently, I’m not seeing a lot of slowdowns in my network. Everything works well, even though December started with me adding a bunch of new devices to the network and not having a smooth start with all of them (many documented in these forums, but include two Schlage Z-Wave locks–one Plus, one ancient/not–, a Dome water leak sensor, a Zooz siren ZSE19, a Zooz switch ZSE15, another Zooz switch ZSE25, an ancient outdoor GE/Jasco appliance switch, and two Ecolink garage door tilt sensors). For the 1-2 days when I was mucking about with these devices, things got wild–the siren in particular, a secure device, trying to get it added, I ended up losing half my network and had to restore from backup (and I made a backup before doing anything, which was life-saving, but you learn the hard way). Things generally stabilized, but I could get the Vera to reload by manually operating one of the Schlage locks. That ended up being resolved by doing a heal on the lock and a cluster of three switches closest to it. Not much trouble since (although I’ll likely still replace it as some point as the new ZP models are much faster).
I have one community plugin that I run on my house Plus, which I’ve had disabled since I did the device mucking before Christmas. I know that plugin has problems (some quite serious), but I’m too busy to write my own so I’ve fixed many of them in my own version of it, but I know it still has issues, and uses facilities within Luup that I believe contribute to instability (Luup’s <incoming>
read and built-in TCP socket handling, specifically). Other than that, I use (since they all mine, of course) Reactor, Switchboard, Rachio, SiteSensor, DelayLight, Emby, and Virtual Sensor. I use two unpublished plugins that I call LockValet (manages lock codes) and SceneSlayer (makes scene controllers work the way I want them to). Oh, and I recently started using Battery Monitor, which is a community plugin, but again, I use a version I’ve modified. My system has been quite stable. Without that one suspect plugin running over the holidays, I ended up with (a record for me) an unbroken 18 days of runtime on 7.30 (4833) just recently, before a reload occurred that I didn’t even notice.
I have expressed here many times my opinions about Vera’s pre-acquisition handling of the App Marketplace and the large catalog of booby traps and time bombs that it allowed to grow, particularly in the transition between UI5 and UI7, when they had ample opportunity to clean house. Unfortunately, in holding an eye to their future plans, eZLO has really largely ignored the App Marketplace, so the rot that festers there continues to be a minefield for new users to the platform, and is no less a hazard to even seasoned users here.
Being able to write plugins is a gift, but with great power comes great responsibility, as they say. In the Vera/Luup world, the plugin environment is not tightly sandboxed, at least, not as much, for example, as a script in a browser tab is. Unfortunately, it’s very easy for plugins to create deadlocks and cause reloads that bring the entire system down. It’s very easy for plugins to delay system operations (e.g. luup.sleep()
should never have been implemented). It’s not hard to imagine the operation of a plugin (or even any scene Lua fragment) creating a significant enough delay in system execution that a Z-Wave message is missed altogether (or perhaps more correctly, not handled within time constraints/requirements).
There’s been a lot of attention to the specific actions of the ZWave stack here, and I agree completely that there are some things going on that don’t look like good choices, but what if those exceptions were being caused by irritants elsewhere in the system?
In 2017, I was at my wits end with the instability of my system and on my way to bringing up HomeAssistant. But like @rafale77 and @therealdb, the investment and sunk costs in Vera kept me there. I then began to exhaust every option I could think of to get my system working and healthy. Lo and behold, I stopped using a couple of very common plugins, and things got much better. Like @anon53786315, I replaced all plugins with my own Lua, and things stayed good. Some time after, I decided that this community needed better tools, and so a lot of the work I had done just for myself, I ended up publishing and now support.
Anyway, the point of this rather long missive is this: what I don’t yet see happening in trying to chase down these recurring errors is removing all of the other variables from the system–as many as possible, no matter how far-fetched their relationship might seem. Back everything up, and then, as painful as it might be, I suggest disabling plugins systematically… not uninstalling them and killing all the devices/losing configs/scenes/etc., just un-LZO the implementation file and put if luup then return false end
at the start of the startup function, which will keep the plugin from starting up. Then observe.
I think this is a possibility worth exploring. I would not be surprised if we find there’s an irritant that’s exacerbating the other problems. Yes, we want the ZWave problems fixed, but consider, if we’re looking at a bug in Luup’s ZWave error handling, how much better can it get if the error handling improves but the stimulant is still causing an error that needs to be handled?