Vera issues and alternatives testing for zwave

I decided to start a new thread as I have practically decided like many to migrate from the vera to a different solution but which one?
Most have gone to Homeseer which is an expensive but reliable solution and very few other minuses.

Vera Symptoms:

  1. Occasional lag or drop of zwave commands and sensor status updates
  2. Random Luup reloads and sometimes full reboot of the vera.

I have shared some of my findings digging into the code inside the vera on my other thread:
Bottom line is that on my network, The Vera Plus I/O interface is overwhelmed and the CPU could not keep up with the combination of traffic from the zwave and zigbee UARTs, processing lua codes and plugin codes, output the outcome of the code logic back to the zwave/zigbee UARTs, listening to the Luup API, update the MIOS servers (event servers, remote access relays, device servers, storage servers, logging servers, account servers and occasionally firmware servers, support servers and energy servers for those who use it)

I attempted improve stability by removing all of the luup logic and plugin from the vera to openLuup and cutting off the mios server traffic by firewalling it:
-The remote access relay is a constant SSH tunnel which I successfully killed. It is the largest source of I/O access and potential crash when the responses are corrupted or access is compromised.
-The storage server access can be disabled by stopping the logs from being archived on the mios server (option in the UI)
-Not sure what the storage server does and have not dug into it. I suspect it is the server which sync files whenever we have deleted files showing up on the unit.
-The account server is disabled by removing the vera from your mios account
-The other 3 are on demand and are therefore not a concern.
The more problematic ones are the event and the device servers I have so far not been able to disable after working with CC for over a week:

  • The device server calls still happen but at least do not cause the vera a heartburn when it can’t access it.
  • The event server which normally reports alerts and notifications on the other hand is designed to write files, keep all the alerts and continuously retry to access the mios event server and eventually as I found out crash the vera. It is the single design flaw preventing the vera from working without mios server access as it eventually will stop executing lua code and go in infinite luup reload loops when the storage is full and it spends all its cpu time trying to send these accumulated events.

With the relay tunnel removed and keeping it online so as to avoid the event server triggered crashes, my vera has become a lot more responsive but I am still occasionally seeing userdata corruptions and sensor status misses as well as still inexplicable luup reloads.

Given how much I have invested in lua and the entire vera setup, rather than starting from scratch, I have decided instead to try to work with the vera and improve its reliability. Though it has improved significantly I have come to the point of testing 3 other platforms on a test network:

  1. Home Assistant with the HUBZB on which I now have both Zigbee(through bellows) and Zwave (through openZwave 1.4 at the time of writting though I have also installed openZwave1.5 with OZWCP for more advanced tasks) I got this stick with a hubitat. SDK v 4.5
  2. Zway with it’s own UZB and a license. SDK v 6.51
  3. Vera Edge running with a Homeseer USB stick (also recognized as a UZB). The thought was to run a test to see how far I would go and move the stick to a different platform. SDK v4.5

Something quirky: I have not been able to run a controller shift from one controller to another since they all somehow joined the network as primary.
All three controllers are able to control the one switch in the network and all report power consumption correctly.
Pros and Cons:
The vera is the buggiest one: It sees only one of the other two controllers and created 16 dimmer child devices of the zway controller. It has very questionable handling of secure classes as it appears to often fail exchanging keys.
Zway to which I am biased due to the existence of plugin for openLuup has its own bugs with the need to complete the device interview process for them to show up on the API. It offers the best control of the network with the ability to assign SUC/SIS to other controllers. Oddly Zway shows to be primary role “yes” and primary capability as “no”.
OpenZwave itself also offers a lot of options though not to the level of Zway is also very good with Home Assistant. It already supports a lot of devices and may have some support gaps to vera (I heard the garage door openers for example)

My big problem right now is to be able to run a controller shift with the secure class given the fact that all my controllers included as primary. It maybe that I will have to exclude and reinclude all of my secure class devices on my production node.

Edit:

I am having some interesting times here with the 3 controllers. I somehow managed to make the zway drop its primary role and have been playing dominoes with primary/secondary since. It appears so far that zway and vera are able to shift their primary out to another controller and maybe openZwave too but the only openZwave is capable of receiving the primary controller shift.

Results of my testing so far…

[table]
[tr]
[td]Test Results[/td]
[td]Accepts to be secondary[/td]
[td]Shift Primary from[/td]
[td]Receive primary[/td]
[/tr]
[tr]
[td]Vera with z-stick[/td]
[td]yes[/td]
[td]yes[/td]
[td]no[/td]
[/tr]
[tr]
[td]OpenZwave[/td]
[td]no[/td]
[td]no[/td]
[td]yes[/td]
[/tr]
[tr][td]zway[/td]
[td]yes[/td]
[td]yes[/td]
[td]yes[/td]
[/tr]
[/table]

So I bit the bullet and included the Zway to my ?production? network with the Vera Plus. I also successfully shifted the primary role and the SUC/SIS node to Zway.
I initially also added the HUBZB with OZWCP but the fact that I could not make it a secondary scared me so I took it back out for now.

Of note: while Zway only supports their UZB stick and Razberry, Openzwave supported every stick I threw at it (Homeseer, UZB, HUBUZB). The vera supported both the UZB and the Homeseer stick but not the HUBZB in spite of being able to see the device show up in the /dev/ folder.

I have been learning a lot about zwave networking in the process.

A couple of issues I have seen:

All the switches/relays and motor controllers work when controlled by either controller. However status does not get updated on the Zway if the command came from Zway. I now know why. The original controller which included the device created a lifeline association with the device. In order to have the status from device to be sent to both controllers, both controllers need to be associated. Now the problem is that so far on Zway, I am not able to modify the association settings on devices which it did not include. It reports that the device did not report ?groups? to it. I have verified that adding association works by adding a flood sensor through the Zway and then associating the vera to the sensor. The flood sensor now reports status to both of them. That being said, it also means that unless I reset and reinclude my network, I will need the vera around and a network configuration tool for the older devices.

The other problem which remains is the secure class devices. As I reported in my previous post, the vera is not giving its secure class key even though I made zway ask for it. I suspect that the vera key is not very standard and maybe encoded in ways that zway cannot understand. This may mean that I will have to exclude and reinclude all my secure class devices. Not fun.

After a few months of tinkering I am back on trying to make the vera work but not without assistance of the other 2 hubs.
I have now very significantly stabilized the vera and found workaround for practically every problems I have encountered.
I have tested Home Assistant/openZwave, Hubitat, Zway.

The reasons why I am back on the openLuup/Vera setup are:
-Too invested in plugins and lua coding of my automation.
-I believe to have nearly eliminated the spontaneous Luup reload issues
-Device support capability is still wider/easier in spite of all the workarounds than other platforms
-Laziness to move my 130+ node system and write a plugin to port all these devices to openLuup
-Uncertainty of the device support of other platforms.
-One key criteria is for me not to have to exclude and reinclude any device (including secure class) from my zwave network and I found only home-assistant/openZwave to give me this capability.

Reasons why I am still exploring alternatives:
-Device support is rapidly degrading with new firmware versions with many devices which used to include fine now needing more refined workarounds. For example secure class devices no longer include correctly often failing at the secure class key exchange step. The Zigbee generic device inclusion has become catastrophic with self creation of invalid devices. The Aeon Smart Dimmer 6 deemed supported has also an inclusion process which now creates invalid child devices. I can go on and on with examples. One need to become quite an expert to include a device these days.
-Precarious stability of the luup engine
-Lack of offline mode on the vera making it unusable without internet access.
-Have been observing increasing lag in the zwave command queue causing occasional misses of device state change report and frequent lag in the device command or state change. One example is the Aeon HEM for which the MIOS recommended configuration no longer works: MIOS recommends to update all 3 sensors at the same time for wattage and kwH every 4 minutes. Well the wattage no longer gets updated because it is too much to much at the same time. I now have it staggered into 2 messages separating the main HEM from the other 2 phases but at a higher frequency and my wattage is back. The command queue handling is getting worse.
-Inability to disable the nightly heal which significantly increases the lag issues due to the increased command queue and luup engine instability.

What I did to stabilize the vera plus:

  1. Removed it from my vera account
  2. Moved all my logic and plugins to openLuup
  3. Manually edited the cmh-ra tunnel start files to kill the SSH tunnel to the vera server which still gets linked even if you have no account
  4. extrooted the vera, basically making it run on an external SSD through USB giving it a lot more storage space. It is the same external SSD it is writing its logs on.
  5. replaced a few failing or soon to fail devices on my network though I am not sure this should impact the luup engine. The vera is just much too luup reload happy.
  6. Added Home Assistant/openZwave as now the primary zwave controller and the SUC/SIS master.

Pros and cons of other platforms:

Hubitat:
Pros: Very good device support for both Zwave and Zigbee. Rapidly developing with a lot of knowledge from ST. Can run completely without internet. Uses an external stick. Logic and plugins are also very extensive.
Cons: Very limited zwave network control. Cannot join another network as a secondary or set your own secure class key for example. Lack of visibility of what is happening underneath in the stack. Lack of an open API so that I could control it from another controller (openLuup). The current API requires and oauth2 which I don’t want to implement. Can be added to another network only if the stick is moved to another computer and use openZwave to include.

Home-Assistant/openZwave
Pros: Incredible and unmatched external API support and device support. I got my garage door opener, shades, door locks and security sensors to all work. Rapidly evolving and developing platform. Ability to set up your secure class key. I actually extracted the key from the vera and ported it to home-assistant which can’t be done on either of the other 2 controllers. Strong community input with a lot of users.
Cons: Had to learn some python. Relies heavily on openZwave with which the integration is precarious. The interface limitations are the same as hubitat if one only uses the integrated one. I had to modify and rebuild openzwave to the 1.5 version in order to be able get my door locks work. The default integrated version is 1.4. Have not been able to make my HUSBZB a secondary controller so it is now the primary. Learning curve as the configuration though not very difficult, is very manual. No plugin for openLuup.

zwave.me Zway:
Pros: Has the most control over the zwave network with ability to assign SUC/SIS, provides the most information on network devices and activities with very direct access to the zwave stack. Uses most advanced zwave SDK. Accepts to be secondary controller with no issues. Very user friendly GUI with device support. Has plugins for openLuup.
Cons: No Zigbee. Has buggy firmware upgrade process. Not able to input your secure class key. Limited secure class support with also inability to grab it from the primary controller during inclusion. Good but not extensive external API support for other devices, I could not use it as a standalone whole house controller.

One of my workaround these days for secure class device inclusion, is to include with home assistant, get information to the vera and then configure the device manually on the vera.

I have been also looking at the SDK/ZSK version of the different controllers:
Aeon Z-stick Gen 5 is on 3.95
Vera is on 6.1. (Zway says the vera is actually 6.83)
zwave.me UZB is on 6.71 but there is a 6.83 released which failed to upgrade for me.
HUSBZB is on 4.5 (like the vera prior to 7.0.26)

I am on my 5th day of uptime (without any spontaneous Luup reload) and counting and I am starting to observe a very slow memory leak on the system. Initially the cache and buffer memory on the system were creeping up pretty fast. I ran a cache clearing command to bring it back down after about 48hrs but it is still creeping up slowly. I will report back is it ever settles but it does not appear that way. Both the Luup engine (LuaUPnP) and the cache/buffered memory size seem to be going up at a steady rate. I may need to setup a cron job to run a script and clear the cache on a regular basis but I am not sure what to do with the LuaUPnP memory useage besides reloading the luup engine.

Re. memory leak, I have this on one of my Veras.

Available memory ( = Free + Cached) remains constant, but free memory goes down and around 5Mb there’s a reload. Note the timescale, this is 60 days worth of data. Two of the reloads are at the start of each month (7/1 and 8/1.) If it wasn’t for this fall off in Free memory, I should invariably be able to manage a whole month rather than just 15 or so days.

However, I concede that this is a lot more under control than some others!

[quote=“akbooer, post:6, topic:199503”]Re. memory leak, I have this on one of my Veras.

Available memory ( = Free + Cached) remains constant, but free memory goes down and around 5Mb there’s a reload. Note the timescale, this is 60 days worth of data. Two of the reloads are at the start of each month (7/1 and 8/1.) If it wasn’t for this fall off in Free memory, I should invariably be able to manage a whole month rather than just 15 or so days.

However, I concede that this is a lot more under control than some others![/quote]

This confirms my suspicion, thank you.
The command to clear cache seems to also clear the buffer memory and therefore the available memory. I went from 200MB avail to 130MB in little over 2 days and was able to clear it back up to ~200MB (Vera Plus). I am now back down about 155MB Avail. I noticed that a housemode change would recover some little amount of Avail memory (buffer) but the leak continues.

For the machine I showed, the drop_cache command is run every 30 minutes, and, indeed, the available memory is pretty much constant.

For the machine I showed, the drop_cache command is run every 30 minutes, and, indeed, the available memory is pretty much constant.[/quote]

Wow, that’s pretty frequent, I was thinking of running it once a week…

@akbooer,

You seem to have some which have this problem and some that don’t. Did you find out what potentially could be causing it?

Not yet. Working on it! Ideas welcome.
This is actually on one of my UI5 machines, so I think that the problem is pretty fundamental and embedded in MiOS.

Assuming they are all connected the same way ethernet wise, I would rule out the internet. I would suspect number of zwave devices or rather how chatty they are. Does one have a chattier zwave network than another or are they all on the same zwave network? I have an HEM and a number of devices reporting power useage on a regular basis for example which I know the vera is struggling with. The command queuing of the OS seems quite limited and potentially could be storing errors or unprocessed messages in the cache and never process them. Just an idea.

I decided to insert a cron job every 12 hours to run:

sync; echo 3 > /proc/sys/vm/drop_caches

and it seems to fully recover the free memory back to 190MB. I will see if it creeps back down over time. It may also be more frequent than needed.

Got a Luup reload after 7 days of uptime… sigh… :o >:( >:(

Just start saving your coins for every Luup reload and you’ll have enough saved up to buy HomeSeer by the time the next sale happens in November. If you can keep your sanity that long.

Ha! Yeah I wish but my biggest energy barrier is migration and I have gotten the vera to now be up for days at a time. I think I know the source of that reload and remedied it. Let’s see where this goes.

[quote=“rafale77, post:13, topic:199503”]I decided to insert a cron job every 12 hours to run:

sync; echo 3 > /proc/sys/vm/drop_caches

and it seems to fully recover the free memory back to 190MB. I will see if it creeps back down over time. It may also be more frequent than needed.[/quote]

Besides this cron to clear memory, is there any other cron jobs you recommend in order to stabilize my VeraSecure, which has a large number of z-wave devices?

Not really. Since the last FW is a disaster, I am also running a job to eliminate the forced firmware update page. My last experiment is to eliminate polling since I learned that hubitat does not poll its network and works just fine.

FWIW I have not had an unscheduled luup reload in weeks, and all my automation is rock solid. The solution for me was to turn my Vera into basically a bridge for my Zwave devices ie remove almost all scenes, apps and non-zwave devices. All my home automation stuff is now managed by a raspberry pi running Home Assistant.

I still have a scene that once a day does a

os.execute("echo 3 > /proc/sys/vm/drop_caches")

and another that once a week does a

os.execute( 'reboot' )

but I suspect they are no longer required.

Ha! Yeah I wish but my biggest energy barrier is migration and I have gotten the vera to now be up for days at a time. I think I know the source of that reload and remedied it. Let’s see where this goes.[/quote]

I will say that migrating to HS is surprisingly fast. The device enrollment process takes much less time and their dense gui uses cascading menus to build rules for events. It goes quite fast.