Securing and stabilizing the Vera by taking it off the grid

I think I am pretty close to where I can be on this topic. It seems like I am now able to almost run forever without Luup reloads due to internet connection drops, network errors or any memory issues. The one leftover source of Luup reload might be within the zwave network which I hope, the Luup engine code is not allowing.

Disabling “networkmonitor” aka “NM” by adding an exit 0 into it’s start script has made the vera much snappier and the modification to my LogRotate.sh script has eliminated the vera reloads. I have fixed the absurd mios system time resetting scripts and am now fully relying on the native openWRT scripts for it. The main source of available memory drop is also the logging mechanism… so the log rotate reliability’s key.

I also disabled the MiOSRestApi.sh which contains functions to make calls to the MIOS API server and now prevents it from checking for new firmware updates amongst other things and possibly could be preventing the zwave automated nightly heals. This is making me wonder about the monthly Luup reload too and the end of the month is coming…

Other mods: I updated the “serialapi_controller_static_ZM5304_US.hex” to the latest version (6.81) I obtained from Silabs though this does not seem to matter so much, it is now matching the firmware SDK as it seems vera did not update it as part of the firmware upgrade as the original file looks older and smaller. I have rewritten some of the scripts to make the internet led depend on successful ntp server checks rather than the networkmonitor which I disabled, and service to be linked to the Luup engine status.

[quote=“rafale77, post:111, topic:199140”]I think I am pretty close to where I can be on this topic. It seems like I am now able to almost run forever without Luup reloads due to internet connection drops, network errors or any memory issues. The one leftover source of Luup reload might be within the zwave network which I hope, the Luup engine code is not allowing.

Disabling “networkmonitor” aka “NM” by adding an exit 0 into it’s start script has made the vera much snappier and the modification to my LogRotate.sh script has eliminated the vera reloads. I have fixed the absurd mios system time resetting scripts and am now fully relying on the native openWRT scripts for it. The main source of available memory drop is also the logging mechanism… so the log rotate reliability’s key.

I also disabled the MiOSRestApi.sh which contains functions to make calls to the MIOS API server and now prevents it from checking for new firmware updates amongst other things and possibly could be preventing the zwave automated nightly heals. This is making me wonder about the monthly Luup reload too and the end of the month is coming…

Other mods: I updated the “serialapi_controller_static_ZM5304_US.hex” to the latest version (6.81) I obtained from Silabs though this does not seem to matter so much, it is now matching the firmware SDK as it seems vera did not update it as part of the firmware upgrade as the original file looks older and smaller. I have rewritten some of the scripts to make the internet led depend on successful ntp server checks rather than the networkmonitor which I disabled, and service to be linked to the Luup engine status.[/quote]

I hope you are in the process of creating some sort of tutorial for the rest of us to follow… Keep up the good work! watching this thread closely.

Lol, I am too. This is probably my favorite thread on this forum!

Sent from my VS995 using Tapatalk

[quote=“tomtcom, post:113, topic:199140”]Lol, I am too. This is probably my favorite thread on this forum!

Sent from my VS995 using Tapatalk[/quote]

I definitely will. I am even considering writing a script you can execute to do all of this in one shot. So far it?s been holding up. I have not had any spontaneous reload since but for the sake of testing did a few reloads manually. I will now let it be for a month and see if it ever reloads.
Available memory has been oscillating between 180MB and 203MB.
On a different note I just wish mcv would provide a LuaUPnP test program like they used to on UI5 for windows so that I could run some test on another platform.

Got the luup reload from 1 day 1s of the month which I haven?t found how to disable. The interesting thing I learned is the total time it takes to run each reboot and reload on my system. Knowing how many devices I have and my startup lua runs a number of functions to establish variable watches, my full reboot takes ~1m40s and the Luup reload takes 33s. My openLuup which runs a lot more devices, plugins and all my automation, reloads in less than 3s on a single threaded VM…

It appears also that somehow I have managed to disable the automated nightly heal. As ALTUI is reporting that my latest heal is 4 days old. This would be interesting as I have not yet figured out where or how this happened. I am wondering if it is really not happening or if it is the logging of that event that is no longer occurring. In any case, I started thinking about creating a script to hack the time on the vera, which obviously would mess the logs too, to prevent all these unneeded time dependant events.

I am sharing the summary of all the mods I did to the Vera Plus to stabilize it and essentially disconnect it (almost completely) from the mios server.
This was done on firmware 7.0.26.01 (1.7.3831) as the latest release had catastrophic issues with secure class devices on my system. In spite of the help I have gotten from hours on the phone with the mcv support team I had not been able to get my vera to be a reliable, set it up and forget it system until now. It took a lot of testing and onion peeling of the system to conclude that vera is trying to do too much with too little. openLuup was one major leap toward stability but the occasional vera crash still were annoying and required interventions.

With all these mods, my vera has not had any luup reload or vera reboots except for the 1st of the month at midnight.
I practically eliminated all the weird “device not detected” and data corruption issues.
I somehow may have also disabled the nightly heal, though I am trying to confirm this as I used to have very random heal events as reported by ALTUI which have not reproduced in a week.
All plugins not relying on the MIOS servers work. All scenes and Lua code, startup Lua all work. I am only disabling the internet connectivity checks and and the underlying services connecting to the mios servers. The App Store which uses regular http calls still works.

What no longer works:

-Firmware automated check with the mios server
-vera notifications (I am using pushover, if you use vera alert, you likely don’t care)
-vera mobile apps as they rely on a remote access relay (I am running everything off of Homewave and openLuup and have stopped using the slow buggy iOS app years ago)
-logging to mios server. All the logs will remain local

The one service I have not been able to disable and still preventing the vera from running completely without the internet is the event server to which the vera sends alerts and accumulate unless either one dumps the files and does a luup reload or allow the vera to connect to its mios event server. This is all in spite of CC agent great help and attempts. The call is unfortunately made within the LuaUPnP program which is the vera engine itself and is a C++ compiled binary.

So here are the mods:

  1. Unrelated to disconnection but purely for stability/reliabilty, I extrooted the vera
    see here: http://forum.micasaverde.com/index.php/topic,103140.0.html

The main reasons for doing it are:
a. High number of people, including myself reporting NAND flash failure due to excessive write cycles on a very limited amount of storage.
http://forum.micasaverde.com/index.php/topic,109476.0.html
b. Limited storage has caused havoc on firmware upgrades recently to the point of bricking vera units.

Unfortunately I have only been able to make this work on the VeraPlus. The Vera Edge and older Veras do not seem to want to mount the external drive on boot.
This is low risk for anyone to try since a failure would just make you boot from the original vera NAND flash

  1. Modified 6 files in the /usr/bin folder
    a. mios-services.sh Killed the service functions
    b. MIOSRestApi.sh Disabled all the MIOS server calls
    c. Start_networkmonitor.sh Disabled the network monitor which is a source of Luup reloads, uses resources and is practically useless. It checks for network and internet connectivity and reboots/reloads which is the last thing I wanted it to do. It also seems to slow down the vera
    d. sync.time.sh disabled. One of many time sync scripts. This one is redundant and obsolete
    e. Start_LuaUPnP.sh Very slight mods to control the service led on the unit. Since network monitor is disabled, this LED would be off even when the vera is up and running. I made it depend on the vera engine only.
    f. Rotate_Logs.sh. Removed a nonsensical reload and reboot script which would cause the vera to reboot upon server connection failure even if you have log uploading disabled. I also brute forced disabled the log uploading.

  2. Modified files in /etc/init.d

check_internet, tunnels_manager.sh, wan_failover, cmh-ra, wol, all disabled for obvious reasons.
mios_fix_time.sh, modified to eliminate a strange time reset to Jan1st 2000 but I guess I could disable this script altogether given the number of redundant time resetting scripts.

  1. Updated the zwave sdk API to 6.81 (in /etc/zwave) to match with the zwave chip firmware. I guess it is either a mistake of the firmware or mcv purposefully wanted to save 20KB of space by keeping a 2 year old version of the file.

I have attached a zip file to decompress and upload on the vera and run modvera.sh which will update all the files and reboot the vera in one shot.

Usage of the file: decompress and SCP into the vera. Copy the entire folder. No SSH into the vera and go into the folder and type the following:

chmod +x modvera.sh ./modvera.sh
At the end of the script you will get an ash error which is due to the fact that the script deleted itself so you can ignore it.

In combination with openLuup running all my automation and plugins, I now finally have a stable system!

Rafale, wow! Good description. Though for me this is a bridge too far…

I want to suggest Melih to read ypur post very carefully because the reasons you did all this, I also have them.

Especially the buggy gen5 secure class zwave support (nonsupport) is breaking up all I have had running fine until this release. It is the culprit of all issues I have on my system.

Keep up the good work!

Great write up and I really appreciate all the investigative work you have done. I have no idea what most of what you have written means as I’m not technical but I believe it keeps pressure on Vera to get things sorted…or I’ll be joining the crowd moving elsewhere unless we see something soon. Not the six months that keeps getting mentioned.

Ran the script on my test Vera Plus with no issues. I’ll keep an eye on things and report back.

Thanks for all of the good work!

Thank you for testing. I just installed this on a test vera running the latest 4001 7.0.27 firmware as well and not problems. There is actually no difference in os build between these two versions so the difference is only between the two programs. I will also write a script to revert it in case it is needed.

Attached is the undo script.

To use it, upload to vera and do

chmod +x UnModVera.sh ./UnModVera.sh

Thanks for the undo script!

Things are running fine, but this test unit is lightly loaded, so I’ll add some devices over the next few days to make the test more useful.

FYI, ALTUI shows a Z-Wave heal occured at 1:00AM today. I’ll keep an eye on this.

Interesting that you still have a nightly heal. I had a luup reload at 1am this morning due to the daylight saving time change and I was expecting it. The luup engine does not like time changes. I have gotten to the point of wondering if it actually also reloads when there is an ntp time adjustment…
This is such a poor design. Imagine a plane rebooting its engine and instruments mid air because its clock time drifted or because it lost communication with its black box… This is exactly what the vera does. Why would one allow a peripheral function in your software to trigger a full reboot of the system?

[quote=“HSD99, post:122, topic:199140”]Thanks for the undo script!

Things are running fine, but this test unit is lightly loaded, so I’ll add some devices over the next few days to make the test more useful.

FYI, ALTUI shows a Z-Wave heal occured at 1:00AM today. I’ll keep an eye on this.[/quote]

Let me know how things are going. I am still not seeing any nightly heal on my unit. I don?t know how it got disabled. It might be in the zwave stick with something I did with another controller but I thought it was triggered by the controller and not by the zwave chip. After the reboot of the 1st second of the month, I have not had any reload. I have since done some more tweaking of the service LED behavior so it better follows the luup engine status. I also upgraded all the non kernel packages of OpenWRT on my unit which include lighttpd, which is the webserver, OpenSSL, busybox and lua libraries. It took some tweaking (symbolic links) of the library files. The Vera Plus is running on a version of OpenWRT which officially does not support it?s CPU (MT7621ST). It is supposed to support the older CPU of the vera edge MT7620 but I guess they are similar enough that they can share the same packages. The kernel is very old…

It still shows a nightly heat at 1:00AM. Does it log the start or the end of the heal? I’m heading out of the country on business and won’t be able to play around with this for a couple of weeks.

It looks like it marks the start… I can?t imagine it finish every night at the same time.
I am on business travel overseas myself.
I played around with a second test unit and found out that the log rotation is a very large contributor to the reloads either by running out of memory while rotating logs or from the script itself calling for a reload when the rotation fails or when it lost its network monitor.

Hi rafale,
i notice that you do all your external integration (Alexa, Sonos, etc.) through either HA or OpenLuup. For those of us that may not want to roll out multiple systems and only want something like the native Alexa integration, do you know where in the scripts you modified MIOS hid those settings?

Actually the Alexa integration may still work after my mod. The sonos one for sure does. I am isolating my vera from the internet only through my firewall and only one way. Meaning nothing on the internet can initiate a call to the vera but the vera can reach the internet because of the event server problem I described previously.

All the local integrations like sonos work for sure. Alexa is going through the habridge at the point for me and I don?t know whether it relies on the mios server (cloud to cloud, Alexa device ->amazon server ->mios API server or ra sever->vera) or whether it connects directly to the amazon server from the unit. If it does call the mios servers there can be many:
The MIOS_Rest_api maybe one, the cmh_ra+tunnels_manager would be the other (remote access tunnel). You could then delete these two files before running the script.

I’ll check those files. i ran the script and the native alexa stopped working. i ran the undo script and everything was back to normal. it must rely on the MIOS servers for some of the integration. THanks for providing all this!

After a week just got a luup reload corresponding to the time of a daily zwave data backup triggered by my remote server. I am guessing that it either locked up the LuaUPnP for too long or that the system ran out of memory while doing it. I am going to try increasing the cache memory cache dump frequency.