Vera Edge 7.31 suddenly luup reloads

Three months ago I started my production unit from scratch. Vera Edge with most recent firmware update 7.31. Excluded and included all my zwave nodes and build up all my automation with Reactor. So I don’t use a single Vera Scene. I have 10 Reactor Sensors. Next to that I only use Switchboard , ALTHue and Housemodes plugins. I have 51 zwave nodes connected.
All run smoothly … till yesterday, Yesterday I had suddenly 10 luup reloads with times inbetween of 40 minutes to two hours. During the night I had no luup reloads. And today (only) 2 till now (I live in Holland, it is now 9 pm).

I looked at the logs yesterday after the 10th luup reload. Here some snippets from the logs:

So at 22:42 there was a luup reload.
What does “dirty data” mean ?
What does "mongoose"mean ?

After the restart I see a “got CAN” message what does that mean ?

I also SSH’ed into my Vera and checked the memory:

So in rootfs there is still 38% = 3.6M available.
But dev/root is 100% full ??

Who can help me:

What is the cause of these luup reloads and how can I solve it ?

Don’t worry about /dev/root—it’s a read-only filesystem, hence it will always show as 100% full. I’d restore a backup from a couple of days before the trouble and see if that helps. There may be a corrupted user.json file.

1 Like

You may also be able (if desired, as I did) to completely replicate all of the HouseModes plug in functions using Reactor. Just a thought to simplify things and clean house even more.
Replacing HouseModes Plug-In with Reactor

1 Like

+1 to this. In my experience, the more you use Reactor to replace other logic in your system, the happier you will be.

1 Like

Have you had any internet connectivity problems? I’ve noticed that instability of the Internet connection causes increased reloads.

2 Likes

I had internet problems a couple of weeks ago and my Vera entered a reboot cycle, with a full reboot every 20 minutes! 3 days after that mess, I wiped the unit, restored a backup and I got 8 days before a new luup reload and no reboots.

This fw is so fragile that sometimes it goes against all the natural rules you’ll expect from an home automation system.

1 Like

Thanks for your replies ! I’m not aware of internet connection problems 2 days ago.
I have restored a backup from 3 days ago. We will see how it goes

I have a strong suspicion that Vera’s “clock discipline” is in large part responsible, and I’ve been experimenting with that path for a few weeks. The latest Reactor even got some additional clock sanity checks because of other discoveries I’ve made in the process.

At the moment, I am testing using a local (LAN) time server rather than Vera’s chosen pool of remote servers exclusively. This seems to be bearing some fruit, but I’ll say it’s not entirely conclusive as yet because I’ve been focused on other things and I haven’t done specific, dedicated testing (injecting faults, etc.), I’ve just let it run and see how it goes. But my systems are stable, and my house system in particular seems to have little inclination to reload unless I cause it.

In the specific case that Internet access is down when the system boots up, if the clock cannot be synced to remote servers at that instant, it will be set to a default (fixed, incorrect) time and run until it can. This causes all kinds of problems with any automations that rely on time. But more importantly, when Internet is restored, the unit will sync time—and reload Luup. Consider that the Vera can often boot faster than your “customer premise equipment” can reboot and reconnect to your ISP, so the Vera can often come up before Internet access is up and ready. I have observed that the Edge, especially, has some related issues… the sequence of boot events and the time taken for those events seems to often put it in the position where the start of LuaUPnP precedes the completion of the first setting of the clock.

Additionally, in my prior testing on 7.30 with the Vera team that led to the discovery of the “no-wifi? no-internet? no-LuaUPnP!” bug (also called “Xmas Lights” because of its visible symptoms on the unit LEDs), I discovered that if an Internet outage lasts more than about an hour, Vera reloads unconditionally when Internet access is restored. This appears to be a behavior of separate monitoring subsystems (the infamous NetworkMonitor) from the Xmas Lights problem, so I suspect that although Xmas LIghts was fixed, the long-outage reload behavior still persists in 7.31.

It’s also the case that Vera uses several “well known” targets to determine if Internet is up or down; and at least on my system, among these are sites that are blocked in some geopolitical regions. If some of those targets are blocked, the pool of available targets goes down, making the test potentially more sensitive than it should be. I’m not sure how they adjust the targets based on region, but they should (must). It would be educational to have someone in the EU or Asia look at the NetworkMonitor’s log file and report what servers it is attempting to ping. I note in my log that one of the servers it attempts to hit at startup is “test.mios.com” and this reports as unresolvable, so right off, we’re off to a potentially bad start.

I think these are among the things, likely just a few of many more yet to discover, that lead to the fragility @therealdb asserts.

And to think that much of this could be avoided with a few cents worth of parts (a battery-backed hardware real time clock). IMO, no system used for home automation should be without it.

3 Likes

HI rigpapa thanks for your research and observations. I’m living in Europe (Holland) so how can I checl that NetworkMonitor’s log file ?

By the way yesterday evening I restored a backup and this morning I got again a luup reload (only 1 today so much better than 2 days ago). It is now 4pm in Holland.

I have disabled network monitor and I have time server sync’ed locally. So, it’s very strange, and probably buried into some hidden mechanism.

We’ll see… There can be various reasons for luup reloads as @rigpapa and @therealdb explain and also rafale did in the past. I work with a lot of Fibaro devices for more than 7 years on Vera conctrollers: never problems with ghost devices. I did have it once with one Aeotec Multisensor 6 ( I also have 3 of these) but that was with older fw version and in combination with other problems at that time which I hope I have tackled now (start clean, no old plugins anymore, no scenes, and use the adviced settings of rafale77).

1 Like

My Vera restarts randomly. Sometimes once a day; sometimes once a fortnight. Always different times of day!

So I chase down nearly all my Vera unplanned restarts, but this “luvd_restart_mongoose” has me baffled. I don’t see a clear indication of what is causing it. I think this is new in 7.31 - this is the first time I have seen it…

I also have no clue… who can help ?

Yes, it’s new in 7.31 and was added to catch mongoose restart and sync with a luup reload. Previously it just stopped accepting connections and required a manual restarting.

Okay and what does that mean a mongoose restart? And what could be the cause?

Mongoose is a service used by the luup engine to store data and when it crashes, the luup engine is restarted.

Okay, and what could be the reason that it crashes? I have enough memory left on my Vera.
I would like to prevent the luup reloads. SInce 2 weeks I have 1 to sometimes 7-8 luup reloads per day. My Vera still functions as it should but before I did not see it Only once in a while.

I seems to recall it’s linked to too many http requests. Back in the alpha stage, I had to remove some code that was doing a lot of http requests to my Vera. I have since removed it, but stability is always high and low on a Vera.

I have to say, I’ve always been a little dubious on this, at least as a generality. I beat the heck out of all my Veras here with a home-spun dashboard, and when I do regression tests on Reactor, I do it with a python script that hammers the system with action and variable requests as fast as it will take them, and I never have seen crashes that I can conclusively relate to all of this traffic. The regression tests can run 30-45 minutes per cycle, flat out and hammering away the whole time. No issues.

I suspect there may be an issue with some particular request, or the way the request is structured/used (i.e. if you’re asking for the entirety of user_data every 30 seconds, you’re gonna have a bad time). If it wasn’t for the fact that I believe at this point we’re unlikely to see a 7.32, I’d say we should dig in more and see if we can isolate it, but at this stage, anything we find has a slim chance of being fixed, I think. At least we’d figure out what to avoid doing, though.

2 Likes