Luup Exit Code 245

This is appearing more frequently in my log, several times a day, for the last few weeks:

Sun, 12 Sep 2010 16:48:23 +1000 - Exit code: 245

Along with a restart of the Luup engine.

It occurs randomly without seeming to be related to any of the logs that precede it. Following it, of course, is the normal Luup startup log output.

What’s causing it, and how do I make it stop?

UI2, 1.0.996, Vera1.

(Edit: I don’t see this happening in UI4 1.1.1047, so with any luck it’s been fixed.)

It will happen from time to time, depending upon the release you’re using. If MiOS “detects” that something is amiss, it will “kill” off everything and cause it to restart.

Technically, it should never need to do this, but it has the above mechanisms to prevent problems resulting from memory leaks, bugs (etc). If they’re occurring very frequently then you can often work out what’s going amiss by [periodically] looking at “top” and working out which bit is leaking memory (most common cause from what I’ve seen)

Over 8 years later and… this seems to be still my first source of luup reloads. It is currently correlated to me upgrading firmware of a network switch. At least it is the case this morning but I often can’t correlate it to anything happening in the house. It is not very frequent, once every few days or weeks. It is not a memory leak, I am far from being out of memory or storage. I have no extensive plugins on the vera either.

1 Like

Having dug further into what could cause this, I am now highly suspecting that there is a command queueing error to the serial z-wave interface.
I know the eZLO devs are working on a new platform but I have not seen the color of it yet.
@melih, @Sorin:
Just in case this could be useful for your new platform, I think the main source of luup reload is a lack of error handling in the zwave command queue. I am seeing in my logs a number of unexpected responses from some devices which I think makes the luup engine thinks that these devices did not respond. It seems like just before I get a luup reload, I also get very long lags in responses from luup commands which indicates that either the luup engine or more likely the zwave dongle/serial is busy waiting for something or processing something and ends up hanging with a timeout which then leads to the vera doing a reload and miraculously works again. The fact that the reload recovers it hints though that something between the luup engine and the serial interface is out of sync. It is pretty rare as it occurs randomly at interval of a few days to a few weeks but it is definitely a problem.

3 Likes

I hate to toot the horn of another system, but while the eZLO team is busy rebuilding this firmware, the behavior of many competing systems is that Z-Wave is handled in such a way that it is possible to restart/reset the Z-Wave dongle and restart Z-Wave processing without a restart of the entire system–a subsystem restart, isolated. It will always be the case that interface hardware goes wrong. Just as network devices recover by re-opening sockets and re-authorizing when an existing connection times out or is closed suddently by the far end, so too should the Z-Wave radio be handled. My 2p, and a hope that eZLO engineering has considered this approach for the future.

So that it’s said, this doesn’t make up for good error recovery and doing everything possible to avoid even the subsystem restart–it’s a last-resort response–but it shouldn’t be necessary to nuke and rebuild the entire world in any case.

3 Likes

:pensive: sigh yeah I see this a lot too,

I’ve reported this to the old engine team. In fact this was a known issue at a certain level, but very hard to reproduce and fix. We’re just hoping we’ll have enough time to catch this issue part of the next firmware.

1 Like

Thank you for helping and pushing it to the devs. As you can see from the age of this thread… Many of us have been longing for a fix.

i you need a backup file to restor to avera plus to see the problems ill be happy to provide one

Trying to help this a bit further.

I was going down the wrong track I believe with the time keeping of the OS causing the luup reloads. I am noticing in the logs that the zwave commands show a parameter called “tardy” and that this parameter goes up and down (according to how busy the zwave dongle is). It appears that I get a luup reload whenever the “tardy” time exceeds something like 320 or 360 which is when I am observing the zwave commands lagging. It appears to be the time between when the vera expects a response and when it gets it. I am also getting a lot of “got CAN” errors for specific dimmer commands which I don’t know the impact of.

This is the error I see when I get a reload and in this particular case I had a code 137 which is a self triggered reload by the Luup engine.

03 07/16/19 10:58:41.831 JobHandler_LuaUPnP::Reload: TimeJump Critical 1 m_bCriticalOnly 0 dirty data 1 running 1 <0x7ef80520>

Thanks, richie, please shoot be a backup file privately.

Thank you, adding to this to the bug explanation.