Sensing error in lua for scenes and events

Is there a way to trigger reactor sensor if lua reports error after reloading?
I see that I can make reactor tick at the lua reload, but my intention is to report only when the reload ended up with an error, which happens from time to time.

Nothing jumps out at me. I’m not aware of any signal for this from Luup, or any attribute set in user_data or anywhere we could look at.

There are also many failure modes that lead to this. A runtime error in startup Lua is one. A syntax error that makes startup or scene Lua a non-starter is another. That’s the more serious, because then nothing works at all. So getting consistency, no false negatives and no false positives, would not be easy, maybe not doable at all.

The only thing I can imagine as an experiment would be to set a value in an RS expression/variable, say to the current time, in startup Lua as the last thing it does, probably delayed by 60 seconds or so. Then in the RS, craft a way to test that value (at the right time) and determine if it’s reasonably proximate to a reload condition tripping. The idea is that if startup Lua can be loaded and runs to completion, the value is set; otherwise, it goes stale and you can detect that. I leave that as an exercise for the reader.

I am not sure this helps, and kwieto is a long time Vera user who probably already knows this, but VeraAlerts sends me notifications of reboots and restarts. If you are not expecting one of those, the notification could cause you to look at the controller to see if there is a problem. Sorry if that is not helpful.

Don’t know if this will help you … but I use a “deadman” approach to detect that. I’ve got two Vera’s running, and I have them send a message to each other (via http: requests) from a repeating scene once the startup Lua has been completed. If one doesn’t hear from the other for two cycles, it sends a message to me.

So if the startup Lua doesn’t finish or scenes are not running, or if the system is hung for some other reason, I’ll get notified.

The 2nd device wouldn’t have to be a Vera – it could be anything where you could run some listening code.

I had some time this morning over my first sip (of coffee) to play with this. Here’s what my proposed solution looks like in Reactor. This uses Reactor 3.5 (release candidate), which has a new “Expression Variable” condition type to make testing the “canary in the mine” variable easier, but it can still be done with 3.4 (Device State condition looking at the ReactorSensor’s variable, which must be exported).

First, here’s what goes in startup Lua. This should be the last thing your startup Lua does:

-- Place this at the END of your startup Lua -- it should be the last thing done
function tickleReactor()
    -- Replace 426 with the device number of your ReactorSensor. Make sure that RS
    -- has a variable "laststart" defined with no expression (blank).
    luup.call_action("urn:toggledbits-com:serviceId:ReactorSensor", "SetVariable",
        { VariableName="laststart", NewValue=os.time() }, 426)
end
luup.call_delay("tickleReactor", 30)

And here’s the ReactorSensor:

The group “Startup Check” will go true if the startup Lua fails.

A few details to note:

  • The “Luup Reloaded” condition has a “delay reset” of 60 seconds;
  • The group containing the “Luup Reloaded” condition is configured NOT AND
  • A timesince variable has been added to compute the difference between current time and laststart

This works pretty much as expected/described. On startup, the startup Lua code delays an update to laststart by 60 seconds; meanwhile, the RS delays its timestamp check by 60 seconds. When it finally checks it, it checks to see if the difference between the last successful update to laststart and the current time occurred less than 2 minutes ago (seemed like a safe range to pick), and if so, startup Lua ran fine and the “Startup Check” group will stay false. If startup Lua can’t run, however, the value of laststart will not be updated and the delta-time will get larger at every subsequent reload (and very likely larger than 120 seconds on the first failed reload), and that sets the “Startup Check” group true.

Note that the 120 second “allowance” in the time difference is meant to account for (a) the difference in time between when startup Lua runs and Reactor is started, which is not deterministic; and (b) the time required for your ZWave devices to settle, because ReactorSensors are not re-evaluated at startup until your ZWave network indicates that it’s up and running.

I’ll test it in practice.
What are the risk installing reactor 3.5 release candidate, except that reactor itself may have a bug?

As for the error, I think it is related to the Netatmo plugin, but I’m not skilled enough to track it. Lua reloads happen from time to time and sometimes they result in such error, then Netatmo plugin stops updating its data.
Currently I use the device state timestamp updates for Netatmo plugin, which works pretty well (I’m getting alert if plugin doesn’t update long enough), but I would like to have more sophisticated solution, which would allow me to force lua reload if it hung, or alert me if lack of updates is caused for another reason.

The problem is that information about reboot is not enough. Luup tend to reload from time to time. Most of the time it goes without issues (you just may experience some delays in running scenes), but sometimes it hangs (the “error in lua for scenes and events” message) and then your system generally work (i.e. you can steer your switches), but it doesn’t run scenes for example. Also some plugins are not working anymore.
One may want to check the system every time luup reloads, but this would be an overkill in my opinion.

Good Idea, but I don’t have any secondary device to pair it with vera. The whole system is at remote location which is not helping as well (It has to be 100% operational remotely).
I’ll try @rigpapa solution and post results.

Reactor 3.5 is a release candidate. I haven’t released it because it’s my practice not to release software right before major holidays. While I am not aware of any issues that may interrupt Christmas, it’s a very complex environment we all work in.

If you want to do it in 3,4 and save yourself any angst and the trouble of the install, the rest of the example works, and the only changes you need to make are:

  1. Make the “timesince” variable exported (click the up-down arrow so it highlights green);
  2. Where I’ve shown a 3.5 “Expression Variable” condition, you will use a “Device State” condition, choosing “(this ReactorSensor)” as the device, and the “timesince” variable, and then “>” operator and 120 for the operand. The 3.5 “Expression Variable” condition is just a short-cut.
2 Likes

You’re in change freeze :slight_smile:
C

@rigpapa, one question about reactor logic: why do you use here “NUL” group operator here?
If I want to run some actions based on the state, shouldn’t be it “AND”?

You can run actions on any group. My preference now is to make the root group NUL, and having no actions on it (because they would never run with NUL operator), and put my active groups within the root group. More modular–encourages more compartmentalized, reusable thinking, I find. So my examples going forward will be increasingly structured this way.