Prevent frequent luup reloads when dealing with frequently updating virtual sensors

therealdb · September 28, 2020, 10:44am

I recently added a lot (10?) of virtual sensors, and I started to have a system less stable than before. Someone remember that I had to completely turn off my virtual RGB lights from being sync’d, because sending the color every 15 minutes will cause luup reloads.

Anyway, this is true with general variableset as well. So, I wrote this simple handler:

function lr_updateVar(lul_request, lul_parameters, lul_outputformat)
	local serviceId = lul_parameters["serviceId"]
	local variable = lul_parameters["Variable"]
	local value = lul_parameters["Value"] or ''
	local devNum = tonumber(lul_parameters["DeviceNum"], 10)

	local updated, newValue = setVar(serviceId, variable, value, devNum)

	luup.log(string.format('[lr_updateVar] %s - %s - %s - %s - %s - %s', serviceId, variable, tostring(devNum), value, newValue or '', tostring(updated)))

	return tostring(updated), "application/json"
end

Where setVar is preventing from writing the current value again:

function setVar(sid, name, val, devNum)
    val = (val == nil) and "" or tostring(val)
    local s = (luup.variable_get(sid, name, devNum)) or ""
    if s ~= val then
        luup.variable_set(sid, name, val, devNum)
        return true, s
    end
    return false, s
end

And then the handler is registered at your startup:

luup.register_handler("lr_updateVar", "updateVar")

This could be called as variableset (same parameters):

http://yourvera/port_3480/data_request?id=lr_updateVar&DeviceNum=6&serviceId=urn:micasaverde-com:serviceId:DoorLock1&Variable=Status&Value=1

After 2 days, I have a stable system again (12+ hours of luup engine uptime, vs 2-3 before). There’s memory leak for sure in this area with frequent writes. Hope this helps someone else.

rigpapa · September 28, 2020, 12:58pm

A lot of that is very familiar looking code.

Working on Reactor and my other plugins has made me keenly aware of where and how race conditions can lead to deadlocks in Luup. Changing approaches like this can have benefits, but sometimes you just end up shifting the problem away temporarily, and something else you do later brings it back, often worse. The situation is, IMO, the result of many years of Vera exposing its disdain for plugin developers, resulting in plugins being blamed for problems that actually are rooted in the core Luup code, and thus no attention from firmware developers to the underlying causes. It takes an inordinate amount of screaming and wailing to get developer attention to any internal defect, and then months or years before fixes were forthcoming. IMO, in a well-architected system, plugins would, like JavaScript running in a browser, run in a walled garden sufficient to defend the system from crashes caused by plugin errors — crash the plugin but not everything else. Any system crash that can be forced by walled code is viewed as a defect of the system protections and not just a defect of the plugin. That’s not Vera. Their system is just a minefield of exceptions and bugs, and so we end up contorting ourselves with things like this trying to get to some kind of stability. In that sense, the end of this firmware is a welcome change. But given the culture, it remains to be seen how eZLO performs in this respect.

therealdb · September 28, 2020, 1:21pm

I 100% agree, but we have to deal with what we have now.

I’m more and more shifting to openluup, but I’m not ready yet and I think this could help someone in the future, so I posted it.

rigpapa · September 28, 2020, 1:45pm

Indeed, and to that end, let me continue on that… most of the time I’ve run into problems when setting variables at high rates, it is not related to setting the variables itself, it’s been actually a timing/concurrency problem with watches and when that code is triggered by the changing variables vs other code running for the plugin. A common scenario is that a timed event will run (i.e. call_delay() executing its callback) and a watch callback is invoked. In this situation, it appears they are not mutex-locked by Luup (bug, or omission?), so they will both run concurrently. If they both happen to manipulate the same table or other similar structures in Lua, this will cause a memory leak, corruption, or deadlock. I’ve found this to be quite reliable, so I’ve adapted my style of writing watch callbacks (and really all plugin functions) to do the absolute minimum and merely signal to another task that there is work to be done. You have probably noticed in looking at my code that much of the work centers around a task scheduler in which everything is handled by a single system timer, and the callback is actually a scheduler that determines which tasks need service and when (and thus reschedules itself). In Luup, if multiple timers expire, it only runs one timer task per plugin at a time, so by forcing all work through this scheduler, I eliminate concurrency problems with other callbacks (watch, request, job). This is not unlike writing low-level interrupt handlers: you do as little as possible as quickly as possible and get the heck out, and some other running process takes note that the interrupt handler has somehow signaled that it’s time to do the real work.

If you’re doing any watches in the code that gave you trouble, I’m thinking this is probably the real problem you are tangling with. You may have only changed the timing by using a request handler enough to mask the problem for now, but it may be back. I don’t think what you did is incorrect, just to be clear, I’m just saying that you may ultimately find it isn’t enough (or someone else will, when they use your plugin).

Maybe they should have called it “Huup”, since we have to jump through it so much.

therealdb · September 28, 2020, 2:01pm

No watches, at the moment, but I plan to add a couple more. It’s just some code updating virtual sensors (light, temp, humidity, distance) and to be used “statically” by other routines, trying to execute anything else based on their current value.

I also added some code, in my other logic outside of Vera (they’re populated via MQTT), to avoid updates if the value is within a given range (let’s say, 1 cm for distance, .5 for temp, 1% for humidity, etc).

Your findings make sense. I’ll try to add some task scheduling logic, if possible. It’s not a plug-in, but plain old startup code, at the moment. Thanks for the suggestions.