lu_block_jobs

Hey AK,

Did you ever come across the titled command when you wrote openLuup? http://wiki.micasaverde.com/index.php/Luup_Plugins_ByHand#.3Crun.2Fjob.2Fincoming.2Ftimeout.3E

I’m asking because I’ve been trying to solve a dropped connection problem that I keep having with a plugin I have installed, and it occurred to me that it may be the result of a race condition of some sort. Basically, anytime the connection drops (IP TCP socket), the plugin tries to re-open the connection with an io.open to no avail. Here’s the code: luup.io.open(lul_device, PLUGIN_SETTINGS.PANEL_IP_ADDRESS, PLUGIN_SETTINGS.PANEL_IP_PORT) luup.io.intercept(lul_device) local wStatus = luup.io.write("^01,00$\r\n", lul_device) local pData = luup.io.read(2, lul_device)

A luup reload will often, but not always, clear the problem. The drop happens most often after long periods of connectivity, or if another plugin in behaving badly (per error messages in the log.) I can also clear the problem by logging into the exterior device via putty. Once cleared, the device will once again connect with openLuup on reload.

Anyway, just looking for advice and not any sort of action or investigation as part of what’s going on here is probably related to the way the exterior device handles repeated connections…

That link you gave was the most read Vera Wiki page ever when I was developing openLuup in the first place!

This is, undeniably, the worst implemented, worst documented, least understood, and error prone part of the Vera/MiOS Luup architecture.

Countless experiments by myself and others have failed to converge on the ‘correct’ implementation for openLuup. Here are a few:

[ul][li]Jan 2016[/li]
[list]
[li]http://forum.micasaverde.com/index.php/topic,35972.msg266532.html#msg266532[/li]
[li]http://forum.micasaverde.com/index.php/topic,35983.msg266695.html#msg266695[/li]
[/list]
[li]Apr 2017[/li]
[list]
[li]http://forum.micasaverde.com/index.php/topic,48814.msg320662.html#msg320662[/li]
[li]http://forum.micasaverde.com/index.php/topic,37268.msg319978.html#msg319978[/li]
[/list][/ul]

It’s worth, perhaps, reading at least of bit of those threads to understand some of the issues.

One basic problem is lack of a [tt]luup.io.close() [/tt]function, another is the detailed understanding of what [tt]luup.io.intercept()[/tt] actually does, yet another is the differing requirements of devices once the connection is lost and needs to be re-established.

The solution generally adopted by plugin developers has been to abandon any attempt to use the [tt]luup.io[/tt] module in favour of returning to the LuaSocket library and the functions available there. You still have to overcome the semantic issue of what the particular device protocol requires.

Have you actually tried this on Vera and compared to openLuup? I’m really interested in any observed discrepancies. My expectation is that repeated calls to [tt]luup.io.open()[/tt] will invariably fail. The fact that a reload does not clear the problem, but external intervention with the device does, indicates that this is not perhaps an entirely openLuup issue?

Not an easy one.


Edit: I should also note that I abandoned (early on) trying to emulate Vera exactly in the I/O implementation, witness the User Guide caveat that’s been there from day one:

Doesn't support the or action tags in service files, but does support the device-level tag.

OK, read through those very informative posts. And will probably read through again as there were some insights in the comms protocol that might be at work here. In the meantime, what do you make of the following. These are sequential log entries that show the socket not being released–I’m assuming this as the same socket id is used. Openluup does have an explicit close within the open function (I bracketed the close with log calls to get a sense of what’s happening as well as creating a return variable for the close.)

_log("Now closing the socket for ".. tostring(device).. " ".. tostring(sock)) local tempClose = sock: close () -- close our end _log("should now be closed for ".. tostring(device).. " ".. tostring(ip) .." ".. tostring(port) .." " .. tostring(tempClose)) scheduler.socket_unwatch (sock) _log ("socket connection closed " .. tostring (sock))

Here are the relevant log entries:

[code]2019-01-22 20:18:53.373 luup.io.open:: connecting to 10.17.2.30:4025, using crlf protocol tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:18:53.373 openLuup.io:: connect OK

2019-01-22 20:19:03.143 luup.io.incoming:: bytes received: 6, status: OK tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:19:03.143 luup_log:7: (EVL3VistaAlarmPanel::processLogin) EVL3 Interface requesting Login.
2019-01-22 20:19:03.143 luup.variable_set:: 7.urn:micasaverde-com:serviceId:EVL3VistaAlarmPanel1.Plugin_Status was: Startup completed. now: Start Login #hooks:0
2019-01-22 20:19:03.143 luup.io.write:: message length: 6, bytes sent: 6, status: OK tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:19:03.144 luup.io.incoming:: bytes received: 10, status: OK tcp{client}: 0xaaab0c36f2a8

2019-01-22 20:19:03.151 openLuup.io:: Now closing the socket for 7 tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:19:03.151 openLuup.io:: should now be closed for 7 10.17.2.30 4025 1
2019-01-22 20:19:03.152 openLuup.io:: socket connection closed tcp{client}: 0xaaab0c36f2a8

and then minutes later after the explicit close, there is an attempt to open the non-existent socket

2019-01-22 20:22:04.556 luup.variable_set:: 7.urn:micasaverde-com:serviceId:EVL3VistaAlarmPanel1.ConnectionDown was: 0 now: 1 #hooks:1
2019-01-22 20:22:09.562 luup.io.write:: error: timeout tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:22:11.565 luup.io.read:: error: timeout tcp{client}: 0xaaab0c36f2a8
2019-01-22 20:22:11.566 luup_log:7: EVL Data for io read while down nil iostatus false lul_device 7 lug_device 7
[/code]

The attempt to open the non-existent socket is part of the code I posted in my opening question.

Is the release and re-acquire of A socket supposed to be handled in the device code?? If so, how to do so using luup.io. I’m asking in the sense that I cannot envision doing anything different if I were to create my own socket connection. I would not proceed differently as far as flags, timeouts etc —from what is now being done internally by openluup.

As always, your insights greatly appreciated.

And here’s an example where the socket close occurs directly after a luup reload and what seems to be a successful data read:

2019-01-22 20:57:32.122 luup.io.incoming:: bytes received: 6, status: OK tcp{client}: 0xaaab0ce3aa18 2019-01-22 20:57:32.122 luup_log:7: (EVL3VistaAlarmPanel::processLogin) EVL3 Interface requesting Login. 2019-01-22 20:57:32.122 luup.variable_set:: 7.urn:micasaverde-com:serviceId:EVL3VistaAlarmPanel1.Plugin_Status was: Startup completed. now: Start Login #hooks:0 2019-01-22 20:57:32.123 luup.io.write:: message length: 6, bytes sent: 6, status: OK tcp{client}: 0xaaab0ce3aa18 2019-01-22 20:57:32.124 luup.io.incoming:: bytes received: 10, status: OK tcp{client}: 0xaaab0ce3aa18 2019-01-22 20:57:32.124 openLuup.http:: GET /data_request?id=lu_status2&output_format=json&DataVersion=219291509&Timeout=60&MinimumDelay=1500&_=1548216570158 HTTP/1.1 tcp{client}: 0xaaab0dad8d28 2019-01-22 20:57:32.125 openLuup.http:: GET /console?page=log HTTP/1.1 tcp{client}: 0xaaab0ddb9858 2019-01-22 20:57:32.125 openLuup.wsapi:: using openLuup/console.lua for console 2019-01-22 20:57:32.125 openLuup.wsapi:: using REQUIRE to load .lua CGI 2019-01-22 20:57:32.143 openLuup.http:: request completed (57475 bytes, 0 chunks, 18 ms) tcp{client}: 0xaaab0ddb9858 2019-01-22 20:57:32.144 openLuup.io:: Now closing the socket for 7 tcp{client}: 0xaaab0ce3aa18 2019-01-22 20:57:32.144 openLuup.io:: should now be closed for 7 10.17.2.30 4025 1 2019-01-22 20:57:32.144 openLuup.io:: socket connection closed tcp{client}: 0xaaab0ce3aa18

Yes. As I said…

Regarding this…

…it would be the expected behaviour if the remote device closed the socket at its end after sending. Given that this whole transaction happens in milliseconds, then I’m pretty sure that’s what it’s doing.

Let me ask again…

…because I’m imagining that you will find the same behaviour there.

Perhaps what you are asking for is a DIFFERENT behaviour from Vera, wherein you can re-open a socket?

Yes, sorry I didn’t answer about a Vera install. Ironically, this very program that is giving me grief is the plugin that made me buy a Veralite in the first place as I wanted integration with my alarm system, and at the time, the EVL plugin was the only thing available. With further irony, it is this plugin that led me to openLuup as when the plugin was loaded on my Vera, the machine would do a luup restart several times a day. Not to mention some oddities with Vera itself about reporting sensor states. So it is impossible to tell what kind of behavior was occurring because the entire machine was unstable. Needless to say, it worked out for the best as openLuup had been a lot of fun and has led me to all kinds of interesting integrations! That being said, I believe there are others using this plugin on Vera (it may work better on the faster machines with more memory.)

To the problem at hand— one thing I’m not clear on is whether the same socket ID (in this case 0xaaab0ddb9858) is assigned to a socket within a given instance of openLuup regardless of whether it has been previously closed. I’m guessing that the ID is a handle to the socket and not a hard identifier of an ip address + port combination as the id changes with a luup reload. So part of the problem may be that a new socket handle needs to be created in the event of a socket close (closed internally by openLuup) that is then followed by a an open attempt.

Per the millsecond timeout, this may also be the result of not supplying a valid login on a retry of the open. That is certainly a code bug that I can fix with a hack so I need to look closer at the retries. The EVL will immediately kick the connection if you try to write to the device while not logged in.

One thing that I did find with testing is that if I increase the value of the IO timeout constants from 5 to 15, the plugin loads more reliably on a luup reload. So maybe a partial fix would be to either allow a timeout value to be passed to the IO.open as an extra parameter, or to expose the timeouts so that they can be set dynamically in the luup variables. The first approach would allow more customization per plugin…

Ah.

To the problem at hand--- one thing I'm not clear on is whether the same socket ID (in this case 0xaaab0ddb9858) is assigned to a socket within a given instance of openLuup regardless of whether it has been previously closed. I'm guessing that the ID is a handle to the socket and not a hard identifier of an ip address + port combination as the id changes with a luup reload. So part of the problem may be that a new socket handle needs to be created in the event of a socket close (closed internally by openLuup) that is then followed by a an open attempt.

Yes, this is exactly right. The whole implementation was based on my understanding of Vera’s behaviour, that a single socket is assigned to the device on startup (if an IP is given) or on io.open() and is not subsequently changed. This is what I meant when I said that there is no luup.io.close(). Having come to a point where the socket has timed out or been closed from the remote end, there IS no routine in the defined API to close the socket and allow another call to luup.io.open() and without doing that explicitly, there no way to handle any initial login/handshake that a new TCP connection may require.

Having said that, since the Vera implementation is so poor, and therefore the openLuup copy equally so, I’m not beyond changing the way it works for the better. I just never got my head around HOW that would exactly work. This may indeed be the very test case that I was looking for.

So maybe a partial fix would be to either allow a timeout value to be passed to the IO.open as an extra parameter, or to expose the timeouts so that they can be set dynamically in the luup variables. The first approach would allow more customization per plugin.....

Yes, possibly. I’m mostly against adding extra parameters to the Luup API (in case they add ones in the future) but, on the other hand, I have done that in extremis a couple of times.

I’m willing to take a look at this if you’re able to test out some of my attempts.

Yes, I have a secondary linux machine that I use for testing so there’s no problem with running untested openLuup versions.

Hi,

I am currently thinking of testing the waters with luup.io and the incoming routine for a websocket connection rather then how I done it now with a socket and a one second call_delay checking for waiting data using sock.select. The remote side will close the connection after one minute idle time, so I can test what works. Last time I tried it seems that the raw protocol is not 100% raw causing some problems on Vera at least. Also using incoming for all incoming data proved to be sooo slow it was useless. So I want to try to use luup.io.read after receiving the first byte (or two). Testing a new luup.io.open is needed I would make depended on the result of luup.io.is_connected.

Once I have results for both Vera and openLuup I’ll let you know.

Cheers Rene

Excellent! Many thanks for the offer.

I’m following this conversation as well and working in the background on making WebSocket work with the Emby plugin, and a non-WebSocket continuous feed for Yeelight. I had both working with direct socket calls, and polling the socket, same as reneboer. Now I’m converting Emby to and luup.io, to see if I can get something more like async behavior.

One thing I’ve noticed right away that does not work: if you have a child device, and handleChildren=1, and the children have their own sockets for talking to different endpoints (so you call luup.io.open with the child device number), you can luup.io.write() and luup.io…read() no problem, but I have not been successful getting incoming data routed through the parent implementation’s block to the child–I can see the data coming in (log level 52), but the incoming block is never called. If I make all the luup.io calls using the root device number, the incoming block starts working, but that’s not a solution because it seems only one socket can be bound to a device in the luup.io model, so the parent can’t maintain sockets for more than one child. I would call this a (Vera) bug (perhaps incomplete implementation), because everything else on children seems to work, including the use of the and tags in the device files for the child types.

I’ve got to run off to do some errands this afternoon, but when I return I’m going to see if blocks on jobs for child devices work. That could be a workaround.

Not surprised it doesn’t work for the child devices… no one ever told me that it should! Are you sure that’s the way it works on Vera? Where. I GHG that be documented?

I lost the will to live trying to reverse engineer the Vera I/O, so I really stopped at the basics, but even then got frustrated by the lack of accurate documentation. I ended up doing a fair amount of experimentation.

With your help, though, I?m happy to try and improve.!

No, that’s not openLuup. Sorry if I wasn’t clear about that. It’s Vera that’s broken. Vera can’t route for child devices through the top-level . That’s a real pain in the backside. Everything else for the child works–they can have different protocols and ports, etc. All fine. It’s just the routing that fails. So, for a solo device like an alarm panel, where there’s only one endpoint to talk to, that’s no problem, the parent can do all of the talking and everything works. But if you need a dozen sockets because you have a dozen children connected to a dozen endpoints… out of luck. The only thing that might save it is if the device reply includes some identifying information that would let you figure out which endpoint the message came from and dispatch it to the correct device, but for Yeelights, for example, no such luck. If Vera had even included the IP address of the peer in the incoming callback… hmmm… I wonder if that’s hiding in there somewhere… time to go dump _G in the callback and see what it’s got…

I lost the will to live trying to reverse engineer the Vera I/O, so I really stopped at the basics, but even then got frustrated by the lack of accurate documentation. I ended up doing a fair amount of experimentation.

;D ;D ;D ;D ;D

It’s not good. Especially with the above.

My bigger-scope prediction is that the bulk of IoT devices are going to be WiFi, and Z-Wave and its ilk will slowly be relegated to a very specific subset of devices. Time to market and cost will be drivers away from protocols like Z-Wave, not to mention the ubiquity of WiFi. So, for that to be at all sane, Vera needs to get its act together on sockets. LuaSocket under Luup doesn’t cut it, and luup.io isn’t showing well either. This all needs fresh eyes on it.

So, good news for me, but probably not for you, @akbooer

If, instead of trying to use the “global” incoming, I use in a job action that runs on the child, the received data is dispatched correctly (on Vera) to the job (and therefore the child).

<action> <serviceId>urn:toggledbits-com:serviceId:EmbyServer1</serviceId> <name>Update</name> <job> return 5,30 -- waiting for callback, 30 second timeout </job> <incoming> EmbyPlugin.handleIncoming( lul_data, lul_device ) return 5,30 -- waiting for callback, 30-second timeout </incoming> </action>

This is for a luup.io.open() issued on from the parent with the child device number?

Yes, and same for IP assigned as attribute to the child device.

Hi,

Got some testing done and basically I get the following on the intercept function. It basically seems to work as documented on Vera. That is:

  1. call intercept
  2. call write
  3. call read

What does not make a difference is to call intercept on just receiving. In my example I read the WebSocket protocol that is a few header bytes including the payload length and then that payload. So I tried to have the incoming handler process the header, and then use luup.io.read to read the payload. That basically does not work, and adding intercept makes no difference. I have to use the incoming handler to process the payload bytes.

The luup.io.is_connected is no good indication for an open connection. After the peer closes the socket from their end, is_connected still returns true. In my case the peer starts replying with an HTTP 500 message, so I could use that to restore the connection.

Next test is openLuup.

Cheers Rene

[quote=“reneboer, post:17, topic:200471”]Hi,

Got some testing done and basically I get the following on the intercept function. It basically seems to work as documented on Vera. That is:

  1. call intercept
  2. call write
  3. call read

What does not make a difference is to call intercept on just receiving. In my example I read the WebSocket protocol that is a few header bytes including the payload length and then that payload. So I tried to have the incoming handler process the header, and then use luup.io.read to read the payload. That basically does not work, and adding intercept makes no difference. I have to use the incoming handler to process the payload bytes.

The luup.io.is_connected is no good indication for an open connection. After the peer closes the socket from their end, is_connected still returns true. In my case the peer starts replying with an HTTP 500 message, so I could use that to restore the connection.

Next test is openLuup.

Cheers Rene[/quote]

Rene, my testing (also doing WebSocket) pretty much concurs. Trying identical things (using io.read() to get the payload data once I have the length) came to similar conclusions: switching to luup.io.read() for the payload yielded no benefit performance wise. I was able to get it to work using intercept() first, and noted that if I didn’t call intercept() first, a deadlock often resulted. IMPORTANT: this is specifically from within the handler. In other code, skipping the intercept() call does not result in a io.read() hang, you just run the risk of missing data.

That said, I have successfully used luup.io.read(), intercept() and together, like this (in “raw” mode) to handle getting the connection upgraded to WebSocket:

  1. Call luup.io.intercept();
  2. Use luup.io.write() to send the HTTP request with headers to upgrade to Websocket;
  3. Use luup.io.read() to gather response data, scanning for end of headers (consecutive cr/lf/crlf);
  4. Celebrate successful conversion to WebSocket;
  5. Send a WebSocket request (no-op is fine); this is necessary to reset the effect of the previous intercept();
  6. Use from that point forward.

I imagine this is similar to what you are doing.

But have you noticed the performance? Really, really poor. Appalling, actually. I removed all processing and just counted bytes and returned immediately from my incoming handler, trying to see how fast it would pass a 32KB JSON from the WebSocket endpoint, and the best I could do (on my Vera3) was just shy of 1800 bytes per second (18 seconds for the 32KB response–awful, just terrible). Receiving one byte at a time is just way too much overhead. I’ve already sent a note to Sorin requesting a “rawblock” protocol to return the entirety of a larger socket.receive() (e.g. 1024 bytes, connection MTU, or something even larger). But of course, that will be far off, if they agree to do it at all. For my application, as it is today it’s a non-starter, as it takes 15-20 seconds to receive the average payload, but updates come every 5 seconds or less on an active endpoint–it can’t keep up, not even close.

@akbooer, it would be great for openLuup if we could set a switch somewhere (openluup.longrawread = true?) to have the raw protocol return sock:recv(1024) or larger as described. At least then, performance on openLuup would shine over Vera (again).

Note: I also can’t stress enough that, as a best practice, string concatenation should not be used to reassemble the packet in an incoming handler. String concats in Lua are slowwww, and get worse as the string grows. It’s much faster to insert the data into a table, and then concat the table to a string when you have the entire payload.

This, of course, is why the protocol is good, receiving a whole line at a time. Unfortunately, you don’t have that option for a binary stream.

@akbooer, it would be great for openLuup if we could set a switch somewhere (openluup.longrawread = true?) to have the raw protocol return sock:recv(1024) or larger as described. At least then, performance on openLuup would shine over Vera (again).

So what would terminate the read in the event of a shorter stream?

Note: I also can't stress enough that, as a best practice, string concatenation should not be used to reassemble the packet in an incoming handler. String concats in Lua are slowwww, and get worse as the string grows. It's [b]much[/b] faster to insert the data into a table, and then concat the table to a string when you have the entire payload.

Amen to that!

My preference is to repackage some of the asynchronous I/O which is implemented in openLuup for servers, which quite happily handle incoming callbacks to jobs. At the moment, this only works for TCP connections initiated by an incoming server socket, but there’s no reason they couldn’t apply to a specifically opened socket. Currently, also, the callbacks are to a regular function, rather than code in an block since this is much easier to set up. The called routine receives a proxy socket as an argument to do with what it will. Special handling ensures that the socket is correctly closed in the event of an error or a timeout and removed from the scheduler’s socket watch list.

I think this would be far less frustrating rather than trying to invent a way that Vera/MiOS I/O should or might work in the future. But you would, of course, be stuck with something that is openLuup-specific.

So what would terminate the read in the event of a shorter stream?[/quote]

A non-blocking, no-timeout socket:receive( 1024 ) will return up to 1024 bytes, shorter if less data is pending on the socket:

sock:settimeout(0) local data, err, partial = sock:receive( 1024 ) if data == nil then -- if partial not empty ("") dispatch to handler -- if err ~= "timeout" then handle error; for timeout, go do something else and come back later? else -- dispatch data to incoming handler end

EDIT: Fixed code. Sorry, rushing around today… third return parameter to receive is partial data if the timeout occurs, which it will almost immediately. Dispatch that to the incoming handler.