Unraveling the mystery of TimeJump reloads

TL;DR: Vera (Plus, Edge, and likely Secure) has a broken init script that actually sets a worse default time at system boot than the system itself chooses, and this causes TimeJump reloads later. If you system does not have /dev/rtc0, then run the commands below to let OpenWrt’s default clock adjustment control until ntpclient can contact time servers.

rm -f /etc/rc.d/S115-mios_fix_time*
chmod 000 /etc/init.d/mios_fix_time*

Detail:

As part of my work on decoupling Vera from its cloud services, I discovered an interesting bug that I believe is the cause of many TimeJump reloads.

When Vera boots (either cold boot from power off or warm boot from /sbin/reboot), the sysvinit script S115-mios_fix_time looks to see if the system has an RTC (Plus systems do not – see footnotes below), and if not, it looks for the file /etc/cmh/datetime. If that file exists, it sets the system clock to the last-modified time of that file plus a small offset. If it doesn’t exist, it sets the system clock to the default of midnight Jan 1, 2000.

Often, LuaUPnP will start up faster than ntpclient can sync the system clock to the cloud time server pool. This is particularly true after a power failure, where it is often the case that Vera will boot up faster than your Internet reconnects. As a result, some time will elapse before ntpclient can sync time. When it does, the time jumps dramatically, and this causes LuaUPnP to reload again.

The problem on at least Plus and Edge systems (I don’t have a Secure to test with, but Plus and Secure are generally almost identical in builds), is that it appears that prior revisions of firmware would touch a file /tmp/.time_synced in a specially hacked build of ntpclient, which is the file that mios_fix_time looks for to know it needs to create /etc/cmh/datetime. But it appears that recent firmware builds have the stock ntpclient without the hack, so /tmp/.time_synced is never created, and therefore /etc/cmh/datetime is never created, and your system always gets the garbage Jan 1 2020 clock set at boot.

As it turns out, OpenWrt has its own way of setting a reasonable default clock when there’s no RTC on the system. Its sysvinit script scans the entire /etc directory looking for the latest modified file, and then sets the system clock to that as a default during boot. Because the Vera writes userdata into /etc/cmh/user_data.json.lzo every six minutes (i.e. frequently modified), that means OpenWrt’s default would be far more accurate than Vera’s (a few minutes vs >20 years). It would also provide a reasonable default when your Vera crashes or suddenly loses power, as well, which Vera’s attempted solution does not (Vera’s solution requires an orderly shutdown to work – how often does that really happen?). The problem is that Vera’s init script runs after and undoes this better estimate.

The workaround for this is dead easy: SSH into your Vera and run the commands shown at the top of this post in the “TL;DR” section. This will disable Vera’s poor default from being set at boot, leaving OpenWrt’s more reasonable default in place, and that should eliminate at least those TimeJumps.

The other possibility for TimeJumps is that a pool time server is reporting a bogus time, and ntpclient has either shifted its queries to it or away from it (since it’s a pool, it’s dynamic). It’s harder to get around this; occasionally these well-known services just go wrong. The default servers are well-known Internet defaults, not Vera-specific cloud servers. The best way to eliminate that possibility, if you feel you must, is to create your own Stratum 1 time server (easier than it sounds – a RPi with a GPS hat can do it) and reconfigure your Vera to use it exclusively for time server.

Notes:

  • To see if your system has an RTC, SSH in and see if the file /dev/rtc0 exists; it does not exist on systems without a hardware real time clock.
  • It should be clear that this is a lesson in why you don’t hack system packages in unobvious ways to do weird stuff – some future person building may not know about it and it gets lost in the shuffle.
  • Because Veras have a higher probability of rebooting during Internet outages as they try to contact the mother ship, the chance of it coming up with the bad default clock are quite high – even though power has not failed or you haven’t reboot the Vera, it has been shown to do it to itself during Internet outages, and this then leaves you with a broken clock until Internet service is restored (and that causes a TimeJump reload on unmodified systems). These random reboots during Internet outages are pretty much eliminated by decoupling your system from the cloud services.
  • If your system boots with a bad clock, your time/date-based automations will go nuts.
11 Likes

Would you expect a firmware update to restore the /etc/rc.d/S115-mios_fix_time* file?

I have a Secure and can confirm it behaves the same as a Plus.

Thank you for the info and fix!

1 Like

@rigpapa: thanks again for this fix ! I will for sure try it myself. My Vera Plus Edge and Plus production units show more frequent luup reloads lately.
Question how can I recognize in the logs if TimeJump reloads are the cause in my case ?

1 Like

They have TimeJumps in the text.

I’ve decoupled my edge since 30+ days and it’s going strong.

1 Like

I assume you mean /dev/rtc0 and not /etc/rtc0 (at least that is what is in my /etc/rc.d/S115-mios_fix_time

1 Like

Oops! Yes, thanks, I’ll fix that above, too.

1 Like

Yes, it will. I keep a list of things that I change both on the Vera and on my NAS with the backups of my systems. Fortunately this is one of the easier changes to make.

1 Like

I can’t say when it happens, but my /etc/rc.d/S115-mios_fix_time* file recreates itself.
No fw update or restore from backup done. I’ve deleted it 3 times in 5 days now.

What firmware version are you running?

Latest public 1.7.5187 (7.31)

OK. I happened to test that on a decoupled system; it looks like something that runs when the system isn’t decoupled restores it. Easy enough. We’ve got lots of arrows in this quiver. Let’s try this:

rm -f /etc/rc.d/S115-mios_fix_time*
chmod 000 /etc/init.d/mios_fix_time*

So far so good on my “production” Plus (not decoupled). Thanks for bringing this to my attention.

Edit: for clarity, the symlink (in /etc/rc.d) may/will come back, but the actual script (in /etc/init) has now lost its execute permissions and will not run. I have verified this by modifying the script to expose when it runs, and it doesn’t run in this state. It just remains to be seen if something else doesn’t come around and reset the permissions. I’ll be watching.

1 Like

Ok, I’ll try that. Thank you!

1 Like

Thanks Rig, I have never noticed a Timejump restart in the logs, but just in case I have applied this fix.
I used the Altui OS Command page to submit the commmands, which saved running up and SSH session.

Cheers
Octo

1 Like

Just as a note, I found the absolute source of this reincarnation. The chmod addition will work fine to bypass the effect of recreating the symlink, no problem, so I think nothing more needs to be done right now. Just confirming that I’ve identified the source.

Great! The symlink is back, but as you say, it can’t run the script so we reached our goal anyway.

Best Home Automation shopping experience. Shop at getvera!

© 2020 Vera Control Ltd., All Rights Reserved. Terms of Use | Privacy Policy | Forum Rules