Continual Luup restarts

#1

Hi folks
Suddenly Vera has taken it on herself to restart every 15 minutes to an hour. Not the end of the world as things are mostly still working, but slightly worrying especially as I’m about to go on a business trip

Possibly related is that I re-cabled some of my LAN yesterday. Since then the Internet LED remains stubbornly unlit despite the very clear fact that it’s online and fine:
64 bytes from bbc-vip115.telhc.bbc.co.uk (212.58.244.70): icmp_req=2 ttl=54 time=19.9 ms

Any ideas where I can look to ascertain the instability?

Cheers

C

#2

So interestingly I restarted the Network Monitor and all is well again…

C

#3

Hmm it is likely because you have a check network monitor script running which is causing these reboots. Disabling that script is probably also needed. I thought I included all of this in my veramods script it’s been so long, my memory might be failing. Can you check if it is in your crontab?

#4

root@MiOS_50103066:~# crontab -l
*/1 * * * * /usr/bin/Rotate_Logs.sh #Rotate_Logs
#0 0 * * * /mios/usr/bin/CheckForMissingFiles.sh #CheckForMissingFiles
13 04 * * * /usr/bin/mios-service-sync_ergy.sh #Sync_Ergy

Unless it’s under a different user.

I edited the Network monitor script per your suggestion earlier:

#!/bin/sh

SVN : $Id: Start_NetworkMonitor.sh 11683 2014-06-20 09:19:33Z florin $

#Copyright © 2008 Mi Casa Verde, Inc., a Nevada Corporation

www.micasaverde.com

1 - 702 - 4879770 / 866 - 966 - casa

#This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License.
#This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
#without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
exit 0
log “Terminating Start_NetworkMonitor.sh”
log “===END===”

So that was definitely not running, but check_internet is in /etc/init.d so calls that at init time.

I guess I could edit or remove check_internet.

C

#5

Hmm, so was running fine until a Luup reload from editing a scene

That obviously inited check_internet and restarted Network monitor
Issues /etc/init.d/check_internet stop
Killed the Network monitor but didn’t stop it from re-loading which restarted it.
There appears to be an error in the check_internet script:

Here:
# Temporary code: restart dropbear to force it to listen on all interfaces.
log2file “Delete the interface from the dropbear config”
uci del dropbear.@dropbear[-1].Interface 2>>$log_file
uci commit
sync

Gives this in the check internet log:
2019-05-06_15:49:13 -[12042]- Delete the interface from the dropbear config

uci: Entry not found

Dropbear restarts and Network_Monitor logs this:

05/06/19 16:06:36.075 GetIpAddress: LAN NIC NOT FOUND in script response: NIC_NOT_FOUND from:
br-wan Link encap:Ethernet HWaddr B4:A5:EF:E7:46:61
inet addr:192.168.70.6 Bcast:192.168.70.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17863 errors:0 dropped:715 overruns:0 frame:0
TX packets:12661 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3479099 (3.3 MiB) TX bytes:25025412 (23.8 MiB)

Which is dumb. It also writes a line on /etc/hosts to give:
NIC_NOT_FOUND local.mios.com local.my.mios.com

Which is also dumb

Any thoughts?

C

#6

Let me look into this. My veramods script is designed to take the vera offline as much as possible but since you are using geofence, I don’t think that’s what you want to do. I may need to create a script to take down the network related luup reloads without disconnecting from the mios servers which as you noted is overkill and not the smartest thing to do since the luup reload or very reboot are unlikely to change the network conditions. @Sorin told me this was getting fixed on the new firmware which is being redone from scratch.

#7

Cheers!

No idea what it thinks it’s doing :frowning:
C

#8

Could you tell me the output of

ps aux

I have looked through what I did in my veramods and I actually killed both the check_internet script in the /etc/init.d folder and the Start_NetworkMonitor.sh in /usr/bin by adding an “exit 0” line at the beginning of these scripts. I also did the same to the CheckForMissingFiles.sh

#9

Yes
Note I started
/bin/sh /usr/bin/Start_NetworkMonitor.sh.orig

Manually as it seemed to resolve the issue last time. I see there are two dropbears running…

C

root@MiOS_50103066:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.2 1424 664 ? S 16:39 0:03 /sbin/procd
root 2 0.0 0.0 0 0 ? S 16:39 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 16:39 0:00 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 16:39 0:00 [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S 16:39 0:00 [migration/0]
root 8 0.0 0.0 0 0 ? S 16:39 0:00 [rcu_bh]
root 9 0.0 0.0 0 0 ? S 16:39 0:00 [rcu_sched]
root 10 0.0 0.0 0 0 ? S 16:39 0:00 [migration/1]
root 11 0.0 0.0 0 0 ? S 16:39 0:00 [ksoftirqd/1]
root 13 0.0 0.0 0 0 ? S< 16:39 0:00 [kworker/1:0H]
root 14 0.0 0.0 0 0 ? S< 16:39 0:00 [khelper]
root 89 0.0 0.0 0 0 ? S< 16:39 0:00 [writeback]
root 91 0.0 0.0 0 0 ? S< 16:39 0:00 [bioset]
root 93 0.0 0.0 0 0 ? S< 16:39 0:00 [kblockd]
root 101 0.0 0.0 0 0 ? S 16:39 0:00 [khubd]
root 115 0.0 0.0 0 0 ? S 16:39 0:01 [kworker/1:1]
root 137 0.0 0.0 0 0 ? S 16:39 0:00 [kswapd0]
root 138 0.0 0.0 0 0 ? S 16:39 0:00 [fsnotify_mark]
root 139 0.0 0.0 0 0 ? S< 16:39 0:00 [crypto]
root 214 0.0 0.0 0 0 ? S 16:39 0:01 [kworker/0:1]
root 227 0.0 0.0 0 0 ? S< 16:39 0:00 [krfcommd]
root 231 0.0 0.0 0 0 ? S< 16:39 0:00 [deferwq]
root 245 0.0 0.0 0 0 ? S 16:39 0:00 [jfsIO]
root 246 0.0 0.0 0 0 ? S 16:39 0:00 [jfsCommit]
root 247 0.0 0.0 0 0 ? S 16:39 0:00 [jfsCommit]
root 248 0.0 0.0 0 0 ? S 16:39 0:00 [jfsSync]
root 260 0.0 0.0 0 0 ? S 16:39 0:00 [scsi_eh_0]
root 261 0.0 0.0 0 0 ? S 16:39 0:00 [usb-storage]
root 296 0.0 0.0 0 0 ? S 16:39 0:00 [kworker/u4:2]
root 298 0.0 0.0 0 0 ? S< 16:39 0:00 [kworker/1:1H]
root 320 0.0 0.0 0 0 ? S 16:39 0:00 [kworker/1:2]
root 321 0.0 0.0 0 0 ? S 16:39 0:00 [kworker/0:2]
root 322 0.0 0.0 0 0 ? S 16:39 0:00 [jbd2/sda2-8]
root 323 0.0 0.0 0 0 ? S< 16:39 0:00 [ext4-dio-unwrit]
root 483 0.0 0.0 888 88 ? S 16:39 0:00 /sbin/ubusd
root 485 0.0 0.0 768 76 ttyS1 Ss+ 16:39 0:00 /sbin/askfirst ttyS1 /bin/ash --login
root 543 0.0 0.0 0 0 ? S 16:39 0:00 [jbd2/sda1-8]
root 544 0.0 0.0 0 0 ? S< 16:39 0:00 [ext4-dio-unwrit]
root 1878 0.4 0.0 0 0 ? SN 16:39 0:13 [jffs2_gcd_mtd10]
root 2182 0.0 0.1 1048 356 ? S 16:40 0:00 /sbin/logd -S 16
root 2190 0.0 0.0 3560 188 ? S 16:40 0:00 /usr/bin/btn_g450 -c /etc/config/button_g450.ini -d
root 2191 0.0 0.1 3560 416 ? S 16:40 0:00 /usr/bin/btn_g450 -c /etc/config/button_g450.ini -d
root 2192 0.0 0.0 3560 188 ? S 16:40 0:00 /usr/bin/btn_g450 -c /etc/config/button_g450.ini -d
root 2248 0.0 0.2 1516 652 ? S 16:40 0:00 /sbin/netifd
root 2697 0.0 0.1 1800 372 ? S 16:40 0:00 /usr/sbin/crond -f -c /etc/crontabs -l 5
root 2739 0.2 0.7 5096 1932 ? S 16:40 0:07 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
root 2745 0.0 0.1 1740 356 ? Ss 16:40 0:00 /usr/sbin/dbus-daemon --system
root 2816 0.0 0.0 0 0 ? S< 16:40 0:00 [kworker/u5:0]
root 2817 0.0 0.0 0 0 ? S< 16:40 0:00 [hci0]
root 2818 0.0 0.0 0 0 ? S< 16:40 0:00 [hci0]
root 2820 0.0 0.0 0 0 ? S< 16:40 0:00 [kworker/u5:1]
root 2870 0.0 0.1 1792 364 ? S 16:40 0:00 /usr/sbin/ntpd -n -p 0.openwrt.pool.ntp.org 1.openwrt.pool.ntp.org 2.openwrt.pool.ntp.org 3.openwrt.pool.ntp.org
root 2876 0.0 0.1 1804 508 ? S 16:40 0:00 /bin/sh /usr/bin/StreamingTunnelsManager.sh
root 2909 0.0 0.2 1804 524 ? S 16:40 0:00 /bin/sh /usr/bin/Start_LuaUPnP.sh
root 3100 0.0 0.1 1792 424 ? S 16:41 0:00 /bin/sh /usr/bin/Start_serproxy.sh
root 3107 0.0 0.1 1104 296 ? Ss 16:41 0:00 /usr/sbin/ntpclient -c 6 -i 600 -s -l -D -p 123 -h 0.openwrt.pool.ntp.org
nobody 3178 0.0 0.1 960 332 ? S 16:41 0:00 /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf -k
root 3529 0.0 0.2 1972 600 ? S 16:41 0:00 /bin/sh /usr/bin/cmh-ra-daemon.sh 127.0.0.1 80 vera-eu-oem-relay31.mios.com 24017 00000000AC30005CD1A70D715BE19C452F15EF
root 3559 0.2 0.2 1280 616 ? S 16:41 0:07 ssh -p 232 -T -y -i /etc/cmh-ra/keys/cmh-ra-key.priv -R 24017:127.0.0.1:80 cmh-ra@vera-eu-oem-relay31.mios.com
root 4625 3.8 8.7 116148 22416 ? Sl 16:41 2:07 /usr/bin/LuaUPnP
root 4829 0.0 0.1 1024 344 ? S 16:41 0:00 /usr/bin/serproxy 127.0.0.1 127.0.0.1 50103066
root 5060 0.0 0.1 1796 456 ? S 16:44 0:00 /bin/sh /usr/bin/Start_NetworkMonitor.sh.orig
root 5105 0.0 1.4 8276 3732 ? Sl 16:44 0:02 /usr/bin/NetworkMonitor
root 5180 0.0 0.0 0 0 ? S< 16:44 0:00 [kworker/0:1H]
root 5587 0.0 0.0 0 0 ? S 16:44 0:00 [kworker/u4:1]
root 5615 0.0 0.1 1148 352 ? S 16:44 0:00 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
root 18601 0.0 0.0 1784 116 ? S 17:36 0:00 sleep 60
root 18739 0.0 0.0 1784 116 ? S 17:36 0:00 sleep 58
root 18740 15.0 0.2 1216 540 ? Rs 17:36 0:01 /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22 -K 300
root 18741 0.0 0.1 1796 364 pts/0 Ss 17:36 0:00 -ash
root 18746 0.0 0.1 1296 332 pts/0 R+ 17:36 0:00 ps aux

#10

Your NetworkMonitor are definitely the problem. Look at my previously edited post.

Attached is a mod just for you :slight_smile:

CatManMods.zip (449.9 KB)

1 Like
#11

I really don’t know what to say! I used to think our car forum was good, but this is another level.

PM en route!
C

1 Like
#12

Hi Catman,

have you upgraded your vera in the past? Are you now running the latest firmware delivered recently?

Can you tell us what you see in the following path:

settings > z-wave settings > HouseID/Node: House

Thanks

#13

As far as I am aware I’m on the latest

HouseID/Node: House: ff19ad5e Node 1 Suc 1

Cheers

C

#14

@tony-park the house id is created by the zwave radio. It is the identifier for your network. It is random and unique to your network.

#15

@rafale77, thanks I’m aware of that however if the node of suc values are wrong, this can cause issues, like one I experienced recently following upgrade to latest firmware, having upgraded from a veralite about 3 years ago, and legacy setup still existing.

When I check my vera now, following factory reset, I now see HouseID/Node: House: f22bedc1 Node 1 Suc 0

however I think my suc was 1 before and that, if I remember correctly, meant that my veraplus wasn’t acting as the primary controller.

#16

This is the best guide I found explaining the three zwave roles: primary, SUC and SIS:

Indeed your vera not being the SUC could be a problem if it is your only controller. I am fairly certain that the firmware upgrade cannot change this because this role is assigned within the zwave network and cannot be changed by the vera firmware (I wish it could) unless you upgraded the zwave firmware in the process. (Ie you upgraded from a ver old firmware which had an older zwave firmware) On my network I use a secondary controller as a tool to assign and control these roles when needed.