Page 1 of 1

Sudden disconnect problem

Posted: Tue Jul 26, 2016 10:42 am
by Mastiff
I have had ONH running without a hitch for a long, long time. But on one of my two systems (the one at home, a smarthome that I'm going to deliver to the new owner next Monday, of course!) I have started getting problems. I use Krambriw's EG plug-in, and suddenly it has started loosing the connection. I have some stuff from the log of ONH here. First I have a lot of these:

Code: Select all

2016.07.26 11:25:44.855 INFO:Connection from 127.0.0.1 (TCPCommandPort 6377, TID22)
2016.07.26 11:25:44.859 INFO:Disconnected from 127.0.0.1 (TCPCommandPort$Session 6378, TID419)
2016.07.26 11:25:44.876 INFO:Connection from 127.0.0.1 (TCPCommandPort 6379, TID22)
2016.07.26 11:25:44.878 INFO:Disconnected from 127.0.0.1 (TCPCommandPort$Session 6380, TID420) 
And sometimes I get this:

Code: Select all

2016.07.26 11:25:20.569 INFO:Connection from 127.0.0.1 (TCPCommandPort 6325, TID22)
2016.07.26 11:25:20.577 WARNING:Failed reading from socket in TCPCommandPort java.net.SocketException: Software caused connection abort: recv failed (TCPCommandPort$Session 6326, TID415)
2016.07.26 11:25:20.580 INFO:Disconnected from 127.0.0.1 (TCPCommandPort$Session 6327, TID415)
2016.07.26 11:25:20.605 FINE:Received from Tellstick: +Wclass:sensor;protocol:fineoffset;data:4AB0D9FF58; (Tellstick 6328, TID24)
2016.07.26 11:25:21.6 FINE:Received from Tellstick: +T (Tellstick 6329, TID24) 
Krambriw thought it looked like a problem he had seen before. Do you have any help to give me? I now have the latest nightly build, but it didn't help. but that doesn't explain that it suddenly started acting up, when I hadn't done anything to the software for months. I have tried with two different network cards for the VM, and that didn't change anything. Also I have rebooted my WiFi router in case it's sending garbage out on the LAN. Wifi isn't used to connect ONH to EG, they are both running on the same machine, so I use 127.0.0.1 (192.168.0.100 doesn't change anything). So far I haven't had another loss of contact but it happens maybe once or twice an hour, so it doesn't have to mean that it's fixed.

Re: Sudden disconnect problem

Posted: Tue Jul 26, 2016 12:03 pm
by Mastiff
Around two hours since the last disconnect, so I'm crossing my fingers. But it would be nice to know if this is a squashable bug in something. :mrgreen:

Edit: Nope, back again. :oops: I should have held my big mouth shut.

Re: Sudden disconnect problem

Posted: Wed Jul 27, 2016 8:52 am
by stefangsbb
From the logs it looks like someone (I assume the EG-Plugin) is connecting to the TCPCommandPort and then for some reason immediately disconnects again.

This does not really look like the kind of problem Walter has had before, that had to do with connections breaking after long time.

I wonder if there could be any reason for the EG-Plugin to spontaneously break the connection? Are there any logs from that side? One theory could be that the EG-Plugin has some mechanism that tries to determine if the connection is ok, and that that mechanism for some reason thinks that the connection is not ok? It could for example be some kind of timing issue where the EG-Plugin expects a fast answer, but the TCPCommandPort fails to respond in time due to slow hardware or other tasks on the machine.

All these logs say is that someone connects and then disconnects again, there is no indication of any error in NetHome. You can try to connect manually to the same CommandPort as the EG-Plugin with telnet and see if it seems to work. Try the "dir"-command and the "subscribe"-command.

Re: Sudden disconnect problem

Posted: Wed Jul 27, 2016 9:02 am
by Mastiff
Hi, Stefan! Oddly enough Walter just came in with an attempted solution, which I will try if I haven't fixed the error (I hav removed a few devices from the network that might have been faulty, just on instinct). And I think that solution is pretty close to what you say here. :mrgreen: I'll let him know about this answer.

Re: Sudden disconnect problem

Posted: Fri Jul 29, 2016 6:06 pm
by krambriw
Hello Both,

I notice that the ONH can enter a state where no more events are sent to subscribers via the TCP port. In ONH log you can see a line similar to below for each event ONH is trying to send.
2015.10.28 05:17:35.685 WARNING:TCPCommandPort could not send event, TCP stream blocked (TCPCommandPort 12889282, TID16)
While in this state, it is still possible to make a new connection to the port and subscribe but events will not appear until a restart of ONH has been performed

- what is causing this state?
- how could ONH release itself and make ports and event subscription working again

Re: Sudden disconnect problem

Posted: Sat Jul 30, 2016 8:20 am
by krambriw
I think this problem/state can be caused by some kind of network disturbance. I have somehow managed to cause this by disconnecting/connecting the network cable to ONH while a client has a subscription active. It is not easy and not 100% reproducible every time. Many times the communication are restored and events are coming in again when the network cable is plugged in. But suddenly it happens

Re: Sudden disconnect problem

Posted: Sun Jul 31, 2016 12:10 pm
by Mastiff
I'm now pretty sure that it's the signal that's sendt to the heaters (so one the each zone every five minutes, in other words around 15 every five minutes) that chokes up ONH in combination with network problems (no idea what kind of problems). I commented out those sends mid day yesterday and haven't had one loss of contact since that. But it would be nice to have the sends of the heaters working as they should before the heater season comes around in a few months. :mrgreen: