TRH hiccups

From the logs of the check_trh process on flux I see these entries since it was started on April 9. For some reason the higher TRHs had some issues yesterday.

TRH problems

Times in MDT:
 
fgrep cycling /var/log/messages*
Apr 18 18:49:09 flux check_trh.sh: 300m temperature is 137.88 . Power cycling port 5
Apr 23 13:03:58 flux check_trh.sh: 300m temperature is 174.1 . Power cycling port 5
Apr 23 13:06:58 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5
Apr 23 13:08:18 flux check_trh.sh: 200m temperature is 181.61 . Power cycling port 5
Apr 23 13:08:38 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5
Apr 23 13:08:58 flux check_trh.sh: 200m temperature is 181.53 . Power cycling port 5
Apr 23 13:09:48 flux check_trh.sh: 300m temperature is 174.06 . Power cycling port 5
Apr 23 13:16:48 flux check_trh.sh: 200m temperature is 179.15 . Power cycling port 5
Apr 23 13:19:18 flux check_trh.sh: 250m temperature is 173.33 . Power cycling port 5
Apr 23 13:30:38 flux check_trh.sh: 200m temperature is 177.04 . Power cycling port 5
Apr 23 13:48:38 flux check_trh.sh: 250m temperature is 171.63 . Power cycling port 5
Apr 23 13:50:48 flux check_trh.sh: 250m temperature is 173.08 . Power cycling port 5

Yesterday (April 23) I reworked things so that the check script is run on each DSM, including the bao station. The only entries after that are from 300m. Subtracting 6 hours from the times, these are at 13:27-13:29 MDT

Times in UTC
 
ssh 300m fgrep cycling /var/log/isfs/dsm.log

Apr 23 19:27:33 300m root: temperature is -62.52 . Power cycling port 5
Apr 23 19:28:25 300m root: temperature is -62.52 . Power cycling port 5
Apr 23 19:29:41 300m root: temperature is -62.52 . Power cycling port 5

For example, here is the hiccup from 200m at 19:30 UTC:

200m

data_dump -i 4,20 -A 200m_20150423_160000.dat | more
...
2015 04 23 19:30:17.3598   1.001      37 TRH30 15.13 27.28 34 0 1377 56 107\r\n
2015 04 23 19:30:18.3691   1.009      37 TRH30 15.09 27.28 33 0 1376 56 105\r\n
2015 04 23 19:30:19.3691       1      37 TRH30 15.13 27.28 34 0 1377 56 108\r\n
2015 04 23 19:30:20.3692       1      37 TRH30 15.09 27.28 33 0 1376 56 103\r\n
2015 04 23 19:30:21.3790    1.01      37 TRH30 15.09 27.28 34 0 1376 56 108\r\n
2015 04 23 19:30:22.6191    1.24      40 TRH30 177.00 260.02 36 0 5510 886 112\r\n
2015 04 23 19:30:23.6290    1.01      40 TRH30 177.00 260.18 35 0 5510 885 109\r\n
2015 04 23 19:30:24.6290       1      40 TRH30 177.04 260.21 34 0 5511 885 106\r\n
2015 04 23 19:30:25.6398   1.011      40 TRH30 177.08 260.40 33 0 5512 884 105\r\n

Notice the delta-Ts after the datetime. I've looked at a few of these, and I think that there is always a larger deltat-T (in this case 1.24 sec instead of 1.0 ) at the time of the initial bad data, in case that might help in debugging.

Blog

TRH hiccups