From the logs of the check_trh process on flux I see these entries since it was started on April 9. For some reason the higher TRHs had some issues yesterday.
Times in MDT: fgrep cycling /var/log/messages* Apr 18 18:49:09 flux check_trh.sh: 300m temperature is 137.88 . Power cycling port 5 Apr 23 13:03:58 flux check_trh.sh: 300m temperature is 174.1 . Power cycling port 5 Apr 23 13:06:58 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5 Apr 23 13:08:18 flux check_trh.sh: 200m temperature is 181.61 . Power cycling port 5 Apr 23 13:08:38 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5 Apr 23 13:08:58 flux check_trh.sh: 200m temperature is 181.53 . Power cycling port 5 Apr 23 13:09:48 flux check_trh.sh: 300m temperature is 174.06 . Power cycling port 5 Apr 23 13:16:48 flux check_trh.sh: 200m temperature is 179.15 . Power cycling port 5 Apr 23 13:19:18 flux check_trh.sh: 250m temperature is 173.33 . Power cycling port 5 Apr 23 13:30:38 flux check_trh.sh: 200m temperature is 177.04 . Power cycling port 5 Apr 23 13:48:38 flux check_trh.sh: 250m temperature is 171.63 . Power cycling port 5 Apr 23 13:50:48 flux check_trh.sh: 250m temperature is 173.08 . Power cycling port 5
Yesterday (April 23) I reworked things so that the check script is run on each DSM, including the bao station. The only entries after that are from 300m. Subtracting 6 hours from the times, these are at 13:27-13:29 MDT
Times in UTC ssh 300m fgrep cycling /var/log/isfs/dsm.log Apr 23 19:27:33 300m root: temperature is -62.52 . Power cycling port 5 Apr 23 19:28:25 300m root: temperature is -62.52 . Power cycling port 5 Apr 23 19:29:41 300m root: temperature is -62.52 . Power cycling port 5
For example, here is the hiccup from 200m at 19:30 UTC:
data_dump -i 4,20 -A 200m_20150423_160000.dat | more ... 2015 04 23 19:30:17.3598 1.001 37 TRH30 15.13 27.28 34 0 1377 56 107\r\n 2015 04 23 19:30:18.3691 1.009 37 TRH30 15.09 27.28 33 0 1376 56 105\r\n 2015 04 23 19:30:19.3691 1 37 TRH30 15.13 27.28 34 0 1377 56 108\r\n 2015 04 23 19:30:20.3692 1 37 TRH30 15.09 27.28 33 0 1376 56 103\r\n 2015 04 23 19:30:21.3790 1.01 37 TRH30 15.09 27.28 34 0 1376 56 108\r\n 2015 04 23 19:30:22.6191 1.24 40 TRH30 177.00 260.02 36 0 5510 886 112\r\n 2015 04 23 19:30:23.6290 1.01 40 TRH30 177.00 260.18 35 0 5510 885 109\r\n 2015 04 23 19:30:24.6290 1 40 TRH30 177.04 260.21 34 0 5511 885 106\r\n 2015 04 23 19:30:25.6398 1.011 40 TRH30 177.08 260.40 33 0 5512 884 105\r\n
Notice the delta-Ts after the datetime. I've looked at a few of these, and I think that there is always a larger deltat-T (in this case 1.24 sec instead of 1.0 ) at the time of the initial bad data, in case that might help in debugging.