It has been pretty obvious from the plots that the sonic diagnostic flag is worse during the time periods when the data are being "rsync"d back to Boulder. This is only for the CSAT3, not the CSAT3A/EC150. When I plot the high-rate data, I even find data gaps – 12s for one period I looked at. Even during non-rsync times, there are continuous non-zero ldiag values. These should be pretty close to zero all the time.
I suspect that the CSAT3s are picking up RF interference from the cell phones. If it hasn't been done already, we should play with grounding the heads and boxes (after checking to see if the mechanical connections already do this). Andy knows the drill all too well.
If this persists, it isn't a huge problem, since bad samples should be rejected by software, but it would be nice to fix if we can.
4 Comments
Gordon Maclean
Since CSAT3_CKCNTR is true (set in isfs_env.sh and datasets.xml on barolo where statsproc is running) then missing UDP samples will cause the diag to be non-zero. So it could be due to network congestion, not RF interference. We can check this by running statsproc on the rsync'd data files and see if ldiag has the same signature (or setting CSAT3_CKCNTR to false).
I've wondered if the dropouts are worse on Vipers rather than Titans (forgot which stations were which).
Steve Oncley AUTHOR
Yeah, I wondered about that explanation as well. In particular, I did a diff(tspar(<rawdata>)) and saw dropouts. s14 had one of 12seconds, s12&13 only up to 1s. s15 doesn't have this issue. I didn't have time to check out the sequence counter – "diagbits" seems to strip it.
Steve Oncley AUTHOR
OK. Confirmed that this is the old UDP dropped messages issue, not RF. Forget about grounding. I see, for example on s14 at 20160915 12:00:28.51 that there are several dropped sonic samples in a row in the UDP file that exist in the rsync file, and that diag is set to 16 because of this. However, in the process of looking at this, I see errors in the data files, even on the station itself. This happened more than once and on s2 and s14 (only, both vipers) during this time period. Usually, our raw_data files are rock solid, I thought?
Gordon Maclean
I usually like to blame these corruptions on sudden power downs, but I don't see any indication of such a thing happening at s14 on Sep 15th.
My guess is it is due to some sort of USB glitch, which is shared between the cell modem and the flash drive. Perhaps in some instances the DSM can't keep up with the USB interrupt load. Perhaps it happens more with Vipers?
The system logs (/var/log/kern.log, messages) should have some info. It would be nice to grab those.
Looks like a loss of about 30 seconds of data. Hope that the UDP dataset will fill it in, though if the USB was having issues one might expect the cell connection was problematic too.