...
- ehs USB stick died sometime between the 24 mar and 25 mar 00Z rsyncs. fdisk reports no partitions on this stick. I replaced this with a new USB stick about an hour ago and it is now working.
- Santiago rebuilt/started eol-rt-data yesterday. Apparently, in the process the ssh configuration was modified to disallow (some) connections to porter2.
- Yesterday, in an attempt to solve the eol-rt-data issue by myself, I tried restart_process of ssh_tunnel on flux. This killed ssh to flux from the outside. SRS rebooted flux last night at about 1700 and I rebooted it again at about 1200 today (2 separate trips to the tower), which restored the connection from flux to eol-rt-data, but still failed to get all the way through to the eol machines due to eol-rt-data configuration issues. I don't know if this was a red herring and would have fixed itself when eol-rt-data was fixed.
- Today, we also tried 2 porter2 reboots. At least the second one was justified, since sstat reported most services not running and restart_service didn't bring them back. All was well after the reboot.
- Even once eol-rt-data was fixed (by Ted restoring an old image of the virtual machine!) (that restored ssh tunnel to flux), connections to bao and ehs were broken. We found that Gordon's check_udp_... only reported errors and didn't restart the udp_ tunnel process. Running this by hand finally got data flowing. We had also run this process earlier in the day, so there are a couple of hours of udp data that made it to porter2.
- Even with everything fixed, rsync_flab_loop.sh failed with a PATH issue (couldn't find rsync_flab.sh). This is really strange, since this script has always worked. I manually added setting of PATH to this script. In the meantime, I also ran the nightly rsync manually (which was hideously slow – about 3 hours to bring 2 days of data back – so I cheated on some by removing the bandwidth limit).
...