Data Flow from the Data Systems
Local archive to USB
Each DSM has a USB stick which mounts to /media/usbdisk
, and all the local samples are archived to files in /media/usbdisk/projects/Perdigao/raw_data
. As long as the USB stick does not fill up and does not fail, it will always be the most complete dataset.
A couple of DSMs have had a problem where the USB bus occasionally resets, and that interrupts the USB archive for a few minutes. However, that also interrupts the network stream since the network interface device is also on the USB bus on the Raspberry Pi. (This issue is tracked in EOL JIRA issue ISFS-145.)
Suffice to say it's important to keep the USB disks working. The nagios
server running on ustar is using a check_mk
agent on every DSM to monitor the USB mounts. The nagios status can be browsed from the Perdigao local network or from the NCAR network, but you need the username and password.
- From EOL: http://ops-perdigao.dydns.org/nagios
- From Perdigao: http://192.168.2.2/nagios
Real-time TCP
The DSMs are all configured with dgrequest outputs to host address flux
, aka ustar
. This means a DSM sends a UDP datagram to flux:30000 requesting a TCP connection, and dsm_server on flux responds with a port to connect to. The raw network samples received by dsm_server are written to /data/isfs/projects/Perdigao/raw_data
with filenames isfs_YYYYmmdd_HHMMSS.dat.bz2
. Here is the relevant excerpt from perdigao.xml
:
<output class="RawSampleOutputStream"> <fileset dir="$DATAMNT/projects/${PROJECT}/$DATADIR" file="isfs_%Y%m%d_%H%M%S.dat.bz2" length="14400"> </fileset> </output>
An important part of the server configuration is the size of the sorter time windows. If samples stop appearing on the server from a DSM, check the log file /var/log/isfs/isfs.log
for messages about samples getting dropped. Sometimes there are more samples than will fit in the amount of memory allocated for the sorters, even though the samples all fit in the sorter's time range, and so samples will be dropped. It may be necessary to increase the memory allocated to the sorters in the configuration file.
Also, the sorters naturally filter samples with times that are out of range. If the log file has messages about dropping samples outside the time window, check that the DSM time is correct and synchronized. The DSM time only has to be off by a few seconds for its samples to be dropped by the sorter.
Near real-time rsync
The rsync_dsms.py script uses rsync to synchronize the data files from the USB archive of each DSM into the same directory as the real-time network stream, /data/isfs/projects/Perdigao/raw_data
, except the data file names begin with the DSM name and are not compressed. The script removes all the data files which are synchronized successfully except for the most recent. The script runs continuously, keeping at most 4 rsyncs running in parallel at a time, pulling data from each DSM every hour, and retrying failed rsyncs every 30 minutes. The status of each DSM rsync is written to a json file in Perdigao/ISFS/logs
. A second script rsync_status.py reports to nagios whether an rsync has succeeded within the last 2 hours, and if not reports a critical check failure to NAGIOS.
A crontab entry restarts the rsync process if it ever stops running.
Raw Data Transfer to EOL
Ted and Santiago installed a BitTorrent P2P sync node on ustar to sync raw data files from ustar:/data/isfs/projects/Perdigao/raw_data
to field-sync.eol.ucar.edu:/scr/tmp/isfs/perdigao/raw_data
. This provides off-site backups of the raw data, and it allows duplicate processing on ustar and EOL to generate all the data products.
The P2P software is resilio-sync-2.4.4-1.x86_64
from https://www.resilio.com/
. The configuration is available through a web connection to port 8888, but only accessible from localhost
.
The barolo:/scr/tmp
filesystem was selected because it had 9.6T available at the time, and because these raw data do not need to be backed up. Santiago also turned off the scrubber for /scr/tmp/isfs/perdigao
. The /scr/tmp
location will serve as a backup for the on-site data, and as a staging location for moving the raw data to HPSS during the project.
The BitTorrent mirror was replaced with a 15-minute rsync loop script, running in the isfs
account on barolo, so that the real-time network files could be updated continuously. BitTorrent only updates files after they are quiescent for 15 minutes, which means the real-time network stream could be as much as 4 hours behind at EOL.
Data backups to USB on ustar
Two multi-terabyte USB disk drives were attached to ustar. A crontab entry mirrors the entire /data/isfs/
filesystem onto each USB drive every 6 hours. The laptop /data partition is not large enough to contain the entire project's data, so older raw and merged data were aged off that disk manually as it filled up. The iss_catalog.py script was used to keep a database of all the data files and their time coverage, and then the delete operation was used to age off the oldest data.
/opt/local/iss-system/iss/src/iss_catalog/iss_catalog.py scan --context=perdigao/ustar --scan /data/isfs/projects/Perdigao --db ~/ustar.db --verbose /opt/local/iss-system/iss/src/iss_catalog/iss_catalog.py delete --time 20000101, --size 100000 --context=perdigao/ustar --db $HOME/ustar.db --categories raw/nidas --categories merge
Data processing
Once a complete data file has been synchronized from a DSM (ie, it is complete because it is not the file being currently written), then all the samples which were acquired by that DSM for that same time period now reside on ustar in either the USB archive stream or the network archive stream. These two streams are merged to generate the most complete sample archive.
The merge process runs from cron every 8 hours, within a few hours after the rsyncs finish. The data files are written to the merge
directory next to the raw_data
directory.
Finally, the stats_proc
program runs on the merged data file stream. This process passes all the raw samples through the NIDAS processing and computes the 5-minute averages and derivations. There are few kinds of datasets produced this way, depending on what QC and corrections are applied. Each kind of dataset being generated is written to a subdirectory of the netcdf
directory, next to raw_data
and merge
.
At EOL, the processed netcdf output going to /scr/tmp/isfs
same as for the raw data. Usually ISFS netcdf data are stored in /scr/isfs
. Even though that filesystem has 15T total, right now APG is using 2.8T for DEEPWAVE and 5.8T for PECAN under /scr/isfs/wbrown/temp
, and ISFS has 1.5T for CHATS, so there is only 3.7T of space left.
Transfer of Sample-Rate NetCDF to University of Porto
There was an earlier plan to offer a server from which U Porto could pull the sample-rate data, preferably with rsync. However, there is not enough space on the EOL ftp filesystem for the data (2.3T total size), and EOL does not have an exposed rsync server. We could have setup a rsync server on ustar, but that could have been an unnecessary drain on what seemed to be unreliable Internet bandwidth.
Instead, the plan is to rsync sample-rate netcdf data files to the University of Porto from barolo.eol.ucar.edu
through access they have provided.
Web Pages on ustar
Setup:
sudo mkdir /var/www/html/isf sudo chown daq.daq /var/www/html/isf rsync -av isfs@barolo:/net/www/docs/isf/include /var/www/html/isf rsync -av isfs@barolo:/net/www/docs/isf/graphics /var/www/html/isf rsync -av isfs@barolo:/net/www/docs/gif /var/www/html
I think the only gif referenced may be ncar_ban_sml.gif, so perhaps it should just be moved into isf/include and the headers fixed.
The href in the header needs to be fixed to point directly to the ISF facilities page on the EOL web site.
We should commit the include and graphics and conf.d/isf.conf somewhere, and deploy with a RPM or script. The isf/include and isf/graphics directories under /net/www/docs
are not in revision control yet either, unless they are installed from somewhere.
The webplots_table
script generates /var/www/isf/projects/VERTEX/isfs/qcdata/index.shtml
.