We now can make some statements about the performance of our new Tirga measurement. I'll call this "fan" vs the original "shield".
Shield was deployed at ehs. First, I have to remove a bias of +1C from tc.5m to make T.2m and tc.5m agree with the heat flux. I then find that Tirga is generally within 1C of both T.2m and tc.5m. Some days and nights (presumably clear skies), Tirga.5m agrees closer to T.2m than to tc.5m. This makes sense, because the radiation error would act to raise daytime temps and lower nighttime temps, which is the same effect as measuring closer to the surface. Generally, the magnitude of this radiation error was about 0.5C.
Fan was deployed at bao. No tc adjustment was needed. At this site, large differences between 2m and 5m are seen – typically 5C at night. When the fan was running, differences from tc are typically within 1C. When the fan wasn't running (may 19 16:30 – apr 29 17:30), daytime Tirga was typically 4C higher than tc. Presumably, this is the internal EC100 box temperature heating up.
Considering all of the above and using data only with the fan working, nighttime Tirga.5m-tc.5m differences are about the same between fan and shield – typically within 0.5C. Daytime Tirga.5m-tc.5m has shield on the order of 70% of fan – say 0.9 vs 1.3C. Thus, after all this work, fan still is worse than shield . Perhaps we need a double-shield inside the EC100?
Rudy had noted that rad data died Monday afternoon. Efforts to reset it remotely using mote commands have failed, so there must be a hardware issue. We'll try to get out there today to replace this with a spare.
We'll replace the EC150 Tirga fan at the same time.
The GPS at 200m has quit reporting. It died around 01:00 UTC, April 27.
I noticed 200m was an outlier in the "chronyc sourcestats" output on flux. This listing shows an offset of 1051 microseconds for 200m instead of +-4 microseconds for the others:
chronyc sourcestats 210 Number of sources = 6 Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev ============================================================================== 50m 39 21 10h -0.000 0.001 -2198ns 13us 100m 5 4 68m +0.002 0.124 +4085ns 27us 150m 32 15 534m +0.000 0.001 +2403ns 11us 200m 7 4 103m +0.058 0.024 +1051us 16us 250m 43 21 12h +0.000 0.001 +1879ns 12us 300m 5 3 68m -0.001 0.145 -1918ns 31us
The serial port doesn't show large values of fe (framing errors) or breaks:
root@200m root# cktty 3 3: uart:XR16850 mmio:0x10000000 irq:122 tx:1936 rx:808452280 fe:24 RTS|DTR
I don't think power to serial port 3 can be controlled with "tio 3 1/0". When I tried to power off the GPS on 150m, the output to "rs G" did not stop.
As a workaround, I edited /etc/ntp.conf on 200m and added 150m as a server. So there is no urgency to replace this GPS.
Dan noticed that 300m TRH data were NA last night. This morning, a power cycle showed that the fan isn't turning on – it acts like the fan is stuck.
We plan to drop a new TRH off this afternoon for Dan/Bruce to replace this with (probably on Monday).
Sensor ID3 I2C ADD: 12 data rate: 1 (secs) fan(0) max current: 80 (ma)\n resolution: 12 bits 1 sec MOTE: off\r\n calibration coefficients:\r\n Ta0 = -4.112729E+1\r\n Ta1 = 4.153065E-2\r\n Ta2 = -5.198994E-7\r\n Ha0 = -7.871138E+0\r\n Ha1 = 6.237115E-1\r\n Ha2 = -5.446227E-4\r\n Ha3 = 8.683383E-2\r\n Ha4 = 7.886339E-4\r\n Fa0 = 3.222650E-1\r\n TRH3 11.82 35.91 329 0 1296 72 1023\r\n TRH3 11.82 35.91 221 0 1296 72 687\r\n TRH3 11.82 35.91 118 0 1296 72 367\r\n TRH3 11.78 35.90 69 0 1295 72 216\r\n TRH3 11.82 35.91 39 0 1296 72 123\r\n TRH3 11.78 35.90 18 0 1295 72 57\r\n TRH3 11.78 35.90 6 0 1295 72 19\r\n TRH3 11.78 35.90 0 0 1295 72 0\r\n TRH3 11.78 35.90 0 0 1295 72 0\r\n TRH3 11.78 36.45 0 0 1295 73 0\r\n TRH3 11.74 36.45 0 0 1294 73 0\r\n
From the logs of the check_trh process on flux I see these entries since it was started on April 9. For some reason the higher TRHs had some issues yesterday.
Times in MDT: fgrep cycling /var/log/messages* Apr 18 18:49:09 flux check_trh.sh: 300m temperature is 137.88 . Power cycling port 5 Apr 23 13:03:58 flux check_trh.sh: 300m temperature is 174.1 . Power cycling port 5 Apr 23 13:06:58 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5 Apr 23 13:08:18 flux check_trh.sh: 200m temperature is 181.61 . Power cycling port 5 Apr 23 13:08:38 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5 Apr 23 13:08:58 flux check_trh.sh: 200m temperature is 181.53 . Power cycling port 5 Apr 23 13:09:48 flux check_trh.sh: 300m temperature is 174.06 . Power cycling port 5 Apr 23 13:16:48 flux check_trh.sh: 200m temperature is 179.15 . Power cycling port 5 Apr 23 13:19:18 flux check_trh.sh: 250m temperature is 173.33 . Power cycling port 5 Apr 23 13:30:38 flux check_trh.sh: 200m temperature is 177.04 . Power cycling port 5 Apr 23 13:48:38 flux check_trh.sh: 250m temperature is 171.63 . Power cycling port 5 Apr 23 13:50:48 flux check_trh.sh: 250m temperature is 173.08 . Power cycling port 5
Yesterday (April 23) I reworked things so that the check script is run on each DSM, including the bao station. The only entries after that are from 300m. Subtracting 6 hours from the times, these are at 13:27-13:29 MDT
Times in UTC ssh 300m fgrep cycling /var/log/isfs/dsm.log Apr 23 19:27:33 300m root: temperature is -62.52 . Power cycling port 5 Apr 23 19:28:25 300m root: temperature is -62.52 . Power cycling port 5 Apr 23 19:29:41 300m root: temperature is -62.52 . Power cycling port 5
For example, here is the hiccup from 200m at 19:30:22 UTC. Note after the first power cycle, things look good for 5 seconds, then it reports a bad temp of 89.92 at 19:30:50.1491 and is power cycled again, and works after that.
data_dump -i 4,20 -A 200m_20150423_160000.dat | more ... 2015 04 23 19:30:17.3598 1.001 37 TRH30 15.13 27.28 34 0 1377 56 107\r\n 2015 04 23 19:30:18.3691 1.009 37 TRH30 15.09 27.28 33 0 1376 56 105\r\n 2015 04 23 19:30:19.3691 1 37 TRH30 15.13 27.28 34 0 1377 56 108\r\n 2015 04 23 19:30:20.3692 1 37 TRH30 15.09 27.28 33 0 1376 56 103\r\n 2015 04 23 19:30:21.3790 1.01 37 TRH30 15.09 27.28 34 0 1376 56 108\r\n 2015 04 23 19:30:22.6191 1.24 40 TRH30 177.00 260.02 36 0 5510 886 112\r\n 2015 04 23 19:30:23.6290 1.01 40 TRH30 177.00 260.18 35 0 5510 885 109\r\n 2015 04 23 19:30:24.6290 1 40 TRH30 177.04 260.21 34 0 5511 885 106\r\n 2015 04 23 19:30:25.6398 1.011 40 TRH30 177.08 260.40 33 0 5512 884 105\r\n ... 2015 04 23 19:30:37.6898 1.001 40 TRH30 177.26 260.19 32 0 5517 886 102\r\n 2015 04 23 19:30:38.6900 1 40 TRH30 177.23 260.33 34 0 5516 885 108\r\n 2015 04 23 19:30:39.6991 1.009 38 TRH30 177.30 260.87 5 0 5518 882 16\r\n 2015 04 23 19:30:43.7398 4.041 2 \n 2015 04 23 19:30:43.7408 0.001042 80 \r Sensor ID30 I2C ADD: 11 data rate: 1 (secs) fan(0) max current: 80 (ma)\n 2015 04 23 19:30:43.8292 0.08842 44 \rresolution: 12 bits 1 sec MOTE: off\r\n 2015 04 23 19:30:43.8806 0.05133 28 calibration coefficients:\r\n 2015 04 23 19:30:43.9098 0.02924 21 Ta0 = -4.129395E+1\r\n 2015 04 23 19:30:43.9398 0.02995 21 Ta1 = 4.143320E-2\r\n 2015 04 23 19:30:43.9691 0.02937 21 Ta2 = -3.293163E-7\r\n 2015 04 23 19:30:43.9899 0.02073 21 Ha0 = -7.786594E+0\r\n 2015 04 23 19:30:44.0191 0.02922 21 Ha1 = 6.188832E-1\r\n 2015 04 23 19:30:44.0449 0.02582 21 Ha2 = -5.069766E-4\r\n 2015 04 23 19:30:44.0691 0.02418 21 Ha3 = 9.665616E-2\r\n 2015 04 23 19:30:44.0991 0.03 21 Ha4 = 6.398342E-4\r\n 2015 04 23 19:30:44.1191 0.02001 21 Fa0 = 3.222650E-1\r\n 2015 04 23 19:30:45.1098 0.9907 37 TRH30 15.17 26.14 32 0 1378 54 102\r\n 2015 04 23 19:30:46.1191 1.009 37 TRH30 15.17 26.14 33 0 1378 54 103\r\n 2015 04 23 19:30:47.1291 1.01 37 TRH30 15.17 26.14 34 0 1378 54 108\r\n 2015 04 23 19:30:48.1290 0.9999 37 TRH30 15.17 26.14 32 0 1378 54 101\r\n 2015 04 23 19:30:49.1390 1.01 37 TRH30 15.17 26.14 33 0 1378 54 105\r\n 2015 04 23 19:30:50.1491 1.01 32 TRH30 89.92 0.90 0 0 3251 0 0\r\n 2015 04 23 19:30:53.5790 3.43 2 \n 2015 04 23 19:30:53.5801 0.001042 80 \r Sensor ID30 I2C ADD: 11 data rate: 1 (secs) fan(0) max current: 80 (ma)\n 2015 04 23 19:30:53.6699 0.08981 44 \rresolution: 12 bits 1 sec MOTE: off\r\n 2015 04 23 19:30:53.7213 0.05139 28 calibration coefficients:\r\n 2015 04 23 19:30:53.7491 0.0278 21 Ta0 = -4.129395E+1\r\n 2015 04 23 19:30:53.7790 0.02995 21 Ta1 = 4.143320E-2\r\n 2015 04 23 19:30:53.8083 0.02925 21 Ta2 = -3.293163E-7\r\n 2015 04 23 19:30:53.8290 0.02075 21 Ha0 = -7.786594E+0\r\n 2015 04 23 19:30:53.8601 0.03103 21 Ha1 = 6.188832E-1\r\n 2015 04 23 19:30:53.8898 0.02971 21 Ha2 = -5.069766E-4\r\n 2015 04 23 19:30:53.9108 0.02107 21 Ha3 = 9.665616E-2\r\n 2015 04 23 19:30:53.9398 0.02892 21 Ha4 = 6.398342E-4\r\n 2015 04 23 19:30:53.9691 0.02932 21 Fa0 = 3.222650E-1\r\n 2015 04 23 19:30:54.9590 0.99 37 TRH30 15.17 26.14 34 0 1378 54 107\r\n 2015 04 23 19:30:55.9598 1.001 37 TRH30 15.17 26.14 33 0 1378 54 103\r\n 2015 04 23 19:30:56.9691 1.009 37 TRH30 15.21 26.15 34 0 1379 54 108\r\n 2015 04 23 19:30:57.9691 1 37 TRH30 15.17 26.14 33 0 1378 54 103\r\n
Notice the delta-T column after the datetime. I've looked at a few of these, and I think that there is always a larger deltat-T (in this case 1.24 sec instead of 1.0 ) at the time of the initial bad data, in case that might help in debugging.
9am, Apr 25: Some more glitches since yesterday. Notice again that the problems in different sensors seem to occur at approximately simultaneous times:
ck_trh 200m Apr 24 20:56:05 200m root: temperature is 170.15 . Power cycling port 5 Apr 24 20:57:10 200m root: temperature is 170.22 . Power cycling port 5 300m Apr 24 20:51:47 300m root: temperature is -62.52 . Power cycling port 5 Apr 24 20:56:52 300m root: temperature is -62.52 . Power cycling port 5
This done from about 2:30-4:30 with Steve S&O and Kurt. Soil samples taken – will update gravimetric posting. Queried each soil sensor for IDs (since I had to look at the Qsoil values anyway) into minicom capture file (attached: ehsteardown.cap)
Problem soil sensors at this site have been:
Tsoil.0.6cm (SN 12, epoxy-coated, upside down): Didn't work from a few hours after installation until it revived itself ~3 weeks later. Ran fine until tear-down (including the manual reading I took during tear-down). Back in the lab, nothing is visually wrong with this probe.
Qsoil.5cm (SN 12): Worked fine for the first 3 weeks, then started dropping data (for hours at a time) during last 3 weeks. Worked during the manual reading during tear-down. In lab, the Binder connector was not fully seated – pulled out by about 0.7mm. Sorry, I forgot to inspect it during the actual tear-down. This is a post-mortem item – to inspect that each Binder is fully seated during probe installation.
It appears that after EOL systems work yesterday, /scr/isfs didn't come up. WWW plots died (though there is at least one WWW plots issue that is due to ehs being decommissioned). rsync from bao didn't happen last night.
I've just submitted a system help request to correct this, and tonight's rsync task should regenerate the data.
From the plots at http://datavis.eol.ucar.edu/ncharts/projects/CABL/geo_notiltcor, it seems that several of the sonics on the tower were not reporting data intermittently on 4/17. The 5-minute data files also have several '_' points. Is this a permanent data outage or are the data recoverable?
Sensor status:
T: ok
RH: ok
Ifan: ok
spd: ok
P: ok
co2/h2o: ok
csat u,v,w: ok
csat ldiag: ok
soils: ok
Wetness: ok
Rsw/Rlw/Rpile: ok
Voltages: ok
sstat outputs: ok, ok
We had to recover the DSM and batteries for use in PECAN, so I removed these from ehs from about 1018-1038 today. The rest of the station is still in place, but we no longer are collecting data.
bao and the tower sensors are still up, and currently scheduled to start being removed 1 June. (Actually, I don't know if this extension to 1 June has officially been approved, but I am assuming that this will be the case.)
I introduced a bug in the R code yesterday such that it would try to create an infinitely long netcdf file. This is the code that adds the derived quantities to the netcdf files.
As a results ncharts was hanging, and probably also the R plots weren't working.
The bug is now fixed, and things should return to normal.
Sensor status:
T: ok
RH: ok
Ifan: ok
spd: ok
P: ok
co2/h2o: ok
csat u,v,w: ok
csat ldiag: ok
soils: Gsoil/Cvsoil.ehs has a lot of missing data during noontime
Wetness: ok
Rsw/Rlw/Rpile: ok
Voltages: ok
sstat outputs: ok, ok
Sensor status:
T: ok
RH: ok
Ifan: ok
spd: ok
P: ok
co2/h2o: ok
csat u,v,w: ok
csat ldiag: ok
soils: ok
Wetness: ok
Rsw/Rlw/Rpile: ok
Voltages: ok
sstat outputs: ok, ok
Sensor status:
T: T.300m missing
RH: T.300m missing
Ifan: ok
spd: ok
P: ok
co2/h2o: ok
csat u,v,w: ok
csat ldiag: ok
soils: ok
Wetness: ok
Rsw/Rlw/Rpile: ok
Voltages: ok
sstat outputs: ok, ok
Dan noticed that, as predicted a few logbook entries ago, T.300m was bad again. I manually restarted it at about 9 this morning. Since then, Gordon has written a script that checks tower TRH data every 10s and restarts the sensor if needed.
I also note the METCRAX-II logbook comment: "TRH.40m.rim restarted" where we found that the (bad) data are correctable. My (manual) code to implement this for METCRAXII is at $ISFF/projects/METCRAXII/ISFF/R/fixSHT.qq. It isn't clear that it is worth the effort to implement this fix for the 15 hours of data that are missing during this educational deployment...
If we decide to fix these data, we will need the following coefficients:
Sensor ID3 I2C ADD: 12 data rate: 1 (secs) fan(0) max current: 80 (ma)\n
2015 04 09 15:01:26.7912 0.08542 44 \rresolution: 12 bits 1 sec MOTE: off\r\n
2015 04 09 15:01:26.8472 0.05602 28 calibration coefficients:\r\n
2015 04 09 15:01:26.8755 0.02831 21 Ta0 = -4.112729E+1\r\n
2015 04 09 15:01:26.9047 0.02921 21 Ta1 = 4.153065E-2\r\n
2015 04 09 15:01:26.9348 0.03005 21 Ta2 = -5.198994E-7\r\n
2015 04 09 15:01:26.9555 0.02074 21 Ha0 = -7.871138E+0\r\n
2015 04 09 15:01:26.9848 0.02932 21 Ha1 = 6.237115E-1\r\n
2015 04 09 15:01:27.0148 0.02994 21 Ha2 = -5.446227E-4\r\n
2015 04 09 15:01:27.0348 0.01999 21 Ha3 = 8.683383E-2\r\n
2015 04 09 15:01:27.0648 0.03001 21 Ha4 = 7.886339E-4\r\n
2015 04 09 15:01:27.0947 0.02994 21 Fa0 = 3.222650E-1\r\n
P.S. There now is a version of fixSHT.qq in the CABL/ISFF/R directory that implements this fix, though we now how to figure out how to apply this fix in our data flow. One possibility is to bundle it with the wind-direction computing script to write to both high-rate and 5-min NetCDF files.
Ultimately, of course, the solution is to fix the sensor/microprocessor. This is an intermittent problem that has affected at least 2 different sensors. Since it is a bit slip, it likely is an issue communicating between the PIC and the SHT. Thus, it appears that either the PIC, SHT, or the interface circuit is marginal in some aspect. Is the timing too fast? Is a pull-up resistor needed? Is some timing wait state needed?