Blog from April, 2015

Tirga test results

We now can make some statements about the performance of our new Tirga measurement.  I'll call this "fan" vs the original "shield".

Shield was deployed at ehs.  First, I have to remove a bias of +1C from tc.5m to make T.2m and tc.5m agree with the heat flux.  I then find that Tirga is generally within 1C of both T.2m and tc.5m.  Some days and nights (presumably clear skies), Tirga.5m agrees closer to T.2m than to tc.5m.  This makes sense, because the radiation error would act to raise daytime temps and lower nighttime temps, which is the same effect as measuring closer to the surface.  Generally, the magnitude of this radiation error was about 0.5C.

Fan was deployed at bao.  No tc adjustment was needed.  At this site, large differences between 2m and 5m are seen – typically 5C at night.  When the fan was running, differences from tc are typically within 1C.  When the fan wasn't running (may 19 16:30 – apr 29 17:30), daytime Tirga was typically 4C higher than tc.  Presumably, this is the internal EC100 box temperature heating up.

Considering all of the above and using data only with the fan working, nighttime Tirga.5m-tc.5m differences are about the same between fan and shield – typically within 0.5C.  Daytime Tirga.5m-tc.5m has shield on the order of 70% of fan – say 0.9 vs 1.3C.  Thus, after all this work, fan still is worse than shield (sad).  Perhaps we need a double-shield inside the EC100?

 

 

No rad data

Rudy had noted that rad data died Monday afternoon.  Efforts to reset it remotely using mote commands have failed, so there must be a hardware issue.  We'll try to get out there today to replace this with a spare.

We'll replace the EC150 Tirga fan at the same time.

 

GPS at 200m out

The GPS at 200m has quit reporting. It died around 01:00 UTC, April 27.

I noticed 200m was an outlier in the "chronyc sourcestats" output on flux. This listing shows an offset of 1051 microseconds for 200m instead of +-4 microseconds for the others:

chronyc sourcestats
210 Number of sources = 6
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
==============================================================================
50m                        39  21   10h     -0.000      0.001  -2198ns    13us
100m                        5   4   68m     +0.002      0.124  +4085ns    27us
150m                       32  15  534m     +0.000      0.001  +2403ns    11us
200m                        7   4  103m     +0.058      0.024  +1051us    16us
250m                       43  21   12h     +0.000      0.001  +1879ns    12us
300m                        5   3   68m     -0.001      0.145  -1918ns    31us

The serial port doesn't show large values of fe (framing errors) or breaks:

root@200m root# cktty 3
3: uart:XR16850 mmio:0x10000000 irq:122 tx:1936 rx:808452280 fe:24 RTS|DTR

I don't think power to serial port 3 can be controlled with "tio 3 1/0". When I tried to power off the GPS on 150m, the output to "rs G" did not stop.

As a workaround, I edited /etc/ntp.conf on 200m and added 150m as a server. So there is no urgency to replace this GPS.

Dan noticed that 300m TRH data were NA last night.  This morning, a power cycle showed that the fan isn't turning on – it acts like the fan is stuck.

We plan to drop a new TRH off this afternoon for Dan/Bruce to replace this with (probably on Monday).

Sensor ID3   I2C ADD: 12   data rate: 1 (secs)  fan(0) max current: 80 (ma)\n
resolution: 12 bits      1 sec MOTE: off\r\n
calibration coefficients:\r\n
Ta0 = -4.112729E+1\r\n
Ta1 =  4.153065E-2\r\n
Ta2 = -5.198994E-7\r\n
Ha0 = -7.871138E+0\r\n
Ha1 =  6.237115E-1\r\n
Ha2 = -5.446227E-4\r\n
Ha3 =  8.683383E-2\r\n
Ha4 =  7.886339E-4\r\n
Fa0 =  3.222650E-1\r\n
TRH3 11.82 35.91 329 0 1296 72 1023\r\n
TRH3 11.82 35.91 221 0 1296 72 687\r\n
TRH3 11.82 35.91 118 0 1296 72 367\r\n
TRH3 11.78 35.90 69 0 1295 72 216\r\n
TRH3 11.82 35.91 39 0 1296 72 123\r\n
TRH3 11.78 35.90 18 0 1295 72 57\r\n
TRH3 11.78 35.90 6 0 1295 72 19\r\n
TRH3 11.78 35.90 0 0 1295 72 0\r\n
TRH3 11.78 35.90 0 0 1295 72 0\r\n
TRH3 11.78 36.45 0 0 1295 73 0\r\n
TRH3 11.74 36.45 0 0 1294 73 0\r\n

 

 

TRH hiccups

From the logs of the check_trh process on flux I see these entries since it was started on April 9.  For some reason the higher TRHs had some issues yesterday.

TRH problems
Times in MDT:
 
fgrep cycling /var/log/messages*
Apr 18 18:49:09 flux check_trh.sh: 300m temperature is 137.88 . Power cycling port 5
Apr 23 13:03:58 flux check_trh.sh: 300m temperature is 174.1 . Power cycling port 5
Apr 23 13:06:58 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5
Apr 23 13:08:18 flux check_trh.sh: 200m temperature is 181.61 . Power cycling port 5
Apr 23 13:08:38 flux check_trh.sh: 300m temperature is 174.28 . Power cycling port 5
Apr 23 13:08:58 flux check_trh.sh: 200m temperature is 181.53 . Power cycling port 5
Apr 23 13:09:48 flux check_trh.sh: 300m temperature is 174.06 . Power cycling port 5
Apr 23 13:16:48 flux check_trh.sh: 200m temperature is 179.15 . Power cycling port 5
Apr 23 13:19:18 flux check_trh.sh: 250m temperature is 173.33 . Power cycling port 5
Apr 23 13:30:38 flux check_trh.sh: 200m temperature is 177.04 . Power cycling port 5
Apr 23 13:48:38 flux check_trh.sh: 250m temperature is 171.63 . Power cycling port 5
Apr 23 13:50:48 flux check_trh.sh: 250m temperature is 173.08 . Power cycling port 5

 

Yesterday (April 23) I reworked things so that the check script is run on each DSM, including the bao station.  The only entries after that are from 300m. Subtracting 6 hours from the times, these are at 13:27-13:29 MDT

Times in UTC
 
ssh 300m fgrep cycling /var/log/isfs/dsm.log

Apr 23 19:27:33 300m root: temperature is -62.52 . Power cycling port 5
Apr 23 19:28:25 300m root: temperature is -62.52 . Power cycling port 5
Apr 23 19:29:41 300m root: temperature is -62.52 . Power cycling port 5

For example, here is the hiccup from 200m at 19:30:22 UTC.  Note after the first power cycle, things look good for 5 seconds, then it reports a bad temp of 89.92 at 19:30:50.1491 and is power cycled again, and works after that.

200m
data_dump -i 4,20 -A 200m_20150423_160000.dat | more
...
2015 04 23 19:30:17.3598   1.001      37 TRH30 15.13 27.28 34 0 1377 56 107\r\n
2015 04 23 19:30:18.3691   1.009      37 TRH30 15.09 27.28 33 0 1376 56 105\r\n
2015 04 23 19:30:19.3691       1      37 TRH30 15.13 27.28 34 0 1377 56 108\r\n
2015 04 23 19:30:20.3692       1      37 TRH30 15.09 27.28 33 0 1376 56 103\r\n
2015 04 23 19:30:21.3790    1.01      37 TRH30 15.09 27.28 34 0 1376 56 108\r\n
2015 04 23 19:30:22.6191    1.24      40 TRH30 177.00 260.02 36 0 5510 886 112\r\n
2015 04 23 19:30:23.6290    1.01      40 TRH30 177.00 260.18 35 0 5510 885 109\r\n
2015 04 23 19:30:24.6290       1      40 TRH30 177.04 260.21 34 0 5511 885 106\r\n
2015 04 23 19:30:25.6398   1.011      40 TRH30 177.08 260.40 33 0 5512 884 105\r\n
...
2015 04 23 19:30:37.6898   1.001      40 TRH30 177.26 260.19 32 0 5517 886 102\r\n
2015 04 23 19:30:38.6900       1      40 TRH30 177.23 260.33 34 0 5516 885 108\r\n
2015 04 23 19:30:39.6991   1.009      38 TRH30 177.30 260.87 5 0 5518 882 16\r\n
2015 04 23 19:30:43.7398   4.041       2 \n
2015 04 23 19:30:43.7408 0.001042      80 \r Sensor ID30   I2C ADD: 11   data rate: 1 (secs)  fan(0) max current: 80 (ma)\n
2015 04 23 19:30:43.8292 0.08842      44 \rresolution: 12 bits      1 sec MOTE: off\r\n
2015 04 23 19:30:43.8806 0.05133      28 calibration coefficients:\r\n
2015 04 23 19:30:43.9098 0.02924      21 Ta0 = -4.129395E+1\r\n
2015 04 23 19:30:43.9398 0.02995      21 Ta1 =  4.143320E-2\r\n
2015 04 23 19:30:43.9691 0.02937      21 Ta2 = -3.293163E-7\r\n
2015 04 23 19:30:43.9899 0.02073      21 Ha0 = -7.786594E+0\r\n
2015 04 23 19:30:44.0191 0.02922      21 Ha1 =  6.188832E-1\r\n
2015 04 23 19:30:44.0449 0.02582      21 Ha2 = -5.069766E-4\r\n
2015 04 23 19:30:44.0691 0.02418      21 Ha3 =  9.665616E-2\r\n
2015 04 23 19:30:44.0991    0.03      21 Ha4 =  6.398342E-4\r\n
2015 04 23 19:30:44.1191 0.02001      21 Fa0 =  3.222650E-1\r\n
2015 04 23 19:30:45.1098  0.9907      37 TRH30 15.17 26.14 32 0 1378 54 102\r\n
2015 04 23 19:30:46.1191   1.009      37 TRH30 15.17 26.14 33 0 1378 54 103\r\n
2015 04 23 19:30:47.1291    1.01      37 TRH30 15.17 26.14 34 0 1378 54 108\r\n
2015 04 23 19:30:48.1290  0.9999      37 TRH30 15.17 26.14 32 0 1378 54 101\r\n
2015 04 23 19:30:49.1390    1.01      37 TRH30 15.17 26.14 33 0 1378 54 105\r\n
2015 04 23 19:30:50.1491    1.01      32 TRH30 89.92 0.90 0 0 3251 0 0\r\n
2015 04 23 19:30:53.5790    3.43       2 \n
2015 04 23 19:30:53.5801 0.001042      80 \r Sensor ID30   I2C ADD: 11   data rate: 1 (secs)  fan(0) max current: 80 (ma)\n
2015 04 23 19:30:53.6699 0.08981      44 \rresolution: 12 bits      1 sec MOTE: off\r\n
2015 04 23 19:30:53.7213 0.05139      28 calibration coefficients:\r\n
2015 04 23 19:30:53.7491  0.0278      21 Ta0 = -4.129395E+1\r\n
2015 04 23 19:30:53.7790 0.02995      21 Ta1 =  4.143320E-2\r\n
2015 04 23 19:30:53.8083 0.02925      21 Ta2 = -3.293163E-7\r\n
2015 04 23 19:30:53.8290 0.02075      21 Ha0 = -7.786594E+0\r\n
2015 04 23 19:30:53.8601 0.03103      21 Ha1 =  6.188832E-1\r\n
2015 04 23 19:30:53.8898 0.02971      21 Ha2 = -5.069766E-4\r\n
2015 04 23 19:30:53.9108 0.02107      21 Ha3 =  9.665616E-2\r\n
2015 04 23 19:30:53.9398 0.02892      21 Ha4 =  6.398342E-4\r\n
2015 04 23 19:30:53.9691 0.02932      21 Fa0 =  3.222650E-1\r\n
2015 04 23 19:30:54.9590    0.99      37 TRH30 15.17 26.14 34 0 1378 54 107\r\n
2015 04 23 19:30:55.9598   1.001      37 TRH30 15.17 26.14 33 0 1378 54 103\r\n
2015 04 23 19:30:56.9691   1.009      37 TRH30 15.21 26.15 34 0 1379 54 108\r\n
2015 04 23 19:30:57.9691       1      37 TRH30 15.17 26.14 33 0 1378 54 103\r\n

Notice the delta-T column after the datetime. I've looked at a few of these, and I think that there is always a larger deltat-T (in this case 1.24 sec instead of 1.0 ) at the time of the initial bad data, in case that might help in debugging.

9am, Apr 25: Some more glitches since yesterday. Notice again that the problems in different sensors seem to occur at approximately simultaneous times:

ck_trh
200m
Apr 24 20:56:05 200m root: temperature is 170.15 . Power cycling port 5
Apr 24 20:57:10 200m root: temperature is 170.22 . Power cycling port 5

300m
Apr 24 20:51:47 300m root: temperature is -62.52 . Power cycling port 5
Apr 24 20:56:52 300m root: temperature is -62.52 . Power cycling port 5

ehs removed today

This done from about 2:30-4:30 with Steve S&O and Kurt.  Soil samples taken – will update gravimetric posting.  Queried each soil sensor for IDs (since I had to look at the Qsoil values anyway) into minicom capture file (attached: ehsteardown.cap)

Problem soil sensors at this site have been:

Tsoil.0.6cm (SN 12, epoxy-coated, upside down):  Didn't work from a few hours after installation until it revived itself ~3 weeks later.  Ran fine until tear-down (including the manual reading I took during tear-down).  Back in the lab, nothing is visually wrong with this probe.

Qsoil.5cm (SN 12): Worked fine for the first 3 weeks, then started dropping data (for hours at a time) during last 3 weeks.  Worked during the manual reading during tear-down.  In lab, the Binder connector was not fully seated – pulled out by about 0.7mm.  Sorry, I forgot to inspect it during the actual tear-down.  This is a post-mortem item – to inspect that each Binder is fully seated during probe installation.

 

/scr/isfs down

It appears that after EOL systems work yesterday, /scr/isfs didn't come up.  WWW plots died (though there is at least one WWW plots issue that is due to ehs being decommissioned).  rsync from bao didn't happen last night.

I've just submitted a system help request to correct this, and tonight's rsync task should regenerate the data.

 

From the plots at http://datavis.eol.ucar.edu/ncharts/projects/CABL/geo_notiltcor, it seems that several of the sonics on the tower were not reporting data intermittently on 4/17. The 5-minute data files also have several '_' points. Is this a permanent data outage or are the data recoverable?

Status update Friday

Sensor status:    

T: ok

RH: ok

Ifan: ok

spd: ok

P: ok

co2/h2o: ok

csat u,v,w: ok

csat ldiag: ok

soils: ok

Wetness: ok

Rsw/Rlw/Rpile: ok

Voltages: ok

sstat outputs: ok, ok

ehs data turned off

We had to recover the DSM and batteries for use in PECAN, so I removed these from ehs from about 1018-1038 today.  The rest of the station is still in place, but we no longer are collecting data.

bao and the tower sensors are still up, and currently scheduled to start being removed 1 June.  (Actually, I don't know if this extension to 1 June has officially been approved, but I am assuming that this will be the case.)

 

I introduced a bug in the R code yesterday such that it would try to create an infinitely long netcdf file. This is the code that adds the derived quantities to the netcdf files.

As a results ncharts was hanging, and probably also the R plots weren't working.

The bug is now fixed, and things should return to normal.

Status update Monday

Sensor status:    

T: ok

RH: ok

Ifan: ok

spd: ok

P: ok

co2/h2o: ok

csat u,v,w: ok

csat ldiag: ok

soils: Gsoil/Cvsoil.ehs has a lot of missing data during noontime

Wetness: ok

Rsw/Rlw/Rpile: ok

Voltages: ok

sstat outputs: ok, ok

Status update Saturday

Sensor status:    

T: ok

RH: ok

Ifan: ok

spd: ok

P: ok

co2/h2o: ok

csat u,v,w: ok

csat ldiag: ok

soils: ok

Wetness: ok

Rsw/Rlw/Rpile: ok

Voltages: ok

sstat outputs: ok, ok

Status update Thursday

Sensor status:    

T: T.300m missing

RH: T.300m missing

Ifan: ok

spd: ok

P: ok

co2/h2o: ok

csat u,v,w: ok

csat ldiag: ok

soils: ok

Wetness: ok

Rsw/Rlw/Rpile: ok

Voltages: ok

sstat outputs: ok, ok

TRH.300m restarted

Dan noticed that, as predicted a few logbook entries ago, T.300m was bad again.  I manually restarted it at about 9 this morning.  Since then, Gordon has written a script that checks tower TRH data every 10s and restarts the sensor if needed.

I also note the METCRAX-II logbook comment: "TRH.40m.rim restarted" where we found that the (bad) data are correctable.  My (manual) code to implement this for METCRAXII is at $ISFF/projects/METCRAXII/ISFF/R/fixSHT.qq.  It isn't clear that it is worth the effort to implement this fix for the 15 hours of data that are missing during this educational deployment...

If we decide to fix these data, we will need the following coefficients:

 Sensor ID3   I2C ADD: 12   data rate: 1 (secs)  fan(0) max current: 80 (ma)\n
2015 04 09 15:01:26.7912 0.08542      44 \rresolution: 12 bits      1 sec MOTE: off\r\n
2015 04 09 15:01:26.8472 0.05602      28 calibration coefficients:\r\n
2015 04 09 15:01:26.8755 0.02831      21 Ta0 = -4.112729E+1\r\n
2015 04 09 15:01:26.9047 0.02921      21 Ta1 =  4.153065E-2\r\n
2015 04 09 15:01:26.9348 0.03005      21 Ta2 = -5.198994E-7\r\n
2015 04 09 15:01:26.9555 0.02074      21 Ha0 = -7.871138E+0\r\n
2015 04 09 15:01:26.9848 0.02932      21 Ha1 =  6.237115E-1\r\n
2015 04 09 15:01:27.0148 0.02994      21 Ha2 = -5.446227E-4\r\n
2015 04 09 15:01:27.0348 0.01999      21 Ha3 =  8.683383E-2\r\n
2015 04 09 15:01:27.0648 0.03001      21 Ha4 =  7.886339E-4\r\n
2015 04 09 15:01:27.0947 0.02994      21 Fa0 =  3.222650E-1\r\n

P.S. There now is a version of fixSHT.qq in the CABL/ISFF/R directory that implements this fix, though we now how to figure out how to apply this fix in our data flow.  One possibility is to bundle it with the wind-direction computing script to write to both high-rate and 5-min NetCDF files.

Ultimately, of course, the solution is to fix the sensor/microprocessor.  This is an intermittent problem that has affected at least 2 different sensors.  Since it is a bit slip, it likely is an issue communicating between the PIC and the SHT.  Thus, it appears that either the PIC, SHT, or the interface circuit is marginal in some aspect.  Is the timing too fast?  Is a pull-up resistor needed?  Is some timing wait state needed?