Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Monitoring the accuracy of the system clock on a real-time data acquisition system provides useful information about the performance of the system. Hence this long discussion.

Data System Clock, NTP and GPS

The data system at the Manitou Forest Observatory The turbulence tower data system (aka, the DSM) uses a GPS receiver and the NTP (Network Time Protocol) software to set its the system clock, which, in addition to the normal uses of the a system clock, is used to time-tag the data samples.

The serial messages from the GPS are received on serial port 3, /dev/ttyS3. The pulse-per-second square-wave signal (PPS) from the GPS is also connected to the CTS DCD line of that serial port. The PPS A patch has been added to the Linux kernel on the data system so that an interrupt function can be registered to run in response to the CTS DCD interrupts. This interrupt function will be called immediately after the rising edge of the PPS signal has been detected by the serial port hardware.

...

The RMC records contain the current date and time, in addition to latitude, longitude, and other quantities. The transmission time of the RMC message is not tightly controlled within the GPS and appears to be primarily effected by lags associated with internal GPS processing, and is also likely effected by what other NMEA messages are enabled for output on the GPS. The exact receipt time of the RMC message is not used for clock adjustments. NTP simply uses the time fields within the RMC message as a an absolute time label for the previous PPS, whose timing is very precise.

Clock Variables

We monitor the following variables to keep track of the DSM timekeeping, and plot them on the daily web plots:

  • GPSdiff: The time difference, in seconds, between the time-tag that was assigned to a RMC message and the date and time that is contained within the message. The time-tag assigned to a message sample is the value of the system clock at the moment the first byte of the message was received. For example, a value of 0.6 secs sec means that the data system assigned a time-tag to the RMC message that was 0.6 seconds later than the time value contained in within the message. GPSdiff will be effected by processing lags within the GPS, DSM data sampling lags, and the drift of the DSM system clock relative to the clock within the GPS receiverAs discussed above, GPSdiff is not a precise measurement of clock differences and is not used to adjust the system clock. It gives a crude value of the agreement of the clocks and possible effects of I/O latency and buffering in the data system. When 5 minute statistics are computed, the maximum and minimum values of GPSdiff for each 5 minute period are written to the output NetCDF files as GPSdiff_max and GPSdiff_min, and then plotted on the daily web plots.
  • GPSnsat: number of satellites being tracked by the receiver, that is, the number of satellites whose signals are used in the its time and location solution. GPSnsat in the NetCDF files and plots is a 5 minute mean.

NTP on the DSM is configured to log information about time-keeping its status in a "loopstats" file. See http://www.eecis.udel.edu/~mills/ntp/html/monopt.htmlImage Removed for information on the NTP monitoring options. The loopstats file includes these variables, which have been merged into the Manitou data archive:

  • NTPClockOffset: the estimated offset of the GPS time from the data system time. A positive value indicates that NTP has determined that the GPS clock is ahead of the system clock, i.e. the GPS is showing a later time than the system clock. The maximum, minimum and mean values of NTPClockOffset in each 5 minute period are computed and written to the NetCDF files and plotted as NTPClockOffset_max, NTPClockOffset_min and NTPClockOffset.
  • NTPFreqOffset: the correction applied to the system clock frequency in parts-per-million, a . A positive value indicates that NTP has determined that the system clock oscillator is slow and the frequency offset is NTPFreqOffset PPM values are being added periodically to the system clock valuescounter. The NetCDF files and plots contain 5 minute means of NTPFreqOffset.

The NTP values logs have not been recorded consistently since the beginning of the project. Data in Year 2010 data from May 3 to August 12th and Oct 14th to November 9th are available, as well as all data from April 9, 2011 onward.

Replacement of Garmin GPS

On April 12, 2011 the old Garmin GPS 25-HVS at the tower was replaced with a much newer Garmin 18x-LVC model. The model numbers are shown in the $PGRMT messages in the archive, where the time is UTC:

Code Block
data_dump -i 1,30 -A manitou_20110412_120000.bz2 | fgrep GPSPGRMT
...
2011 04 12 16:41:39.6568    0.15      49 $PGRMT,GPS 25-HVS VER 2.50 ,P,P,R,R,P,,23,R*08\r\n
2011 04 12 16:42:50.4248  0.1249      51 $PGRMT,GPS 18x-LVC software ver. 3.10,,,,,,,,*6D\r\

...

The following plot is for the old 25-HVS model for 3 days prior to the swap:

The NTPClockOffset _max ranges from approximately -1000 shows spikes between -100000 to 50000 microseconds during that periodthis period, which is much worse than expected for a GPS/NTP reference clock. The upward spikes in NTPClockOffset _max are simultaneous with positive jumps in GPSdiff_max, up to as much as 2.5 seconds. These jumps in GPSdiff_max also events seem to happen when the number of received tracked satellites changes, indicating which indicates that internal processing lags in the 25-HVS cause it to report late. It is unknown if , causing large values of GPSdiff. The extent of this effect on the timing of the PPS signal is effected by these events. unknown.

The following plot shows a close up of one of the clock offset spikes using un-averaged data:
Image Added

The sudden downward jump in NTPClockOffset causes NTP to think that the GPS clock is earlier than the system clock. NTP At these moments, NTP estimates that the system clock is early relative to the GPS, and starts to correct for the error offset by speeding up slowing down the system clock, as seen as the positive spike in NTPClockOffset and in the negative values for NTPFreqOffset. When the GPS recovers from its delayed reporting, NTP then NTP sees that sees positive values for NTPClockOffset and adjusts the system clock has gotten ahead of the GPS, reports a negative NTPClockOffset and gradually returns to the previous frequency offset.

After installing 18x-LVC, the NTPClockOffset is in a much smaller improved range, from -10 70 to 25 microseconds:
Image Removed
. NTPFreqOffset is also in a much tighter range, indicating that NTP is applying smaller corrections to the system clock. GPSdiff is also much better behaved, ranging from a minimum of 0.5 to 1.1 seconds. The number of satellites tracked by the new GPS is also generally higher.
Image Added

Temperature Effects

The frequency offset shows a temperature dependence in the system clock oscillator. We do not measure the temperature inside the data system enclosure, which is at the base of the tower. The nearest temperature measurement is of the ambient air at 2 meters up the tower. The top panel in the plot below shows a time series of the air temperature, along with NTPFreqOffset, for a cool 3 day period in April, after the installation of the new GPS. It appears that when the air temperature is below 5 deg C, the system clock oscillator does not show an obvious temperature relation.

The bottom panel shows a close relationship between the NTPClockOffset and the time derivative of NTPFreqOffset, which, I believe, indicates how NTP adjusts the system clock based on the measured offset. It also enforces the obvious conclusion that we could improve the time-keeping by insulating the CPU from temperature changes.

Image Added

On a warmer 3 day period in July, where the temperatures were all above 5 degC, the temperature effect on the system clock oscillator is very evident.
Image Added

Time Offsets During File Transfers

The periodic spikes in GPSdiff_max up to 1 second that occur at 23:00 local time and last about an hour, are simultaneous with the network transfer of the day's data files from the DSM to the RAL server. These indicate increased sampling suggest that increased sample buffering and latency is happening at these times, which needs to be investigated and improved.

At these times there is also a little bump in NTPFreqOffset. It is likely that this occurs due to increased interrupt load at these times, increasing the latency of interrupt function that is called in response to the PPS interrupts. Increased latency in response to PPS interrupts should cause NTP to think that the system clock is ahead of the GPS clock, but the NTPClockOffset at these times is positive, and the slope of NTPFreqOffset is positive, indicating that NTP thinks the GPS clock is ahead. The bump shows up at cold ambient temperatures, so I don't think it is due to increased heating of the CPU card under the increased load caused by the network transfers.

As expected, the frequency offset shows a temperature dependence in the system clock oscillator. We do not have a measurement of the temperature inside the data system. The top panel in the plot below shows a time series of the ambient air temperature at 2 meters on the tower, along with the NTPFreqOffset, for a cool 3 day period in April. When the ambient air temperatures is below 5 deg C, the system clock oscillator does not show an obvious temperature relation.

The bottom panel shows a close relationship between the NTPClockOffset and the time derivative of NTPFreqOffset, indicating how NTP adjusts the clock.

Image Removed

A close-up of the file transfer on April 14, 23:00, plotted below, shows several events where NTPClockOffset first has a negative spike, indicating that NTP has determined that the GPS clock is behind the system clock and starts to slow down the system clock. These down spikes appear to be due to a delay in the response to a PPS interrupt. The interrupt latency appears to be short lived, because the NTPClockOffset becomes positive, and the system clock is re-adjusted. The April 14 transfer is shown in this plot:

Image Added

In July, the clock behaviour during the file transfer is similar, but the initial increase in NTPClockOffset and a rising slope in NTPFreqOffset might be due to increased heating of the system clock oscillator, due to increased CPU load during the file transfers. Wild conjecture? After a quick scan of the web plots of 5 minute averages, I think these positive bumps in NTPFreqOffset seem to occur during file transfers when the outside air temperatures are above 0 C, and don't occur in colder conditions.
Image Added

ppstest and ntpq

On the DSM, the ppstest program is helpful for gaining an understanding of the system and GPS clocks. It displays the system clock value when the interrupt function is called at the time of the assertion and the clear of the PPS signal. Do ctrl-C to terminate ppstest.

Code Block

root@manitou root# ppstest /dev/ttyS3
trying PPS source "/dev/ttyS3"
found PPS source #3 "serial3" on "/dev/ttyS3"
ok, found 1 source(s), now start fetching data...
source 0 - assert 1315494544.999995675, sequence: 37249847 - clear  1315494544.099998000, sequence: 37249862
source 0 - assert 1315494544.999995675, sequence: 37249847 - clear  1315494545.099995000, sequence: 37249863
source 0 - assert 1315494545.999994675, sequence: 37249848 - clear  1315494545.099995000, sequence: 37249863
source 0 - assert 1315494545.999994675, sequence: 37249848 - clear  1315494546.099993000, sequence: 37249864
source 0 - assert 1315494546.999994675, sequence: 37249849 - clear  1315494546.099993000, sequence: 37249864
ctrl-C

The above sequence shows that the system clock is behind the GPS. The system time when the interrupt function is being called on the PPS assert is 5 microseconds before the exact second (0.999995). This corresponds to a NTPClockOffset of a positive 5 microseconds. This is confirmed with the ntpq program (which reports its offset in milliseconds):

Code Block

root@manitou root# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
xral             38.229.71.1      3 u   34   64  377    0.320    3.804   0.031
 LOCAL(0)        .LOCL.          10 l  93d   64    0    0.000    0.000   0.000
oGPS_NMEA(0)     .GPS.            2 l    6   16  377    0.000    0.005   0.031

The ntpq output indicates (with the leading 'o') that NTP is using the GPS as the system's reference clock. It also displays the offset of the RAL server's clock of 3.804 milliseconds, and indicates with an 'x' that it is not using that clock as a reference. The RAL server uses NTP over a WIFI connection to adjust its clock, so it is not as accurate as the DSM.

The loopstats file also shows the 5 usec offset at this time:

Code Block

55812 54504.454 0.000005000 39.301 0.000030518 0.001408 4
55812 54520.455 0.000006000 39.302 0.000030518 0.001415 4
55812 54536.454 0.000005000 39.303 0.000030518 0.001392 4
55812 54552.454 0.000005000 39.305 0.000030518 0.001372 4

I do not believe I've seen a jitter value less than 31 microseconds. Not sure why that is. I believe the jitter is the standard deviation of the offset, but the NTP documentation is rather unclear to me.On a warmer 3 day period in July, where the temperatures were all above 5 degC, the temperature effect on the system clock oscillator is very evident.
Image Removed