There is a need for metadata to help perform data discovery. There are two approaches I've considered to date:
- metadata at the per variable level saying "this is the one" you are looking for. This would require scanning the list of variables first.
- global metadata which points you to the variables.
This is a standard which should be DataSource type independent (i.e. ICARTT, netCDF, HDF5, etc); any file standard which supports global attributes or any level of sophistication in it's header.
There are several reasons for the need for data discovery:
- Variable names can be cryptic.
- There can be multiple measurements of the same type.
- Automation - Software which wants to find its way into a file.
e.g. From an NCAR Aircraft you will have the following Latitudes to choose from:
Var Name | Source |
GGLAT | GPS |
LAT | IRU |
LATC | Blended |
CLAT | CMIGITS III |
and the following redundant ambient temperature measurements:
- ATHR1, ATHR2, ATFR
To that end we have defined 2 global metadata attributes for our netCDF files. One for the aircraft position or coordinate variables and a second to identify the wind field variables.
:coordinates = "LONC LATC GALT Time" ;
:wind_field = "WSC WDC WIC" ;
This probably needs some work. For example I should move from space to comma separation. Possibly add a prefix (namespace).
:reference:coordinates = "LONC,LATC,GGALT,Time" ;
:reference:navigation = "PITCH,ROLL,THDG,VEW,VNS,TAS,IAS"
:reference:wind_field = "WSC,WDC,WIC" ; // Or should this be the vector UIC & VIC
:reference:thermodynamic = "PSXC,ATX,DPXC"
Jon Caron and Ethan Davis of Unidata made a couple passes at conventions for observational data including data discovery.
Unidata Observation Conventions (Draft)