4.7 CCSM data and metadata requirements
Standard data and metadata formats are essential for the automated analysis
necessary to efficiently interact with large data collections.
In its broadest sense, metadata are simply 'structured data about data'
describing important attributes of an information resource. Scientific
metadata examples include descriptions of telescope images or the header
files describing gridded CCSM model output. Metadata can be conceptually
classed into two general types, discovery and use. Discovery metadata
addresses the information necessary to find a data collection and determine
it's availability and appropriateness for the intended application. Use
metadata provides the technical information necessary to actually use the
data in the collection. Of the two types, use metadata are more mature due
to the creators and consumers of geodata converging in the last decade to a
modest number of data storage formats containing reasonably well defined
data descriptions. Discovery metadata has only recently become an issue as
operational and science centers have begun to move from static, in-house,
data archives to dynamic online data services.
1. NetCDF and the CF convention
CCSM selected netCDF as the standard data format for CCSM related
datasets. All CCSM models either create netCDF history files or provide a
filter to convert files into netCDF. The use of netCDF makes CCSM output
data readily accessible to a variety of existing graphics and analysis
packages. In addition, CCSM3.0 uses the CF1.0 netCDF meta-data convention,
which is designed for the representation of gridded geophysical data. CF1.0
is based on, and very similar to, the COARDS Conventions.
While the cost of switching formats is high, this decision should be
periodically re-evaluated in light of changing CCSM needs, data storage
costs and the emergence of new data formats. For example, NetCDF lacks both
a multitreaded output capability and a good compression method. What are
the criteria for deciding when it is worth switching to a new format?
CCSM3 NetCDF datasets comply with the Climate and Forecast (CF) metadata conventions. The convention is designed for the representation of gridded geophysical data.
The CCSM NetCDF convention follows the COARDS conventions, with a few exceptions and additions to meet CCSM requirements. Translations of CF metadata into other metadata conventions such as Dublin Core, ISO and FGDC metadata standards will be pursued through the Community Data Portal and Earth System Grid collaborations.
2. Case and File naming conventions
The CDMG also has Case and File Name conventions to help keep track
of the numerous simulations and their output data.
The CCSM case naming conventions are outlined in the web page:
http://www.cgd.ucar.edu/csm/experiments/csm1/names.html
The CCSM file naming conventions are outlined in the web page:
http://www.cgd.ucar.edu/~njn01/ccsm/draft.html
Recommendation
The CDMG should include an automated system to assure compliance with the CF metadata standard as part of its quality control process.
The CDMG should propose adoption of the (evolving) CCSM3.0 file naming conventions as outlined in the URL above for postprocessed
datasets.
Possible Supporting Policy:
CCSM output history data will be in NetCDF format and fully compliant with the CF metadata convention.