Appendix B. CCSM Datasets
During the course of an integration the CCSM produces three distinct
output data streams: printed, restart and history data. After a CCSM
run finishes, the raw history data is postprocessed into more useful
collections referred to as postprocessed history data.
|Description | Volume | Data Format/Convention |a. Input Initial/Boundary data |(small) |NetCDF or raw binary |b. Output Printed output |(small) |Plain text files |c. Output Restart data |(small) |Raw binary |d. Output Raw history data |(large) |NetCDF compliant with CF convention |e. Postprocessed History data |(large) |NetCDF/CF, JPG images, HTML pages
Types of CCSM output data
a. Input Initial and Boundary Condition Data
New CCSM runs are typically started using initial data sets that represent a
known or idealized climate state for each CCSM component. Boundary
condition files may also be used to prescribe time varying states of
nonprognostic variables, such as the annual cycle of ozone in the atmosphere
or emission profiles for future climate change scenarios.
b. Printed Output
The printed output contains diagnostic messages written by the various CCSM
components during the course of a run. This includes a printed log file for
the entire system as well printed log files from each of the CCSM
components. The printed output primary importance is for archiving details
about the model run, how long it ran and when it stopped and restarted.
While the printed output contains little information useful for detailed
model diagnostics, it provides a convenient method for displaying "quick
look" diagnostics.
c. Output Restart Data
The CCSM restart data sets are raw binary files containing sufficient
information for the CCSM to restart exactly. Restart data is usually output
at monthly, half year or yearly intervals. As the integration progresses,
most old restart data are deleted to save disk space. The usual practise is
to retain restart data at decadal intervals.
d. Output Raw History Data
The raw history data contain the model data from each component of the CCSM.
The history data consists of gridpoint representation of the three
dimensional (latitude, longitude, time) and four dimensional (latitude,
longitude, height/depth, time) model fields. These fields include such
variables as surface temperature, precipitation, and ocean salinity. Output
frequencies can range from minutes to months or years and the data can
represent either instantaneous values or averages over the output period.
In total, several hundred fields are output by the CCSM components.
e. Postprocessed history data
Postprocessed history data is the most useful data product from the CCSM.
The CCSM should be viewed as a collection of distinct models optimized for
very high speed multi-processor computing. This results in raw output data
streams from each component which do not present the data in the most
coordinated or user-friendly manner. While raw history data can be
analyzed, the raw data package have not allowed for easy time-series
analysis. For example, the atmosphere dumps all the requested variables
into one large file at each requested output period. While this allows for
very fast model execution, this makes it impossible to analysis time-series
of individual variables without having to access the entire data volume.
The process of transforming the raw CCSM history output into data
collections more useful for analysis is called postprocessing. The
post-processing step takes raw model data and repackages it into more useful
collections. This step may involve reformatting the data, deriving new
fields from the set of existing data, making averages along any or all of
the data dimensions or sampling the data in different ways. Postprocessed
history datasets are the actual CCSM "product". These postprocessed
datasets should be made very visible and easily assessable to the worldwide
scientific community.
The CCSM data most used by the greater community will be the post-processed
data collections. Therefore, effort should be made to coordinate the
preparation and presentation of these data across the different components.
The overall strategy in this area is to put sufficient information in the
raw data files to allow permit automatic generation of all CCSM
postprocessed products.
f.CCSM Data Output Volume
For the generic CCSM2.0 release, the total data volume is 7.5 Gigabytes per
simulated year. This is composed of 40% restart files and 60% history
files. Restart files are used only to restart or initialize subsequent
runs. History files are used for analysis. Broken out by components, the
relative sizes of the history and restart output data are:
{table:border="0"|width="60%"}<caption>CCSM Data Volumes, T42L18 atm, gx3 ocn, gx3 ice, T42 lnd, cpl</caption> <tbody> {tr}{th:align="left"}Component{th}{th:colspan=1|align="left"}History{th}{th:colspan=1|align="left"}Restart{th}{th:colspan=1|align="left"}Total{th}{tr}{tr}{td}ocn{td}{td}2182MB{td}{td}1999MB{td}{td}4181MB{td}{tr}{tr}{td}cpl{td}{td}1241MB{td}{td} 233MB{td}{td}1474MB{td}{tr}{tr}{td}atm{td}{td} 588MB{td}{td} 250MB{td}{td} 838MB{td}{tr}{tr}{td}ice{td}{td} 413MB{td}{td} 220MB{td}{td} 633MB{td}{tr}{tr}{td}lnd{td}{td} 123MB{td}{td} 277MB{td}{td} 400MB{td}{tr}{tr}{td}*total*{td}{td}*4547MB*{td}{td}*2979MB*{td}{td}*7526MB*{td}{tr}{table} <tt> {table:border="0"|width="60%"}<caption>CCSM Data Volumes, T85L26 atm, gx3 ocn, gx3 ice, T85 lnd, cpl</caption> <tbody> {tr}{th:align="left"}Component{th}{th:colspan=1|align="left"}History{th}{th:colspan=1|align="left"}Restart{th}{th:colspan=1|align="left"}Total{th}{tr}{tr}{td}ocn{td}{td}2448MB{td}{td}1201MB{td}{td}3649MB{td}{tr}{tr}{td}cpl{td}{td} 714MB{td}{td} 142MB{td}{td} 856MB{td}{tr}{tr}{td}atm{td}{td}1543MB{td}{td} 490MB{td}{td}2033MB{td}{tr}{tr}{td}ice{td}{td} 448MB{td}{td} 120MB{td}{td} 568MB{td}{tr}{tr}{td}lnd{td}{td} 281MB{td}{td} 48MB{td}{td} 329MB{td}{tr}{tr}{td}*total*{td}{td}*5436MB*{td}{td}*2001MB*{td}{td}*6437MB*{td}{tr}{table}
Changing model resolution, output frequency or output fields will result in
changes in the output data volume.