OVERVIEW
This example analysis was performed for a group of cam-fv assimilations
using real obs to determine whether ensemble size has an important
interaction with the BNRH algorithm.
The procedure for generating the text file summaries and table below
could (should) be streamlined for future cases.
The best solution would be to write the relevant data to dedicated files,
formatted to be read by software and plotted.
I developed several csh scripts
to manipulate the contents of the matlab_nc.out files that come from
diagnostics/matlab/plot_rmse_xxx_evolution.m.
The contents include custom output to print the "grand" statistics
(statistics averaged over the observations in the whole time span,
and averaged over the atmospheric depth if the statistics come from profiles).
There is not yet similar output from the profile Matlab script.
Both of those could or should be ported to main from my reanalysis branch.
Maybe as part of the addition of good features from the reanalysis branch;
normalized Q and GPS profiles, better vertical axes for those profiles, ...?
We're not adding csh scripts to DART, so there are just descriptions of the scripts below,
which can be used to develop scripts in a preferred language.
I've included the csh scripts as examples (merg*.csh, compare_grand.csh).
[from Kevin's Mac:~/DAI/QCEFF/]
GOAL
The goal here is to compare obs space diagnostics from 2 assimilations,
which used the same obs_seq.out file(s) as input.
The three ways to do that, which are covered here, are::
1) default/standard obs_diag
Different numbers of obs will be used to calculate the statistics for each assimilation,
determined by the QCs in each case.
2) obs_diag with trusted obs specified
Both assimilations will be evaluated against the combined assimilated and outlier obs (QC 0 and 7), which will be the same set for both.
We expect RMSEs from this to be larger than 1).
3) obs_diag run on obs_seq.finals which have been passed through obs_common_subset
This restricts the obs to those which were assimilated (and optionally evaluated) in both cases.
We expect RMSEs to be smaller than 1) because it will tend to exclude obs which were close to being outliers.
SETUP NOTES
For each case:
a) Gather the obs_seq.final files.
If a "common obs" evaluation is needed, and the ensemble sizes of the cases differ
the obs_seq.finals must be run through obs_sequence_tool to remove the members.
The matlab scripts use only the ensemble means, to the members are not needed.
Trying to keep as many members as the smaller ensemble has opens the question
of which members from the large ensemble should be excluded.
input.nml:obs_sequence_tool:
filename_seq = 'obs_seq.final.all_copies'
filename_out = 'obs_seq.no_ens'
edit_copies = .true.
new_copy_index = 1,2,3,-1
b) edit input.nml:obs_diag_nml to
+ find those files,
+ define the time span, regions, etc.
+ "trusted_obs" must be empty
+ For 3) turn off create_rank_histogram, since the members are not available for generating histograms.
c) Run obs_diag.
If you will use the profile "grand" statistics, make sure that it calculates
the averages over the number of obs, not the number of levels.
d) Run plot_rmse_xxx_evolution.m on the matlab_output.nc files from the 2 cases.
Capture the printed output in a file (matlab_nc.out below)
You can compare the pictures generated by the Matlab scripts, as usual.
For a numerical summary of the comparison, diagnostics printed by the matlab scripts are useful.
The printed output to be harvested from plot_rmse_xxx_evolution.m looks like::
region 2 ACARS_TEMPERATURE level 4 nobs_poss 5010 prior 4995 poste 0
level_grand: rmse grand pr = 1.4934; bias grand pr = -0.68356
In comparison 1) the numbers will differ, and there may be different levels
which have non-0 nobs_poss, which makes comparison trickier.
> Write a script to extract all of these lines from each matlab_nc.out file.
File comparison tools like diffuse can be used to compare the files of extracted lines.
> Optionally make the script merge the extracted lines from 2 files into a single file
so that they are grouped in a way that the cases can easily be compared.
Label the merged lines with appropriate case names.
> You might want to exclude the 'nobs_poss 0' obs, but that can make comparison
and grouping trickier.
Example output text files
The merged file could look like (for cases "80 members" and "40 members")::
region 2 ACARS_TEMPERATURE level 4 nobs_poss 5010 prior 4995 poste 0
80; trusted: rmse grand pr = 1.4934; bias grand pr = -0.68356
40; trusted: rmse grand pr = 1.5207; bias grand pr = -0.75158
... more obs
The tar file `comparison_files_DIME.tgz` has examples of merged files for standard, common, and trusted obs diagnostics.
It also has comparisons between cases with 80 and 40 members. The contents are:
80 member
Standard
Fxd_infl_NTrS_2019.12.30-2020.1.1H0_s0/matlab_nc.grand.bias
Compare to WACCM/Diags_NTrS_2019.12.30-2020.1.1H0_s0/matlab_nc.grand.bias
There's no file of merged differences.
Trusted
Fxd_infl_NTrS_2019.12.30-2020.1.1H0_trusted/bias_rmse_80_v_40
Common
Fxd_infl_NTrS_2019.12.30-2020.1.1H0_common/bias_rmse_80_v_40.merged
Comparison of Trusted and Common diagnostics for both 80 and 40 member cases
Fxd_infl_NTrS_2019.12.30-2020.1.1H0_trusted/bias_rmse_80_v_40.merg_merged
40 member
Standard
WACCM/Diags_NTrS_2019.12.30-2020.1.1H0_s0/matlab_nc.grand.bias
Numbers of obs used in the diagnostics
Nused numbers in the obs space Matlab pictures (and here) may be different from the number assimilated
because trusted obs diagnostics include QC = 7, while standard and common diagnostics do not.
Type | QC | 80_trusted | 40_trusted | diff | common |
---|---|---|---|---|---|
GPS | 0 | 364410 | 361025 | -3385 | 359214 |
7 | 19247 | 22633 | 3386 | ||
RAD_T | 0 | 89822 | 89344 | -478 | 89034 |
7 | 3134 | 3658 | 524 | ||
SAT_U | 0 | 651359 | 650669 | -700 | 65272 (V is 651380 !) |
7 | 6894 | 7675 | 781 | ||
AIRS_T | 0 | 59582 | 59541 | -41 | 59492 |
7 | 374 | 433 | 59 | ||
4 | 57584 | 57566 | -18 | ||
ACAR_U | 0 | 793251 | 791140 | -2111 | 789212 |
7 | 12025 | 14137 | 2112 |