CCSM/ESMF joint meeting teleconference
Date: Friday January 12th, 1:30PM MST
Invited: Cecelia, Sylvia,Gerhard, Samson, Peggy, Erik, and Mariana
Call 1-800-516-9896 (join with code 888509 #)
Action items from this meeting:
- Erik update requests with ticket numbers – done
- Erik deliver code to ESMF
- Erik characterize the Open-MP problem more carefully
- Erik test if Open-MP problem shows up with CAM only using ESMF time-manager – done
- Mariana/Brian E. prepare survey (give to ESMF folks beforehand)
- Erik do performance tests
- Mariana – develop Stage-2 evaluation plan in 2 weeks
- ESMF-person (Samson?), do Stage-2 work with Erik on consulting basis
Content of meeting:
- Review last meeting
- Stage-1 completed
- ESMF support requests status
- Evaluation tasks to perform
- Stage-2 tasks
- Who does what?
1.) Review last meeting:
- Grid coordinates issues resolved
- Build issues resolved
- Darwin working
2.) Stage-1 completed
Good news: - Answers same to roundoff (would like to see if we can get it bit-for-bit)
- On branch – up to date with cam3_3_38 (cam is now on cam3_3_46)
- Test suite runs on: tempest (IRIX64-SGI), bluesky (AIX-IBM-XLF90), and bangkok (Linux-lahey) (this is with the main CAM physics Open-MP loop turned off) (does test different modes, pure-MPI, pure-OMP, hybrid)
https://svn-ccsm-models.cgd.ucar.edu/cam1/branches/esmfStageI_cam3_3_15/
Bad news: - Main CAM OpenMP loops – core dumps on bluevista in pure-OMP mode with a seg-fault – unless disabled (This is a critical issue that we want to resolve before releasing the code. We also don't know if this is a ESMF issue or a issue in my user-code superstructure interface layer that uses ESMF)
- Remove as much cam specific stuff from mrg_x2a_esmf merge component (will move this to the CAM component)
- Possible memory corruption issue (see this on bluedawn – but not other platforms)
- Some small issues still being resolved with two test cases (one on tempest and one on bangkok)
- Need to get on phoenix
- Possible performance tuning if performance issues are found
3.) ESMF support request status
There's been an absolute ton of requests that you've already implimented. And tons of effort into improving ESMF robustness and ease of use and developing ESMF_2_2_2r for us. Peggy even found a bug in my code! We want to acknowledge and appreciate that effort.
As such we don't want you to mess with ESMF_2_2_2r unless absolutely necessary – but we do have a list of important issues that we think are important for the future of ESMF. Most of this I'd like to see in ESMF_3 in the future.Description of categories:
- Critical – must be done or we can't accomplish the task.
- Important – CCSM can live without it – but it's an issue of ESMF robustness, flexibility, and user friendliness.
- Nice to have – Won't impact evaluations. Longterm non-critical, issues that do improve ESMF user friendliness.
Critical for Stage-1 Evaluation with ESMF_2_2_2r
- Use with totalview on tempest and bluedawn. SCD is installing new versions of totalview. Now works on bluedawn. (#1597138)
Critical for Stage-2 (Summer 2007) with ESMF_3_x
- Ability to only use regrid on Ocean data. I think this is done with the SparseMatrix multiply. #907930
Important issues for Stage-2 (Summer 2007) with ESMF_3_x
- Global grid coordinates: Need ability to have components take a grid from an import state and create a different decomposition from it. (required for coupler, lightweight and merge components)(Originally requested Nov 14th). I had to have my merge component use cam specific lat/long data to construct a grid – rather than getting grid coordinates from the input ESMF import state. I need an easy way to create a grid on a different decomposition from ESMF. #1388148 (internal ESMF ticket) and #1192688 (generated by CCSM)
- RedistStore check for incompatable grids, #1610955
- Ability to get description of decomposition on a grid. #1647468
- A no-extrapolate option so bad grids abort when global grids aren't matched up correctly on a redist or a regrid. #916798
- Check for duplicated names of fields on states, #1576877
- GridCompInitialize requres clock and import/export state in user code, but not in calling interface. #1582929
Nice to have doesn't effect evaluations – (For ESMF_3)
- Ability to add time-objects to states, sent April 13, 2006 (may be upped in priority) #1634441
- ESMF logical use FORTRAN logical on FORTRAN side, #1194557
- Send a global grid to all processors. #1195480
- Remove extra garbage put on names of states #1622467
- SGI linking using -lffio #1627368
- Ability to have a _FillValue for regrid or redist #916794
- gatherv/scatterv communication, #1597180
- Documentation on ESMF_GridDistribute #1191849
Important/Possibly Critical – Stage-4 Concurrent CCSM (late 2008?) – (For ESMF_4)
- Sends and receives on different timesteps. The ability to have gets/puts takes care of this problem. This may also be resolved beforehand in Stage-3. #1230616
Important/Possibly Critical – LONGTERM (several years out) – (For ESMF_4/5/6?)
- Ability to have uneven halo regions so we can use pointers to ESMF states to FV dycore data, when dynamics and physics are split into separate ESMF components. #872566
4.) Evaluation tasks to perform
- Use with totalview on tempest and bluedawn. SCD is installing new versions of totalview. Now works on bluedawn. (#1597138)
- Test memory usage (within 20%)
- Test performance for matrix of resolutions/PE's on bluesky and phoenix (within 5%)
- Release code to public – prepare surveys (this will be done after performance assessment done). Brian Eaton will be involved in creating the survey. Survey will be given to ESMF folks beforeh and.
- If pass – update and move to CAM trunk, add system tests to ensure still works. Fix SCAM and CCSM/concurrent modes to work.
- If issues – ESMF gets 3-months to respond
5.) Stage-2/3 tasks - Move couplers to redist/regrid instead of copy/redist
- Put cpl6 code and logic into couplers/mergers
- Averaging over time-steps for model components (how this will be done is under development)
6.) Who does what? - Erik move to part-time status on ESMF work (75% ESMF until Stage-1 done)
- Erik deliver Stage-1 code and performance testing
- Samson do the Stage-2 work, consult/meet with Erik as needed