problems with cpl_map_mod
Part of CCSM on BlueGene Project.
One of the first obstacles in running cpl6 at high processor counts and resolutions was the amount of memory needed to initialize a mapping.
The old algorithm was to read in all the weights on node0 and then scatter them. This required to much memory on node 0.
The new algorithm (from Tony) reads in pieces and broadcasts them to each node. Each node then picks out what it needs.
- Ray started with a cpl_map_mod patched by Jon to replace npfix with npFixNone. This ran fine with T42_gx1v3 on up to 508 processors.
- Ray noted mapping init problems at 1024 with T42_gx1v3
- Tony re-wrote cpl_map_read in cpl_map_mod.F90
- Ray took Tony's version which did not have Jon's npFixNone. This version did not work even on processor counts that previously worked (like 512).
- Tony verified a problem with 32 procs (See http://bugs.cgd.ucar.edu/show_bug.cgi?id=312) or more on bluevista related to the cpl_map_npFix3R subroutine.
- Tony rewrote cpl_map_npFix3R and now cpl_map_mod, with the new cpl_map_read and the new cpl_map_npFix4R, works on BlueGene. At least the high-processor-count coupler can get past the mapping init on the 1024 case.
Question: did the cpl_map_mod with the new cpl_map_read but the old cpl_map_npFix3R work fine on less than 32 procs on BlueGene and other platforms? (Possibly not due to the temporary npMapNone patch.)