Attempts to run CCSM with all dead models on Argonne's BlueGene.

Goal is to test cpl6 scalability.
Updated 2/21/07
Part of CCSM on BlueGene project
Configuration is 1 processor for each dead model and varying coupler processors. So "32" means 28 coupler processors and 1 each for atm,ocn,ice and land dead models.
Data from Ray Loy
Note: Dead models do not have the decomposition restrictions of full models (e.g. T42 can run on more than 128 processors)

T42_gx1v3
 CO up to 512 and VN up to 512 runs.
 1024 VN with old cpl_map_mod:  Runs out of memory in mapping init
 1024 VN with new cpl_map_mod:  runs out of memory in main loop
 1024 CO Runs out of memory during an Av init
 2048 VN (with mph changes) runs out of memory in mapping init

T62_gx1v3

co mode runs OK with 32,64,96,128,256,512,1024 procs
vn mode runs OK with cpl procs 32,64,96,128,256,512 procs
 1024 case exits after reaching cpl_domain_compare
 2048 case still in cpl_comm_init after 20 minutes (killed for time)

T85_gx1v3

CO mode runs OK with 32,64,128,256,512 procs
            Does not complete: 1024
  1024 makes it to start of main integration then runs
  out of memory while allocating an Av somewhere.
VN mode runs OK with 32,64,128,256,512 procs
  1024:  does not complete (MORE INFO NEEDED)
  2048:  runs out of memory in mapping init.

T340_gx1v3

2 cpl (6 total):  dies before/during bundle initialization in Xc2a
28 cpl (32 total):   dies before/during bundle init in Xc2i

28 cpl gets further then 2 before dying.

  • No labels