APE CAM/HOMME Benchmark Results

Benchmark results with CAM 3.5 physics

  • https://svn-ccsm-models.cgd.ucar.edu/cam1/branches/homme_cam3_5_29
  • Resolution:
    • Physics dtime = 30min. Subcycled dynamics.
    • NE=16 26 Levels (1.9 degree) dt=3min (1536 elements, 13826 columns)
    • NE=21 26 Levels (1.43 degree) dt=2min (2646 elements, 24K columns) 252 points along equator (~T85)
    • NE=30 26 Levels (1.00 degree) dt=1.5 min (5400 elements, 49K columns)
    • NE=60 26 Levels (.50 degree) dt=45s (21600 elements, 194K columns)
    • NE=120 26 Levels (.25 degree) dt=20s (86400 elements, 778K columns)
  • Model was run for NCPU/3 days (about 10 min runs). SYPD taken from stdout, best out of two simulations.
  • qneg3.F90: tolerence changed to 1e-3 to avoid QNEG3 messages to stdout.
  • most I/O disabled
  • namelists: drv_in atm_in
  • Using standard CAM datasets, except for a symmetric ozone dataset (APE.o3_64x1_L59_c060929.nc) and two grid dependent datasets (created from EUL T85 datasets via interpic_new tool):
    • AerosolMass_V_ne21np4_clim_c070730.nc
    • cami_0000-01-01_ne21np4_L26_ape_c071210.nc
  • All datasets are now installed at NCAR (but namelists above need to be updated for NCAR names/locations)

Results from LLNL IBM BG/L

Each curve shows the fixed mesh scalability

This is the same data as plotted above, except the x-axis is changed to elements per processor, to show that each resolution has good scalability down to 1 element per processor.

Results from Sandia/Thunderbird

Linux cluster, 4480 nodes, dual 3.6 GHz Intel EM64T processors. These results are from the older homme_cam3_5_23 branch.

1.9 degree NE=16 1536 elements

NCPU

Elem/CPU

SYPD

24

64

4.53

48

32

9.45

96

16

20.0

192

8

38.3

384

4

70.6

768

2

118

1.4 degree NE=21 2646 elements

NCPU

Elem/CPU

SYPD

18

147

1.77

42

63

4.11

112

24

11.2

222

12

20.2

442

6

40.1

882

3

71.1

  • No labels