CPU load analysis
User tasks may request resources, but during execution the fully requested processors are not actually used by the JOBS. Several reasons can explain this behavior:
(a) Errors in SLURM batch files.
(b) The code/application is not well parallelized.
(c) Local additional memory requirements per processor.
(d) The user intends to run a sequential code.
For all these cases, the user should check their JOB performance using a system utility called "jobcpuload". This utility reports the average CPU usage (CPULoad) and the available CPU capacity (CPUTot) of the nodes per JOB. If CPULoad reaches the value of CPUTot, the JOB uses almost all processors on the node; however, if for some reason CPULoad is lower than CPUTot, the JOB is unbalanced and the user's JOBS waste CPU time and resources on EXPLOR.
Therefore, users should maximize the CPULoad of their runs to avoid submitting many unbalanced JOBS that do not use all allocated resources. In cases where the code shows poor parallel performance, the user should bundle multiple tasks into a single SLURM batch run rather than submitting many JOBS that allocate few processors.
