CPULoad Analysis - jobcpuload

The user tasks’ may request some resources, but during the execution time, the fully demanded CPUs are not being used by the JOBS. There could be many reasons to explain this behavior:

(a) Mistakes in the SLURM batch files.

(b) The code/application is not well parallelized.

(c) Local needs for more memory per CPU.

(d) The user wants to run a serial code.

For all these events, the user should check the performance of their JOBS using a system function called « jobcpuload ». This function reports the averaged CPU usage (CPULoad) and the available CPU capability (CPUTot) of the nodes per JOB. If CPULoad reaches the CPUTot value the JOB is using almost all the CPUs in the node; however, for any reason the CPULoad is less than the CPUTot the JOB is unbalanced and the user JOBs are wasting CPU time and resources in EXPLOR.

Hence, the users should maximize the CPULoad of their executions to avoid running many unbalanced JOBs that don’t use all the allocated resources. In cases when the code has poor parallel performance, the user must combine multiple tasks into one SLURM batch execution rather than submitting many JOBS that allocate few processors.

Screenshot1