mpirun: Forwarding signal 18 to job -------------------------------------------------------------------------- ORTE has lost communication with a remote daemon. HNP daemon : [[44229,0],0] on node compute03 Remote daemon: [[44229,0],3] on node compute06 This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate the job. -------------------------------------------------------------------------- slurmstepd: *** JOB 29973 ON compute03 CANCELLED AT 2022-01-29T14:55:37 ***