14 lines
624 B
Plaintext
14 lines
624 B
Plaintext
mpirun: Forwarding signal 18 to job
|
|
--------------------------------------------------------------------------
|
|
ORTE has lost communication with a remote daemon.
|
|
|
|
HNP daemon : [[44229,0],0] on node compute03
|
|
Remote daemon: [[44229,0],3] on node compute06
|
|
|
|
This is usually due to either a failure of the TCP network
|
|
connection to the node, or possibly an internal failure of
|
|
the daemon itself. We cannot recover from this failure, and
|
|
therefore will terminate the job.
|
|
--------------------------------------------------------------------------
|
|
slurmstepd: *** JOB 29973 ON compute03 CANCELLED AT 2022-01-29T14:55:37 ***
|