SLURM-optimized parallel jobs will not under PBS out of the box.
Conversion to PBS standards is necessary. Here we provide hints on how to proceed.
...
...
@@ -6,21 +6,22 @@ Conversion to PBS standards is necessary. Here we provide hints on how to procee
It is important to notice that `mpirun` is used here as an alternative to the `srun` in SLURM. The `-n` flag is used to regulate the number of tasks spawned by the MPI. The path to the script being run by MPI must be absolute. The script rights should be set to allow execution and reading.
The PBS provides some useful variables that may be used in the jobscripts
`PBS_O_WORKDIR` and `PBS_JOBID`. For example:
The `PBS_O_WORKDIR` returns the directory, where the `qsub` command was submitted.
`PBS_O_WORKDIR` and `PBS_JOBID`. For example:
The `PBS_O_WORKDIR` returns the directory, where the `qsub` command was submitted.
The `PBS_JOBID` returns the numercal identifyer of the job.
The `qsub` always starts execution in the `$HOME` directory.
## Migrating PyTorch from SLURM
## Migrating PyTorch From SLURM
The Intel MPI provides some useful variables that may be used in the scripts executed via the MPI.
these include `PMI_RANK`,`PMI_SIZE` and `MPI_LOCALRANKID`.
these include `PMI_RANK`,`PMI_SIZE` and `MPI_LOCALRANKID`.
- The `PMI_RANK` and `MPI_LOCALRANKID` returns the process rank within the MPI_COMM_WORLD communicator - the process number
- The `PMI_SIZE` returns the process rank within the MPI_COMM_WORLD communicator - the number of processes
- The `PMI_RANK` and `MPI_LOCALRANKID` returns the process rank within the MPI_COMM_WORLD communicator - the process number
- The `PMI_SIZE` returns the process rank within the MPI_COMM_WORLD communicator - the number of processes