Skip to content
Snippets Groups Projects
Commit f09f7a5d authored by Roman Sliva's avatar Roman Sliva
Browse files

Update slurm-job-submission-and-execution.md

parent 7c5d5a20
No related branches found
No related tags found
No related merge requests found
Pipeline #33207 passed with warnings
...@@ -12,19 +12,17 @@ A `man` page exists for all Slurm commands, as well as `--help` command option, ...@@ -12,19 +12,17 @@ A `man` page exists for all Slurm commands, as well as `--help` command option,
### Quick Overview of Common Commands ### Quick Overview of Common Commands
| Command | Explanation | | Command | Explanation |
| :-----: | :-------------------------------------------------------------------------------------------------------------------------- | | :------: | :-------------------------------------------------------------------------------------------------------------------------- |
| sinfo | View information about nodes and partitions. | | sinfo | View information about nodes and partitions. |
| squeue | View information about jobs located in the scheduling queue. | | squeue | View information about jobs located in the scheduling queue. |
| sacct | Display accounting data for all jobs and job steps in the job accounting log or Slurm database. | | sacct | Display accounting data for all jobs and job steps in the job accounting log or Slurm database. |
| | | scontrol | View or modify jobs, nodes, partitions, reservations, and other Slurm objects. |
| salloc | Obtain a job allocation (a set of nodes), execute a command, and then release the allocation when the command is finished. | | |
| sattach | Attach to a job step. | | sbatch | Submit a batch script to Slurm. |
| sbatch | Submit a batch script to Slurm. | | salloc | Run an interactive job. |
| sbcast | Transmit a file to the nodes allocated to a job. | | srun | Run parallel tasks. |
| scancel | Used to signal jobs or job steps that are under the control of Slurm. | | scancel | Cancel job. |
|
| srun | Run parallel jobs. |
### Job Submission Options ### Job Submission Options
...@@ -54,11 +52,11 @@ To define Slurm job options within the batch script, use `SBATCH` keyword follow ...@@ -54,11 +52,11 @@ To define Slurm job options within the batch script, use `SBATCH` keyword follow
```shell ```shell
#SBATCH -A OPEN-00-00 #SBATCH -A OPEN-00-00
#SBATCH -p p03-amd #SBATCH -p qcpu
#SBATCH -n 4 #SBATCH -n 4
``` ```
Here we asked for 4 tasks in total to be executed on partition p03-amd using OPEN-00-00's project resources. Here we asked for 4 tasks in total to be executed on partition qcpu using OPEN-00-00's project resources.
Job instructions should contain everything you'd like your job to do; that is, every single command the job is supposed to execute: Job instructions should contain everything you'd like your job to do; that is, every single command the job is supposed to execute:
...@@ -73,7 +71,7 @@ Combined together, the previous examples make up a following script: ...@@ -73,7 +71,7 @@ Combined together, the previous examples make up a following script:
```shell ```shell
#!/usr/bin/bash #!/usr/bin/bash
#SBATCH -A OPEN-00-00 #SBATCH -A OPEN-00-00
#SBATCH -p p03-amd #SBATCH -p qcpu
#SBATCH -n 4 #SBATCH -n 4
ml OpenMPI/4.1.4-GCC-11.3.0 ml OpenMPI/4.1.4-GCC-11.3.0
...@@ -92,8 +90,8 @@ we get an output file with the following contents: ...@@ -92,8 +90,8 @@ we get an output file with the following contents:
```console ```console
$ cat slurm-1511.out $ cat slurm-1511.out
1 p03-amd01.cs.it4i.cz 1 cn1.barbora.it4i.cz
3 p03-amd02.cs.it4i.cz 3 cn2.barbora.it4i.cz
``` ```
Notice that Slurm spread our job across 2 different nodes; by default, Slurm selects the number of nodes to minimize wait time before job execution. However, sometimes you may want to restrict your job to only a certain minimum or maximum number of nodes (or both). You may also require more time for your calculation to finish than the default allocated time. For an overview of such job options, see table below. Notice that Slurm spread our job across 2 different nodes; by default, Slurm selects the number of nodes to minimize wait time before job execution. However, sometimes you may want to restrict your job to only a certain minimum or maximum number of nodes (or both). You may also require more time for your calculation to finish than the default allocated time. For an overview of such job options, see table below.
...@@ -138,7 +136,7 @@ The recommended way to run production jobs is to change to the `/scratch` direct ...@@ -138,7 +136,7 @@ The recommended way to run production jobs is to change to the `/scratch` direct
#!/bin/bash #!/bin/bash
#SBATCH -J job_example #SBATCH -J job_example
#SBATCH -A OPEN-00-00 #SBATCH -A OPEN-00-00
#SBATCH -p p03-amd #SBATCH -p qcpu
#SBATCH -n 4 #SBATCH -n 4
cd $SLURM_SUBMIT_DIR cd $SLURM_SUBMIT_DIR
...@@ -382,19 +380,27 @@ p04-edge up 1-00:00:00 1 idle p04-edge01 ...@@ -382,19 +380,27 @@ p04-edge up 1-00:00:00 1 idle p04-edge01
p05-synt up 1-00:00:00 1 idle p05-synt01 p05-synt up 1-00:00:00 1 idle p05-synt01
``` ```
Here we can see output of the `sinfo` command ran on the Complementary System. By default, it shows basic node and partition configurations. Here we can see output of the `sinfo` command ran on Barbora cluster. By default, it shows basic node and partition configurations.
To view partition summary information, use `sinfo -s`, or `sinfo --summarize`: To view partition summary information, use `sinfo -s`, or `sinfo --summarize`:
```console ```console
$ sinfo -s $ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
p00-arm up 1-00:00:00 0/1/0/1 p00-arm01 qcpu* up 2-00:00:00 0/192/0/192 cn[1-192]
p01-arm* up 1-00:00:00 0/8/0/8 p01-arm[01-08] qcpu_biz up 2-00:00:00 0/192/0/192 cn[1-192]
p02-intel up 1-00:00:00 0/2/0/2 p02-intel[01-02] qcpu_exp up 1:00:00 0/192/0/192 cn[1-192]
p03-amd up 1-00:00:00 0/2/0/2 p03-amd[01-02] qcpu_free up 18:00:00 0/192/0/192 cn[1-192]
p04-edge up 1-00:00:00 0/1/0/1 p04-edge01 qcpu_long up 6-00:00:00 0/192/0/192 cn[1-192]
p05-synt up 1-00:00:00 0/1/0/1 p05-synt01 qcpu_preempt up 12:00:00 0/192/0/192 cn[1-192]
qgpu up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_biz up 2-00:00:00 0/8/0/8 cn[193-200]
qgpu_exp up 1:00:00 0/8/0/8 cn[193-200]
qgpu_free up 18:00:00 0/8/0/8 cn[193-200]
qgpu_preempt up 12:00:00 0/8/0/8 cn[193-200]
qfat up 2-00:00:00 0/1/0/1 cn201
qdgx up 2-00:00:00 0/1/0/1 cn202
qviz up 8:00:00 0/2/0/2 vizserv[1-2]
``` ```
This lists only a partition state summary with no dedicated column for partition state. Instead, it is summarized in the `NODES(A/I/O/T)` column, where the `A/I/O/T` stands for `allocated/idle/other/total`. This lists only a partition state summary with no dedicated column for partition state. Instead, it is summarized in the `NODES(A/I/O/T)` column, where the `A/I/O/T` stands for `allocated/idle/other/total`.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment