From f09f7a5d9f909ec37adead077c1ee6b87e045ff6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Roman=20Sl=C3=ADva?= <roman.sliva@vsb.cz> Date: Tue, 18 Jul 2023 13:16:50 +0200 Subject: [PATCH] Update slurm-job-submission-and-execution.md --- .../slurm-job-submission-and-execution.md | 60 ++++++++++--------- 1 file changed, 33 insertions(+), 27 deletions(-) diff --git a/docs.it4i/general/slurm-job-submission-and-execution.md b/docs.it4i/general/slurm-job-submission-and-execution.md index 91d4f3853..acf00fe3f 100644 --- a/docs.it4i/general/slurm-job-submission-and-execution.md +++ b/docs.it4i/general/slurm-job-submission-and-execution.md @@ -12,19 +12,17 @@ A `man` page exists for all Slurm commands, as well as `--help` command option, ### Quick Overview of Common Commands -| Command | Explanation | -| :-----: | :-------------------------------------------------------------------------------------------------------------------------- | -| sinfo | View information about nodes and partitions. | -| squeue | View information about jobs located in the scheduling queue. | -| sacct | Display accounting data for all jobs and job steps in the job accounting log or Slurm database. | -| | -| salloc | Obtain a job allocation (a set of nodes), execute a command, and then release the allocation when the command is finished. | -| sattach | Attach to a job step. | -| sbatch | Submit a batch script to Slurm. | -| sbcast | Transmit a file to the nodes allocated to a job. | -| scancel | Used to signal jobs or job steps that are under the control of Slurm. | -| -| srun | Run parallel jobs. | +| Command | Explanation | +| :------: | :-------------------------------------------------------------------------------------------------------------------------- | +| sinfo | View information about nodes and partitions. | +| squeue | View information about jobs located in the scheduling queue. | +| sacct | Display accounting data for all jobs and job steps in the job accounting log or Slurm database. | +| scontrol | View or modify jobs, nodes, partitions, reservations, and other Slurm objects. | +| | +| sbatch | Submit a batch script to Slurm. | +| salloc | Run an interactive job. | +| srun | Run parallel tasks. | +| scancel | Cancel job. | ### Job Submission Options @@ -54,11 +52,11 @@ To define Slurm job options within the batch script, use `SBATCH` keyword follow ```shell #SBATCH -A OPEN-00-00 -#SBATCH -p p03-amd +#SBATCH -p qcpu #SBATCH -n 4 ``` -Here we asked for 4 tasks in total to be executed on partition p03-amd using OPEN-00-00's project resources. +Here we asked for 4 tasks in total to be executed on partition qcpu using OPEN-00-00's project resources. Job instructions should contain everything you'd like your job to do; that is, every single command the job is supposed to execute: @@ -73,7 +71,7 @@ Combined together, the previous examples make up a following script: ```shell #!/usr/bin/bash #SBATCH -A OPEN-00-00 -#SBATCH -p p03-amd +#SBATCH -p qcpu #SBATCH -n 4 ml OpenMPI/4.1.4-GCC-11.3.0 @@ -92,8 +90,8 @@ we get an output file with the following contents: ```console $ cat slurm-1511.out - 1 p03-amd01.cs.it4i.cz - 3 p03-amd02.cs.it4i.cz + 1 cn1.barbora.it4i.cz + 3 cn2.barbora.it4i.cz ``` Notice that Slurm spread our job across 2 different nodes; by default, Slurm selects the number of nodes to minimize wait time before job execution. However, sometimes you may want to restrict your job to only a certain minimum or maximum number of nodes (or both). You may also require more time for your calculation to finish than the default allocated time. For an overview of such job options, see table below. @@ -138,7 +136,7 @@ The recommended way to run production jobs is to change to the `/scratch` direct #!/bin/bash #SBATCH -J job_example #SBATCH -A OPEN-00-00 -#SBATCH -p p03-amd +#SBATCH -p qcpu #SBATCH -n 4 cd $SLURM_SUBMIT_DIR @@ -382,19 +380,27 @@ p04-edge up 1-00:00:00 1 idle p04-edge01 p05-synt up 1-00:00:00 1 idle p05-synt01 ``` -Here we can see output of the `sinfo` command ran on the Complementary System. By default, it shows basic node and partition configurations. +Here we can see output of the `sinfo` command ran on Barbora cluster. By default, it shows basic node and partition configurations. To view partition summary information, use `sinfo -s`, or `sinfo --summarize`: ```console $ sinfo -s -PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST -p00-arm up 1-00:00:00 0/1/0/1 p00-arm01 -p01-arm* up 1-00:00:00 0/8/0/8 p01-arm[01-08] -p02-intel up 1-00:00:00 0/2/0/2 p02-intel[01-02] -p03-amd up 1-00:00:00 0/2/0/2 p03-amd[01-02] -p04-edge up 1-00:00:00 0/1/0/1 p04-edge01 -p05-synt up 1-00:00:00 0/1/0/1 p05-synt01 +PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST +qcpu* up 2-00:00:00 0/192/0/192 cn[1-192] +qcpu_biz up 2-00:00:00 0/192/0/192 cn[1-192] +qcpu_exp up 1:00:00 0/192/0/192 cn[1-192] +qcpu_free up 18:00:00 0/192/0/192 cn[1-192] +qcpu_long up 6-00:00:00 0/192/0/192 cn[1-192] +qcpu_preempt up 12:00:00 0/192/0/192 cn[1-192] +qgpu up 2-00:00:00 0/8/0/8 cn[193-200] +qgpu_biz up 2-00:00:00 0/8/0/8 cn[193-200] +qgpu_exp up 1:00:00 0/8/0/8 cn[193-200] +qgpu_free up 18:00:00 0/8/0/8 cn[193-200] +qgpu_preempt up 12:00:00 0/8/0/8 cn[193-200] +qfat up 2-00:00:00 0/1/0/1 cn201 +qdgx up 2-00:00:00 0/1/0/1 cn202 +qviz up 8:00:00 0/2/0/2 vizserv[1-2] ``` This lists only a partition state summary with no dedicated column for partition state. Instead, it is summarized in the `NODES(A/I/O/T)` column, where the `A/I/O/T` stands for `allocated/idle/other/total`. -- GitLab