diff --git a/docs.it4i/barbora/introduction.md b/docs.it4i/barbora/introduction.md index 874ccb1f3e951b222a8c8d1231ece872f3855542..0c7c5466d4b170902734b062e96bbfa4f3284b37 100644 --- a/docs.it4i/barbora/introduction.md +++ b/docs.it4i/barbora/introduction.md @@ -8,7 +8,7 @@ The cluster runs with an operating system compatible with the Red Hat [Linux fam The user data shared file system and job data shared file system are available to users. -The [PBS Professional Open Source Project][b] workload manager provides [computing resources allocations and job execution][3]. +The [Slurm][b] workload manager provides [computing resources allocations and job execution][3]. Read more on how to [apply for resources][4], [obtain login credentials][5] and [access the cluster][6]. @@ -22,4 +22,4 @@ Read more on how to [apply for resources][4], [obtain login credentials][5] and [6]: ../general/shell-and-data-access.md [a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg -[b]: https://www.pbspro.org/ +[b]: https://slurm.schedmd.com/ diff --git a/docs.it4i/dgx2/accessing.md b/docs.it4i/dgx2/accessing.md index 6885e4cf1e520e1d8febe0200f8bf88b53d84f6f..ad4f6969c4fe42e30acf20e199cba60fbe5078e3 100644 --- a/docs.it4i/dgx2/accessing.md +++ b/docs.it4i/dgx2/accessing.md @@ -7,7 +7,8 @@ ## How to Access -The DGX-2 machine can be accessed through the scheduler from Barbora login nodes `barbora.it4i.cz` as a compute node cn202. +The DGX-2 machine is integrated into [Barbora cluster][3]. +The DGX-2 machine can be accessed from Barbora login nodes `barbora.it4i.cz` through the Barbora scheduler queue qdgx as a compute node cn202. ## Storage @@ -32,3 +33,4 @@ For more information on accessing PROJECT, its quotas, etc., see the [PROJECT Da [1]: ../../barbora/storage/#home-file-system [2]: ../../storage/project-storage +[3]: ../../barbora/introduction diff --git a/docs.it4i/dgx2/job_execution.md b/docs.it4i/dgx2/job_execution.md index 027bca20015707d5c8c0ddbb455d3b7cb019e062..849cd7469d8762ff740d68e08c8a8a6e22c22597 100644 --- a/docs.it4i/dgx2/job_execution.md +++ b/docs.it4i/dgx2/job_execution.md @@ -2,38 +2,24 @@ To run a job, computational resources of DGX-2 must be allocated. -## Resources Allocation Policy - -The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue. The queue provides prioritized and exclusive access to computational resources. - -The queue for the DGX-2 machine is called **qdgx**. - -!!! note - The qdgx queue is configured to run one job and accept one job in a queue per user with the maximum walltime of a job being **48** hours. - -## Job Submission and Execution - -The `qsub` submits the job into the queue. The command creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to allocation policies and constraints. After the resources are allocated, the jobscript or interactive shell is executed on the allocated node. - -### Job Submission +The DGX-2 machine is integrated to and accessible through Barbora cluster, the queue for the DGX-2 machine is called **qdgx**. When allocating computational resources for the job, specify: -1. a queue for your job (the default is **qdgx**); -1. the maximum wall time allocated to your calculation (default is **4 hour**, maximum is **48 hour**); -1. a jobscript or interactive switch. - -!!! info - You can access the DGX PBS scheduler by loading the "DGX-2" module. +1. your Project ID +1. a queue for your job - **qdgx**; +1. the maximum time allocated to your calculation (default is **4 hour**, maximum is **48 hour**); +1. a jobscript if batch processing is intended. -Submit the job using the `qsub` command: +Submit the job using the `sbatch` (for batch processing) or `salloc` (for interactive session) command: **Example** ```console -[kru0052@login2.barbora ~]$ qsub -q qdgx -l walltime=02:00:00 -I -qsub: waiting for job 258.dgx to start -qsub: job 258.dgx ready +[kru0052@login2.barbora ~]$ salloc -A PROJECT-ID -p qdgx --time=02:00:00 +salloc: Granted job allocation 36631 +salloc: Waiting for resource configuration +salloc: Nodes cn202 are ready for job kru0052@cn202:~$ nvidia-smi Wed Jun 16 07:46:32 2021 @@ -95,7 +81,7 @@ kru0052@cn202:~$ exit ``` !!! tip - Submit the interactive job using the `qsub -I ...` command. + Submit the interactive job using the `salloc` command. ### Job Execution @@ -110,9 +96,10 @@ to download the container via Apptainer/Singularity, see the example below: #### Example - Apptainer/Singularity Run Tensorflow ```console -[kru0052@login2.barbora ~]$ qsub -q qdgx -l walltime=01:00:00 -I -qsub: waiting for job 96.dgx to start -qsub: job 96.dgx ready +[kru0052@login2.barbora ~] $ salloc -A PROJECT-ID -p qdgx --time=02:00:00 +salloc: Granted job allocation 36633 +salloc: Waiting for resource configuration +salloc: Nodes cn202 are ready for job kru0052@cn202:~$ singularity shell docker://nvcr.io/nvidia/tensorflow:19.02-py3 Singularity tensorflow_19.02-py3.sif:~> diff --git a/docs.it4i/environment-and-modules.md b/docs.it4i/environment-and-modules.md index dfb2f13e5a8df0db99e37e7edeb3d623a585a6dc..c467bd70c014aa69df7423c8784c67ff652ea5ed 100644 --- a/docs.it4i/environment-and-modules.md +++ b/docs.it4i/environment-and-modules.md @@ -24,7 +24,7 @@ After logging in, you may want to configure the environment. Write your preferre export MODULEPATH=${MODULEPATH}:/home/$USER/.local/easybuild/modules/all # User specific aliases and functions -alias qs='qstat -a' +alias sq='squeue --me' # load default intel compilator !!! is not recommended !!! ml intel @@ -37,7 +37,7 @@ fi ``` !!! note - Do not run commands outputting to standard output (echo, module list, etc.) in .bashrc for non-interactive SSH sessions. It breaks the fundamental functionality (SCP, PBS) of your account. Take care for SSH session interactivity for such commands as stated in the previous example. + Do not run commands outputting to standard output (echo, module list, etc.) in .bashrc for non-interactive SSH sessions. It breaks the fundamental functionality (SCP) of your account. Take care for SSH session interactivity for such commands as stated in the previous example. ### Application Modules diff --git a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md index 21267028e6de9b350623902677ead76b7b0993f4..3028f2ae610d99f0124a06ce4ca0d2704f490f59 100644 --- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md +++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md @@ -227,10 +227,10 @@ Open a Terminal (_Applications -> System Tools -> Terminal_). Run all the follow Allow incoming X11 graphics from the compute nodes at the login node: -Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `X` option: +Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option: ```console -$ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=36 +$ salloc -A PROJECT_ID -q qcpu --x11 ``` Test that the DISPLAY redirection into your VNC session works, by running an X11 application (e.g. XTerm, Intel Advisor, etc.) on the assigned compute node: @@ -249,10 +249,10 @@ For a [better performance][1] an SSH connection can be used. Open two Terminals (_Applications -> System Tools -> Terminal_) as described before. -Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `X` option: +Get an interactive session on a compute node (for more detailed info [look here][4]). Forward X11 system using `--x11` option: ```console -$ qsub -I -X -A PROJECT_ID -q qprod -l select=1:ncpus=36 +$ salloc -A PROJECT_ID -q qcpu --x11 ``` In the second terminal connect to the assigned node and run the X11 application diff --git a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md index cc041f9ffa14072fb306ee883e89a951167218ed..787a1d0a62bd79e92fa0dae7c8d2ab075f97eabb 100644 --- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md +++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/x-window-system.md @@ -99,21 +99,21 @@ In this example, we activate the Intel programing environment tools and then sta ## GUI Applications on Compute Nodes -Allocate the compute nodes using the `-X` option on the `qsub` command: +Allocate the compute nodes using the `--x11` option on the `salloc` command: ```console -$ qsub -q qexp -l select=2:ncpus=24 -X -I +$ salloc -A PROJECT-ID -q qcpu_exp --x11 ``` -In this example, we allocate 2 nodes via qexp queue, interactively. We request X11 forwarding with the `-X` option. It will be possible to run the GUI enabled applications directly on the first compute node. +In this example, we allocate one node via qcpu_exp queue, interactively. We request X11 forwarding with the `--x11` option. It will be possible to run the GUI enabled applications directly on the first compute node. For **better performance**, log on the allocated compute node via SSH, using the `-X` option. ```console -$ ssh -X r24u35n680 +$ ssh -X cn245 ``` -In this example, we log on the r24u35n680 compute node, with the X11 forwarding enabled. +In this example, we log on the cn245 compute node, with the X11 forwarding enabled. ## Gnome GUI Environment @@ -143,7 +143,7 @@ xinit /usr/bin/ssh -XT -i .ssh/path_to_your_key yourname@cluster-namen.it4i.cz g ``` However, this method does not seem to work with recent Linux distributions and you will need to manually source -/etc/profile to properly set environment variables for PBS. +/etc/profile to properly set environment variables for Slurm. ### Gnome on Windows diff --git a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/xorg.md b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/xorg.md index 77c3bd00c8c1fc4987b1b056eca770d0bbfcd8d7..2c131e831ae260f3ac354bbab0faecbf9fecd822 100644 --- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/xorg.md +++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/xorg.md @@ -28,7 +28,7 @@ Some applications (e.g. Paraview, Ensight, Blender, Ovito) require not only visu 1. Run interactive job in gnome terminal ```console - [loginX.karolina]$ qsub -q qnvidia -l select=1 -IX -A OPEN-XX-XX -l xorg=True + [loginX.karolina]$ salloc --A PROJECT-ID -q qgpu --x11 --comment use:xorg=true ``` 1. Run Xorg server @@ -82,7 +82,7 @@ Some applications (e.g. Paraview, Ensight, Blender, Ovito) require not only visu 1. Run job from terminal: ```console - [loginX.karolina]$ qsub -q qnvidia -l select=1 -A OPEN-XX-XX -l xorg=True ./run_eevee.sh + [loginX.karolina]$ sbatch -A PROJECT-ID -q qcpu --comment use:xorg=true ./run_eevee.sh ``` [1]: ./vnc.md diff --git a/docs.it4i/general/barbora-queues.md b/docs.it4i/general/barbora-queues.md index 990d7505c7238c9c014f3e6f67aa520637404d24..3ccfb5478e68df7ed67e1fdd369802a66dc8f01e 100644 --- a/docs.it4i/general/barbora-queues.md +++ b/docs.it4i/general/barbora-queues.md @@ -1,3 +1,6 @@ +!!!Warning + This page has not been fully updated yet. The page does not reflect the transition from PBS to Slurm. + # Barbora Queues Below is the list of queues available on the Barbora cluster: diff --git a/docs.it4i/general/capacity-computing.md b/docs.it4i/general/capacity-computing.md index d9c98d6f77090336e671fe5ef05bd37da9cdfbbc..f0d36bf30426ae7fa8389289a1c6d7ecb8336787 100644 --- a/docs.it4i/general/capacity-computing.md +++ b/docs.it4i/general/capacity-computing.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Capacity Computing ## Introduction diff --git a/docs.it4i/general/hyperqueue.md b/docs.it4i/general/hyperqueue.md index 0fbd8222da05f28c18771cdaab41722ad8bfa065..458ecfcb6d420842386347874e52adfae8b53480 100644 --- a/docs.it4i/general/hyperqueue.md +++ b/docs.it4i/general/hyperqueue.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # HyperQueue HyperQueue lets you build a computation plan consisting of a large amount of tasks and then execute it transparently over a system like SLURM/PBS. diff --git a/docs.it4i/general/job-arrays.md b/docs.it4i/general/job-arrays.md index def493cb11a8f5567b2020286cd17db3c9764cc9..3a438f0cbc3fae2e10bccbe43a6e25eb326d8bba 100644 --- a/docs.it4i/general/job-arrays.md +++ b/docs.it4i/general/job-arrays.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Job Arrays A job array is a compact representation of many jobs called subjobs. Subjobs share the same job script, and have the same values for all attributes and resources, with the following exceptions: diff --git a/docs.it4i/general/job-priority.md b/docs.it4i/general/job-priority.md index 2e9f7137a9f18ce8865b9a4768276693b49411e8..fa7252d5cd930ad71913672841cfa17b2dcb5016 100644 --- a/docs.it4i/general/job-priority.md +++ b/docs.it4i/general/job-priority.md @@ -1,3 +1,6 @@ +!!!Warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Job Scheduling ## Job Execution Priority diff --git a/docs.it4i/general/job-submission-and-execution.md b/docs.it4i/general/job-submission-and-execution.md deleted file mode 100644 index 071b794f902d0e09ede02e2030649ae8abb2f0ce..0000000000000000000000000000000000000000 --- a/docs.it4i/general/job-submission-and-execution.md +++ /dev/null @@ -1,458 +0,0 @@ -# Job Submission and Execution - -## Job Submission - -When allocating computational resources for the job, specify: - -1. a suitable queue for your job (the default is qprod) -1. the number of computational nodes (required) -1. the number of cores per node (not required) -1. the maximum wall time allocated to your calculation, note that jobs exceeding the maximum wall time will be killed -1. your Project ID -1. a Jobscript or interactive switch - -Submit the job using the `qsub` command: - -```console -$ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript -``` - -The `qsub` command submits the job to the queue, i.e. it creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to the above described policies and constraints. **After the resources are allocated, the jobscript or interactive shell is executed on the first of the allocated nodes.** - -!!! note - `ncpus=y` is usually not required, because the smallest allocation unit is an entire node. The exception are corner cases for `qviz` and `qfat` on Karolina. - -### Job Submission Examples - -```console -$ qsub -A OPEN-0-0 -q qprod -l select=64,walltime=03:00:00 ./myjob -``` - -In this example, we allocate 64 nodes, 36 cores per node, for 3 hours. We allocate these resources via the `qprod` queue, consumed resources will be accounted to the project identified by Project ID `OPEN-0-0`. The jobscript `myjob` will be executed on the first node in the allocation. - -```console -$ qsub -q qexp -l select=4 -I -``` - -In this example, we allocate 4 nodes, 36 cores per node, for 1 hour. We allocate these resources via the `qexp` queue. The resources will be available interactively. - -```console -$ qsub -A OPEN-0-0 -q qnvidia -l select=10 ./myjob -``` - -In this example, we allocate 10 NVIDIA accelerated nodes, 24 cores per node, for 24 hours. We allocate these resources via the `qnvidia` queue. The jobscript `myjob` will be executed on the first node in the allocation. - -```console -$ qsub -A OPEN-0-0 -q qfree -l select=10 ./myjob -``` - -In this example, we allocate 10 nodes, 24 cores per node, for 12 hours. We allocate these resources via the `qfree` queue. It is not required that the project `OPEN-0-0` has any available resources left. Consumed resources are still accounted for. The jobscript `myjob` will be executed on the first node in the allocation. - -All `qsub` options may be [saved directly into the jobscript][1]. In such cases, it is not necessary to specify any options for `qsub`. - -```console -$ qsub ./myjob -``` - -By default, the PBS batch system sends an email only when the job is aborted. Disabling mail events completely can be done as follows: - -```console -$ qsub -m n -``` - -#### Dependency Job Submission - -To submit dependent jobs in sequence, use the `depend` function of `qsub`. - -First submit the first job in a standard manner: - -```console -$ qsub -A OPEN-0-0 -q qprod -l select=64,walltime=02:00:00 ./firstjob -123456[].isrv1 -``` - -Then submit the second job using the `depend` function: - -```console -$ qsub -W depend=afterok:123456 ./secondjob -``` - -Both jobs will be queued, but the second job won't start until the first job has finished successfully. - -Below is the list of arguments that can be used with `-W depend=dependency:jobid`: - -| Argument | Description | -| ----------- | --------------------------------------------------------------- | -| after | This job is scheduled after `jobid` begins execution. | -| afterok | This job is scheduled after `jobid` finishes successfully. | -| afternotok | This job is scheduled after `jobid` finishes unsucessfully. | -| afterany | This job is scheduled after `jobid` finishes in any state. | -| before | This job must begin execution before `jobid` is scheduled. | -| beforeok | This job must finish successfully before `jobid` begins. | -| beforenotok | This job must finish unsuccessfully before `jobid` begins. | -| beforeany | This job must finish in any state before `jobid` begins. | - -### Useful Tricks - -All `qsub` options may be [saved directly into the jobscript][1]. In such a case, no options to `qsub` are needed. - -```console -$ qsub ./myjob -``` - -By default, the PBS batch system sends an email only when the job is aborted. Disabling mail events completely can be done like this: - -```console -$ qsub -m n -``` - -<!--- NOT IMPLEMENTED ON KAROLINA YET - -## Advanced Job Placement - -### Salomon - Placement by Network Location - -The network location of allocated nodes in the [InfiniBand network][3] influences efficiency of network communication between nodes of job. Nodes on the same InfiniBand switch communicate faster with lower latency than distant nodes. To improve communication efficiency of jobs, PBS scheduler on Salomon is configured to allocate nodes (from currently available resources), which are as close as possible in the network topology. - -For communication intensive jobs, it is possible to set stricter requirement - to require nodes directly connected to the same InfiniBand switch or to require nodes located in the same dimension group of the InfiniBand network. - -### Salomon - Placement by InfiniBand Switch - -Nodes directly connected to the same InfiniBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InfiniBand switch. - -!!! note - We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. - -Nodes directly connected to the one InfiniBand switch can be allocated using node grouping on the PBS resource attribute `switch`. - -In this example, we request all 9 nodes directly connected to the same switch using node grouping placement. - -```console -$ qsub -A OPEN-0-0 -q qprod -l select=9 -l place=group=switch ./myjob -``` - ---> - -## Advanced Job Handling - -### Selecting Turbo Boost Off - -!!! note - For Barbora only. - -Intel Turbo Boost Technology is on by default. We strongly recommend keeping the default. - -If necessary (such as in the case of benchmarking), you can disable Turbo for all nodes of the job by using the PBS resource attribute `cpu_turbo_boost`: - -```console -$ qsub -A OPEN-0-0 -q qprod -l select=4 -l cpu_turbo_boost=0 -I -``` - -More information about the Intel Turbo Boost can be found in the TurboBoost section - -### Advanced Examples - -In the following example, we select an allocation for benchmarking a very special and demanding MPI program. We request Turbo off, and 2 full chassis of compute nodes (nodes sharing the same IB switches) for 30 minutes: - -```console -$ qsub -A OPEN-0-0 -q qprod - -l select=18:ibswitch=isw10:mpiprocs=1:ompthreads=16+18:ibswitch=isw20:mpiprocs=16:ompthreads=1 - -l cpu_turbo_boost=0,walltime=00:30:00 - -N Benchmark ./mybenchmark -``` - -The MPI processes will be distributed differently on the nodes connected to the two switches. On the isw10 nodes, we will run 1 MPI process per node with 16 threads per process, on isw20 nodes we will run 16 plain MPI processes. - -Although this example is somewhat artificial, it demonstrates the flexibility of the qsub command options. - -## Job Management - -!!! note - Check the status of your jobs using the `qstat` and `check-pbs-jobs` commands - -```console -$ qstat -a -$ qstat -a -u username -$ qstat -an -u username -$ qstat -f 12345.srv11 -``` - -Example: - -```console -$ qstat -a - -srv11: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ---------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- -16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 -16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 -16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 -``` - -In this example user1 and user2 are running jobs named `job1`, `job2`, and `job3x`. `job1` and `job2` are using 4 nodes, 128 cores per node each. `job1` has already run for 38 hours and 25 minutes, and `job2` for 17 hours 44 minutes. So `job1`, for example, has already consumed `64 x 38.41 = 2,458.6` core-hours. `job3x` has already consumed `32 x 0.96 = 30.93` core-hours. These consumed core-hours will be [converted to node-hours][10] and accounted for on the respective project accounts, regardless of whether the allocated cores were actually used for computations. - -The following commands allow you to check the status of your jobs using the `check-pbs-jobs` command, check for the presence of user's PBS jobs' processes on execution hosts, display load and processes, display job standard and error output, and continuously display (`tail -f`) job standard or error output. - -```console -$ check-pbs-jobs --check-all -$ check-pbs-jobs --print-load --print-processes -$ check-pbs-jobs --print-job-out --print-job-err -$ check-pbs-jobs --jobid JOBID --check-all --print-all -$ check-pbs-jobs --jobid JOBID --tailf-job-out -``` - -Examples: - -```console -$ check-pbs-jobs --check-all -JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 -Check session id: OK -Check processes -cn164: OK -cn165: No process -``` - -In this example we see that job `35141.dm2` is not currently running any processes on the allocated node cn165, which may indicate an execution error: - -```console -$ check-pbs-jobs --print-load --print-processes -JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 -Print load -cn164: LOAD: 16.01, 16.01, 16.00 -cn165: LOAD: 0.01, 0.00, 0.01 -Print processes - %CPU CMD -cn164: 0.0 -bash -cn164: 0.0 /bin/bash /var/spool/PBS/mom_priv/jobs/35141.dm2.SC -cn164: 99.7 run-task -... -``` - -In this example, we see that job `35141.dm2` is currently running a process run-task on node `cn164`, using one thread only, while node `cn165` is empty, which may indicate an execution error. - -```console -$ check-pbs-jobs --jobid 35141.dm2 --print-job-out -JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 -Print job standard output: -======================== Job start ========================== -Started at : Fri Aug 30 02:47:53 CEST 2013 -Script name : script -Run loop 1 -Run loop 2 -Run loop 3 -``` - -In this example, we see the actual output (some iteration loops) of the job `35141.dm2`. - -!!! note - Manage your queued or running jobs, using the `qhold`, `qrls`, `qdel`, `qsig`, or `qalter` commands - -You may release your allocation at any time, using the `qdel` command - -```console -$ qdel 12345.srv11 -``` - -You may kill a running job by force, using the `qsig` command - -```console -$ qsig -s 9 12345.srv11 -``` - -Learn more by reading the PBS man page - -```console -$ man pbs_professional -``` - -## Job Execution - -### Jobscript - -!!! note - Prepare the jobscript to run batch jobs in the PBS queue system - -The Jobscript is a user made script controlling a sequence of commands for executing the calculation. It is often written in bash, though other scripts may be used as well. The jobscript is supplied to the PBS `qsub` command as an argument, and is executed by the PBS Professional workload manager. - -!!! note - The jobscript or interactive shell is executed on first of the allocated nodes. - -```console -$ qsub -q qexp -l select=4 -N Name0 ./myjob -$ qstat -n -u username - -srv11: - Req'd Req'd Elap -Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ---------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- -15209.srv11 username qexp Name0 5530 4 128 -- 01:00 R 00:00 - cn17/0*32+cn108/0*32+cn109/0*32+cn110/0*32 -``` - -In this example, the nodes `cn17`, `cn108`, `cn109`, and `cn110` were allocated for 1 hour via the qexp queue. The `myjob` jobscript will be executed on the node `cn17`, while the nodes `cn108`, `cn109`, and `cn110` are available for use as well. - -The jobscript or interactive shell is by default executed in the `/home` directory: - -```console -$ qsub -q qexp -l select=4 -I -qsub: waiting for job 15210.srv11 to start -qsub: job 15210.srv11 ready - -$ pwd -/home/username -``` - -In this example, 4 nodes were allocated interactively for 1 hour via the `qexp` queue. The interactive shell is executed in the `/home` directory. - -!!! note - All nodes within the allocation may be accessed via SSH. Unallocated nodes are not accessible to the user. - -The allocated nodes are accessible via SSH from login nodes. The nodes may access each other via SSH as well. - -Calculations on allocated nodes may be executed remotely via the MPI, SSH, pdsh, or clush. You may find out which nodes belong to the allocation by reading the `$PBS_NODEFILE` file - -```console -$ qsub -q qexp -l select=4 -I -qsub: waiting for job 15210.srv11 to start -qsub: job 15210.srv11 ready - -$ pwd -/home/username - -$ sort -u $PBS_NODEFILE -cn17.bullx -cn108.bullx -cn109.bullx -cn110.bullx - -$ pdsh -w cn17,cn[108-110] hostname -cn17: cn17 -cn108: cn108 -cn109: cn109 -cn110: cn110 -``` - -In this example, the hostname program is executed via `pdsh` from the interactive shell. The execution runs on all four allocated nodes. The same result would be achieved if the `pdsh` were called from any of the allocated nodes or from the login nodes. - -### Example Jobscript for MPI Calculation - -!!! note - Production jobs must use the /scratch directory for I/O - -The recommended way to run production jobs is to change to the `/scratch` directory early in the jobscript, copy all inputs to `/scratch`, execute the calculations, and copy outputs to the `/home` directory. - -```bash -#!/bin/bash - -cd $PBS_O_WORKDIR - -SCRDIR=/scratch/project/open-00-00/${USER}/myjob -mkdir -p $SCRDIR - -# change to scratch directory, exit on failure -cd $SCRDIR || exit - -# copy input file to scratch -cp $PBS_O_WORKDIR/input . -cp $PBS_O_WORKDIR/mympiprog.x . - -# load the MPI module -# (Always specify the module's name and version in your script; -# for the reason, see https://docs.it4i.cz/software/modules/lmod/#loading-modules.) -ml OpenMPI/4.1.1-GCC-10.2.0-Java-1.8.0_221 - -# execute the calculation -mpirun -pernode ./mympiprog.x - -# copy output file to home -cp output $PBS_O_WORKDIR/. - -#exit -exit -``` - -In this example, a directory in `/home` holds the input file input and the `mympiprog.x` executable. We create the `myjob` directory on the `/scratch` filesystem, copy input and executable files from the `/home` directory where the `qsub` was invoked (`$PBS_O_WORKDIR`) to `/scratch`, execute the MPI program `mympiprog.x` and copy the output file back to the `/home` directory. `mympiprog.x` is executed as one process per node, on all allocated nodes. - -!!! note - Consider preloading inputs and executables onto [shared scratch][6] memory before the calculation starts. - -In some cases, it may be impractical to copy the inputs to the `/scratch` memory and the outputs to the `/home` directory. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such cases, it is the users' responsibility to preload the input files on the shared `/scratch` memory before the job submission, and retrieve the outputs manually after all calculations are finished. - -!!! note - Store the `qsub` options within the jobscript. Use the `mpiprocs` and `ompthreads` qsub options to control the MPI job execution. - -### Example Jobscript for MPI Calculation With Preloaded Inputs - -Example jobscript for an MPI job with preloaded inputs and executables, options for `qsub` are stored within the script: - -```bash -#!/bin/bash -#PBS -q qprod -#PBS -N MYJOB -#PBS -l select=100:mpiprocs=1:ompthreads=16 -#PBS -A OPEN-00-00 - -# job is run using project resources; here ${PBS_ACCOUNT,,} translates to "open-00-00" -SCRDIR=/scratch/project/${PBS_ACCOUNT,,}/${USER}/myjob - -# change to scratch directory, exit on failure -cd $SCRDIR || exit - -# load the MPI module -# (Always specify the module's name and version in your script; -# for the reason, see https://docs.it4i.cz/software/modules/lmod/#loading-modules.) -ml OpenMPI/4.1.1-GCC-10.2.0-Java-1.8.0_221 - -# execute the calculation -mpirun ./mympiprog.x - -#exit -exit -``` - -In this example, input and executable files are assumed to be preloaded manually in the `/scratch/project/open-00-00/$USER/myjob` directory. Because we used the `qprod` queue, we had to specify which project's resources we want to use, and our `PBS_ACCOUNT` variable will be set accordingly (OPEN-00-00). `${PBS_ACCOUNT,,}` uses one of the bash's built-in functions to translate it into lower case. - -Note the `mpiprocs` and `ompthreads` qsub options controlling the behavior of the MPI execution. `mympiprog.x` is executed as one process per node, on all 100 allocated nodes. If `mympiprog.x` implements OpenMP threads, it will run 16 threads per node. - -### Example Jobscript for Single Node Calculation - -!!! note - The local scratch directory is often useful for single node jobs. Local scratch memory will be deleted immediately after the job ends. - -Example jobscript for single node calculation, using [local scratch][6] memory on the node: - -```bash -#!/bin/bash - -# change to local scratch directory -cd /lscratch/$PBS_JOBID || exit - -# copy input file to scratch -cp $PBS_O_WORKDIR/input . -cp $PBS_O_WORKDIR/myprog.x . - -# execute the calculation -./myprog.x - -# copy output file to home -cp output $PBS_O_WORKDIR/. - -#exit -exit -``` - -In this example, a directory in `/home` holds the input file input and the executable `myprog.x`. We copy input and executable files from the `/home` directory where the `qsub` was invoked (`$PBS_O_WORKDIR`) to the local `/scratch` memory `/lscratch/$PBS_JOBID`, execute `myprog.x` and copy the output file back to the `/home directory`. `myprog.x` runs on one node only and may use threads. - -### Other Jobscript Examples - -Further jobscript examples may be found in the software section and the [Capacity computing][9] section. - -[1]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs -[2]: resources-allocation-policy.md -[3]: ../salomon/network.md -[5]: ../salomon/7d-enhanced-hypercube.md -[6]: ../salomon/storage.md -[9]: capacity-computing.md -[10]: resources-allocation-policy.md#resource-accounting-policy diff --git a/docs.it4i/general/job-submission-and-execution.md b/docs.it4i/general/job-submission-and-execution.md new file mode 120000 index 0000000000000000000000000000000000000000..752343b84fa6a73dd3f78226f5d4bf5f1b49465b --- /dev/null +++ b/docs.it4i/general/job-submission-and-execution.md @@ -0,0 +1 @@ +slurm-job-submission-and-execution.md \ No newline at end of file diff --git a/docs.it4i/general/karolina-mpi.md b/docs.it4i/general/karolina-mpi.md index 685dbe6a6326252d482ec3b7a2cc17fd945a25bb..1002b798ecdea5ad1b0b8862f9518d7676628543 100644 --- a/docs.it4i/general/karolina-mpi.md +++ b/docs.it4i/general/karolina-mpi.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Parallel Runs Setting on Karolina Important aspect of each parallel application is correct placement of MPI processes diff --git a/docs.it4i/general/karolina-queues.md b/docs.it4i/general/karolina-queues.md index 39ecdb259b56aa1148eaf357f55347b6445c341e..421fc83257099c5c0eabd1b550a3491957ec64cc 100644 --- a/docs.it4i/general/karolina-queues.md +++ b/docs.it4i/general/karolina-queues.md @@ -1,3 +1,6 @@ +!!!Warning + This page has not been fully updated yet. The page does not reflect the transition from PBS to Slurm. + # Karolina Queues Below is the list of queues available on the Karolina cluster: @@ -20,12 +23,3 @@ Below is the list of queues available on the Karolina cluster: | **qviz** | yes | none required | 2 nodes (with NVIDIA® Quadro RTX™ 6000) | 8 | 0 | no | 1 / 8h | | **qfat** | yes | > 0 | 1 (sdf1) | 24 | 0 | yes | 24 / 48h | -## Legacy Queues - -| Queue | Active project | Project resources | Nodes | Min ncpus | Priority | Authorization | Walltime (default/max) | -| ---------------- | -------------- | -------------------- | ------------------------------------------------------------- | --------- | -------- | ------------- | ----------------------- | -| **qfree** | yes | < 150% of allocation | 756 nodes<br>max 4 nodes per job | 128 | -100 | no | 12 / 12h | -| **qexp** | no | none required | 756 nodes<br>max 2 nodes per job | 128 | 150 | no | 1 / 1h | -| **qprod** | yes | > 0 | 756 nodes | 128 | 0 | no | 24 / 48h | -| **qlong** | yes | > 0 | 200 nodes<br>max 20 nodes per job, only non-accelerated nodes allowed | 128 | 0 | no | 72 / 144h | -| **qnvidia** | yes | > 0 | 72 nodes | 128 | 0 | yes | 24 / 48h | diff --git a/docs.it4i/general/pbs-job-submission-and-execution.md b/docs.it4i/general/pbs-job-submission-and-execution.md new file mode 100644 index 0000000000000000000000000000000000000000..f682a621fd07eaa81f558e7dcb2e304a11e5fec6 --- /dev/null +++ b/docs.it4i/general/pbs-job-submission-and-execution.md @@ -0,0 +1,461 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + +# Job Submission and Execution + +## Job Submission + +When allocating computational resources for the job, specify: + +1. a suitable queue for your job (the default is qprod) +1. the number of computational nodes (required) +1. the number of cores per node (not required) +1. the maximum wall time allocated to your calculation, note that jobs exceeding the maximum wall time will be killed +1. your Project ID +1. a Jobscript or interactive switch + +Submit the job using the `qsub` command: + +```console +$ qsub -A Project_ID -q queue -l select=x:ncpus=y,walltime=[[hh:]mm:]ss[.ms] jobscript +``` + +The `qsub` command submits the job to the queue, i.e. it creates a request to the PBS Job manager for allocation of specified resources. The resources will be allocated when available, subject to the above described policies and constraints. **After the resources are allocated, the jobscript or interactive shell is executed on the first of the allocated nodes.** + +!!! note + `ncpus=y` is usually not required, because the smallest allocation unit is an entire node. The exception are corner cases for `qviz` and `qfat` on Karolina. + +### Job Submission Examples + +```console +$ qsub -A OPEN-0-0 -q qprod -l select=64,walltime=03:00:00 ./myjob +``` + +In this example, we allocate 64 nodes, 36 cores per node, for 3 hours. We allocate these resources via the `qprod` queue, consumed resources will be accounted to the project identified by Project ID `OPEN-0-0`. The jobscript `myjob` will be executed on the first node in the allocation. + +```console +$ qsub -q qexp -l select=4 -I +``` + +In this example, we allocate 4 nodes, 36 cores per node, for 1 hour. We allocate these resources via the `qexp` queue. The resources will be available interactively. + +```console +$ qsub -A OPEN-0-0 -q qnvidia -l select=10 ./myjob +``` + +In this example, we allocate 10 NVIDIA accelerated nodes, 24 cores per node, for 24 hours. We allocate these resources via the `qnvidia` queue. The jobscript `myjob` will be executed on the first node in the allocation. + +```console +$ qsub -A OPEN-0-0 -q qfree -l select=10 ./myjob +``` + +In this example, we allocate 10 nodes, 24 cores per node, for 12 hours. We allocate these resources via the `qfree` queue. It is not required that the project `OPEN-0-0` has any available resources left. Consumed resources are still accounted for. The jobscript `myjob` will be executed on the first node in the allocation. + +All `qsub` options may be [saved directly into the jobscript][1]. In such cases, it is not necessary to specify any options for `qsub`. + +```console +$ qsub ./myjob +``` + +By default, the PBS batch system sends an email only when the job is aborted. Disabling mail events completely can be done as follows: + +```console +$ qsub -m n +``` + +#### Dependency Job Submission + +To submit dependent jobs in sequence, use the `depend` function of `qsub`. + +First submit the first job in a standard manner: + +```console +$ qsub -A OPEN-0-0 -q qprod -l select=64,walltime=02:00:00 ./firstjob +123456[].isrv1 +``` + +Then submit the second job using the `depend` function: + +```console +$ qsub -W depend=afterok:123456 ./secondjob +``` + +Both jobs will be queued, but the second job won't start until the first job has finished successfully. + +Below is the list of arguments that can be used with `-W depend=dependency:jobid`: + +| Argument | Description | +| ----------- | --------------------------------------------------------------- | +| after | This job is scheduled after `jobid` begins execution. | +| afterok | This job is scheduled after `jobid` finishes successfully. | +| afternotok | This job is scheduled after `jobid` finishes unsucessfully. | +| afterany | This job is scheduled after `jobid` finishes in any state. | +| before | This job must begin execution before `jobid` is scheduled. | +| beforeok | This job must finish successfully before `jobid` begins. | +| beforenotok | This job must finish unsuccessfully before `jobid` begins. | +| beforeany | This job must finish in any state before `jobid` begins. | + +### Useful Tricks + +All `qsub` options may be [saved directly into the jobscript][1]. In such a case, no options to `qsub` are needed. + +```console +$ qsub ./myjob +``` + +By default, the PBS batch system sends an email only when the job is aborted. Disabling mail events completely can be done like this: + +```console +$ qsub -m n +``` + +<!--- NOT IMPLEMENTED ON KAROLINA YET + +## Advanced Job Placement + +### Salomon - Placement by Network Location + +The network location of allocated nodes in the [InfiniBand network][3] influences efficiency of network communication between nodes of job. Nodes on the same InfiniBand switch communicate faster with lower latency than distant nodes. To improve communication efficiency of jobs, PBS scheduler on Salomon is configured to allocate nodes (from currently available resources), which are as close as possible in the network topology. + +For communication intensive jobs, it is possible to set stricter requirement - to require nodes directly connected to the same InfiniBand switch or to require nodes located in the same dimension group of the InfiniBand network. + +### Salomon - Placement by InfiniBand Switch + +Nodes directly connected to the same InfiniBand switch can communicate most efficiently. Using the same switch prevents hops in the network and provides for unbiased, most efficient network communication. There are 9 nodes directly connected to every InfiniBand switch. + +!!! note + We recommend allocating compute nodes of a single switch when the best possible computational network performance is required to run job efficiently. + +Nodes directly connected to the one InfiniBand switch can be allocated using node grouping on the PBS resource attribute `switch`. + +In this example, we request all 9 nodes directly connected to the same switch using node grouping placement. + +```console +$ qsub -A OPEN-0-0 -q qprod -l select=9 -l place=group=switch ./myjob +``` + +--> + +## Advanced Job Handling + +### Selecting Turbo Boost Off + +!!! note + For Barbora only. + +Intel Turbo Boost Technology is on by default. We strongly recommend keeping the default. + +If necessary (such as in the case of benchmarking), you can disable Turbo for all nodes of the job by using the PBS resource attribute `cpu_turbo_boost`: + +```console +$ qsub -A OPEN-0-0 -q qprod -l select=4 -l cpu_turbo_boost=0 -I +``` + +More information about the Intel Turbo Boost can be found in the TurboBoost section + +### Advanced Examples + +In the following example, we select an allocation for benchmarking a very special and demanding MPI program. We request Turbo off, and 2 full chassis of compute nodes (nodes sharing the same IB switches) for 30 minutes: + +```console +$ qsub -A OPEN-0-0 -q qprod + -l select=18:ibswitch=isw10:mpiprocs=1:ompthreads=16+18:ibswitch=isw20:mpiprocs=16:ompthreads=1 + -l cpu_turbo_boost=0,walltime=00:30:00 + -N Benchmark ./mybenchmark +``` + +The MPI processes will be distributed differently on the nodes connected to the two switches. On the isw10 nodes, we will run 1 MPI process per node with 16 threads per process, on isw20 nodes we will run 16 plain MPI processes. + +Although this example is somewhat artificial, it demonstrates the flexibility of the qsub command options. + +## Job Management + +!!! note + Check the status of your jobs using the `qstat` and `check-pbs-jobs` commands + +```console +$ qstat -a +$ qstat -a -u username +$ qstat -an -u username +$ qstat -f 12345.srv11 +``` + +Example: + +```console +$ qstat -a + +srv11: + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time +--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- +16287.srv11 user1 qlong job1 6183 4 64 -- 144:0 R 38:25 +16468.srv11 user1 qlong job2 8060 4 64 -- 144:0 R 17:44 +16547.srv11 user2 qprod job3x 13516 2 32 -- 48:00 R 00:58 +``` + +In this example user1 and user2 are running jobs named `job1`, `job2`, and `job3x`. `job1` and `job2` are using 4 nodes, 128 cores per node each. `job1` has already run for 38 hours and 25 minutes, and `job2` for 17 hours 44 minutes. So `job1`, for example, has already consumed `64 x 38.41 = 2,458.6` core-hours. `job3x` has already consumed `32 x 0.96 = 30.93` core-hours. These consumed core-hours will be [converted to node-hours][10] and accounted for on the respective project accounts, regardless of whether the allocated cores were actually used for computations. + +The following commands allow you to check the status of your jobs using the `check-pbs-jobs` command, check for the presence of user's PBS jobs' processes on execution hosts, display load and processes, display job standard and error output, and continuously display (`tail -f`) job standard or error output. + +```console +$ check-pbs-jobs --check-all +$ check-pbs-jobs --print-load --print-processes +$ check-pbs-jobs --print-job-out --print-job-err +$ check-pbs-jobs --jobid JOBID --check-all --print-all +$ check-pbs-jobs --jobid JOBID --tailf-job-out +``` + +Examples: + +```console +$ check-pbs-jobs --check-all +JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 +Check session id: OK +Check processes +cn164: OK +cn165: No process +``` + +In this example we see that job `35141.dm2` is not currently running any processes on the allocated node cn165, which may indicate an execution error: + +```console +$ check-pbs-jobs --print-load --print-processes +JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 +Print load +cn164: LOAD: 16.01, 16.01, 16.00 +cn165: LOAD: 0.01, 0.00, 0.01 +Print processes + %CPU CMD +cn164: 0.0 -bash +cn164: 0.0 /bin/bash /var/spool/PBS/mom_priv/jobs/35141.dm2.SC +cn164: 99.7 run-task +... +``` + +In this example, we see that job `35141.dm2` is currently running a process run-task on node `cn164`, using one thread only, while node `cn165` is empty, which may indicate an execution error. + +```console +$ check-pbs-jobs --jobid 35141.dm2 --print-job-out +JOB 35141.dm2, session_id 71995, user user2, nodes cn164,cn165 +Print job standard output: +======================== Job start ========================== +Started at : Fri Aug 30 02:47:53 CEST 2013 +Script name : script +Run loop 1 +Run loop 2 +Run loop 3 +``` + +In this example, we see the actual output (some iteration loops) of the job `35141.dm2`. + +!!! note + Manage your queued or running jobs, using the `qhold`, `qrls`, `qdel`, `qsig`, or `qalter` commands + +You may release your allocation at any time, using the `qdel` command + +```console +$ qdel 12345.srv11 +``` + +You may kill a running job by force, using the `qsig` command + +```console +$ qsig -s 9 12345.srv11 +``` + +Learn more by reading the PBS man page + +```console +$ man pbs_professional +``` + +## Job Execution + +### Jobscript + +!!! note + Prepare the jobscript to run batch jobs in the PBS queue system + +The Jobscript is a user made script controlling a sequence of commands for executing the calculation. It is often written in bash, though other scripts may be used as well. The jobscript is supplied to the PBS `qsub` command as an argument, and is executed by the PBS Professional workload manager. + +!!! note + The jobscript or interactive shell is executed on first of the allocated nodes. + +```console +$ qsub -q qexp -l select=4 -N Name0 ./myjob +$ qstat -n -u username + +srv11: + Req'd Req'd Elap +Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time +--------------- -------- -- |---|---| ------ --- --- ------ ----- - ----- +15209.srv11 username qexp Name0 5530 4 128 -- 01:00 R 00:00 + cn17/0*32+cn108/0*32+cn109/0*32+cn110/0*32 +``` + +In this example, the nodes `cn17`, `cn108`, `cn109`, and `cn110` were allocated for 1 hour via the qexp queue. The `myjob` jobscript will be executed on the node `cn17`, while the nodes `cn108`, `cn109`, and `cn110` are available for use as well. + +The jobscript or interactive shell is by default executed in the `/home` directory: + +```console +$ qsub -q qexp -l select=4 -I +qsub: waiting for job 15210.srv11 to start +qsub: job 15210.srv11 ready + +$ pwd +/home/username +``` + +In this example, 4 nodes were allocated interactively for 1 hour via the `qexp` queue. The interactive shell is executed in the `/home` directory. + +!!! note + All nodes within the allocation may be accessed via SSH. Unallocated nodes are not accessible to the user. + +The allocated nodes are accessible via SSH from login nodes. The nodes may access each other via SSH as well. + +Calculations on allocated nodes may be executed remotely via the MPI, SSH, pdsh, or clush. You may find out which nodes belong to the allocation by reading the `$PBS_NODEFILE` file + +```console +$ qsub -q qexp -l select=4 -I +qsub: waiting for job 15210.srv11 to start +qsub: job 15210.srv11 ready + +$ pwd +/home/username + +$ sort -u $PBS_NODEFILE +cn17.bullx +cn108.bullx +cn109.bullx +cn110.bullx + +$ pdsh -w cn17,cn[108-110] hostname +cn17: cn17 +cn108: cn108 +cn109: cn109 +cn110: cn110 +``` + +In this example, the hostname program is executed via `pdsh` from the interactive shell. The execution runs on all four allocated nodes. The same result would be achieved if the `pdsh` were called from any of the allocated nodes or from the login nodes. + +### Example Jobscript for MPI Calculation + +!!! note + Production jobs must use the /scratch directory for I/O + +The recommended way to run production jobs is to change to the `/scratch` directory early in the jobscript, copy all inputs to `/scratch`, execute the calculations, and copy outputs to the `/home` directory. + +```bash +#!/bin/bash + +cd $PBS_O_WORKDIR + +SCRDIR=/scratch/project/open-00-00/${USER}/myjob +mkdir -p $SCRDIR + +# change to scratch directory, exit on failure +cd $SCRDIR || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/input . +cp $PBS_O_WORKDIR/mympiprog.x . + +# load the MPI module +# (Always specify the module's name and version in your script; +# for the reason, see https://docs.it4i.cz/software/modules/lmod/#loading-modules.) +ml OpenMPI/4.1.1-GCC-10.2.0-Java-1.8.0_221 + +# execute the calculation +mpirun -pernode ./mympiprog.x + +# copy output file to home +cp output $PBS_O_WORKDIR/. + +#exit +exit +``` + +In this example, a directory in `/home` holds the input file input and the `mympiprog.x` executable. We create the `myjob` directory on the `/scratch` filesystem, copy input and executable files from the `/home` directory where the `qsub` was invoked (`$PBS_O_WORKDIR`) to `/scratch`, execute the MPI program `mympiprog.x` and copy the output file back to the `/home` directory. `mympiprog.x` is executed as one process per node, on all allocated nodes. + +!!! note + Consider preloading inputs and executables onto [shared scratch][6] memory before the calculation starts. + +In some cases, it may be impractical to copy the inputs to the `/scratch` memory and the outputs to the `/home` directory. This is especially true when very large input and output files are expected, or when the files should be reused by a subsequent calculation. In such cases, it is the users' responsibility to preload the input files on the shared `/scratch` memory before the job submission, and retrieve the outputs manually after all calculations are finished. + +!!! note + Store the `qsub` options within the jobscript. Use the `mpiprocs` and `ompthreads` qsub options to control the MPI job execution. + +### Example Jobscript for MPI Calculation With Preloaded Inputs + +Example jobscript for an MPI job with preloaded inputs and executables, options for `qsub` are stored within the script: + +```bash +#!/bin/bash +#PBS -q qprod +#PBS -N MYJOB +#PBS -l select=100:mpiprocs=1:ompthreads=16 +#PBS -A OPEN-00-00 + +# job is run using project resources; here ${PBS_ACCOUNT,,} translates to "open-00-00" +SCRDIR=/scratch/project/${PBS_ACCOUNT,,}/${USER}/myjob + +# change to scratch directory, exit on failure +cd $SCRDIR || exit + +# load the MPI module +# (Always specify the module's name and version in your script; +# for the reason, see https://docs.it4i.cz/software/modules/lmod/#loading-modules.) +ml OpenMPI/4.1.1-GCC-10.2.0-Java-1.8.0_221 + +# execute the calculation +mpirun ./mympiprog.x + +#exit +exit +``` + +In this example, input and executable files are assumed to be preloaded manually in the `/scratch/project/open-00-00/$USER/myjob` directory. Because we used the `qprod` queue, we had to specify which project's resources we want to use, and our `PBS_ACCOUNT` variable will be set accordingly (OPEN-00-00). `${PBS_ACCOUNT,,}` uses one of the bash's built-in functions to translate it into lower case. + +Note the `mpiprocs` and `ompthreads` qsub options controlling the behavior of the MPI execution. `mympiprog.x` is executed as one process per node, on all 100 allocated nodes. If `mympiprog.x` implements OpenMP threads, it will run 16 threads per node. + +### Example Jobscript for Single Node Calculation + +!!! note + The local scratch directory is often useful for single node jobs. Local scratch memory will be deleted immediately after the job ends. + +Example jobscript for single node calculation, using [local scratch][6] memory on the node: + +```bash +#!/bin/bash + +# change to local scratch directory +cd /lscratch/$PBS_JOBID || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/input . +cp $PBS_O_WORKDIR/myprog.x . + +# execute the calculation +./myprog.x + +# copy output file to home +cp output $PBS_O_WORKDIR/. + +#exit +exit +``` + +In this example, a directory in `/home` holds the input file input and the executable `myprog.x`. We copy input and executable files from the `/home` directory where the `qsub` was invoked (`$PBS_O_WORKDIR`) to the local `/scratch` memory `/lscratch/$PBS_JOBID`, execute `myprog.x` and copy the output file back to the `/home directory`. `myprog.x` runs on one node only and may use threads. + +### Other Jobscript Examples + +Further jobscript examples may be found in the software section and the [Capacity computing][9] section. + +[1]: #example-jobscript-for-mpi-calculation-with-preloaded-inputs +[2]: resources-allocation-policy.md +[3]: ../salomon/network.md +[5]: ../salomon/7d-enhanced-hypercube.md +[6]: ../salomon/storage.md +[9]: capacity-computing.md +[10]: resources-allocation-policy.md#resource-accounting-policy diff --git a/docs.it4i/general/resource-accounting.md b/docs.it4i/general/resource-accounting.md index 2444b3dedf0571620238fa2bb42b89e23aa941cf..15607a39a710b11de2a81c48d10c776ea87a6506 100644 --- a/docs.it4i/general/resource-accounting.md +++ b/docs.it4i/general/resource-accounting.md @@ -10,7 +10,7 @@ Starting with the 24<sup>th</sup> open access grant competition, the accounting 1. [Karolina GPU][8a] 1. [Karolina FAT][9a] -The accounting runs whenever the nodes are allocated via the PBS Pro workload manager (the `qsub` command), regardless of whether +The accounting runs whenever the nodes are allocated via the Slurm workload manager (the `sbatch`, `salloc` command), regardless of whether the nodes are actually used for any calculation. The same rule applies for unspent [reservations][10a]. ## Conversion Table diff --git a/docs.it4i/general/resource_allocation_and_job_execution.md b/docs.it4i/general/resource_allocation_and_job_execution.md index b28ccd2640ddd1c53512fd0517179b0d5fae3c8a..44191521a9518dcbbd0dc4bcbf779584f151e3d9 100644 --- a/docs.it4i/general/resource_allocation_and_job_execution.md +++ b/docs.it4i/general/resource_allocation_and_job_execution.md @@ -1,58 +1,44 @@ -# Resource Allocation and Job Execution +!!! important "All clusters migrated to Slurm" + We migrated workload managers of all clusters (including Barbora and Karolina) **from PBS to Slurm**! + For more information on how to submit jobs in Slurm, see the [Job Submission and Execution][5] section. -!!! important "Karolina migrating to Slurm" - Starting September 21., we are migrating the Karolina's workload manager **from PBS to Slurm**. - For more information on how to submit jobs in Slurm, see the [Slurm Job Submission and Execution][8] section. - -To run a [job][1], computational resources for this particular job must be allocated. This is done via the [PBS Pro][b] job workload manager software, which distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [PBS Pro User's Guide][2]. - -## Resource Allocation Policy - -Resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. - -### Resource Reservation - -You can request a reservation of a specific number, range, or type of computational resources at [support@it4i.cz][d]. -Note that unspent reserved node-hours count towards the total computational resources used. - -!!! note - See the queue status for [Karolina][a] or [Barbora][c]. - -Read more on the [Resource Allocation Policy][4] page. +# How to Run Jobs ## Job Submission and Execution -The `qsub` command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is an entire node**, with the exception of the `qexp` queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated, the jobscript or interactive shell is executed on first of the allocated nodes.** +To run a [job][1], computational resources for this particular job must be allocated. This is done via the [Slurm][a] job workload manager software, which distributes workloads across the supercomputer. -Read more on the [Job Submission and Execution][5] page. +The `sbatch` or `salloc` command creates a request to the Slurm job manager for allocation of specified resources. +The resources will be allocated when available, subject to allocation policies and constraints. +**After the resources are allocated, the jobscript or interactive shell is executed on first of the allocated nodes.** -## Capacity Computing +Read more on the [Job Submission and Execution][5] page. -!!! note - Use Job arrays when running huge number of jobs. +## Resource Allocation Policy -Use GNU Parallel and/or Job arrays when running (many) single core jobs. +!!! warning + Fair-share has not been implemented yet. -In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute parallel calculations, achieving best runtime, throughput and computer utilization. In this chapter, we discuss the recommended way to run huge numbers of jobs, including **ways to run huge numbers of single core jobs**. +Resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share][3] ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. -Read more on the [Capacity Computing][6] page. +!!! note + See the queue status for [Karolina][d] or [Barbora][e]. -## Vnode Allocation +Read more on the [Resource Allocation Policy][4] page. -The `qgpu` queue on Karolina takes advantage of the division of nodes into vnodes. Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes, each containing 16 CPU cores and 1 GPU card. Vnodes can be allocated to jobs individually – through precise definition of resource list at job submission, you may allocate varying number of resources/GPU cards according to your needs. +## Resource Reservation -Red more on the [Vnode Allocation][7] page. +You can request a reservation of a specific number, range, or type of computational resources at [support@it4i.cz][c]. +Note that unspent reserved node-hours count towards the total computational resources used. [1]: ../index.md#terminology-frequently-used-on-these-pages -[2]: ../pbspro.md +[2]: https://slurm.schedmd.com/documentation.html [3]: job-priority.md#fair-share-priority [4]: resources-allocation-policy.md [5]: job-submission-and-execution.md -[6]: capacity-computing.md -[7]: vnode-allocation.md -[8]: slurm-job-submission-and-execution.md - -[a]: https://extranet.it4i.cz/rsweb/karolina/queues -[b]: https://www.altair.com/pbs-works/ -[c]: https://extranet.it4i.cz/rsweb/barbora/queues -[d]: mailto:support@it4i.cz + +[a]: https://slurm.schedmd.com/ +[b]: https://slurm.schedmd.com/documentation.html +[c]: mailto:support@it4i.cz +[d]: https://extranet.it4i.cz/rsweb/karolina/queues +[e]: https://extranet.it4i.cz/rsweb/barbora/queues diff --git a/docs.it4i/general/resources-allocation-policy.md b/docs.it4i/general/resources-allocation-policy.md index e76c47a8221e8026403ee5be258a720e6ff039fd..e7474a077604f9870b21d5511d85a6f35e665be5 100644 --- a/docs.it4i/general/resources-allocation-policy.md +++ b/docs.it4i/general/resources-allocation-policy.md @@ -1,3 +1,6 @@ +!!!Warning + This page has not been fully updated yet. The page does not reflect the transition from PBS to Slurm. + # Resource Allocation Policy ## Job Queue Policies @@ -14,14 +17,14 @@ Computational resources are subject to [accounting policy][7]. !!! important Queues are divided based on a resource type: `qcpu_` for non-accelerated nodes and `qgpu_` for accelerated nodes. <br><br> - On the Karolina's `qgpu` queue, **you can now allocate 1/8 of the node - 1 GPU and 16 cores**. For more information, see [Allocation of vnodes on qgpu][4].<br><br> + On the Karolina's `qgpu` queue, **you can allocate 1/8 of the node - 1 GPU and 16 cores**. <br><br> -### New Queues +### Queues | <div style="width:86px">Queue</div>| Description | | -------------------------------- | ----------- | | `qcpu` | Production queue for non-accelerated nodes intended for standard production runs. Requires an active project with nonzero remaining resources. Full nodes are allocated. Identical to `qprod`. | -| `qgpu` | Dedicated queue for accessing the NVIDIA accelerated nodes. Requires an active project with nonzero remaining resources. It utilizes 8x NVIDIA A100 with 320GB HBM2 memory per node. The PI needs to explicitly ask support for authorization to enter the queue for all users associated with their project. **On Karolina, you can allocate 1/8 of the node - 1 GPU and 16 cores**. For more information, see [Allocation of vnodes on qgpu][4]. | +| `qgpu` | Dedicated queue for accessing the NVIDIA accelerated nodes. Requires an active project with nonzero remaining resources. It utilizes 8x NVIDIA A100 with 320GB HBM2 memory per node. The PI needs to explicitly ask support for authorization to enter the queue for all users associated with their project. **On Karolina, you can allocate 1/8 of the node - 1 GPU and 16 cores**. For more information, see [Karolina qgpu allocation][4]. | | `qcpu_biz`<br>`qgpu_biz` | Commercial queues, slightly higher priority. | | `qcpu_eurohpc`<br>`qgpu_eurohpc` | EuroHPC queues, slightly higher priority, **Karolina only**. | | `qcpu_exp`<br>`qgpu_exp` | Express queues for testing and running very small jobs. There are 2 nodes always reserved (w/o accelerators), max 8 nodes available per user. The nodes may be allocated on a per core basis. It is configured to run one job and accept five jobs in a queue per user. | @@ -32,18 +35,6 @@ Computational resources are subject to [accounting policy][7]. | `qfat` | Queue for fat node, PI must request authorization to enter the queue for all users associated to their project. | | `qviz` | Visualization queue Intended for pre-/post-processing using OpenGL accelerated graphics. Each user gets 8 cores of a CPU allocated (approx. 64 GB of RAM and 1/8 of the GPU capacity (default "chunk")). If more GPU power or RAM is required, it is recommended to allocate more chunks (with 8 cores each) up to one whole node per user. This is currently also the maximum allowed allocation per one user. One hour of work is allocated by default, the user may ask for 2 hours maximum. | -### Legacy Queues - -Legacy queues stay in production until early 2023. - -| Legacy queue | Replaced by | -| ------------ | ------------------------- | -| `qexp` | `qcpu_exp` & `qgpu_exp` | -| `qprod` | `qcpu` | -| `qlong` | `qcpu_long` | -| `nvidia` | `qgpu` Note that unlike in new queues, only full nodes can be allocated. | -| `qfree` | `qcpu_free` & `qgpu_free` | - See the following subsections for the list of queues: * [Karolina queues][5] @@ -51,28 +42,28 @@ See the following subsections for the list of queues: ## Queue Notes -The job wallclock time defaults to **half the maximum time**, see the table above. Longer wall time limits can be [set manually, see examples][3]. +The job time limit defaults to **half the maximum time**, see the table above. Longer time limits can be [set manually, see examples][3]. -Jobs that exceed the reserved wall clock time (Req'd Time) get killed automatically. The wall clock time limit can be changed for queuing jobs (state Q) using the `qalter` command, however it cannot be changed for a running job (state R). +Jobs that exceed the reserved time limit get killed automatically. The time limit can be changed for queuing jobs (state Q) using the `scontrol modify job` command, however it cannot be changed for a running job. ## Queue Status !!! tip Check the status of jobs, queues and compute nodes [here][c]. - + Display the queue status: ```console -$ qstat -q +$ sinfo -s ``` -The PBS allocation overview may also be obtained using the `rspbs` command: +The Slurm allocation overview may also be obtained using the `rsslurm` command: ```console -$ rspbs -Usage: rspbs [options] +$ rsslurm +Usage: rsslurm [options] Options: --version show program's version number and exit @@ -93,7 +84,7 @@ Options: [1]: job-priority.md [2]: #resource-accounting-policy [3]: job-submission-and-execution.md -[4]: ./vnode-allocation.md +[4]: karolina-slurm.md [5]: ./karolina-queues.md [6]: ./barbora-queues.md [7]: ./resource-accounting.md diff --git a/docs.it4i/general/slurm-job-submission-and-execution.md b/docs.it4i/general/slurm-job-submission-and-execution.md index 5f4a57ea9d8af9473573fc2fbf39f65091002f1f..5524b2c579c7856df63523ceb704632b1e6f66c7 100644 --- a/docs.it4i/general/slurm-job-submission-and-execution.md +++ b/docs.it4i/general/slurm-job-submission-and-execution.md @@ -1,9 +1,8 @@ -# Slurm Job Submission and Execution +# Job Submission and Execution ## Introduction -[Slurm][1] workload manager is used to allocate and access Barbora's and Complementary systems' resources. -Slurm on Karolina will be implemented later in 2023. +[Slurm][1] workload manager is used to allocate and access Karolina's, Barbora's and Complementary systems' resources. A `man` page exists for all Slurm commands, as well as the `--help` command option, which provides a brief summary of options. diff --git a/docs.it4i/general/vnode-allocation.md b/docs.it4i/general/vnode-allocation.md deleted file mode 100644 index 413358f7d5a2789a9c14c5355e32711dedf3f378..0000000000000000000000000000000000000000 --- a/docs.it4i/general/vnode-allocation.md +++ /dev/null @@ -1,147 +0,0 @@ -# Allocation of vnodes on qgpu - -## Introduction - -The `qgpu` queue on Karolina takes advantage of the division of nodes into vnodes. -Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes, -each containing 16 CPU cores and 1 GPU card. -Vnodes can be allocated to jobs individually – -through precise definition of resource list at job submission, -you may allocate varying number of resources/GPU cards according to your needs. - -!!! important "Vnodes and Security" - Division of nodes into vnodes was implemented to be as secure as possible, but it is still a "multi-user mode", - which means that if two users allocate a portion of the same node, they can see each other's running processes. - If this solution is inconvenient for you, consider allocating a whole node. - -## Selection Statement and Chunks - -Requested resources are specified using a selection statement: - -``` --l select=[<N>:]<chunk>[+[<N>:]<chunk> ...] -``` - -`N` specifies the number of chunks; if not specified then `N = 1`.<br> -`chunk` declares the value of each resource in a set of resources which are to be allocated as a unit to a job. - -* `chunk` is seen by the MPI as one node. -* Multiple chunks are then seen as multiple nodes. -* Maximum chunk size is equal to the size of a full physical node (8 GPU cards, 128 cores) - -Default chunk for the `qgpu` queue is configured to contain 1 GPU card and 16 CPU cores, i.e. `ncpus=16:ngpus=1`. - -* `ncpus` specifies number of CPU cores -* `ngpus` specifies number of GPU cards - -### Allocating Single GPU - -Single GPU can be allocated in an interactive session using - -```console -qsub -q qgpu -A OPEN-00-00 -l select=1 -I -``` - -or simply - -```console -qsub -q qgpu -A OPEN-00-00 -I -``` - -In this case, the `ngpus` parameter is optional, since it defaults to `1`. -You can verify your allocation either in the PBS using the `qstat` command, -or by checking the number of allocated GPU cards in the `CUDA_VISIBLE_DEVICES` variable: - -```console -$ qstat -F json -f $PBS_JOBID | grep exec_vnode - "exec_vnode":"(acn53[0]:ncpus=16:ngpus=1)" - -$ echo $CUDA_VISIBLE_DEVICES -GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00 -``` - -The output shows that you have been allocated vnode acn53[0]. - -### Allocating Single Accelerated Node - -!!! tip "Security tip" - Allocating a whole node prevents other users from seeing your running processes. - -Single accelerated node can be allocated in an interactive session using - -```console -qsub -q qgpu -A OPEN-00-00 -l select=8 -I -``` - -Setting `select=8` automatically allocates a whole accelerated node and sets `mpiproc`. -So for `N` full nodes, set `select` to `N x 8`. -However, note that it may take some time before your jobs are executed -if the required amount of full nodes isn't available. - -### Allocating Multiple GPUs - -!!! important "Security risk" - If two users allocate a portion of the same node, they can see each other's running processes. - When required for security reasons, consider allocating a whole node. - -Again, the following examples use only the selection statement, so no additional setting is required. - -```console -qsub -q qgpu -A OPEN-00-00 -l select=2 -I -``` - -In this example two chunks will be allocated on the same node, if possible. - -```console -qsub -q qgpu -A OPEN-00-00 -l select=16 -I -``` - -This example allocates two whole accelerated nodes. - -Multiple vnodes within the same chunk can be allocated using the `ngpus` parameter. -For example, to allocate 2 vnodes in an interactive mode, run - -```console -qsub -q qgpu -A OPEN-00-00 -l select=1:ngpus=2:mpiprocs=2 -I -``` - -Remember to **set the number of `mpiprocs` equal to that of `ngpus`** to spawn an according number of MPI processes. - -To verify the correctness: - -```console -$ qstat -F json -f $PBS_JOBID | grep exec_vnode - "exec_vnode":"(acn53[0]:ncpus=16:ngpus=1+acn53[1]:ncpus=16:ngpus=1)" - -$ echo $CUDA_VISIBLE_DEVICES | tr ',' '\n' -GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00 -GPU-5e88c15c-e331-a1e4-c80c-ceb3f49c300e -``` - -The number of chunks to allocate is specified in the `select` parameter. -For example, to allocate 2 chunks, each with 4 GPUs, run - -```console -qsub -q qgpu -A OPEN-00-00 -l select=2:ngpus=4:mpiprocs=4 -I -``` - -To verify the correctness: - -```console -$ cat > print-cuda-devices.sh <<EOF -#!/bin/bash -echo \$CUDA_VISIBLE_DEVICES -EOF - -$ chmod +x print-cuda-devices.sh -$ ml OpenMPI/4.1.4-GCC-11.3.0 -$ mpirun ./print-cuda-devices.sh | tr ',' '\n' | sort | uniq -GPU-0910c544-aef7-eab8-f49e-f90d4d9b7560 -GPU-1422a1c6-15b4-7b23-dd58-af3a233cda51 -GPU-3dbf6187-9833-b50b-b536-a83e18688cff -GPU-3dd0ae4b-e196-7c77-146d-ae16368152d0 -GPU-93edfee0-4cfa-3f82-18a1-1e5f93e614b9 -GPU-9c8143a6-274d-d9fc-e793-a7833adde729 -GPU-ad06ab8b-99cd-e1eb-6f40-d0f9694601c0 -GPU-dc0bc3d6-e300-a80a-79d9-3e5373cb84c9 -``` diff --git a/docs.it4i/index.md b/docs.it4i/index.md index 1d2d8143f488f869fa2894231aa87a89ab039861..bfde8e5fe63fa605858645537ec1b68fe0f9da55 100644 --- a/docs.it4i/index.md +++ b/docs.it4i/index.md @@ -43,7 +43,7 @@ Proficiency in MPI, OpenMP, CUDA, UPC, or GPI2 programming may be gained via [tr * **primary investigator (PI):** a person responsible for execution of computational project and utilization of computational resources allocated to that project * **collaborator:** a person participating in the execution of a computational project and utilization of computational resources allocated to that project * **project:** a computational project under investigation by the PI – the project is identified by the project ID. Computational resources are allocated and charged per project. -* **jobscript:** a script to be executed by the PBS Professional workload manager +* **jobscript:** a script to be executed by the Slurm workload manager ## Conventions diff --git a/docs.it4i/job-features.md b/docs.it4i/job-features.md index 858015b254a502b998ac4fbc96654ff7092329e2..6a57b7af1fb6116f1395c99cbf7224e74f112bf3 100644 --- a/docs.it4i/job-features.md +++ b/docs.it4i/job-features.md @@ -1,17 +1,31 @@ -# Job Features +/ Job Features -Special features installed/configured on the fly on allocated nodes, features are requested in PBS job. +Special features installed/configured on the fly on allocated nodes, features are requested in Slurm job usin specially formatted comment. ```console -$ qsub... -l feature=req +$ salloc... --comment "use:feature=req" ``` +or + +``` +SBATCH --comment "use:feature=req" +``` + +or for multiple features + +```console +$ salloc ... --comment "use:feature1=req1 use:feature2=req2 ..." +``` + +where feature is feature name and req is requested value (true, version string, etc.) + ## Xorg [Xorg][2] is a free and open source implementation of the X Window System imaging server maintained by the X.Org Foundation. Xorg is available only for Karolina accelerated nodes Acn[01-72]. ```console -$ qsub ... -l xorg=True +$ salloc ... --comment "use:xorg=True" ``` ## VTune Support @@ -19,20 +33,23 @@ $ qsub ... -l xorg=True Load the VTune kernel modules. ```console -$ qsub ... -l vtune=version_string +$ salloc ... --comment "use:vtune=version_string" ``` `version_string` is VTune version e.g. 2019_update4 ## Global RAM Disk +!!! warning + The feature has not been implemented on Slurm yet. + The Global RAM disk deploys BeeGFS On Demand parallel filesystem, using local (i.e. allocated nodes') RAM disks as a storage backend. The Global RAM disk is mounted at `/mnt/global_ramdisk`. ```console -$ qsub ... -l global_ramdisk=true +$ salloc ... --comment "use:global_ramdisk=true" ```  @@ -40,18 +57,18 @@ $ qsub ... -l global_ramdisk=true ### Example ```console -$ qsub -q qprod -l select=4,global_ramdisk=true ./jobscript +$ sbatch -A PROJECT-ID -p qcpu --nodes 4 --comment="use:global_ramdisk=true" ./jobscript ``` -This command submits a 4-node job in the `qprod` queue; -once running, a 440GB RAM disk shared across the 4 nodes will be created. +This command submits a 4-node job in the `qcpu` queue; +once running, a RAM disk shared across the 4 nodes will be created. The RAM disk will be accessible at `/mnt/global_ramdisk` and files written to this RAM disk will be visible on all 4 nodes. The file system is private to a job and shared among the nodes, created when the job starts and deleted at the job's end. -!!! note +!!! warning The Global RAM disk will be deleted immediately after the calculation end. Users should take care to save the output data from within the jobscript. @@ -87,7 +104,7 @@ Load a kernel module that allows saving/restoring values of MSR registers. Uses [LLNL MSR-SAFE][a]. ```console -$ qsub ... -l msr=version_string +$ salloc ... --comment "use:msr=version_string" ``` `version_string` is MSR-SAFE version e.g. 1.4.0 @@ -98,34 +115,12 @@ $ qsub ... -l msr=version_string !!! Warning Available on Barbora nodes only. -## Offlining CPU Cores - -!!! Info - Not available now. - -To offline N CPU cores: - -```console -$ qsub ... -l cpu_offline_cores=N -``` - -To offline CPU cores according to pattern: - -```console -$ qsub ... -l cpu_offline_cores=PATTERN -``` - -where `PATTERN` is a list of core's numbers to offline, separated by the character 'c' (e.g. "5c11c16c23c"). - -!!! Danger - Hazardous, it causes Lustre threads disruption. - ## HDEEM Support Load the HDEEM software stack. The [High Definition Energy Efficiency Monitoring][b] (HDEEM) library is a software interface used to measure power consumption of HPC clusters with bullx blades. ```console -$ qsub ... -l hdeem=version_string +$ salloc ... --comment "use:hdeem=version_string" ``` `version_string` is HDEEM version e.g. 2.2.8-1 @@ -135,25 +130,28 @@ $ qsub ... -l hdeem=version_string ## NVMe Over Fabrics File System +!!! warning + The feature has not been implemented on Slurm yet. + Attach a volume from an NVMe storage and mount it as a file-system. File-system is mounted on /mnt/nvmeof (on the first node of the job). Barbora cluster provides two NVMeoF storage nodes equipped with NVMe disks. Each storage node contains seven 1.6TB NVMe disks and provides net aggregated capacity of 10.18TiB. Storage space is provided using the NVMe over Fabrics protocol; RDMA network i.e. InfiniBand is used for data transfers. ```console -$ qsub ... -l nvmeof=size +$ salloc ... --comment "use:nvmeof=size" ``` -`size` is a size of the requested volume, PBS size conventions are used, e.g. 10t +`size` is a size of the requested volume, size conventions are used, e.g. 10t Create a shared file-system on the attached NVMe file-system and make it available on all nodes of the job. Append `:shared` to the size specification, shared file-system is mounted on /mnt/nvmeof-shared. ```console -$ qsub ... -l nvmeof=size:shared +$ salloc ... --comment "use:nvmeof=size:shared" ``` For example: ```console -$ qsub ... -l nvmeof=10t:shared +$ salloc ... --comment "use:nvmeof=10t:shared" ``` !!! Warning @@ -161,12 +159,15 @@ $ qsub ... -l nvmeof=10t:shared ## Smart Burst Buffer -Accelerate SCRATCH storage using the Smart Burst Buffer (SBB) technology. A specific Burst Buffer process is launched and Burst Buffer resources (CPUs, memory, flash storage) are allocated on an SBB storage node for acceleration (I/O caching) of SCRATCH data operations. The SBB profile file `/lscratch/$PBS_JOBID/sbb.sh` is created on the first allocated node of job. For SCRATCH acceleration, the SBB profile file has to be sourced into the shell environment - provided environment variables have to be defined in the process environment. Modified data is written asynchronously to a backend (Lustre) filesystem, writes might be proceeded after job termination. +!!! warning + The feature has not been implemented on Slurm yet. + +Accelerate SCRATCH storage using the Smart Burst Buffer (SBB) technology. A specific Burst Buffer process is launched and Burst Buffer resources (CPUs, memory, flash storage) are allocated on an SBB storage node for acceleration (I/O caching) of SCRATCH data operations. The SBB profile file `/lscratch/$SLURM_JOB_ID/sbb.sh` is created on the first allocated node of job. For SCRATCH acceleration, the SBB profile file has to be sourced into the shell environment - provided environment variables have to be defined in the process environment. Modified data is written asynchronously to a backend (Lustre) filesystem, writes might be proceeded after job termination. Barbora cluster provides two SBB storage nodes equipped with NVMe disks. Each storage node contains ten 3.2TB NVMe disks and provides net aggregated capacity of 29.1TiB. Acceleration uses RDMA network i.e. InfiniBand is used for data transfers. ```console -$ qsub ... -l sbb=spec +$ salloc ... --comment "use:sbb=spec: ``` `spec` specifies amount of resources requested for Burst Buffer (CPUs, memory, flash storage), available values are small, medium, and large @@ -174,7 +175,7 @@ $ qsub ... -l sbb=spec Loading SBB profile: ```console -$ source /lscratch/$PBS_JOBID/sbb.sh +$ source /lscratch/$SLURM_JOB_ID/sbb.sh ``` !!! Warning diff --git a/docs.it4i/karolina/introduction.md b/docs.it4i/karolina/introduction.md index d0162bf22ad1406a8568180757138882d36901ed..2bf7eb249aa463f362eec7acc397afeec5f39759 100644 --- a/docs.it4i/karolina/introduction.md +++ b/docs.it4i/karolina/introduction.md @@ -8,7 +8,7 @@ The cluster runs with an operating system compatible with the Red Hat [Linux fam The user data shared file system and job data shared file-system are available to users. -The [PBS Professional Open Source Project][b] workload manager provides [computing resources allocations and job execution][3]. +The [Slurm][b] workload manager provides [computing resources allocations and job execution][3]. Read more on how to [apply for resources][4], [obtain login credentials][5] and [access the cluster][6]. @@ -20,4 +20,4 @@ Read more on how to [apply for resources][4], [obtain login credentials][5] and [6]: ../general/shell-and-data-access.md [a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg -[b]: https://www.pbspro.org/ +[b]: https://slurm.schedmd.com/ diff --git a/docs.it4i/software/bio/omics-master/overview.md b/docs.it4i/software/bio/omics-master/overview.md index f673c899b1624247021433f4131e10ffe3ac96bb..04145565271700fd2577e29ea995949df7ef23be 100644 --- a/docs.it4i/software/bio/omics-master/overview.md +++ b/docs.it4i/software/bio/omics-master/overview.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Overview A human NGS data processing solution. diff --git a/docs.it4i/software/cae/comsol/comsol-multiphysics.md b/docs.it4i/software/cae/comsol/comsol-multiphysics.md index c71bf25b9a17e6ac8201e87de777c95085bcc300..5a997572abed39eb584d58b92ecf530357cca4f0 100644 --- a/docs.it4i/software/cae/comsol/comsol-multiphysics.md +++ b/docs.it4i/software/cae/comsol/comsol-multiphysics.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # COMSOL Multiphysics ## Introduction diff --git a/docs.it4i/software/chemistry/gaussian.md b/docs.it4i/software/chemistry/gaussian.md index a6b951708571b4149865b42dfc18969b8a540a86..638a60cea0d5396f7eee5dc03b0060b1de385b2a 100644 --- a/docs.it4i/software/chemistry/gaussian.md +++ b/docs.it4i/software/chemistry/gaussian.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Gaussian ## Introduction diff --git a/docs.it4i/software/chemistry/molpro.md b/docs.it4i/software/chemistry/molpro.md index 92b9fe55829abc7986184694ae8947abdcf5715c..079babcaf58579b46c92d4e08f07674259e0141c 100644 --- a/docs.it4i/software/chemistry/molpro.md +++ b/docs.it4i/software/chemistry/molpro.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Molpro ## Introduction diff --git a/docs.it4i/software/chemistry/nwchem.md b/docs.it4i/software/chemistry/nwchem.md index 129a4df24d03ade482d83f814131b5a5d6da89aa..b351b347326c6267726d17774270251bf686db3a 100644 --- a/docs.it4i/software/chemistry/nwchem.md +++ b/docs.it4i/software/chemistry/nwchem.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # NWChem ## Introduction diff --git a/docs.it4i/software/chemistry/orca.md b/docs.it4i/software/chemistry/orca.md index 1d7a092b1bcadb0bb254bc0e34a6b4776f1bcdae..e6fc8bcf312c779ac39bcedb2ac54309cf5399dc 100644 --- a/docs.it4i/software/chemistry/orca.md +++ b/docs.it4i/software/chemistry/orca.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ORCA ## Introduction diff --git a/docs.it4i/software/chemistry/phono3py.md b/docs.it4i/software/chemistry/phono3py.md index a68f38c68158ef0bd703f4cc90c523020cda42c1..65943aab12f27ad97279edc78fb4b895bb103883 100644 --- a/docs.it4i/software/chemistry/phono3py.md +++ b/docs.it4i/software/chemistry/phono3py.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Phono3py ## Introduction diff --git a/docs.it4i/software/chemistry/phonopy.md b/docs.it4i/software/chemistry/phonopy.md index 6488be8b663520ae7417f68459006c161df9006e..c0b31b3296d10114ab8a2bfbc438ebb982fef425 100644 --- a/docs.it4i/software/chemistry/phonopy.md +++ b/docs.it4i/software/chemistry/phonopy.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Phonopy ## Introduction diff --git a/docs.it4i/software/data-science/dask.md b/docs.it4i/software/data-science/dask.md index 0f9120b43783100021a9b1760fe91e13d9b810d9..592701539f35de628ed542dd4228dd81910877e4 100644 --- a/docs.it4i/software/data-science/dask.md +++ b/docs.it4i/software/data-science/dask.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Dask [Dask](https://docs.dask.org/en/latest/) is a popular open-source library that allows you to diff --git a/docs.it4i/software/debuggers/allinea-ddt.md b/docs.it4i/software/debuggers/allinea-ddt.md index 45fa4d8be972f31bd2e83bf8aebbfb3f51284c28..b435f97a58b5180ee4b923587ac4e3b7a729e656 100644 --- a/docs.it4i/software/debuggers/allinea-ddt.md +++ b/docs.it4i/software/debuggers/allinea-ddt.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Allinea Forge (DDT, MAP) ## Introduction diff --git a/docs.it4i/software/debuggers/allinea-performance-reports.md b/docs.it4i/software/debuggers/allinea-performance-reports.md index 81d6f3cfe596bb721fe1077cd8c8f4e90df35e67..cf4af0c2284abfd254db337c6abbad448ad6b661 100644 --- a/docs.it4i/software/debuggers/allinea-performance-reports.md +++ b/docs.it4i/software/debuggers/allinea-performance-reports.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Allinea Performance Reports ## Introduction diff --git a/docs.it4i/software/debuggers/intel-vtune-amplifier.md b/docs.it4i/software/debuggers/intel-vtune-amplifier.md index 629c6280606ef00e8a3a307dcbd3a49af2b0891f..9675e77d3dc5e8974c9135366536ba296afece5d 100644 --- a/docs.it4i/software/debuggers/intel-vtune-amplifier.md +++ b/docs.it4i/software/debuggers/intel-vtune-amplifier.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Intel VTune Amplifier XE ## Introduction diff --git a/docs.it4i/software/debuggers/intel-vtune-profiler.md b/docs.it4i/software/debuggers/intel-vtune-profiler.md index e0bec4276308e789ef8304e1758172bc0fef0bb4..5562557c54ee9d2ecca7d8264c143f5c449ac3d0 100644 --- a/docs.it4i/software/debuggers/intel-vtune-profiler.md +++ b/docs.it4i/software/debuggers/intel-vtune-profiler.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Intel VTune Profiler ## Introduction diff --git a/docs.it4i/software/debuggers/total-view.md b/docs.it4i/software/debuggers/total-view.md index cfaf1a8c746db4953c2276f39f707f69051c8537..3f06d86e20cb60b5090f6fdbf3718046f9650516 100644 --- a/docs.it4i/software/debuggers/total-view.md +++ b/docs.it4i/software/debuggers/total-view.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # TotalView ## Introduction diff --git a/docs.it4i/software/isv_licenses.md b/docs.it4i/software/isv_licenses.md index 544dafa7c89912f32ef49c966a66070bcef488b1..e934e781ea10867297e59351310f8a112c3575aa 100644 --- a/docs.it4i/software/isv_licenses.md +++ b/docs.it4i/software/isv_licenses.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ISV Licenses ## Guide to Managing Independent Software Vendor Licenses diff --git a/docs.it4i/software/lang/csc.md b/docs.it4i/software/lang/csc.md index 40c10f9788fcf298b2d2fc7879f819412c7c8469..375b3640648619185677e8e2ce5a517226ccee3c 100644 --- a/docs.it4i/software/lang/csc.md +++ b/docs.it4i/software/lang/csc.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # CSharp C# is available on the cluster. diff --git a/docs.it4i/software/machine-learning/deepdock.md b/docs.it4i/software/machine-learning/deepdock.md index d912becb7361b36b0da3e7a7a2be38373d9ca24b..e2d9b7b7f4f1a29832404f28ceab7b341893aba5 100644 --- a/docs.it4i/software/machine-learning/deepdock.md +++ b/docs.it4i/software/machine-learning/deepdock.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # DeepDock Adapted from [https://github.com/OptiMaL-PSE-Lab/DeepDock](https://github.com/OptiMaL-PSE-Lab/DeepDock) diff --git a/docs.it4i/software/mpi/mpi4py-mpi-for-python.md b/docs.it4i/software/mpi/mpi4py-mpi-for-python.md index 1fe8efad002f17c65eb40c89525b31ad7b6c4b03..bbd3a36f64266b22a7e28a9454694bafea635719 100644 --- a/docs.it4i/software/mpi/mpi4py-mpi-for-python.md +++ b/docs.it4i/software/mpi/mpi4py-mpi-for-python.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # MPI4Py (MPI for Python) ## Introduction diff --git a/docs.it4i/software/numerical-languages/matlab.md b/docs.it4i/software/numerical-languages/matlab.md index 3a940d712b0f575fead2405096f04b76f28da172..2ece7ce9919110a48da2a423f65b4e32d04f419a 100644 --- a/docs.it4i/software/numerical-languages/matlab.md +++ b/docs.it4i/software/numerical-languages/matlab.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # MATLAB ## Introduction diff --git a/docs.it4i/software/numerical-languages/octave.md b/docs.it4i/software/numerical-languages/octave.md index 7e863cdf804aef6661672524acfd98d29941fb0a..f5d6f5c9d518e74f10214b906d4b2660b066e4f7 100644 --- a/docs.it4i/software/numerical-languages/octave.md +++ b/docs.it4i/software/numerical-languages/octave.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Octave ## Introduction diff --git a/docs.it4i/software/numerical-languages/r.md b/docs.it4i/software/numerical-languages/r.md index 5caa8455c8d9a3c97638c18063eee6cdebac9c97..a562fa5ae4d47a24d1b46afa0497612c413bd9bb 100644 --- a/docs.it4i/software/numerical-languages/r.md +++ b/docs.it4i/software/numerical-languages/r.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # R ## Introduction diff --git a/docs.it4i/software/nvidia-cuda.md b/docs.it4i/software/nvidia-cuda.md index 8ca378244e0e65ebb05926f72894b4bf6683a5e5..10457081f8bdc75e21bd0c206d32b3e3b625c53a 100644 --- a/docs.it4i/software/nvidia-cuda.md +++ b/docs.it4i/software/nvidia-cuda.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # NVIDIA CUDA ## Introduction diff --git a/docs.it4i/software/nvidia-hip.md b/docs.it4i/software/nvidia-hip.md index 373bc5e931ed51c904655ad3884cf05dd3e55a60..a444aae59fe37f5c304458c2eb67c07663a096fa 100644 --- a/docs.it4i/software/nvidia-hip.md +++ b/docs.it4i/software/nvidia-hip.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ROCm HIP ## Introduction diff --git a/docs.it4i/software/tools/ansys/ansys-cfx.md b/docs.it4i/software/tools/ansys/ansys-cfx.md index b1411b0a9a1c2c42e5559f53cf6177ff56aa4f22..e76bdc53f0de566d64a9205e4493411df7b03a05 100644 --- a/docs.it4i/software/tools/ansys/ansys-cfx.md +++ b/docs.it4i/software/tools/ansys/ansys-cfx.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ANSYS CFX [ANSYS CFX][a] is a high-performance, general purpose fluid dynamics program that has been applied to solve wide-ranging fluid flow problems for over 20 years. At the heart of ANSYS CFX is its advanced solver technology, the key to achieving reliable and accurate solutions quickly and robustly. The modern, highly parallelized solver is the foundation for an abundant choice of physical models to capture virtually any type of phenomena related to fluid flow. The solver and its many physical models are wrapped in a modern, intuitive, and flexible GUI and user environment, with extensive capabilities for customization and automation using session files, scripting and a powerful expression language. diff --git a/docs.it4i/software/tools/ansys/ansys-fluent.md b/docs.it4i/software/tools/ansys/ansys-fluent.md index 378f748b47b1589a0eae73277a3eb361d894b3de..372cae69924b3bce9b372735c726403715713ea2 100644 --- a/docs.it4i/software/tools/ansys/ansys-fluent.md +++ b/docs.it4i/software/tools/ansys/ansys-fluent.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ANSYS Fluent [ANSYS Fluent][a] software contains the broad physical modeling capabilities needed to model flow, turbulence, heat transfer, and reactions for industrial applications ranging from air flow over an aircraft wing to combustion in a furnace, from bubble columns to oil platforms, from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants. Special models that give the software the ability to model in-cylinder combustion, aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach. diff --git a/docs.it4i/software/tools/ansys/ansys-ls-dyna.md b/docs.it4i/software/tools/ansys/ansys-ls-dyna.md index 0c68fd65045339f6eefe6a9d7f704ab6a8ff117e..f44d92c259320bd09ec4b2d416bdbaf1f2713e75 100644 --- a/docs.it4i/software/tools/ansys/ansys-ls-dyna.md +++ b/docs.it4i/software/tools/ansys/ansys-ls-dyna.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ANSYS LS-DYNA [ANSYSLS-DYNA][a] provides convenient and easy-to-use access to the technology-rich, time-tested explicit solver without the need to contend with the complex input requirements of this sophisticated program. Introduced in 1996, ANSYS LS-DYNA capabilities have helped customers in numerous industries to resolve highly intricate design issues. ANSYS Mechanical users have been able to take advantage of complex explicit solutions for a long time utilizing the traditional ANSYS Parametric Design Language (APDL) environment. These explicit capabilities are available to ANSYS Workbench users as well. The Workbench platform is a powerful, comprehensive, easy-to-use environment for engineering simulation. CAD import from all sources, geometry cleanup, automatic meshing, solution, parametric optimization, result visualization, and comprehensive report generation are all available within a single fully interactive modern graphical user environment. diff --git a/docs.it4i/software/tools/ansys/ansys-mechanical-apdl.md b/docs.it4i/software/tools/ansys/ansys-mechanical-apdl.md index d027f175f9651b825366d4a8c464629bb47704da..b650839e22e4749005050dd36d59d611d2340200 100644 --- a/docs.it4i/software/tools/ansys/ansys-mechanical-apdl.md +++ b/docs.it4i/software/tools/ansys/ansys-mechanical-apdl.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ANSYS MAPDL [ANSYS Multiphysics][a] offers a comprehensive product solution for both multiphysics and single-physics analysis. The product includes structural, thermal, fluid, and both high- and low-frequency electromagnetic analysis. The product also contains solutions for both direct and sequentially coupled physics problems including direct coupled-field elements and the ANSYS multi-field solver. diff --git a/docs.it4i/software/tools/ansys/ansys.md b/docs.it4i/software/tools/ansys/ansys.md index fe49e40d161974b8c9e32cbef5f04b163545c964..f9d238b286471130bf1b5806887b585b5e0874ca 100644 --- a/docs.it4i/software/tools/ansys/ansys.md +++ b/docs.it4i/software/tools/ansys/ansys.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Overview of ANSYS Products [SVS FEM][a] as [ANSYS Channel partner][b] for the Czech Republic provided all ANSYS licenses for our clusters and supports all ANSYS Products (Multiphysics, Mechanical, MAPDL, CFX, Fluent, Maxwell, LS-DYNA, etc.) to IT staff and ANSYS users. In case of a problem with ANSYS functionality, contact [hotline@svsfem.cz][c]. diff --git a/docs.it4i/software/tools/ansys/workbench.md b/docs.it4i/software/tools/ansys/workbench.md index fb16259f1d226e79a015607ffcec1db1a57eeb9d..59518f2ad6b86efced066356232b5899ce34da19 100644 --- a/docs.it4i/software/tools/ansys/workbench.md +++ b/docs.it4i/software/tools/ansys/workbench.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Workbench ## Workbench Batch Mode diff --git a/docs.it4i/software/tools/singularity-it4i.md b/docs.it4i/software/tools/singularity-it4i.md index c37ef8382eb4e7ac8447c70e25ef493517b1c6bf..5e6b0c3b2d43b2f768d91c11c7377eef309b782a 100644 --- a/docs.it4i/software/tools/singularity-it4i.md +++ b/docs.it4i/software/tools/singularity-it4i.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Apptainer on IT4Innovations !!!note "Singularity name change" diff --git a/docs.it4i/software/tools/virtualization.md b/docs.it4i/software/tools/virtualization.md index f18bc793df945c198d610729f48237e6dc63e863..df372328e1690a870da9747fcd0c79befb15f004 100644 --- a/docs.it4i/software/tools/virtualization.md +++ b/docs.it4i/software/tools/virtualization.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # Virtualization <!-- diff --git a/docs.it4i/software/viz/NICEDCVsoftware.md b/docs.it4i/software/viz/NICEDCVsoftware.md index 507a60c854404ebaaa8c1e3aa36833a9140bafc3..aef5b0e2576a45757c279c68cdcca85054fb9d7a 100644 --- a/docs.it4i/software/viz/NICEDCVsoftware.md +++ b/docs.it4i/software/viz/NICEDCVsoftware.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # NICE DCV **Install NICE DCV** (user-computer) diff --git a/docs.it4i/software/viz/gpi2.md b/docs.it4i/software/viz/gpi2.md index e459fe1ed49090e8f4f5fea8de47023a7812083c..fce7808c2795fd3d0b1aee976691e4572f294c0d 100644 --- a/docs.it4i/software/viz/gpi2.md +++ b/docs.it4i/software/viz/gpi2.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # GPI-2 ## Introduction diff --git a/docs.it4i/software/viz/insitu.md b/docs.it4i/software/viz/insitu.md index ca9c5203844bcb9b232fa748a76af88fa5646a72..ba97ffa05e01b08b3567f6a00e8e00ca225a4e6c 100644 --- a/docs.it4i/software/viz/insitu.md +++ b/docs.it4i/software/viz/insitu.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # In Situ Visualization ## Introduction diff --git a/docs.it4i/software/viz/openfoam.md b/docs.it4i/software/viz/openfoam.md index 8f35268a57829ea5bc26f8c1d9d39fcfc0209021..d036b2480c4716d4b240ca79b099f7964d333a99 100644 --- a/docs.it4i/software/viz/openfoam.md +++ b/docs.it4i/software/viz/openfoam.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # OpenFOAM OpenFOAM is a free, open source CFD software package. diff --git a/docs.it4i/software/viz/paraview.md b/docs.it4i/software/viz/paraview.md index ff22a94e515ad47a80e04277a539a9f769b92590..1a14ad5f391d3908fc966204b90299e92ac29e5c 100644 --- a/docs.it4i/software/viz/paraview.md +++ b/docs.it4i/software/viz/paraview.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # ParaView An open-source, multi-platform data analysis and visualization application. diff --git a/docs.it4i/software/viz/vgl.md b/docs.it4i/software/viz/vgl.md index 9af22febb93eb37351e0721113751c6f16ceb638..60ce5b26df523c676da9a3baee7f9eace3a4f4eb 100644 --- a/docs.it4i/software/viz/vgl.md +++ b/docs.it4i/software/viz/vgl.md @@ -1,3 +1,6 @@ +!!!warning + This page has not been updated yet. The page does not reflect the transition from PBS to Slurm. + # VirtualGL VirtualGL is an open source program that redirects the 3D rendering commands from Unix and Linux OpenGL applications to 3D accelerator hardware in a dedicated server and displays the rendered output interactively to a thin client located elsewhere on the network. diff --git a/mkdocs.yml b/mkdocs.yml index 1f10b2e38c39a6c49b310b815335816d1cd83afb..3f7eec96c57e260e75d61a1e45d95a3ea0c185cb 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -96,20 +96,19 @@ nav: - VPN Access: general/accessing-the-clusters/vpn-access.md - Run Jobs: - Introduction: general/resource_allocation_and_job_execution.md + - Job Submission and Execution: general/job-submission-and-execution.md - Resources Allocation: - Resource Allocation Policy: general/resources-allocation-policy.md - Karolina Queues: general/karolina-queues.md - Barbora Queues: general/barbora-queues.md - Resource Accounting Policy: general/resource-accounting.md - Job Priority: general/job-priority.md - - Job Submission and Execution: general/job-submission-and-execution.md - - Slurm Job Submission and Execution: general/slurm-job-submission-and-execution.md - - Capacity Computing: - - Introduction: general/capacity-computing.md - - Job Arrays: general/job-arrays.md - - HyperQueue: general/hyperqueue.md - - Parallel Computing and MPI: general/karolina-mpi.md - - Vnode Allocation: general/vnode-allocation.md +# - Slurm Job Submission and Execution: general/slurm-job-submission-and-execution.md +# - Capacity Computing: +# - Introduction: general/capacity-computing.md +# - Job Arrays: general/job-arrays.md +# - HyperQueue: general/hyperqueue.md +# - Parallel Computing and MPI: general/karolina-mpi.md - Other Services: - OpenCode: general/opencode.md - Technical Information: