Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
docs.it4i.cz
Manage
Activity
Members
Labels
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
SCS
docs.it4i.cz
Commits
bb89ba74
Commit
bb89ba74
authored
1 year ago
by
Roman Sliva
Browse files
Options
Downloads
Patches
Plain Diff
Remove Vnode Allocataion
parent
75ffeeba
No related branches found
No related tags found
1 merge request
!440
PBS eradication
Pipeline
#34144
failed
1 year ago
Stage: test
Stage: build
Stage: deploy
Stage: after_test
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
docs.it4i/general/vnode-allocation.md
+0
-147
0 additions, 147 deletions
docs.it4i/general/vnode-allocation.md
mkdocs.yml
+0
-1
0 additions, 1 deletion
mkdocs.yml
with
0 additions
and
148 deletions
docs.it4i/general/vnode-allocation.md
deleted
100644 → 0
+
0
−
147
View file @
75ffeeba
# Allocation of vnodes on qgpu
## Introduction
The
`qgpu`
queue on Karolina takes advantage of the division of nodes into vnodes.
Accelerated node equipped with two 64-core processors and eight GPU cards is treated as eight vnodes,
each containing 16 CPU cores and 1 GPU card.
Vnodes can be allocated to jobs individually –
through precise definition of resource list at job submission,
you may allocate varying number of resources/GPU cards according to your needs.
!!! important "Vnodes and Security"
Division of nodes into vnodes was implemented to be as secure as possible, but it is still a "multi-user mode",
which means that if two users allocate a portion of the same node, they can see each other's running processes.
If this solution is inconvenient for you, consider allocating a whole node.
## Selection Statement and Chunks
Requested resources are specified using a selection statement:
```
-l select=[<N>:]<chunk>[+[<N>:]<chunk> ...]
```
`N`
specifies the number of chunks; if not specified then
`N = 1`
.
<br>
`chunk`
declares the value of each resource in a set of resources which are to be allocated as a unit to a job.
*
`chunk`
is seen by the MPI as one node.
*
Multiple chunks are then seen as multiple nodes.
*
Maximum chunk size is equal to the size of a full physical node (8 GPU cards, 128 cores)
Default chunk for the
`qgpu`
queue is configured to contain 1 GPU card and 16 CPU cores, i.e.
`ncpus=16:ngpus=1`
.
*
`ncpus`
specifies number of CPU cores
*
`ngpus`
specifies number of GPU cards
### Allocating Single GPU
Single GPU can be allocated in an interactive session using
```
console
qsub -q qgpu -A OPEN-00-00 -l select=1 -I
```
or simply
```
console
qsub -q qgpu -A OPEN-00-00 -I
```
In this case, the
`ngpus`
parameter is optional, since it defaults to
`1`
.
You can verify your allocation either in the PBS using the
`qstat`
command,
or by checking the number of allocated GPU cards in the
`CUDA_VISIBLE_DEVICES`
variable:
```
console
$
qstat
-F
json
-f
$PBS_JOBID
|
grep
exec_vnode
"exec_vnode":"(acn53[0]:ncpus=16:ngpus=1)"
$
echo
$CUDA_VISIBLE_DEVICES
GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00
```
The output shows that you have been allocated vnode acn53[0].
### Allocating Single Accelerated Node
!!! tip "Security tip"
Allocating a whole node prevents other users from seeing your running processes.
Single accelerated node can be allocated in an interactive session using
```
console
qsub -q qgpu -A OPEN-00-00 -l select=8 -I
```
Setting
`select=8`
automatically allocates a whole accelerated node and sets
`mpiproc`
.
So for
`N`
full nodes, set
`select`
to
`N x 8`
.
However, note that it may take some time before your jobs are executed
if the required amount of full nodes isn't available.
### Allocating Multiple GPUs
!!! important "Security risk"
If two users allocate a portion of the same node, they can see each other's running processes.
When required for security reasons, consider allocating a whole node.
Again, the following examples use only the selection statement, so no additional setting is required.
```
console
qsub -q qgpu -A OPEN-00-00 -l select=2 -I
```
In this example two chunks will be allocated on the same node, if possible.
```
console
qsub -q qgpu -A OPEN-00-00 -l select=16 -I
```
This example allocates two whole accelerated nodes.
Multiple vnodes within the same chunk can be allocated using the
`ngpus`
parameter.
For example, to allocate 2 vnodes in an interactive mode, run
```
console
qsub -q qgpu -A OPEN-00-00 -l select=1:ngpus=2:mpiprocs=2 -I
```
Remember to
**set the number of `mpiprocs` equal to that of `ngpus`**
to spawn an according number of MPI processes.
To verify the correctness:
```
console
$
qstat
-F
json
-f
$PBS_JOBID
|
grep
exec_vnode
"exec_vnode":"(acn53[0]:ncpus=16:ngpus=1+acn53[1]:ncpus=16:ngpus=1)"
$
echo
$CUDA_VISIBLE_DEVICES
|
tr
','
'\n'
GPU-8772c06c-0e5e-9f87-8a41-30f1a70baa00
GPU-5e88c15c-e331-a1e4-c80c-ceb3f49c300e
```
The number of chunks to allocate is specified in the
`select`
parameter.
For example, to allocate 2 chunks, each with 4 GPUs, run
```
console
qsub -q qgpu -A OPEN-00-00 -l select=2:ngpus=4:mpiprocs=4 -I
```
To verify the correctness:
```
console
$
cat
>
print-cuda-devices.sh
<<
EOF
#
!/bin/bash
echo \$
CUDA_VISIBLE_DEVICES
EOF
$
chmod
+x print-cuda-devices.sh
$
ml OpenMPI/4.1.4-GCC-11.3.0
$
mpirun ./print-cuda-devices.sh |
tr
','
'\n'
|
sort
|
uniq
GPU-0910c544-aef7-eab8-f49e-f90d4d9b7560
GPU-1422a1c6-15b4-7b23-dd58-af3a233cda51
GPU-3dbf6187-9833-b50b-b536-a83e18688cff
GPU-3dd0ae4b-e196-7c77-146d-ae16368152d0
GPU-93edfee0-4cfa-3f82-18a1-1e5f93e614b9
GPU-9c8143a6-274d-d9fc-e793-a7833adde729
GPU-ad06ab8b-99cd-e1eb-6f40-d0f9694601c0
GPU-dc0bc3d6-e300-a80a-79d9-3e5373cb84c9
```
This diff is collapsed.
Click to expand it.
mkdocs.yml
+
0
−
1
View file @
bb89ba74
...
...
@@ -109,7 +109,6 @@ nav:
-
Job Arrays
:
general/job-arrays.md
-
HyperQueue
:
general/hyperqueue.md
-
Parallel Computing and MPI
:
general/karolina-mpi.md
-
Vnode Allocation
:
general/vnode-allocation.md
-
Other Services
:
-
OpenCode
:
general/opencode.md
-
Technical Information
:
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment