Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
docs.it4i.cz
Manage
Activity
Members
Labels
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
SCS
docs.it4i.cz
Commits
7007f98e
Commit
7007f98e
authored
7 months ago
by
Jan Siwiec
Browse files
Options
Downloads
Patches
Plain Diff
Update dask.md
parent
6007103f
No related branches found
No related tags found
No related merge requests found
Pipeline
#40240
canceled
7 months ago
Stage: test
Stage: build
Stage: deploy
Stage: after_test
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
docs.it4i/software/data-science/dask.md
+9
-12
9 additions, 12 deletions
docs.it4i/software/data-science/dask.md
with
9 additions
and
12 deletions
docs.it4i/software/data-science/dask.md
+
9
−
12
View file @
7007f98e
!!!warning
This page has not been updated yet. The page does not reflect the transition from PBS to Slurm.
# Dask
# Dask
[
Dask
](
https://docs.dask.org/en/latest/
)
is a popular open-source library that allows you to
[
Dask
](
https://docs.dask.org/en/latest/
)
is a popular open-source library that allows you to
...
@@ -41,7 +38,7 @@ execute your tasks.
...
@@ -41,7 +38,7 @@ execute your tasks.


After you start a
PBS
job, you should therefore first start the server and the workers on the
After you start a job, you should therefore first start the server and the workers on the
available computing nodes and then run your Python program that uses Dask. There are multiple ways
available computing nodes and then run your Python program that uses Dask. There are multiple ways
of deploying the cluster. A common scenario is to run a Dask server on a single computing node,
of deploying the cluster. A common scenario is to run a Dask server on a single computing node,
run a single worker per node on all remaining computing nodes and then run your program on the node
run a single worker per node on all remaining computing nodes and then run your program on the node
...
@@ -53,14 +50,14 @@ with the server.
...
@@ -53,14 +50,14 @@ with the server.
!!! note
!!! note
All the following deployment methods assume that you are inside a Python environment that has
All the following deployment methods assume that you are inside a Python environment that has
Dask installed. Do not forget to load Python and activate the correct virtual environment at
Dask installed. Do not forget to load Python and activate the correct virtual environment at
the beginning of your
PBS
job! And also do the same after connecting to any worker nodes
the beginning of your job! And also do the same after connecting to any worker nodes
manually using SSH.
manually using SSH.
### Manual Deployment
### Manual Deployment
Both the server and the worker nodes can be started using a CLI command. If you prefer manual
Both the server and the worker nodes can be started using a CLI command. If you prefer manual
deployment, you can manually start the server on a selected node and then start the workers on
deployment, you can manually start the server on a selected node and then start the workers on
other nodes available inside your
PBS
job.
other nodes available inside your job.
```
bash
```
bash
# Start the server on some node N
# Start the server on some node N
...
@@ -73,15 +70,15 @@ $ dask-worker tcp://<hostname-of-N>:8786
...
@@ -73,15 +70,15 @@ $ dask-worker tcp://<hostname-of-N>:8786
### Dask-ssh Deployment
### Dask-ssh Deployment
Dask actually contains
[
built-in support
](
https://docs.dask.org/en/latest/setup/ssh.html
)
for
Dask actually contains
[
built-in support
](
https://docs.dask.org/en/latest/setup/ssh.html
)
for
automating Dask deployment using SSH. It also supports nodefiles provided by
PBS, so inside of your
automating Dask deployment using SSH. It also supports nodefiles provided by
Slurm,
PBS
job, you can simply run
so inside of your
job, you can simply run
```
bash
```
bash
$
dask-ssh
--hostfile
$
PBS
_NODE
FILE
$
dask-ssh
--hostfile
$
SLURM
_NODE
LIST
```
```
to start the Dask cluster on all available computing nodes. This will start the server on the first
to start the Dask cluster on all available computing nodes. This will start the server on the first
node of your
PBS
job and then a single worker on each node. The first node will therefore be shared
node of your job and then a single worker on each node. The first node will therefore be shared
by a server and a worker, which might not be ideal from a performance point of view.
by a server and a worker, which might not be ideal from a performance point of view.
> Note that for this to work, the `paramiko` Python library has to be installed inside your Python
> Note that for this to work, the `paramiko` Python library has to be installed inside your Python
...
@@ -94,8 +91,8 @@ can start the scheduler and the workers on separate nodes to avoid overcrowding
...
@@ -94,8 +91,8 @@ can start the scheduler and the workers on separate nodes to avoid overcrowding
### Other Deployment Options
### Other Deployment Options
Dask has a lot of other ways of being deployed, e.g. using MPI, or using a shared file on the
Dask has a lot of other ways of being deployed, e.g. using MPI, or using a shared file on the
network file system. It also allows you to create a
PBS
job directly, wait for it to be started and
network file system. It also allows you to create a job directly, wait for it to be started and
then it starts the whole cluster inside the
PBS
job. You can find more information about Dask HPC
then it starts the whole cluster inside the job. You can find more information about Dask HPC
deployment
[
here
](
https://docs.dask.org/en/latest/setup/hpc.html
)
.
deployment
[
here
](
https://docs.dask.org/en/latest/setup/hpc.html
)
.
## Connecting to the Cluster
## Connecting to the Cluster
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment