Skip to content
Snippets Groups Projects
Commit d9094b61 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Master

parent 92bcc635
No related branches found
No related tags found
1 merge request!468Master
...@@ -11,7 +11,8 @@ which is a free open standard cloud computing platform. ...@@ -11,7 +11,8 @@ which is a free open standard cloud computing platform.
## Access ## Access
To access the cloud you must be a member of an active EUROHPC project. To access the cloud you must be a member of an active EUROHPC project,
or fall into the **Access Category B**, i.e. [Access For Thematic HPC Resource Utilisation][11].
The dashboard is available at [https://cloud.it4i.cz][6]. The dashboard is available at [https://cloud.it4i.cz][6].
......
...@@ -108,6 +108,8 @@ After logging in, you will see the command prompt with the name of the cluster a ...@@ -108,6 +108,8 @@ After logging in, you will see the command prompt with the name of the cluster a
## Data Transfer ## Data Transfer
### Serial Transfer
Data in and out of the system may be transferred by SCP and SFTP protocols. Data in and out of the system may be transferred by SCP and SFTP protocols.
| Cluster | Port | Protocol | | Cluster | Port | Protocol |
...@@ -133,6 +135,14 @@ or ...@@ -133,6 +135,14 @@ or
local $ sftp -o IdentityFile=/path/to/id_rsa username@cluster-name.it4i.cz local $ sftp -o IdentityFile=/path/to/id_rsa username@cluster-name.it4i.cz
``` ```
You may request the **aes256-gcm@openssh.com cipher** for more efficient ssh based transfer:
```console
local $ scp -c aes256-gcm@openssh.com -i /path/to/id_rsa -r my-local-dir username@cluster-name.it4i.cz:directory
```
The -c argument may be used with ssh, scp and sftp, and is also applicable to sshfs and rsync below.
A very convenient way to transfer files in and out of the cluster is via the fuse filesystem [SSHFS][b]. A very convenient way to transfer files in and out of the cluster is via the fuse filesystem [SSHFS][b].
```console ```console
...@@ -159,9 +169,13 @@ local $ rsync my-local-file ...@@ -159,9 +169,13 @@ local $ rsync my-local-file
local $ rsync -r my-local-dir username@cluster-name.it4i.cz:directory local $ rsync -r my-local-dir username@cluster-name.it4i.cz:directory
``` ```
### Parallel Transfer
!!! note !!! note
The data transfer speed is limited by the single-core ssh encryption speed to about **150 MB/s** The data transfer speed is limited by the single TCP stream and single-core ssh encryption speed to about **250 MB/s** (750 MB/s in case of aes256-gcm@openssh.com cipher)
Run **multiple rsync** instances for unlimited transfers Run **multiple** streams for unlimited transfers
#### Many Files
Parallel execution of multiple rsync processes utilizes multiple cores to accelerate encryption and multiple tcp streams for enhanced bandwidth. Parallel execution of multiple rsync processes utilizes multiple cores to accelerate encryption and multiple tcp streams for enhanced bandwidth.
First, set up ssh-agent single sign on: First, set up ssh-agent single sign on:
...@@ -182,6 +196,73 @@ local $ ls | xargs -n 2 -P 4 /bin/bash -c 'rsync "$@" username@cluster-name.it4i ...@@ -182,6 +196,73 @@ local $ ls | xargs -n 2 -P 4 /bin/bash -c 'rsync "$@" username@cluster-name.it4i
The **-n** argument detemines the number of files to transfer in one rsync call. Set according to file size and count (large for many small files). The **-n** argument detemines the number of files to transfer in one rsync call. Set according to file size and count (large for many small files).
The **-P** argument determines number of parallel rsync processes. Set to number of cores on your local machine. The **-P** argument determines number of parallel rsync processes. Set to number of cores on your local machine.
Alternatively, use [HyperQueue][11]. First get [HyperQueue binary][e], then run:
```console
local $ hq server start &
local $ hq worker start &
local $ find my-local-dir -type f | xargs -n 2 > jobfile
local $ hq submit --log=/dev/null --progress --each-line jobfile \
bash -c 'rsync -R $HQ_ENTRY username@cluster-name.it4i.cz:mydir'
```
Again, the **-n** argument detemines the number of files to transfer in one rsync call. Set according to file size and count (large for many small files).
#### Single Very Large File
To transfer single very large file efficienty, we need to transfer many blocks of the file in parallel, utilizing multiple cores to accelerate ssh encryption and multiple tcp streams for enhanced bandwidth.
First, set up ssh-agent single sign on as [described above][10].
Second, start the [HyperQueue server and HyperQueue worker][f]:
```console
local $ hq server start &
local $ hq worker start &
```
Once set up, run the hqtransfer script listed below:
```console
local $ ./hqtransfer mybigfile username@cluster-name.it4i.cz outputpath/outputfile
```
The hqtransfer script:
```console
#!/bin/bash
#Read input
if [ -z $1 ]; then echo Usage: $0 'input_file ssh_destination [output_path/output_file]'; exit; fi
INFILE=$1
if [ -z $2 ]; then echo Usage: $0 'input_file ssh_destination [output_path/output_file]'; exit; fi
DEST=$2
OUTFILE=$INFILE
if [ ! -z $3 ]; then OUTFILE=$3; fi
#Calculate transfer blocks
SIZE=$(($(stat --printf %s $INFILE)/1024/1024/1024))
echo Transfering $(($SIZE+1)) x 1GB blocks
#Execute
hq submit --log=/dev/null --progress --array 0-$SIZE /bin/bash -c \
"dd if=$INFILE bs=1G count=1 skip=\$HQ_TASK_ID | \
ssh -c aes256-gcm@openssh.com $DEST \
dd of=$OUTFILE bs=1G conv=notrunc seek=\$HQ_TASK_ID"
exit
```
Copy-paste the script into `hqtransfer` file and set executable flags:
```console
local $ chmod u+x hqtransfer
```
The `hqtransfer` script is ready for use.
### Data Transfer From Windows Clients
On Windows, use the [WinSCP client][c] to transfer data. The [win-sshfs client][d] provides a way to mount the cluster filesystems directly as an external disc. On Windows, use the [WinSCP client][c] to transfer data. The [win-sshfs client][d] provides a way to mount the cluster filesystems directly as an external disc.
## Connection Restrictions ## Connection Restrictions
...@@ -272,7 +353,11 @@ Now, configure the applications proxy settings to `localhost:6000`. Use port for ...@@ -272,7 +353,11 @@ Now, configure the applications proxy settings to `localhost:6000`. Use port for
[7]: ../general/accessing-the-clusters/graphical-user-interface/vnc.md [7]: ../general/accessing-the-clusters/graphical-user-interface/vnc.md
[8]: ../general/accessing-the-clusters/vpn-access.md [8]: ../general/accessing-the-clusters/vpn-access.md
[9]: #port-forwarding-from-compute-nodes [9]: #port-forwarding-from-compute-nodes
[10]: #many-files
[11]: ../general/hyperqueue.md
[b]: http://linux.die.net/man/1/sshfs [b]: http://linux.die.net/man/1/sshfs
[c]: http://winscp.net/eng/download.php [c]: http://winscp.net/eng/download.php
[d]: http://code.google.com/p/win-sshfs/ [d]: http://code.google.com/p/win-sshfs/
[e]: https://github.com/It4innovations/hyperqueue/releases/latest
[f]: https://it4innovations.github.io/hyperqueue/stable/cheatsheet/
---
hide:
- toc
---
# Slurm Batch Jobs Examples
Below is an excerpt from the [2024 e-INFRA CZ conference][1]
describing best practices for Slurm batch calculations and data managing, including examples, by Ondrej Meca.
![PDF presentation on Slurm Batch Jobs Examples](../src/5_einfra_meca.pdf){ type=application/pdf style="min-height:100vh;width:100%" }
[1]: https://www.e-infra.cz/en/e-infra-cz-conference
\ No newline at end of file
docs.it4i/img/cudaq.png

30.2 KiB

# Introduction # Introduction
!!! important "KAROLINA UPGRADE AND OUTAGE" !!! important "KAROLINA UPGRADE AND OUTAGE"
**From 08.04. to 05.05.2024**, there is a planned upgrade/reinstallation of the Karolina cluster, including cluster management tools and new node images. The main change for users will be Rocky Linux 8.9 images on compute nodes and corresponding new versions of kernels, libraries, and drivers. **Note that during the upgrade, Karolina will not be available/accessible.** We are nearing the final stages of the planned upgrade/reinstallation of the Karolina cluster. This includes updates to cluster management tools and new node images with Rocky Linux 8.9. Expect new versions of kernels, libraries, and drivers on compute nodes. **We anticipate the Karolina to be fully operational for users by May 9th, 2024.**
Karolina is the latest and most powerful supercomputer cluster built for IT4Innovations in Q2 of 2021. The Karolina cluster consists of 829 compute nodes, totaling 106,752 compute cores with 313 TB RAM, giving over 15.7 PFLOP/s theoretical peak performance. Karolina is the latest and most powerful supercomputer cluster built for IT4Innovations in Q2 of 2021. The Karolina cluster consists of 829 compute nodes, totaling 106,752 compute cores with 313 TB RAM, giving over 15.7 PFLOP/s theoretical peak performance.
......
# CUDA Quantum for Python
## What Is CUDA Quantum?
CUDA Quantum streamlines hybrid application development and promotes productivity and scalability in quantum computing. It offers a unified programming model designed for a hybrid setting—that is, CPUs, GPUs, and QPUs working together.
For more information, see the [official documentation][1].
## How to Install Version Without GPU Acceleration
Use (preferably in conda environment)
```bash
pip install cuda-quantum
```
## How to Install Version With GPU Acceleration Using Conda
Run:
```bash
conda create -y -n cuda-quantum python=3.10 pip
conda install -y -n cuda-quantum -c "nvidia/label/cuda-11.8.0" cuda
conda install -y -n cuda-quantum -c conda-forge mpi4py openmpi cxx-compiler cuquantum
conda env config vars set -n cuda-quantum
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CONDA_PREFIX/envs/cuda-quantum/lib"
conda env config vars set -n cuda-quantum
MPI_PATH=$CONDA_PREFIX/envs/cuda-quantum
conda run -n cuda-quantum pip install cuda-quantum
conda activate cuda-quantum
source $CONDA_PREFIX/lib/python3.10/site-packages/distributed_interfaces/activate_custom_mpi.sh
```
Then configure the MPI:
``` bash
export OMPI_MCA_opal_cuda_support=true OMPI_MCA_btl='^openib'
```
## How to Test Your Installation?
You can test your installation by running the following script:
```bash
import cudaq
kernel = cudaq.make_kernel()
qubit = kernel.qalloc()
kernel.x(qubit)
kernel.mz(qubit)
result = cudaq.sample(kernel)
```
## Further Questions Considering the Installation?
See the Cuda Quantum PyPI website at [https://pypi.org/project/cuda-quantum/][2].
## Example QNN
In the *qnn_example.py* you find a script that loads FashionMNIST dataset, chooses two data type (shirts and pants), then we create a Neural Network with quantum layer.This network is then trained on our data and later tested on the test dataset. You are free to try it on your own. Download the [QNN example][a] and rename it to `qnn_example.py`.
![](../img/cudaq.png)
[1]: https://nvidia.github.io/cuda-quantum/latest/index.html
[2]: https://pypi.org/project/cuda-quantum/
[a]: ../../../src/qnn_example
\ No newline at end of file
File added
#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch.autograd import Function
from torchvision import datasets, transforms
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import cudaq
from cudaq import spin
# GPU utilities
for tar in cudaq.get_targets():
print(f'{tar.description} {tar.name} {tar.platform} {tar.simulator} {tar.num_qpus}')
cudaq.set_target("default") # Set CUDAQ to run on GPU's
torch.cuda.is_available(
) # If this is True then the NVIDIA drivers are correctly installed
torch.cuda.device_count() # Counts the number of GPU's available
torch.cuda.current_device()
torch.cuda.get_device_name(0)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Training set
sample_count = 140
X_train = datasets.FashionMNIST(
root="./data",
train=True,
download=True,
transform=transforms.Compose([transforms.ToTensor()]),
)
# Leaving only labels 0 and 1
idx = np.append(
np.where(X_train.targets == 0)[0][:sample_count],
np.where(X_train.targets == 1)[0][:sample_count],
)
X_train.data = X_train.data[idx]
X_train.targets = X_train.targets[idx]
train_loader = torch.utils.data.DataLoader(X_train, batch_size=1, shuffle=True)
# Test set
sample_count = 70
X_test = datasets.FashionMNIST(
root="./data",
train=False,
download=True,
transform=transforms.Compose([transforms.ToTensor()]),
)
idx = np.append(
np.where(X_test.targets == 0)[0][:sample_count],
np.where(X_test.targets == 1)[0][:sample_count],
)
X_test.data = X_test.data[idx]
X_test.targets = X_test.targets[idx]
test_loader = torch.utils.data.DataLoader(X_test, batch_size=1, shuffle=True)
class QuantumCircuit:
"""This class defines the quantum circuit structure and the run method which is used to calculate an expectation value"""
def __init__(self, qubit_count: int):
"""Define the quantum circuit in CUDA Quantum"""
kernel, thetas = cudaq.make_kernel(list)
self.kernel = kernel
self.theta = thetas
qubits = kernel.qalloc(qubit_count)
self.kernel.h(qubits)
# Variational gate parameters which are optimised during training
kernel.ry(thetas[0], qubits[0])
kernel.rx(thetas[1], qubits[0])
def run(self, thetas: torch.tensor) -> torch.tensor:
"""Excetute the quantum circuit to output an expectation value"""
expectation = torch.tensor(cudaq.observe(self.kernel, spin.z(0),
thetas).expectation_z(),
device=device)
return expectation
class QuantumFunction(Function):
"""Allows the quantum circuit to pass data through it and compute the gradients"""
@staticmethod
def forward(ctx, thetas: torch.tensor, quantum_circuit,
shift) -> torch.tensor:
# Save shift and quantum_circuit in context to use in backward
ctx.shift = shift
ctx.quantum_circuit = quantum_circuit
# Calculate exp_val
expectation_z = ctx.quantum_circuit.run(thetas)
ctx.save_for_backward(thetas, expectation_z)
return expectation_z
@staticmethod
def backward(ctx, grad_output):
"""Backward pass computation via finite difference parameter shift"""
thetas, expectation_z = ctx.saved_tensors
gradients = torch.zeros(len(thetas), device=device)
for i in range(len(thetas)):
shift_right = torch.clone(thetas)
shift_right[i] += ctx.shift
shift_left = torch.clone(thetas)
shift_left[i] -= ctx.shift
expectation_right = ctx.quantum_circuit.run(shift_right)
expectation_left = ctx.quantum_circuit.run(shift_left)
gradients[i] = 0.5 * (expectation_right - expectation_left)
return gradients * grad_output.float(), None, None
class QuantumLayer(nn.Module):
"""Encapsulates a quantum circuit and a quantum function into a quantum layer"""
def __init__(self, shift: torch.tensor):
super(QuantumLayer, self).__init__()
self.quantum_circuit = QuantumCircuit(1) # 1 qubit quantum circuit
self.shift = shift
def forward(self, input):
ans = QuantumFunction.apply(input, self.quantum_circuit, self.shift)
return ans
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# Neural network structure
self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
self.dropout = nn.Dropout2d()
self.fc1 = nn.Linear(256, 64)
self.fc2 = nn.Linear(
64, 2
) # Output a 2D tensor since we have 2 variational parameters in our quantum circuit
self.hybrid = QuantumLayer(
torch.tensor(np.pi / 2)
) # Input is the magnitude of the parameter shifts to calculate gradients
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2)
x = self.dropout(x)
x = x.view(1, -1)
x = F.relu(self.fc1(x))
x = self.fc2(x).reshape(
-1) # Reshapes required to satisfy input dimensions to CUDAQ
x = self.hybrid(x).reshape(-1)
return torch.cat((x, 1 - x), -1).unsqueeze(0)
# We move our model to the CUDA device to minimise data transfer between GPU and CPU
model = Net().to(device)
print(model)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_func = nn.NLLLoss().to(device)
epochs = 20
epoch_loss = []
model.train()
for epoch in range(epochs):
batch_loss = 0.0
for batch_idx, (data, target) in enumerate(train_loader): # batch training
optimizer.zero_grad()
data, target = data.to(device), target.to(device)
# Forward pass
output = model(data).to(device)
# Calculating loss
loss = loss_func(output, target).to(device)
# Backward pass
loss.backward()
# Optimize the weights
optimizer.step()
batch_loss += loss.item()
epoch_loss.append(batch_loss / batch_idx)
print("Training [{:.0f}%]\tLoss: {:.4f}".format(
100.0 * (epoch + 1) / epochs, epoch_loss[-1]))
plt.plot(epoch_loss)
plt.title("Hybrid NN Training Convergence")
plt.xlabel("Training Iterations")
plt.ylabel("Neg Log Likelihood Loss")
# Testing on the test set
model.eval()
with torch.no_grad():
correct = 0
for batch_idx, (data, target) in enumerate(test_loader):
data, target = data.to(device), target.to(device)
output = model(data).to(device)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
loss = loss_func(output, target)
epoch_loss.append(loss.item())
print("Performance on test data:\n\tAccuracy: {:.1f}%".format(
correct / len(test_loader) * 100))
...@@ -13,26 +13,11 @@ To access to S3 you must: ...@@ -13,26 +13,11 @@ To access to S3 you must:
### Steps ### Steps
1. Fil in application at [https://einfra.cesnet.cz/fed/registrar/?vo=storage][b] to be a member of VO group. 1. Fil in application at [https://einfra.cesnet.cz/allfed/registrar/?vo=VO_s3&group=s3_cl4&locale=en][b] to be a member of VO group. For more information about the S3 object storage, see [https://du.cesnet.cz/en/navody/object_storage/osobni_s3/start][f].
Provide information about the reservation capacity in TB/GB in the **Description of planned activity** field. 2. Once the application is approved you will be informed via email. Then please wait at least 30 minutes for our system to synchronize all data. Then you can continue to the Gatekeeper service to generate your credentials. To the [Gatekeeper service][g], you need to login via eduID.cz identity.
2. Wait for a confirmation email with information about the endpoint (e.g. cl2) and link to [Filesender][e] where your key is stored. IT4I offers two tools for object storage management on Karolina and Barbora:
!!! important "Link Expiration"
Note that the link has an expiration date, so download the information to your local storage.
**Key Format:**
```
"user": "51de0172a4***************************bc3cf1b211aacc9d5dc7876649a8",
"access_key": "IL2***********JAS",
"secret_key": "WIt******************HRAYvBtHmD"
```
Use this information for management tool configuration.
IT4I offers two tools for object storage management on Karolina:
!!! Note !!! Note
We recommend using the default versions installed. We recommend using the default versions installed.
...@@ -51,9 +36,11 @@ IT4I offers two tools for object storage management on Karolina: ...@@ -51,9 +36,11 @@ IT4I offers two tools for object storage management on Karolina:
[4]: https://docs.it4i.cz/general/access/project-access/ [4]: https://docs.it4i.cz/general/access/project-access/
[a]: https://docs.e-infra.cz/storage/object-storage/s3-service/ [a]: https://docs.e-infra.cz/storage/object-storage/s3-service/
[b]: https://einfra.cesnet.cz/fed/registrar/?vo=storage [b]: https://einfra.cesnet.cz/allfed/registrar/?vo=VO_s3&group=s3_cl4&locale=en
[c]: https://du.cesnet.cz/cs/navody/object_storage/cesnet_s3/start [c]: https://du.cesnet.cz/cs/navody/object_storage/cesnet_s3/start
[d]: https://www.s3express.com/kb/item26.htm [d]: https://www.s3express.com/kb/item26.htm
[e]: https://filesender.cesnet.cz/ [e]: https://filesender.cesnet.cz/
[f]: https://du.cesnet.cz/en/navody/object_storage/osobni_s3/start
[g]: https://access.du.cesnet.cz/#/access
[email]: mailto:du-support@cesnet.cz [email]: mailto:du-support@cesnet.cz
...@@ -111,6 +111,7 @@ nav: ...@@ -111,6 +111,7 @@ nav:
# - Job Arrays: general/job-arrays.md # - Job Arrays: general/job-arrays.md
- HyperQueue: general/hyperqueue.md - HyperQueue: general/hyperqueue.md
# - Parallel Computing and MPI: general/karolina-mpi.md # - Parallel Computing and MPI: general/karolina-mpi.md
- Slurm Batch Examples: general/slurm-batch-examples.md
- Tools: - Tools:
- Data Sharing Tools: general/tools/tools-list.md - Data Sharing Tools: general/tools/tools-list.md
- OpenCode: general/tools/opencode.md - OpenCode: general/tools/opencode.md
...@@ -233,6 +234,7 @@ nav: ...@@ -233,6 +234,7 @@ nav:
- EESSI: software/eessi.md - EESSI: software/eessi.md
- GPU: - GPU:
- NVIDIA CUDA: software/nvidia-cuda.md - NVIDIA CUDA: software/nvidia-cuda.md
- NVIDIA CUDA Quantum: software/nvidia-cuda-q.md
- ROCm HIP: software/nvidia-hip.md - ROCm HIP: software/nvidia-hip.md
- Intel Suite: - Intel Suite:
- Introduction: software/intel/intel-suite/intel-parallel-studio-introduction.md - Introduction: software/intel/intel-suite/intel-parallel-studio-introduction.md
...@@ -334,6 +336,7 @@ plugins: ...@@ -334,6 +336,7 @@ plugins:
- archive/*.md - archive/*.md
- prace.md - prace.md
- salomon/*.md - salomon/*.md
- mkdocs-pdf
# - optimize # - optimize
markdown_extensions: markdown_extensions:
...@@ -348,3 +351,4 @@ markdown_extensions: ...@@ -348,3 +351,4 @@ markdown_extensions:
- pymdownx.tabbed: - pymdownx.tabbed:
- footnotes - footnotes
- pymdownx.superfences - pymdownx.superfences
- attr_list
...@@ -17,6 +17,7 @@ mkdocs-git-committers-plugin-2==1.1.2 ...@@ -17,6 +17,7 @@ mkdocs-git-committers-plugin-2==1.1.2
mkdocs-git-revision-date-localized-plugin==1.2.0 mkdocs-git-revision-date-localized-plugin==1.2.0
mkdocs-material==9.1.12 mkdocs-material==9.1.12
mkdocs-material-extensions==1.1.1 mkdocs-material-extensions==1.1.1
mkdocs-pdf==0.1.2
nltk==3.5 nltk==3.5
packaging==20.4 packaging==20.4
Pygments==2.7.1 Pygments==2.7.1
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment