Master

d9094b61 · Jan Siwiec · 92bcc635 · d9094b61 · d9094b61 · d9094b61
Commit d9094b61 authored 1 year ago by Jan Siwiec
--- a/docs.it4i/cloud/it4i-cloud.md
+++ b/docs.it4i/cloud/it4i-cloud.md
@@ -11,7 +11,8 @@ which is a free open standard cloud computing platform.
 ## Access
-To access the cloud you must be a member of an active EUROHPC project.
+To access the cloud you must be a member of an active EUROHPC project,
+or fall into the **Access Category B**, i.e. [Access For Thematic HPC Resource Utilisation][11].
 The dashboard is available at [https://cloud.it4i.cz][6].

--- a/docs.it4i/general/shell-and-data-access.md
+++ b/docs.it4i/general/shell-and-data-access.md
@@ -108,6 +108,8 @@ After logging in, you will see the command prompt with the name of the cluster a
 ## Data Transfer
+### Serial Transfer
 Data in and out of the system may be transferred by SCP and SFTP protocols.
 | Cluster  | Port | Protocol  |
@@ -133,6 +135,14 @@ or
 local $ sftp -o IdentityFile=/path/to/id_rsa username@cluster-name.it4i.cz
 ```
+You may request the **aes256-gcm@openssh.com cipher** for more efficient ssh based transfer:
+```console
+local $ scp -c aes256-gcm@openssh.com -i /path/to/id_rsa -r my-local-dir username@cluster-name.it4i.cz:directory
+```
+The -c argument may be used with ssh, scp and sftp, and is also applicable to sshfs and rsync below.
 A very convenient way to transfer files in and out of the cluster is via the fuse filesystem [SSHFS][b].
 ```console
@@ -159,9 +169,13 @@ local $ rsync my-local-file
 local $ rsync -r my-local-dir username@cluster-name.it4i.cz:directory
 ```
+### Parallel Transfer
 !!! note
-    The data transfer speed is limited by the single-core ssh encryption speed to about **150 MB/s**
+    The data transfer speed is limited by the single TCP stream and single-core ssh encryption speed to about **250 MB/s** (750 MB/s in case of aes256-gcm@openssh.com cipher)
-    Run **multiple rsync** instances for unlimited transfers
+    Run **multiple** streams for unlimited transfers
+#### Many Files
 Parallel execution of multiple rsync processes utilizes multiple cores to accelerate encryption and multiple tcp streams for enhanced bandwidth.
 First, set up ssh-agent single sign on:
@@ -182,6 +196,73 @@ local $ ls | xargs -n 2 -P 4 /bin/bash -c 'rsync "$@" username@cluster-name.it4i
 The **-n** argument detemines the number of files to transfer in one rsync call. Set according to file size and count (large for many small files).
 The **-P** argument determines number of parallel rsync processes. Set to number of cores on your local machine.
+Alternatively, use [HyperQueue][11]. First get [HyperQueue binary][e], then run:
+```console
+local $ hq server start &
+local $ hq worker start &
+local $ find my-local-dir -type f | xargs -n 2 > jobfile
+local $ hq submit --log=/dev/null --progress --each-line jobfile \
+        bash -c 'rsync -R $HQ_ENTRY username@cluster-name.it4i.cz:mydir'
+```
+Again, the **-n** argument detemines the number of files to transfer in one rsync call. Set according to file size and count (large for many small files).
+#### Single Very Large File
+To transfer single very large file efficienty, we need to transfer many blocks of the file in parallel, utilizing multiple cores to accelerate ssh encryption and multiple tcp streams for enhanced bandwidth.
+First, set up ssh-agent single sign on as [described above][10].
+Second, start the [HyperQueue server and HyperQueue worker][f]:
+```console
+local $ hq server start &
+local $ hq worker start &
+```
+Once set up, run the hqtransfer script listed below:
+```console
+local $ ./hqtransfer mybigfile username@cluster-name.it4i.cz outputpath/outputfile
+```
+The hqtransfer script:
+```console
+#!/bin/bash
+#Read input
+if [ -z $1 ]; then echo Usage: $0 'input_file ssh_destination [output_path/output_file]'; exit; fi
+INFILE=$1
+if [ -z $2 ]; then echo Usage: $0 'input_file ssh_destination [output_path/output_file]'; exit; fi
+DEST=$2
+OUTFILE=$INFILE
+if [ ! -z $3 ]; then OUTFILE=$3; fi
+#Calculate transfer blocks
+SIZE=$(($(stat --printf %s $INFILE)/1024/1024/1024))
+echo Transfering $(($SIZE+1)) x 1GB blocks
+#Execute
+hq submit --log=/dev/null --progress --array 0-$SIZE /bin/bash -c \
+        "dd if=$INFILE bs=1G count=1 skip=\$HQ_TASK_ID | \
+         ssh -c aes256-gcm@openssh.com $DEST \
+         dd of=$OUTFILE bs=1G conv=notrunc seek=\$HQ_TASK_ID"
+exit
+```
+Copy-paste the script into `hqtransfer` file and set executable flags:
+```console
+local $ chmod u+x hqtransfer
+```
+The `hqtransfer` script is ready for use.
+### Data Transfer From Windows Clients
 On Windows, use the [WinSCP client][c] to transfer data. The [win-sshfs client][d] provides a way to mount the cluster filesystems directly as an external disc.
 ## Connection Restrictions
@@ -272,7 +353,11 @@ Now, configure the applications proxy settings to `localhost:6000`. Use port for
 [7]: ../general/accessing-the-clusters/graphical-user-interface/vnc.md
 [8]: ../general/accessing-the-clusters/vpn-access.md
 [9]: #port-forwarding-from-compute-nodes
+[10]: #many-files
+[11]: ../general/hyperqueue.md
 [b]: http://linux.die.net/man/1/sshfs
 [c]: http://winscp.net/eng/download.php
 [d]: http://code.google.com/p/win-sshfs/
+[e]: https://github.com/It4innovations/hyperqueue/releases/latest
+[f]: https://it4innovations.github.io/hyperqueue/stable/cheatsheet/
--- a/docs.it4i/general/slurm-batch-examples.md
+++ b/docs.it4i/general/slurm-batch-examples.md
+---
+hide:
+- toc
+---
+# Slurm Batch Jobs Examples
+Below is an excerpt from the [2024 e-INFRA CZ conference][1]
+describing best practices for Slurm batch calculations and data managing, including examples, by Ondrej Meca.
+![PDF presentation on Slurm Batch Jobs Examples](../src/5_einfra_meca.pdf){ type=application/pdf style="min-height:100vh;width:100%" }
+[1]: https://www.e-infra.cz/en/e-infra-cz-conference
\ No newline at end of file
--- a/docs.it4i/img/cudaq.png
+++ b/docs.it4i/img/cudaq.png
--- a/docs.it4i/karolina/introduction.md
+++ b/docs.it4i/karolina/introduction.md
 # Introduction
 !!! important "KAROLINA UPGRADE AND OUTAGE"
-    **From 08.04. to 05.05.2024**, there is a planned upgrade/reinstallation of the Karolina cluster, including cluster management tools and new node images. The main change for users will be Rocky Linux 8.9 images on compute nodes and corresponding new versions of kernels, libraries, and drivers. **Note that during the upgrade, Karolina will not be available/accessible.**
+    We are nearing the final stages of the planned upgrade/reinstallation of the Karolina cluster. This includes updates to cluster management tools and new node images with Rocky Linux 8.9. Expect new versions of kernels, libraries, and drivers on compute nodes. **We anticipate the Karolina to be fully operational for users by May 9th, 2024.**
 Karolina is the latest and most powerful supercomputer cluster built for IT4Innovations in Q2 of 2021. The Karolina cluster consists of 829 compute nodes, totaling 106,752 compute cores with 313 TB RAM, giving over 15.7 PFLOP/s theoretical peak performance.

--- a/docs.it4i/software/nvidia-cuda-q.md
+++ b/docs.it4i/software/nvidia-cuda-q.md
+# CUDA Quantum for Python
+## What Is CUDA Quantum?
+CUDA Quantum streamlines hybrid application development and promotes productivity and scalability in quantum computing. It offers a unified programming model designed for a hybrid setting—that is, CPUs, GPUs, and QPUs working together.
+For more information, see the [official documentation][1].
+## How to Install Version Without GPU Acceleration
+Use (preferably in conda environment)
+```bash
+pip install cuda-quantum
+```
+## How to Install Version With GPU Acceleration Using Conda
+Run:
+```bash
+conda create -y -n cuda-quantum python=3.10 pip
+conda install -y -n cuda-quantum -c "nvidia/label/cuda-11.8.0" cuda
+conda install -y -n cuda-quantum -c conda-forge mpi4py openmpi cxx-compiler cuquantum
+conda env config vars set -n cuda-quantum
+LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CONDA_PREFIX/envs/cuda-quantum/lib"
+conda env config vars set -n cuda-quantum
+MPI_PATH=$CONDA_PREFIX/envs/cuda-quantum
+conda run -n cuda-quantum pip install cuda-quantum
+conda activate cuda-quantum
+source $CONDA_PREFIX/lib/python3.10/site-packages/distributed_interfaces/activate_custom_mpi.sh
+```
+Then configure the MPI:
+``` bash
+export OMPI_MCA_opal_cuda_support=true OMPI_MCA_btl='^openib'
+```
+## How to Test Your Installation?
+You can test your installation by running the following script:
+```bash
+import cudaq
+kernel = cudaq.make_kernel()
+qubit = kernel.qalloc()
+kernel.x(qubit)
+kernel.mz(qubit)
+result = cudaq.sample(kernel)
+```
+## Further Questions Considering the Installation?
+See the Cuda Quantum PyPI website at [https://pypi.org/project/cuda-quantum/][2].
+## Example QNN
+In the *qnn_example.py* you find a script that loads FashionMNIST dataset, chooses two data type (shirts and pants), then we create a Neural Network with quantum layer.This network is then trained on our data and later tested on the test dataset. You are free to try it on your own. Download the [QNN example][a] and rename it to `qnn_example.py`.
+![](../img/cudaq.png)
+[1]: https://nvidia.github.io/cuda-quantum/latest/index.html
+[2]: https://pypi.org/project/cuda-quantum/
+[a]: ../../../src/qnn_example
\ No newline at end of file
--- a/docs.it4i/src/5_einfra_meca.pdf
+++ b/docs.it4i/src/5_einfra_meca.pdf
--- a/docs.it4i/src/qnn_example
+++ b/docs.it4i/src/qnn_example
+#!/usr/bin/env python
+import numpy as np
+import matplotlib.pyplot as plt
+import torch
+from torch.autograd import Function
+from torchvision import datasets, transforms
+import torch.optim as optim
+import torch.nn as nn
+import torch.nn.functional as F
+import cudaq
+from cudaq import spin
+# GPU utilities
+for tar in cudaq.get_targets():
+    print(f'{tar.description} {tar.name} {tar.platform} {tar.simulator} {tar.num_qpus}')
+cudaq.set_target("default")  # Set CUDAQ to run on GPU's
+torch.cuda.is_available(
+)  # If this is True then the NVIDIA drivers are correctly installed
+torch.cuda.device_count()  # Counts the number of GPU's available
+torch.cuda.current_device()
+torch.cuda.get_device_name(0)
+device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+# Training set
+sample_count = 140
+X_train = datasets.FashionMNIST(
+    root="./data",
+    train=True,
+    download=True,
+    transform=transforms.Compose([transforms.ToTensor()]),
+)
+# Leaving only labels 0 and 1
+idx = np.append(
+    np.where(X_train.targets == 0)[0][:sample_count],
+    np.where(X_train.targets == 1)[0][:sample_count],
+)
+X_train.data = X_train.data[idx]
+X_train.targets = X_train.targets[idx]
+train_loader = torch.utils.data.DataLoader(X_train, batch_size=1, shuffle=True)
+# Test set
+sample_count = 70
+X_test = datasets.FashionMNIST(
+    root="./data",
+    train=False,
+    download=True,
+    transform=transforms.Compose([transforms.ToTensor()]),
+)
+idx = np.append(
+    np.where(X_test.targets == 0)[0][:sample_count],
+    np.where(X_test.targets == 1)[0][:sample_count],
+)
+X_test.data = X_test.data[idx]
+X_test.targets = X_test.targets[idx]
+test_loader = torch.utils.data.DataLoader(X_test, batch_size=1, shuffle=True)
+class QuantumCircuit:
+    """This class defines the quantum circuit structure and the run method which is used to calculate an expectation value"""
+    def __init__(self, qubit_count: int):
+        """Define the quantum circuit in CUDA Quantum"""
+        kernel, thetas = cudaq.make_kernel(list)
+        self.kernel = kernel
+        self.theta = thetas
+        qubits = kernel.qalloc(qubit_count)
+        self.kernel.h(qubits)
+        # Variational gate parameters which are optimised during training
+        kernel.ry(thetas[0], qubits[0])
+        kernel.rx(thetas[1], qubits[0])
+    def run(self, thetas: torch.tensor) -> torch.tensor:
+        """Excetute the quantum circuit to output an expectation value"""
+        expectation = torch.tensor(cudaq.observe(self.kernel, spin.z(0),
+                                                 thetas).expectation_z(),
+                                   device=device)
+        return expectation
+class QuantumFunction(Function):
+    """Allows the quantum circuit to pass data through it and compute the gradients"""
+    @staticmethod
+    def forward(ctx, thetas: torch.tensor, quantum_circuit,
+                shift) -> torch.tensor:
+        # Save shift and quantum_circuit in context to use in backward
+        ctx.shift = shift
+        ctx.quantum_circuit = quantum_circuit
+        # Calculate exp_val
+        expectation_z = ctx.quantum_circuit.run(thetas)
+        ctx.save_for_backward(thetas, expectation_z)
+        return expectation_z
+    @staticmethod
+    def backward(ctx, grad_output):
+        """Backward pass computation via finite difference parameter shift"""
+        thetas, expectation_z = ctx.saved_tensors
+        gradients = torch.zeros(len(thetas), device=device)
+        for i in range(len(thetas)):
+            shift_right = torch.clone(thetas)
+            shift_right[i] += ctx.shift
+            shift_left = torch.clone(thetas)
+            shift_left[i] -= ctx.shift
+            expectation_right = ctx.quantum_circuit.run(shift_right)
+            expectation_left = ctx.quantum_circuit.run(shift_left)
+            gradients[i] = 0.5 * (expectation_right - expectation_left)
+        return gradients * grad_output.float(), None, None
+class QuantumLayer(nn.Module):
+    """Encapsulates a quantum circuit and a quantum function into a quantum layer"""
+    def __init__(self, shift: torch.tensor):
+        super(QuantumLayer, self).__init__()
+        self.quantum_circuit = QuantumCircuit(1)  # 1 qubit quantum circuit
+        self.shift = shift
+    def forward(self, input):
+        ans = QuantumFunction.apply(input, self.quantum_circuit, self.shift)
+        return ans
+class Net(nn.Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        # Neural network structure
+        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
+        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
+        self.dropout = nn.Dropout2d()
+        self.fc1 = nn.Linear(256, 64)
+        self.fc2 = nn.Linear(
+            64, 2
+        )  # Output a 2D tensor since we have 2 variational parameters in our quantum circuit
+        self.hybrid = QuantumLayer(
+            torch.tensor(np.pi / 2)
+        )  # Input is the magnitude of the parameter shifts to calculate gradients
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(x, 2)
+        x = F.relu(self.conv2(x))
+        x = F.max_pool2d(x, 2)
+        x = self.dropout(x)
+        x = x.view(1, -1)
+        x = F.relu(self.fc1(x))
+        x = self.fc2(x).reshape(
+            -1)  # Reshapes required to satisfy input dimensions to CUDAQ
+        x = self.hybrid(x).reshape(-1)
+        return torch.cat((x, 1 - x), -1).unsqueeze(0)
+# We move our model to the CUDA device to minimise data transfer between GPU and CPU
+model = Net().to(device)
+print(model)
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+loss_func = nn.NLLLoss().to(device)
+epochs = 20
+epoch_loss = []
+model.train()
+for epoch in range(epochs):
+    batch_loss = 0.0
+    for batch_idx, (data, target) in enumerate(train_loader):  # batch training
+        optimizer.zero_grad()
+        data, target = data.to(device), target.to(device)
+        # Forward pass
+        output = model(data).to(device)
+        # Calculating loss
+        loss = loss_func(output, target).to(device)
+        # Backward pass
+        loss.backward()
+        # Optimize the weights
+        optimizer.step()
+        batch_loss += loss.item()
+    epoch_loss.append(batch_loss / batch_idx)
+    print("Training [{:.0f}%]\tLoss: {:.4f}".format(
+        100.0 * (epoch + 1) / epochs, epoch_loss[-1]))
+plt.plot(epoch_loss)
+plt.title("Hybrid NN Training Convergence")
+plt.xlabel("Training Iterations")
+plt.ylabel("Neg Log Likelihood Loss")
+# Testing on the test set
+model.eval()
+with torch.no_grad():
+    correct = 0
+    for batch_idx, (data, target) in enumerate(test_loader):
+        data, target = data.to(device), target.to(device)
+        output = model(data).to(device)
+        pred = output.argmax(dim=1, keepdim=True)
+        correct += pred.eq(target.view_as(pred)).sum().item()
+        loss = loss_func(output, target)
+        epoch_loss.append(loss.item())
+    print("Performance on test data:\n\tAccuracy: {:.1f}%".format(
+        correct / len(test_loader) * 100))
--- a/docs.it4i/storage/cesnet-s3.md
+++ b/docs.it4i/storage/cesnet-s3.md
@@ -13,26 +13,11 @@ To access to S3 you must:
 ### Steps
-1. Fil in application at [https://einfra.cesnet.cz/fed/registrar/?vo=storage][b] to be a member of VO group.
+1. Fil in application at [https://einfra.cesnet.cz/allfed/registrar/?vo=VO_s3&group=s3_cl4&locale=en][b] to be a member of VO group. For more information about the S3 object storage, see [https://du.cesnet.cz/en/navody/object_storage/osobni_s3/start][f].
-    Provide information about the reservation capacity in TB/GB in the **Description of planned activity** field.
+2. Once the application is approved you will be informed via email. Then please wait at least 30 minutes for our system to synchronize all data. Then you can continue to the Gatekeeper service to generate your credentials. To the [Gatekeeper service][g], you need to login via eduID.cz identity.
-2. Wait for a confirmation email with information about the endpoint (e.g. cl2) and link to [Filesender][e] where your key is stored.
+IT4I offers two tools for object storage management on Karolina and Barbora:
-    !!! important "Link Expiration"
-        Note that the link has an expiration date, so download the information to your local storage.
-    **Key Format:**
-    ```
-    "user": "51de0172a4***************************bc3cf1b211aacc9d5dc7876649a8",
-    "access_key": "IL2***********JAS",
-    "secret_key": "WIt******************HRAYvBtHmD"
-    ```
-    Use this information for management tool configuration.
-IT4I offers two tools for object storage management on Karolina:
 !!! Note
    We recommend using the default versions installed.
@@ -51,9 +36,11 @@ IT4I offers two tools for object storage management on Karolina:
 [4]: https://docs.it4i.cz/general/access/project-access/
 [a]: https://docs.e-infra.cz/storage/object-storage/s3-service/
-[b]: https://einfra.cesnet.cz/fed/registrar/?vo=storage
+[b]: https://einfra.cesnet.cz/allfed/registrar/?vo=VO_s3&group=s3_cl4&locale=en
 [c]: https://du.cesnet.cz/cs/navody/object_storage/cesnet_s3/start
 [d]: https://www.s3express.com/kb/item26.htm
 [e]: https://filesender.cesnet.cz/
+[f]: https://du.cesnet.cz/en/navody/object_storage/osobni_s3/start
+[g]: https://access.du.cesnet.cz/#/access
 [email]: mailto:du-support@cesnet.cz
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -111,6 +111,7 @@ nav:
 #        - Job Arrays: general/job-arrays.md
        - HyperQueue: general/hyperqueue.md
 #      - Parallel Computing and MPI: general/karolina-mpi.md
+        - Slurm Batch Examples: general/slurm-batch-examples.md
    - Tools:
        - Data Sharing Tools: general/tools/tools-list.md
        - OpenCode: general/tools/opencode.md
@@ -233,6 +234,7 @@ nav:
    - EESSI: software/eessi.md
    - GPU:
      - NVIDIA CUDA: software/nvidia-cuda.md
+      - NVIDIA CUDA Quantum: software/nvidia-cuda-q.md
      - ROCm HIP: software/nvidia-hip.md
    - Intel Suite:
      - Introduction: software/intel/intel-suite/intel-parallel-studio-introduction.md
@@ -334,6 +336,7 @@ plugins:
        - archive/*.md
        - prace.md
        - salomon/*.md
+  - mkdocs-pdf
 #  - optimize
 markdown_extensions:
@@ -348,3 +351,4 @@ markdown_extensions:
  - pymdownx.tabbed:
  - footnotes
  - pymdownx.superfences
+  - attr_list
--- a/requirements.txt
+++ b/requirements.txt
@@ -17,6 +17,7 @@ mkdocs-git-committers-plugin-2==1.1.2
 mkdocs-git-revision-date-localized-plugin==1.2.0
 mkdocs-material==9.1.12
 mkdocs-material-extensions==1.1.1
+mkdocs-pdf==0.1.2
 nltk==3.5
 packaging==20.4
 Pygments==2.7.1