Compare revisions

5a04518a · 5a04518a · 5a04518a · 5a04518a · 5a04518a · 5a04518a
--- a/docs.it4i/cs/guides/power10.md
+++ b/docs.it4i/cs/guides/power10.md
+# Using IBM Power Partition
+For testing your application on the IBM Power partition,
+you need to prepare a job script for that partition or use the interactive job:
+```console
+scalloc -N 1 -c 192 -A PROJECT-ID -p p07-power --time=08:00:00
+```
+where:
+- `-N 1` means allocation single node,
+- `-c 192` means allocation 192 cores (threads),
+- `-p p07-power` is IBM Power partition,
+- `--time=08:00:00` means allocation for 8 hours.
+On the partition, you should reload the list of modules:
+```
+ml architecture/ppc64le
+```
+The platform offers both `GNU` based and proprietary IBM toolchains for building applications. IBM also provides optimized BLAS routines library ([ESSL](https://www.ibm.com/docs/en/essl/6.1)), which can be used by both toolchain.
+## Building Applications
+Our sample application depends on `BLAS`, therefore we start by loading following modules (regardless of which toolchain we want to use):
+```
+ml GCC OpenBLAS
+```
+### GCC Toolchain
+In the case of GCC toolchain we can go ahead and compile the application as usual using either `g++`
+```
+g++ -lopenblas hello.cpp -o hello
+```
+or `gfortran`
+```
+gfortran -lopenblas hello.f90 -o hello
+```
+as usual.
+### IBM Toolchain
+The IBM toolchain requires additional environment setup as it is installed in `/opt/ibm` and is not exposed as a module
+```
+IBM_ROOT=/opt/ibm
+OPENXLC_ROOT=$IBM_ROOT/openxlC/17.1.1
+OPENXLF_ROOT=$IBM_ROOT/openxlf/17.1.1
+export PATH=$OPENXLC_ROOT/bin:$PATH
+export LD_LIBRARY_PATH=$OPENXLC_ROOT/lib:$LD_LIBRARY_PATH
+export PATH=$OPENXLF_ROOT/bin:$PATH
+export LD_LIBRARY_PATH=$OPENXLF_ROOT/lib:$LD_LIBRARY_PATH
+```
+from there we can use either `ibm-clang++`
+```
+ibm-clang++ -lopenblas hello.cpp -o hello
+```
+or `xlf`
+```
+xlf -lopenblas hello.f90 -o hello
+```
+to build the application as usual.
+!!! note
+    Combination of `xlf` and `openblas` seems to cause severe performance degradation. Therefore `ESSL` library should be preferred (see below).
+### Using ESSL Library
+The [ESSL](https://www.ibm.com/docs/en/essl/6.1) library is installed in `/opt/ibm/math/essl/7.1` so we define additional environment variables
+```
+IBM_ROOT=/opt/ibm
+ESSL_ROOT=${IBM_ROOT}math/essl/7.1
+export LD_LIBRARY_PATH=$ESSL_ROOT/lib64:$LD_LIBRARY_PATH
+```
+The simplest way to utilize `ESSL` in application, which already uses `BLAS` or `CBLAS` routines is to link with the provided `libessl.so`. This can be done by replacing `-lopenblas` with `-lessl` or `-lessl -lopenblas` (in case `ESSL` does not provide all required `BLAS` routines).
+In practice this can look like
+```
+g++ -L${ESSL_ROOT}/lib64 -lessl -lopenblas hello.cpp -o hello
+```
+or
+```
+gfortran -L${ESSL_ROOT}/lib64 -lessl -lopenblas hello.f90 -o hello
+```
+and similarly for IBM compilers (`ibm-clang++` and `xlf`).
+## Hello World Applications
+The `hello world` example application (written in `C++` and `Fortran`) uses simple stationary probability vector estimation to illustrate use of GEMM (BLAS 3 routine).
+Stationary probability vector estimation in `C++`:
+```c++
+#include <iostream>
+#include <vector>
+#include <chrono>
+#include "cblas.h"
+const size_t ITERATIONS  = 32;
+const size_t MATRIX_SIZE = 1024;
+int main(int argc, char *argv[])
+{
+    const size_t matrixElements = MATRIX_SIZE*MATRIX_SIZE;
+    std::vector<float> a(matrixElements, 1.0f / float(MATRIX_SIZE));
+    for(size_t i = 0; i < MATRIX_SIZE; ++i)
+        a[i] = 0.5f / (float(MATRIX_SIZE) - 1.0f);
+    a[0] = 0.5f;
+    std::vector<float> w1(matrixElements, 0.0f);
+    std::vector<float> w2(matrixElements, 0.0f);
+    std::copy(a.begin(), a.end(), w1.begin());
+    std::vector<float> *t1, *t2;
+    t1 = &w1;
+    t2 = &w2;
+    auto c1 = std::chrono::steady_clock::now();
+    for(size_t i = 0; i < ITERATIONS; ++i)
+    {
+        std::fill(t2->begin(), t2->end(), 0.0f);
+        cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, MATRIX_SIZE, MATRIX_SIZE, MATRIX_SIZE,
+                    1.0f, t1->data(), MATRIX_SIZE,
+                    a.data(), MATRIX_SIZE,
+                    1.0f, t2->data(), MATRIX_SIZE);
+        std::swap(t1, t2);
+    }
+    auto c2 = std::chrono::steady_clock::now();
+    for(size_t i = 0; i < MATRIX_SIZE; ++i)
+    {
+        std::cout << (*t1)[i*MATRIX_SIZE + i] << " ";
+    }
+    std::cout << std::endl;
+    std::cout << "Elapsed Time: " << std::chrono::duration<double>(c2 - c1).count() << std::endl;
+    return 0;
+}
+```
+Stationary probability vector estimation in `Fortran`:
+```fortran
+program main
+    implicit none
+    integer :: matrix_size, iterations
+    integer :: i
+    real, allocatable, target :: a(:,:), w1(:,:), w2(:,:)
+    real, dimension(:,:), contiguous, pointer :: t1, t2, tmp
+    real, pointer :: out_data(:), out_diag(:)
+    integer :: cr, cm, c1, c2
+    iterations  = 32
+    matrix_size = 1024
+    call system_clock(count_rate=cr)
+    call system_clock(count_max=cm)
+    allocate(a(matrix_size, matrix_size))
+    allocate(w1(matrix_size, matrix_size))
+    allocate(w2(matrix_size, matrix_size))
+    a(:,:) = 1.0 / real(matrix_size)
+    a(:,1) = 0.5 / real(matrix_size - 1)
+    a(1,1) = 0.5
+    w1 = a
+    w2(:,:) = 0.0
+    t1 => w1
+    t2 => w2
+    call system_clock(c1)
+    do i = 0, iterations
+        t2(:,:) = 0.0
+        call sgemm('N', 'N', matrix_size, matrix_size, matrix_size, 1.0, t1, matrix_size, a, matrix_size, 1.0, t2, matrix_size)
+        tmp => t1
+        t1  => t2
+        t2  => tmp
+    end do
+    call system_clock(c2)
+    out_data(1:size(t1)) => t1
+    out_diag => out_data(1::matrix_size+1)
+    print *, out_diag
+    print *, "Elapsed Time: ", (c2 - c1) / real(cr)
+    deallocate(a)
+    deallocate(w1)
+    deallocate(w2)
+end program main
+```
--- a/docs.it4i/cs/guides/xilinx.md
+++ b/docs.it4i/cs/guides/xilinx.md
--- a/docs.it4i/cs/introduction.md
+++ b/docs.it4i/cs/introduction.md
+# Complementary Systems
+Complementary systems offer development environment for users
+that need to port and optimize their code and applications
+for various hardware architectures and software technologies
+that are not available on standard clusters.
+## Complementary Systems 1
+First stage of complementary systems implementation comprises of these partitions:
+- compute partition 0 – based on ARM technology - legacy
+- compute partition 1 – based on ARM technology - A64FX
+- compute partition 2 – based on Intel technologies - Ice Lake, NVDIMMs + Bitware FPGAs
+- compute partition 3 – based on AMD technologies - Milan, MI100 GPUs + Xilinx FPGAs
+- compute partition 4 – reflecting Edge type of servers
+- partition 5 – FPGA synthesis server
+![](../img/cs1_1.png)
+## Complementary Systems 2
+Second stage of complementary systems implementation comprises of these partitions:
+- compute partition 6 - based on ARM technology + CUDA programmable GPGPU accelerators on ampere architecture + DPU network processing units
+- compute partition 7 - based on IBM Power10 architecture
+- compute partition 8 - modern CPU with a very high L3 cache capacity (over 750MB)
+- compute partition 9 - virtual GPU accelerated workstations
+- compute partition 10 - Sapphire Rapids-HBM server
+- compute partition 11 - NVIDIA Grace CPU Superchip
+![](../img/cs2_2.png)
+## Modules and Architecture Availability
+Complementary systems list available modules automatically based on the detected architecture.
+However, you can load one of the three modules -- `aarch64`, `avx2`, and `avx512` --
+to reload the list of modules available for the respective architecture:
+```console
+[user@login.cs ~]$ ml architecture/aarch64
+  aarch64 modules + all modules
+[user@login.cs ~]$ ml architecture/avx2
+  avx2 modules + all modules
+[user@login.cs ~]$ ml architecture/avx512
+  avx512 modules + all modules
+```
--- a/docs.it4i/cs/job-scheduling.md
+++ b/docs.it4i/cs/job-scheduling.md
+# Complementary System Job Scheduling
+## Introduction
+[Slurm][1] workload manager is used to allocate and access Complementary systems resources.
+## Getting Partition Information
+Display partitions/queues
+```console
+$ sinfo -s
+PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
+p00-arm      up 1-00:00:00          0/1/0/1 p00-arm01
+p01-arm*     up 1-00:00:00          0/8/0/8 p01-arm[01-08]
+p02-intel    up 1-00:00:00          0/2/0/2 p02-intel[01-02]
+p03-amd      up 1-00:00:00          0/2/0/2 p03-amd[01-02]
+p04-edge     up 1-00:00:00          0/1/0/1 p04-edge01
+p05-synt     up 1-00:00:00          0/1/0/1 p05-synt01
+p06-arm      up 1-00:00:00          0/2/0/2 p06-arm[01-02]
+p07-power    up 1-00:00:00          0/1/0/1 p07-power01
+p08-amd      up 1-00:00:00          0/1/0/1 p08-amd01
+p10-intel    up 1-00:00:00          0/1/0/1 p10-intel01
+```
+## Getting Job Information
+Show jobs
+```console
+$ squeue --me
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+               104   p01-arm interact    user   R       1:48      2 p01-arm[01-02]
+```
+Show job details for specific job
+```console
+$ scontrol -d show job JOBID
+```
+Show job details for executing job from job session
+```console
+$ scontrol -d show job $SLURM_JOBID
+```
+## Running Interactive Jobs
+Run interactive job
+```console
+ $ salloc -A PROJECT-ID -p p01-arm
+```
+Run interactive job, with X11 forwarding
+```console
+ $ salloc -A PROJECT-ID -p p01-arm --x11
+```
+!!! warning
+    Do not use `srun` for initiating interactive jobs, subsequent `srun`, `mpirun` invocations would block forever.
+## Running Batch Jobs
+Run batch job
+```console
+ $ sbatch -A PROJECT-ID -p p01-arm ./script.sh
+```
+Useful command options (salloc, sbatch, srun)
+* -n, --ntasks
+* -c, --cpus-per-task
+* -N, --nodes
+## Slurm Job Environment Variables
+Slurm provides useful information to the job via environment variables. Environment variables are available on all nodes allocated to job when accessed via Slurm supported means (srun, compatible mpirun).
+See all Slurm variables
+```
+set | grep ^SLURM
+```
+### Useful Variables
+| variable name | description | example |
+| ------ | ------ | ------ |
+| SLURM_JOB_ID | job id of the executing job| 593 |
+| SLURM_JOB_NODELIST | nodes allocated to the job | p03-amd[01-02] |
+| SLURM_JOB_NUM_NODES | number of nodes allocated to the job | 2 |
+| SLURM_STEP_NODELIST | nodes allocated to the job step | p03-amd01 |
+| SLURM_STEP_NUM_NODES | number of nodes allocated to the job step | 1 |
+| SLURM_JOB_PARTITION | name of the partition | p03-amd |
+| SLURM_SUBMIT_DIR | submit directory | /scratch/project/open-xx-yy/work |
+See [Slurm srun documentation][2] for details.
+Get job nodelist
+```
+$ echo $SLURM_JOB_NODELIST
+p03-amd[01-02]
+```
+Expand nodelist to list of nodes.
+```
+$ scontrol show hostnames $SLURM_JOB_NODELIST
+p03-amd01
+p03-amd02
+```
+## Modifying Jobs
+```
+$ scontrol update JobId=JOBID ATTR=VALUE
+```
+for example
+```
+$ scontrol update JobId=JOBID Comment='The best job ever'
+```
+## Deleting Jobs
+```
+$ scancel JOBID
+```
+## Partitions
+| PARTITION | nodes | whole node | cores per node | features |
+| --------- | ----- | ---------- | -------------- | -------- |
+| p00-arm   | 1     | yes        | 64             | aarch64,cortex-a72 |
+| p01-arm   | 8     | yes        | 48             | aarch64,a64fx,ib |
+| p02-intel | 2     | no         | 64             | x86_64,intel,icelake,ib,fpga,bitware,nvdimm |
+| p03-amd   | 2     | no         | 64             | x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx |
+| p04-edge  | 1     | yes        | 16             | 86_64,intel,broadwell,ib |
+| p05-synt  | 1     | yes        | 8              | x86_64,amd,milan,ib,ht |
+| p06-arm   | 2     | yes        | 80             | aarch64,ib |
+| p07-power | 1     | yes        | 192            | ppc64le,ib |
+| p08-amd   | 1     | yes        | 128            | x86_64,amd,milan-x,ib,ht |
+| p10-intel | 1     | yes        | 96             | x86_64,intel,sapphire_rapids,ht|
+Use `-t`, `--time` option to specify job run time limit. Default job time limit is 2 hours, maximum job time limit is 24 hours.
+FIFO scheduling with backfilling is employed.
+## Partition 00 - ARM (Cortex-A72)
+Whole node allocation.
+One node:
+```console
+salloc -A PROJECT-ID -p p00-arm
+```
+## Partition 01 - ARM (A64FX)
+Whole node allocation.
+One node:
+```console
+salloc -A PROJECT-ID -p p01-arm
+```
+```console
+salloc -A PROJECT-ID -p p01-arm -N=1
+```
+Multiple nodes:
+```console
+salloc -A PROJECT-ID -p p01-arm -N=8
+```
+## Partition 02 - Intel (Ice Lake, NVDIMMs + Bitware FPGAs)
+FPGAs are treated as resources. See below for more details about resources.
+Partial allocation - per FPGA, resource separation is not enforced.
+Use only FPGAs allocated to the job!
+One FPGA:
+```console
+salloc -A PROJECT-ID -p p02-intel --gres=fpga
+```
+Two FPGAs on the same node:
+```console
+salloc -A PROJECT-ID -p p02-intel --gres=fpga:2
+```
+All FPGAs:
+```console
+salloc -A PROJECT-ID -p p02-intel -N 2 --gres=fpga:2
+```
+## Partition 03 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)
+GPUs and FPGAs are treated as resources. See below for more details about resources.
+Partial allocation - per GPU and per FPGA, resource separation is not enforced.
+Use only GPUs and FPGAs allocated to the job!
+One GPU:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu
+```
+Two GPUs on the same node:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu:2
+```
+Four GPUs on the same node:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu:4
+```
+All GPUs:
+```console
+salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4
+```
+One FPGA:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=fpga
+```
+Two FPGAs:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=fpga:2
+```
+All FPGAs:
+```console
+salloc -A PROJECT-ID -p p03-amd -N 2--gres=fpga:2
+```
+One GPU and one FPGA on the same node:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu,fpga
+```
+Four GPUs and two FPGAs on the same node:
+```console
+salloc -A PROJECT-ID -p p03-amd --gres=gpu:4,fpga:2
+```
+All GPUs and FPGAs:
+```console
+salloc -A PROJECT-ID -p p03-amd -N 2 --gres=gpu:4,fpga:2
+```
+## Partition 04 - Edge Server
+Whole node allocation:
+```console
+salloc -A PROJECT-ID -p p04-edge
+```
+## Partition 05 - FPGA Synthesis Server
+Whole node allocation:
+```console
+salloc -A PROJECT-ID -p p05-synt
+```
+## Partition 06 - ARM
+Whole node allocation:
+```console
+salloc -A PROJECT-ID -p p06-arm
+```
+## Partition 07 - IBM Power
+Whole node allocation:
+```console
+salloc -A PROJECT-ID -p p07-power
+```
+## Partition 08 - AMD Milan-X
+Whole node allocation:
+```console
+salloc -A PROJECT-ID -p p08-amd
+```
+## Partition 10 - Intel Sapphire Rapids
+Whole node allocation:
+```console
+salloc -A PROJECT-ID -p p10-intel
+```
+## Features
+Nodes have feature tags assigned to them.
+Users can select nodes based on the feature tags using --constraint option.
+| Feature | Description |
+| ------ | ------ |
+| aarch64 | platform |
+| x86_64 | platform |
+| ppc64le | platform |
+| amd | manufacturer |
+| intel | manufacturer |
+| icelake | processor family |
+| broadwell | processor family |
+| sapphire_rapids | processor family |
+| milan | processor family |
+| milan-x | processor family |
+| ib | Infiniband |
+| gpu | equipped with GPU |
+| fpga | equipped with FPGA |
+| nvdimm | equipped with NVDIMMs |
+| ht | Hyperthreading enabled |
+| noht | Hyperthreading disabled |
+```
+$ sinfo -o '%16N %f'
+NODELIST         AVAIL_FEATURES
+p00-arm01        aarch64,cortex-a72
+p01-arm[01-08]   aarch64,a64fx,ib
+p02-intel01      x86_64,intel,icelake,ib,fpga,bitware,nvdimm,ht
+p02-intel02      x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
+p03-amd02        x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,noht
+p03-amd01        x86_64,amd,milan,ib,gpu,mi100,fpga,xilinx,ht
+p04-edge01       x86_64,intel,broadwell,ib,ht
+p05-synt01       x86_64,amd,milan,ib,ht
+p06-arm[01-02]   aarch64,ib
+p07-power01      ppc64le,ib
+p08-amd01        x86_64,amd,milan-x,ib,ht
+p10-intel01      x86_64,intel,sapphire_rapids,ht
+```
+```
+$ salloc -A PROJECT-ID -p p02-intel --constraint noht
+```
+```
+$ scontrol -d show node p02-intel02 | grep ActiveFeatures
+   ActiveFeatures=x86_64,intel,icelake,ib,fpga,bitware,nvdimm,noht
+```
+## Resources, GRES
+Slurm supports the ability to define and schedule arbitrary resources - Generic RESources (GRES) in Slurm's terminology. We use GRES for scheduling/allocating GPUs and FPGAs.
+!!! warning
+    Use only allocated GPUs and FPGAs. Resource separation is not enforced. If you use non-allocated resources, you can observe strange behavior and get into troubles.
+### Node Resources
+Get information about GRES on node.
+```
+$ scontrol -d show node p02-intel01 | grep Gres=
+   Gres=fpga:bitware_520n_mx:2
+$ scontrol -d show node p02-intel02 | grep Gres=
+   Gres=fpga:bitware_520n_mx:2
+$ scontrol -d show node p03-amd01 | grep Gres=
+   Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u250:2
+$ scontrol -d show node p03-amd02 | grep Gres=
+   Gres=gpu:amd_mi100:4,fpga:xilinx_alveo_u280:2
+```
+### Request Resources
+To allocate required resources (GPUs or FPGAs) use the `--gres salloc/srun` option.
+Example: Allocate one FPGA
+```
+$ salloc -A PROJECT-ID -p p03-amd --gres fpga:1
+```
+### Find Out Allocated Resources
+Information about allocated resources is available in Slurm job details, attributes `JOB_GRES` and `GRES`.
+```
+$ scontrol -d show job $SLURM_JOBID |grep GRES=
+   JOB_GRES=fpga:xilinx_alveo_u250:1
+     Nodes=p03-amd01 CPU_IDs=0-1 Mem=0 GRES=fpga:xilinx_alveo_u250:1(IDX:0)
+```
+IDX in the GRES attribute specifies index/indexes of FPGA(s) (or GPUs) allocated to the job on the node. In the given example - allocated resources are `fpga:xilinx_alveo_u250:1(IDX:0)`, we should use FPGA with index/number 0 on node p03-amd01.
+### Request Specific Resources
+It is possible to allocate specific resources. It is useful for partition p03-amd equipped with FPGAs of different types.
+GRES entry is using format "name[[:type]:count", in the following example name is fpga, type is xilinx_alveo_u280, and count is count 2.
+```
+$ salloc -A PROJECT-ID -p p03-amd --gres=fpga:xilinx_alveo_u280:2
+salloc: Granted job allocation XXX
+salloc: Waiting for resource configuration
+salloc: Nodes p03-amd02 are ready for job
+$ scontrol -d show job $SLURM_JOBID | grep -i gres
+   JOB_GRES=fpga:xilinx_alveo_u280:2
+     Nodes=p03-amd02 CPU_IDs=0 Mem=0 GRES=fpga:xilinx_alveo_u280(IDX:0-1)
+   TresPerNode=gres:fpga:xilinx_alveo_u280:2
+```
+[1]: https://slurm.schedmd.com/
+[2]: https://slurm.schedmd.com/srun.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES
--- a/docs.it4i/cs/specifications.md
+++ b/docs.it4i/cs/specifications.md
--- a/docs.it4i/dgx2/accessing.md
+++ b/docs.it4i/dgx2/accessing.md
+# Accessing the DGX-2
+## Before You Access
+!!! warning
+    GPUs are single-user devices. GPU memory is not purged between job runs and it can be read (but not written) by any user. Consider the confidentiality of your running jobs.
+## How to Access
+The DGX-2 machine is integrated into [Barbora cluster][3].
+The DGX-2 machine can be accessed from Barbora login nodes `barbora.it4i.cz` through the Barbora scheduler queue qdgx as a compute node cn202.
+## Storage
+There are three shared file systems on the DGX-2 system: HOME, SCRATCH (LSCRATCH), and PROJECT.
+### HOME
+The HOME filesystem is realized as an NFS filesystem. This is a shared home from the [Barbora cluster][1].
+### SCRATCH
+The SCRATCH is realized on an NVME storage. The SCRATCH filesystem is mounted in the `/scratch` directory.
+Accessible capacity is 22TB, shared among all users.
+!!! warning
+    Files on the SCRATCH filesystem that are not accessed for more than 60 days will be automatically deleted.
+### PROJECT
+The PROJECT data storage is IT4Innovations' central data storage accessible from all clusters.
+For more information on accessing PROJECT, its quotas, etc., see the [PROJECT Data Storage][2] section.
+[1]: ../../barbora/storage/#home-file-system
+[2]: ../../storage/project-storage
+[3]: ../../barbora/introduction
--- a/docs.it4i/dgx2/introduction.md
+++ b/docs.it4i/dgx2/introduction.md
+# NVIDIA DGX-2
+The DGX-2 is a very powerful computational node, featuring high end x86_64 processors and 16 NVIDIA V100-SXM3 GPUs.
+| NVIDIA DGX-2  | |
+| --- | --- |
+| CPUs | 2 x Intel Xeon Platinum |
+| GPUs | 16 x NVIDIA Tesla V100 32GB HBM2 |
+| System Memory | Up to 1.5 TB DDR4 |
+| GPU Memory | 512 GB HBM2 (16 x 32 GB)	|
+| Storage | 30 TB NVMe, Up to 60 TB |
+| Networking | 8 x Infiniband or 8 x 100 GbE |
+| Power | 10 kW	|
+| Size | 350 lbs |
+| GPU Throughput | Tensor: 1920 TFLOPs, FP16: 520 TFLOPs, FP32: 260 TFLOPs, FP64: 130 TFLOPs |
+The [DGX-2][a] introduces NVIDIA’s new NVSwitch, enabling 300 GB/s chip-to-chip communication at 12 times the speed of PCIe.
+With NVLink2, it enables 16x NVIDIA V100-SXM3 GPUs in a single system, for a total bandwidth going beyond 14 TB/s.
+Featuring pair of Xeon 8168 CPUs, 1.5 TB of memory, and 30 TB of NVMe storage,
+we get a system that consumes 10 kW, weighs 163.29 kg, but offers double precision performance in excess of 130TF.
+The DGX-2 is designed to be a powerful server in its own right.
+On the storage side, the DGX-2 comes with 30TB of NVMe-based solid state storage.
+For clustering or further inter-system communications, it also offers InfiniBand and 100GigE connectivity, up to eight of them.
+Further, the [DGX-2][b] offers  a total of ~2 PFLOPs of half precision performance in a single system, when using the tensor cores.
+![](../img/dgx1.png)
+With DGX-2, AlexNET, the network that 'started' the latest machine learning revolution, now takes 18 minutes.
+The DGX-2 is able to complete the training process
+for FAIRSEQ – a neural network model for language translation – 10x faster than a DGX-1 system,
+bringing it down to less than two days total rather than 15 days.
+The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity.
+The topology of the DGX-2 means that all 16 GPUs are able to pool their memory into a unified memory space,
+though with the usual tradeoffs involved if going off-chip.
+![](../img/dgx2-nvlink.png)
+[a]: https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Data-Center/dgx-2/nvidia-dgx-2-datasheet.pdf
+[b]: https://www.youtube.com/embed/OTOGw0BRqK0
--- a/docs.it4i/dgx2/job_execution.md
+++ b/docs.it4i/dgx2/job_execution.md
--- a/docs.it4i/dgx2/software.md
+++ b/docs.it4i/dgx2/software.md
+# Software Deployment
+Software deployment on DGX-2 is based on containers. NVIDIA provides a wide range of prepared Docker containers with a variety of different software. Users can easily download these containers and use them directly on the DGX-2.
+The catalog of all container images can be found on [NVIDIA site][a]. Supported software includes:
+* TensorFlow
+* MATLAB
+* GROMACS
+* Theano
+* Caffe2
+* LAMMPS
+* ParaView
+* ...
+## Running Containers on DGX-2
+NVIDIA expects usage of Docker as a containerization tool, but Docker is not a suitable solution in a multiuser environment. For this reason, the [Apptainer/Singularity container][b] solution is used.
+Singularity can be used similarly to Docker, just change the image URL address. For example, original command for Docker `docker run -it nvcr.io/nvidia/theano:18.08` should be changed to `singularity shell docker://nvcr.io/nvidia/theano:18.08`. More about Apptainer/Singularity [here][1].
+For fast container deployment, all images are cached after first use in the *lscratch* directory. This behavior can be changed by the *SINGULARITY_CACHEDIR* environment variable, but the start time of the container will increase significantly.
+```console
+$ ml av Singularity
+---------------------------- /apps/modules/tools ----------------------------
+   Singularity/3.3.0
+```
+## MPI Modules
+```console
+$ ml av MPI
+---------------------------- /apps/modules/mpi ----------------------------
+   OpenMPI/2.1.5-GCC-6.3.0-2.27    OpenMPI/3.1.4-GCC-6.3.0-2.27    OpenMPI/4.0.0-GCC-6.3.0-2.27 (D)    impi/2017.4.239-iccifort-2017.7.259-GCC-6.3.0-2.27
+```
+## Compiler Modules
+```console
+$ ml av gcc
+---------------------------- /apps/modules/compiler ----------------------------
+   GCC/6.3.0-2.27    GCCcore/6.3.0    icc/2017.7.259-GCC-6.3.0-2.27    ifort/2017.7.259-GCC-6.3.0-2.27
+```
+[1]: ../software/tools/singularity.md
+[a]: https://ngc.nvidia.com/catalog/landing
+[b]: https://www.sylabs.io/
--- a/docs.it4i/dice.md
+++ b/docs.it4i/dice.md
--- a/docs.it4i/einfracz-migration.md
+++ b/docs.it4i/einfracz-migration.md
+# Migration to e-INFRA CZ
+## Introduction
+IT4Innovations is a part of [e-INFRA CZ][1] - strategic research infrastructure of the Czech Republic, which provides capacities and resources for the transmission, storage, and processing of scientific and research data. In January 2022, IT4I has begun the process of integration of its services.
+As a part of the process, a joint e-INFRA CZ user base has been established. This included a migration of eligible IT4I accounts.
+## Who Has Been Affected
+The migration affects all accounts of users affiliated with an academic organizations in the Czech Republic who also have an OPEN-XX-XX project. Affected users have received an email with information about changes in personal data processing.
+## Who Has Not Been Affected
+Commercial users, training accounts, suppliers, and service accounts were **not** affected by the migration.
+## Process
+During the process, additional steps have been required for successful migration.
+This may have included:
+1. e-INFRA CZ registration, if one does not already exist.
+2. e-INFRA CZ password reset, if one does not already exist.
+## Steps After Migration
+After the migration, you must use your **e-INFRA CZ credentials** to access all IT4I services as well as [e-INFRA CZ services][5].
+Successfully migrated accounts tied to e-INFRA CZ can be self-managed at [e-INFRA CZ User profile][4].
+!!! tip "Recommendation"
+    We recommend [verifying your SSH keys][6] for cluster access.
+## Troubleshooting
+If you have a problem with your account migrated to e-INFRA CZ user base, contact the [CESNET support][7].
+If you have questions or a problem with IT4I account (i.e. account not eligible for migration), contact the [IT4I support][2].
+[1]: https://www.e-infra.cz/en
+[2]: mailto:support@it4i.cz
+[3]: https://www.cesnet.cz/?lang=en
+[4]: https://profile.e-infra.cz/
+[5]: https://www.e-infra.cz/en/services
+[6]: https://profile.e-infra.cz/profile/settings/sshKeys
+[7]: mailto:support@cesnet.cz
--- a/docs.it4i/environment-and-modules.md
+++ b/docs.it4i/environment-and-modules.md
--- a/docs.it4i/general/AUP-final.pdf
+++ b/docs.it4i/general/AUP-final.pdf
--- a/docs.it4i/general/Energy_saving_Karolina.pdf
+++ b/docs.it4i/general/Energy_saving_Karolina.pdf
--- a/docs.it4i/general/access/.gitkeep
+++ b/docs.it4i/general/access/.gitkeep
--- a/docs.it4i/general/access/account-introduction.md
+++ b/docs.it4i/general/access/account-introduction.md
+# Introduction
+This section provides basic information on how to gain access to IT4Innovations Information systems and project membership.
+## Account Types
+There are two types of accounts at IT4Innovations:
+* [**e-INFRA CZ Account**][1]
+    intended for all persons affiliated with an academic institution from the Czech Republic ([eduID.cz][a]).
+* [**IT4I Account**][2]
+    intended for all persons who are not eligible for an e-INFRA CZ account.
+Once you create an account, you can use it only for communication with IT4I support and accessing the SCS information system.
+If you want to access IT4I clusters, your account must also be **assigned to a project**.
+For more information, see the section:
+* [**Get Project Membership**][3]
+    if you want to become a collaborator on a project, or
+* [**Get Project**][4]
+    if you want to become a project owner.
+[1]: ./einfracz-account.md
+[2]: ../obtaining-login-credentials/obtaining-login-credentials.md
+[3]: ../access/project-access.md
+[4]: ../applying-for-resources.md
+[a]: https://www.eduid.cz/
--- a/docs.it4i/general/access/einfracz-account.md
+++ b/docs.it4i/general/access/einfracz-account.md
--- a/docs.it4i/general/access/project-access.md
+++ b/docs.it4i/general/access/project-access.md
--- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/ood.md
+++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/ood.md
--- a/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md
+++ b/docs.it4i/general/accessing-the-clusters/graphical-user-interface/vnc.md
No results found