Skip to content
Snippets Groups Projects
Commit 7abc5431 authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

new file: content/docs/anselm/compute-nodes.mdx

	new file:   content/docs/anselm/hardware-overview.mdx
	new file:   content/docs/anselm/introduction.mdx
	new file:   content/docs/anselm/network.mdx
	new file:   content/docs/anselm/storage.mdx
	new file:   content/docs/apiv1.mdx
	new file:   content/docs/archive/archive-intro.mdx
	new file:   content/docs/barbora/compute-nodes.mdx
	new file:   content/docs/barbora/hardware-overview.mdx
	new file:   content/docs/barbora/introduction.mdx
	new file:   content/docs/barbora/network.mdx
	new file:   content/docs/barbora/storage.mdx
	new file:   content/docs/barbora/visualization.mdx
	new file:   content/docs/cloud/einfracz-cloud.mdx
	new file:   content/docs/cloud/it4i-cloud.mdx
	new file:   content/docs/cloud/it4i-quotas.mdx
	new file:   content/docs/cs/accessing.mdx
	new file:   content/docs/cs/guides/amd.mdx
	new file:   content/docs/cs/guides/arm.mdx
	new file:   content/docs/cs/guides/grace.mdx
	new file:   content/docs/cs/guides/hm_management.mdx
	new file:   content/docs/cs/guides/horizon.mdx
	new file:   content/docs/cs/guides/power10.mdx
	new file:   content/docs/cs/guides/xilinx.mdx
	new file:   content/docs/cs/introduction.mdx
	new file:   content/docs/cs/job-scheduling.mdx
	new file:   content/docs/cs/specifications.mdx
	new file:   content/docs/dgx2/accessing.mdx
	new file:   content/docs/dgx2/introduction.mdx
	new file:   content/docs/dgx2/job_execution.mdx
	new file:   content/docs/dgx2/software.mdx
	new file:   content/docs/dice.mdx
	new file:   content/docs/einfracz-migration.mdx
	new file:   content/docs/environment-and-modules.mdx
	new file:   content/docs/general/access/account-introduction.mdx
	new file:   content/docs/general/access/einfracz-account.mdx
	new file:   content/docs/general/access/project-access.mdx
	new file:   content/docs/general/accessing-the-clusters/graphical-user-interface/ood.mdx
	new file:   content/docs/general/accessing-the-clusters/graphical-user-interface/vnc.mdx
	new file:   content/docs/general/accessing-the-clusters/graphical-user-interface/x-window-system.mdx
	new file:   content/docs/general/accessing-the-clusters/graphical-user-interface/xorg.mdx
	new file:   content/docs/general/accessing-the-clusters/shell-access-and-data-transfer/putty.mdx
	new file:   content/docs/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-key-management.mdx
	new file:   content/docs/general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys.mdx
	new file:   content/docs/general/accessing-the-clusters/tmux.mdx
	new file:   content/docs/general/accessing-the-clusters/vpn-access.mdx
	new file:   content/docs/general/applying-for-resources.mdx
	new file:   content/docs/general/aup.mdx
	new file:   content/docs/general/barbora-partitions.mdx
	new file:   content/docs/general/capacity-computing.mdx
	new file:   content/docs/general/energy.mdx
	new file:   content/docs/general/feedback.mdx
	new file:   content/docs/general/hyperqueue.mdx
	new file:   content/docs/general/job-arrays.mdx
	new file:   content/docs/general/job-priority.mdx
	new file:   content/docs/general/job-submission-and-execution.mdx
	new file:   content/docs/general/karolina-mpi.mdx
	new file:   content/docs/general/karolina-partitions.mdx
	new file:   content/docs/general/karolina-slurm.mdx
	new file:   content/docs/general/management/einfracz-profile.mdx
	new file:   content/docs/general/management/it4i-profile.mdx
	new file:   content/docs/general/obtaining-login-credentials/certificates-faq.mdx
	new file:   content/docs/general/obtaining-login-credentials/obtaining-login-credentials.mdx
	new file:   content/docs/general/pbs-job-submission-and-execution.mdx
	new file:   content/docs/general/resource-accounting.mdx
	new file:   content/docs/general/resource_allocation_and_job_execution.mdx
	new file:   content/docs/general/resources-allocation-policy.mdx
	new file:   content/docs/general/services-access.mdx
	new file:   content/docs/general/shell-and-data-access.mdx
	new file:   content/docs/general/slurm-batch-examples.mdx
	new file:   content/docs/general/slurm-job-submission-and-execution.mdx
	new file:   content/docs/general/support.mdx
	new file:   content/docs/general/tools/cicd.mdx
	new file:   content/docs/general/tools/codeit4i.mdx
	new file:   content/docs/general/tools/opencode.mdx
	new file:   content/docs/general/tools/portal-clients.mdx
	new file:   content/docs/general/tools/tools-list.mdx
	new file:   content/docs/index.mdx
	new file:   content/docs/job-features.mdx
	new file:   content/docs/karolina/compute-nodes.mdx
	new file:   content/docs/karolina/hardware-overview.mdx
	new file:   content/docs/karolina/introduction.mdx
	new file:   content/docs/karolina/network.mdx
	new file:   content/docs/karolina/storage.mdx
	new file:   content/docs/karolina/visualization.mdx
	new file:   content/docs/lumi/about.mdx
	new file:   content/docs/lumi/lumiai.mdx
	new file:   content/docs/lumi/openfoam.mdx
	new file:   content/docs/lumi/pytorch.mdx
	new file:   content/docs/lumi/software.mdx
	new file:   content/docs/lumi/support.mdx
	new file:   content/docs/prace.mdx
	new file:   content/docs/salomon/7d-enhanced-hypercube.mdx
	new file:   content/docs/salomon/compute-nodes.mdx
	new file:   content/docs/salomon/hardware-overview.mdx
	new file:   content/docs/salomon/ib-single-plane-topology.mdx
	new file:   content/docs/salomon/introduction.mdx
	new file:   content/docs/salomon/network.mdx
	new file:   content/docs/salomon/software/numerical-libraries/Clp.mdx
	new file:   content/docs/salomon/storage.mdx
	new file:   content/docs/salomon/visualization.mdx
	new file:   content/docs/software/bio/omics-master/diagnostic-component-team.mdx
	new file:   content/docs/software/bio/omics-master/overview.mdx
	new file:   content/docs/software/bio/omics-master/priorization-component-bierapp.mdx
	new file:   content/docs/software/cae/comsol/comsol-multiphysics.mdx
	new file:   content/docs/software/cae/comsol/licensing-and-available-versions.mdx
	new file:   content/docs/software/chemistry/gaussian.mdx
	new file:   content/docs/software/chemistry/molpro.mdx
	new file:   content/docs/software/chemistry/nwchem.mdx
	new file:   content/docs/software/chemistry/orca.mdx
	new file:   content/docs/software/chemistry/phono3py.mdx
	new file:   content/docs/software/chemistry/phonopy.mdx
	new file:   content/docs/software/chemistry/vasp.mdx
	new file:   content/docs/software/compilers.mdx
	new file:   content/docs/software/data-science/dask.mdx
	new file:   content/docs/software/debuggers/allinea-ddt.mdx
	new file:   content/docs/software/debuggers/allinea-performance-reports.mdx
	new file:   content/docs/software/debuggers/cube.mdx
	new file:   content/docs/software/debuggers/intel-performance-counter-monitor.mdx
	new file:   content/docs/software/debuggers/intel-vtune-amplifier.mdx
	new file:   content/docs/software/debuggers/intel-vtune-profiler.mdx
	new file:   content/docs/software/debuggers/introduction.mdx
	new file:   content/docs/software/debuggers/papi.mdx
	new file:   content/docs/software/debuggers/scalasca.mdx
	new file:   content/docs/software/debuggers/score-p.mdx
	new file:   content/docs/software/debuggers/total-view.mdx
	new file:   content/docs/software/debuggers/valgrind.mdx
	new file:   content/docs/software/debuggers/vampir.mdx
	new file:   content/docs/software/eessi.mdx
	new file:   content/docs/software/intel/intel-suite/intel-advisor.mdx
	new file:   content/docs/software/intel/intel-suite/intel-compilers.mdx
	new file:   content/docs/software/intel/intel-suite/intel-inspector.mdx
	new file:   content/docs/software/intel/intel-suite/intel-integrated-performance-primitives.mdx
	new file:   content/docs/software/intel/intel-suite/intel-mkl.mdx
	new file:   content/docs/software/intel/intel-suite/intel-parallel-studio-introduction.mdx
	new file:   content/docs/software/intel/intel-suite/intel-tbb.mdx
	new file:   content/docs/software/intel/intel-suite/intel-trace-analyzer-and-collector.mdx
	new file:   content/docs/software/isv_licenses.mdx
	new file:   content/docs/software/karolina-compilation.mdx
	new file:   content/docs/software/lang/conda.mdx
	new file:   content/docs/software/lang/csc.mdx
	new file:   content/docs/software/lang/java.mdx
	new file:   content/docs/software/lang/julialang.mdx
	new file:   content/docs/software/lang/python.mdx
	new file:   content/docs/software/machine-learning/alphafold.mdx
	new file:   content/docs/software/machine-learning/deepdock.mdx
	new file:   content/docs/software/machine-learning/introduction.mdx
	new file:   content/docs/software/machine-learning/netket.mdx
	new file:   content/docs/software/machine-learning/tensorflow.mdx
	new file:   content/docs/software/modules/lmod.mdx
	new file:   content/docs/software/modules/new-software.mdx
	new file:   content/docs/software/mpi/jupyter_mpi.mdx
	new file:   content/docs/software/mpi/mpi.mdx
	new file:   content/docs/software/mpi/mpi4py-mpi-for-python.mdx
	new file:   content/docs/software/mpi/ompi-examples.mdx
	new file:   content/docs/software/numerical-languages/introduction.mdx
	new file:   content/docs/software/numerical-languages/matlab.mdx
	new file:   content/docs/software/numerical-languages/octave.mdx
	new file:   content/docs/software/numerical-languages/opencoarrays.mdx
	new file:   content/docs/software/numerical-languages/r.mdx
	new file:   content/docs/software/numerical-libraries/fftw.mdx
	new file:   content/docs/software/numerical-libraries/gsl.mdx
	new file:   content/docs/software/numerical-libraries/hdf5.mdx
	new file:   content/docs/software/numerical-libraries/intel-numerical-libraries.mdx
	new file:   content/docs/software/numerical-libraries/petsc.mdx
	new file:   content/docs/software/nvidia-cuda-q.mdx
	new file:   content/docs/software/nvidia-cuda.mdx
	new file:   content/docs/software/nvidia-hip.mdx
	new file:   content/docs/software/sdk/nvhpc.mdx
	new file:   content/docs/software/sdk/openacc-mpi.mdx
	new file:   content/docs/software/tools/ansys/ansys-cfx.mdx
	new file:   content/docs/software/tools/ansys/ansys-fluent.mdx
	new file:   content/docs/software/tools/ansys/ansys-ls-dyna.mdx
	new file:   content/docs/software/tools/ansys/ansys-mechanical-apdl.mdx
	new file:   content/docs/software/tools/ansys/ansys.mdx
	new file:   content/docs/software/tools/ansys/licensing.mdx
	new file:   content/docs/software/tools/ansys/setting-license-preferences.mdx
	new file:   content/docs/software/tools/ansys/workbench.mdx
	new file:   content/docs/software/tools/apptainer.mdx
	new file:   content/docs/software/tools/easybuild-images.mdx
	new file:   content/docs/software/tools/easybuild.mdx
	new file:   content/docs/software/tools/singularity.mdx
	new file:   content/docs/software/tools/spack.mdx
	new file:   content/docs/software/tools/virtualization.mdx
	new file:   content/docs/software/viz/NICEDCVsoftware.mdx
	new file:   content/docs/software/viz/gpi2.mdx
	new file:   content/docs/software/viz/insitu.mdx
	new file:   content/docs/software/viz/openfoam.mdx
	new file:   content/docs/software/viz/ovito.mdx
	new file:   content/docs/software/viz/paraview.mdx
	new file:   content/docs/software/viz/qtiplot.mdx
	new file:   content/docs/software/viz/vesta.mdx
	new file:   content/docs/software/viz/vgl.mdx
	new file:   content/docs/storage/awscli.mdx
	new file:   content/docs/storage/cesnet-s3.mdx
	new file:   content/docs/storage/cesnet-storage.mdx
	new file:   content/docs/storage/nfs4-file-acl.mdx
	new file:   content/docs/storage/proj4-storage.mdx
	new file:   content/docs/storage/project-storage.mdx
	new file:   content/docs/storage/s3cmd.mdx
	new file:   content/docs/storage/standard-file-acl.mdx
	new file:   public/it4i/barbora/img/BullSequanaX.png
	new file:   public/it4i/barbora/img/BullSequanaX1120.png
	new file:   public/it4i/barbora/img/BullSequanaX410E5GPUNVLink.jpg
	new file:   public/it4i/barbora/img/BullSequanaX808.jpg
	new file:   public/it4i/barbora/img/QM8700.jpg
	new file:   public/it4i/barbora/img/XH2000.png
	new file:   public/it4i/barbora/img/bullsequanaX450-E5.png
	new file:   public/it4i/barbora/img/gpu-v100.png
	new file:   public/it4i/barbora/img/hdr.jpg
	new file:   public/it4i/barbora/img/quadrop6000.jpg
	new file:   public/it4i/cloud/.gitkeep
	new file:   public/it4i/cs/.gitkeep
	new file:   public/it4i/general/AUP-final.pdf
	new file:   public/it4i/general/Energy_saving_Karolina.pdf
	new file:   public/it4i/general/access/.gitkeep
	new file:   public/it4i/general/capacity.zip
	new file:   public/it4i/general/management/.gitkeep
	new file:   public/it4i/general/tools/.gitkeep
	new file:   public/it4i/img/49213048_2722927791082867_3152356642071248896_n.png
	new file:   public/it4i/img/7D_Enhanced_hypercube.png
	new file:   public/it4i/img/AMsetPar1.png
	new file:   public/it4i/img/Anselm-Schematic-Representation.png
	new file:   public/it4i/img/Anselmprofile.jpg
	new file:   public/it4i/img/Ansys-lic-admin.jpg
	new file:   public/it4i/img/Authorization_chain.png
	new file:   public/it4i/img/B2ACCESS_chrome_eng.jpg
	new file:   public/it4i/img/Fluent_Licence_1.jpg
	new file:   public/it4i/img/Fluent_Licence_2.jpg
	new file:   public/it4i/img/Fluent_Licence_3.jpg
	new file:   public/it4i/img/Fluent_Licence_4.jpg
	new file:   public/it4i/img/IBsingleplanetopologyAcceleratednodessmall.png
	new file:   public/it4i/img/IBsingleplanetopologyICEXMcellsmall.png
	new file:   public/it4i/img/Matlab.png
	new file:   public/it4i/img/PageantV.png
	new file:   public/it4i/img/PuTTY_host_Salomon.png
	new file:   public/it4i/img/PuTTY_host_cluster.png
	new file:   public/it4i/img/PuTTY_keyV.png
	new file:   public/it4i/img/PuTTY_open_Salomon.png
	new file:   public/it4i/img/PuTTY_open_cluster.png
	new file:   public/it4i/img/PuTTY_save_Salomon.png
	new file:   public/it4i/img/PuTTY_save_cluster.png
	new file:   public/it4i/img/PuttyKeygeneratorV.png
	new file:   public/it4i/img/PuttyKeygenerator_001V.png
	new file:   public/it4i/img/PuttyKeygenerator_002V.png
	new file:   public/it4i/img/PuttyKeygenerator_003V.png
	new file:   public/it4i/img/PuttyKeygenerator_004V.png
	new file:   public/it4i/img/PuttyKeygenerator_005V.png
	new file:   public/it4i/img/PuttyKeygenerator_006V.png
	new file:   public/it4i/img/Salomon_IB_topology.png
	new file:   public/it4i/img/Snmekobrazovky20141204v12.56.36.png
	new file:   public/it4i/img/Snmekobrazovky20151204v15.35.12.png
	new file:   public/it4i/img/Snmekobrazovky20160211v14.27.45.png
	new file:   public/it4i/img/Snmekobrazovky20160708v12.33.35.png
	new file:   public/it4i/img/TightVNC_login.png
	new file:   public/it4i/img/aai.jpg
	new file:   public/it4i/img/aai2.jpg
	new file:   public/it4i/img/aai3-passwd.jpg
	new file:   public/it4i/img/addsshkey.png
	new file:   public/it4i/img/altair_logo.svg
	new file:   public/it4i/img/anyconnectcontextmenu.jpg
	new file:   public/it4i/img/anyconnecticon.jpg
	new file:   public/it4i/img/application.png
	new file:   public/it4i/img/b2access-fill.jpg
	new file:   public/it4i/img/b2access-no_account.jpg
	new file:   public/it4i/img/b2access-select.jpg
	new file:   public/it4i/img/b2access-social.jpg
	new file:   public/it4i/img/b2access-univerzity.jpg
	new file:   public/it4i/img/b2access.jpg
	new file:   public/it4i/img/barbora_cluster_usage.png
	new file:   public/it4i/img/bio-graphs.png
	new file:   public/it4i/img/blender.png
	new file:   public/it4i/img/blender1.png
	new file:   public/it4i/img/blender2.png
	new file:   public/it4i/img/bullxB510.png
	new file:   public/it4i/img/client.jpg
	new file:   public/it4i/img/cn_m_cell.jpg
	new file:   public/it4i/img/cn_mic-1.jpg
	new file:   public/it4i/img/cn_mic.jpg
	new file:   public/it4i/img/copy_of_vpn_web_install_3.png
	new file:   public/it4i/img/crypto_v2.jpg
	new file:   public/it4i/img/cs/guides/p10_numa_sc4_flat.png
	new file:   public/it4i/img/cs/guides/p10_stream_dram.png
	new file:   public/it4i/img/cs/guides/p10_stream_hbm.png
	new file:   public/it4i/img/cs/guides/p10_stream_memkind.png
	new file:   public/it4i/img/cs1.png
	new file:   public/it4i/img/cs1_1.png
	new file:   public/it4i/img/cs2_1.png
	new file:   public/it4i/img/cs2_2.png
	new file:   public/it4i/img/cudaq.png
	new file:   public/it4i/img/cygwinX11forwarding.png
	new file:   public/it4i/img/dcv_5911_1.png
	new file:   public/it4i/img/dcv_5911_2.png
	new file:   public/it4i/img/dcvtest_5911.png
	new file:   public/it4i/img/ddt1.png
	new file:   public/it4i/img/desktop.ini
	new file:   public/it4i/img/dgx-htop.png
	new file:   public/it4i/img/dgx1.png
	new file:   public/it4i/img/dgx2-nvlink.png
	new file:   public/it4i/img/dgx2.png
	new file:   public/it4i/img/dgx3.png
	new file:   public/it4i/img/dgx4.png
	new file:   public/it4i/img/dis_clluster.png
	new file:   public/it4i/img/download.png
	new file:   public/it4i/img/downloadfilesuccessfull.jpeg
	new file:   public/it4i/img/eosc-marketplace-active.jpg
	new file:   public/it4i/img/eosc-providers.jpg
	new file:   public/it4i/img/eudat_request.jpg
	new file:   public/it4i/img/eudat_v2.jpg
	new file:   public/it4i/img/executionaccess.jpeg
	new file:   public/it4i/img/executionaccess2.jpeg
	new file:   public/it4i/img/external.png
	new file:   public/it4i/img/fairshare_formula.png
	new file:   public/it4i/img/favicon.ico
	new file:   public/it4i/img/fc_vpn_web_login.png
	new file:   public/it4i/img/fc_vpn_web_login_2_1.png
	new file:   public/it4i/img/fc_vpn_web_login_3_1.png
	new file:   public/it4i/img/fig1.png
	new file:   public/it4i/img/fig2.png
	new file:   public/it4i/img/fig3.png
	new file:   public/it4i/img/fig4.png
	new file:   public/it4i/img/fig5.png
	new file:   public/it4i/img/fig6.png
	new file:   public/it4i/img/fig7.png
	new file:   public/it4i/img/fig7x.png
	new file:   public/it4i/img/fig8.png
	new file:   public/it4i/img/fig9.png
	new file:   public/it4i/img/firstrun.jpg
	new file:   public/it4i/img/floatingip.png
	new file:   public/it4i/img/gdmdisablescreensaver.png
	new file:   public/it4i/img/gdmscreensaver.png
	new file:   public/it4i/img/git.png
	new file:   public/it4i/img/global_ramdisk.png
	new file:   public/it4i/img/glxgears.jpg
	new file:   public/it4i/img/gnome-compute-nodes-over-vnc.png
	new file:   public/it4i/img/gnome-terminal.png
	new file:   public/it4i/img/gnome_screen.png
	new file:   public/it4i/img/gpu.png
	new file:   public/it4i/img/hdl_net.jpg
	new file:   public/it4i/img/hdl_pid.jpg
	new file:   public/it4i/img/horizon.png
	new file:   public/it4i/img/hq-architecture.png
	new file:   public/it4i/img/hq-idea-s.png
	new file:   public/it4i/img/instalationfile.jpeg
	new file:   public/it4i/img/instance.png
	new file:   public/it4i/img/instance1.png
	new file:   public/it4i/img/instance2.png
	new file:   public/it4i/img/instance3.png
	new file:   public/it4i/img/instance4.png
	new file:   public/it4i/img/instance5.png
	new file:   public/it4i/img/irods-cyberduck.jpg
	new file:   public/it4i/img/irods_linking_link.jpg
	new file:   public/it4i/img/it4i-ci.png
	new file:   public/it4i/img/it4i-ci.svg
	new file:   public/it4i/img/it4i-cz-128.png
	new file:   public/it4i/img/it4i-cz-256.png
	new file:   public/it4i/img/it4i-cz-512.png
	new file:   public/it4i/img/it4i-cz.png
	new file:   public/it4i/img/it4i-en-128.png
	new file:   public/it4i/img/it4i-en-256.png
	new file:   public/it4i/img/it4i-en-512.png
	new file:   public/it4i/img/it4i-en.png
	new file:   public/it4i/img/java_detection.jpeg
	new file:   public/it4i/img/job.jpg
	new file:   public/it4i/img/job_sort_formula.png
	new file:   public/it4i/img/keypairs.png
	new file:   public/it4i/img/keypairs1.png
	new file:   public/it4i/img/legend.png
	new file:   public/it4i/img/login.jpeg
	new file:   public/it4i/img/login.png
	new file:   public/it4i/img/logingui.jpg
	new file:   public/it4i/img/loginwithprofile.jpeg
	new file:   public/it4i/img/logo.png
	new file:   public/it4i/img/logo2.png
	new file:   public/it4i/img/monitor_job.png
	new file:   public/it4i/img/mount.png
	new file:   public/it4i/img/node_gui_sshx.png
	new file:   public/it4i/img/node_gui_xwindow.png
	new file:   public/it4i/img/ood-ansys.png
	new file:   public/it4i/img/ovito_data_pipeline.png
	new file:   public/it4i/img/paraview.png
	new file:   public/it4i/img/paraview1.png
	new file:   public/it4i/img/paraview2.png
	new file:   public/it4i/img/paraview_connect.png
	new file:   public/it4i/img/paraview_connect_salomon.png
	new file:   public/it4i/img/paraview_ssh_tunnel.png
	new file:   public/it4i/img/paraview_ssh_tunnel_salomon.png
	new file:   public/it4i/img/pdf.png
	new file:   public/it4i/img/putty-tunnel.png
	new file:   public/it4i/img/puttygen.png
	new file:   public/it4i/img/puttygenconvert.png
	new file:   public/it4i/img/quality1.png
	new file:   public/it4i/img/quality2.png
	new file:   public/it4i/img/quality3.png
	new file:   public/it4i/img/report.png
	new file:   public/it4i/img/rsweb.png
	new file:   public/it4i/img/rswebsalomon.png
	new file:   public/it4i/img/salomon-1.jpeg
	new file:   public/it4i/img/salomon-2.jpg
	new file:   public/it4i/img/salomon-3.jpeg
	new file:   public/it4i/img/salomon-4.jpeg
	new file:   public/it4i/img/salomon.jpg
	new file:   public/it4i/img/scheme.png
	new file:   public/it4i/img/search_icon.png
	new file:   public/it4i/img/securityg.png
	new file:   public/it4i/img/securityg1.png
	new file:   public/it4i/img/securityg2.png
	new file:   public/it4i/img/sgi-c1104-gp1.jpeg
	new file:   public/it4i/img/sh.png
	new file:   public/it4i/img/ssh.jpg
	new file:   public/it4i/img/sshfs.png
	new file:   public/it4i/img/sshfs1.png
	new file:   public/it4i/img/sshfs2.png
	new file:   public/it4i/img/successfullconnection.jpg
	new file:   public/it4i/img/successfullinstalation.jpeg
	new file:   public/it4i/img/totalview1.png
	new file:   public/it4i/img/totalview2.png
	new file:   public/it4i/img/turbovncclientsetting.png
	new file:   public/it4i/img/uv-2000.jpeg
	new file:   public/it4i/img/virtualization-job-workflow.png
	new file:   public/it4i/img/viz1-win.png
	new file:   public/it4i/img/viz1.png
	new file:   public/it4i/img/viz2-win.png
	new file:   public/it4i/img/viz2.png
	new file:   public/it4i/img/viz3-win.png
	new file:   public/it4i/img/viz3.png
	new file:   public/it4i/img/viz4-win.png
	new file:   public/it4i/img/viz5-win.png
	new file:   public/it4i/img/viz6-win.png
	new file:   public/it4i/img/viz7-win.png
	new file:   public/it4i/img/vizsrv_5911.png
	new file:   public/it4i/img/vizsrv_logout.png
	new file:   public/it4i/img/vmware.png
	new file:   public/it4i/img/vnc.jpg
	new file:   public/it4i/img/vncviewer.png
	new file:   public/it4i/img/vpnuiV.png
	new file:   public/it4i/img/vtune-amplifier.png
	new file:   public/it4i/irods.cyberduckprofile
	new file:   public/it4i/irods_environment.json
	new file:   public/it4i/karolina/.gitkeep
	new file:   public/it4i/karolina/img/.gitkeep
	new file:   public/it4i/karolina/img/apolloproliant.png
	new file:   public/it4i/karolina/img/compute_network_topology_v2.png
	new file:   public/it4i/karolina/img/hpeapollo6500.png
	new file:   public/it4i/karolina/img/proliantdl385.png
	new file:   public/it4i/karolina/img/qrtx6000.png
	new file:   public/it4i/karolina/img/superdomeflex.png
	new file:   public/it4i/lumi/.gitkeep
	new file:   public/it4i/software/chemistry/files-nwchem/h2o.nw
	new file:   public/it4i/software/chemistry/files-phono3py/INCAR.txt
	new file:   public/it4i/software/chemistry/files-phono3py/KPOINTS.txt
	new file:   public/it4i/software/chemistry/files-phono3py/POSCAR.txt
	new file:   public/it4i/software/chemistry/files-phono3py/POTCAR.txt
	new file:   public/it4i/software/chemistry/files-phono3py/gofree-cond1.sh
	new file:   public/it4i/software/chemistry/files-phono3py/prepare.sh
	new file:   public/it4i/software/chemistry/files-phono3py/run.sh
	new file:   public/it4i/software/chemistry/files-phono3py/submit.sh
	new file:   public/it4i/software/chemistry/files-phonopy/INCAR.txt
	new file:   public/it4i/software/chemistry/files-phonopy/KPOINTS.txt
	new file:   public/it4i/software/chemistry/files-phonopy/POSCAR.txt
	new file:   public/it4i/software/chemistry/files-phonopy/mesh.conf
	new file:   public/it4i/software/data-science/imgs/dask-arch.svg
	new file:   public/it4i/software/debuggers/mympiprog_32p_2014-10-15_16-56.html
	new file:   public/it4i/software/debuggers/mympiprog_32p_2014-10-15_16-56.txt
	new file:   public/it4i/software/lang/winston.svg
	new file:   public/it4i/software/mpi/img/jupyter_new.png
	new file:   public/it4i/software/mpi/img/jupyter_ood_start.png
	new file:   public/it4i/software/mpi/img/jupyter_run.png
	new file:   public/it4i/software/mpi/img/ood_jupyter.png
	new file:   public/it4i/software/viz/insitu/CMakeLists.txt
	new file:   public/it4i/software/viz/insitu/FEAdaptor.cxx
	new file:   public/it4i/software/viz/insitu/FEAdaptor.h
	new file:   public/it4i/software/viz/insitu/FEDataStructures.cxx
	new file:   public/it4i/software/viz/insitu/FEDataStructures.h
	new file:   public/it4i/software/viz/insitu/FEDriver.cxx
	new file:   public/it4i/software/viz/insitu/feslicescript.py
	new file:   public/it4i/software/viz/insitu/img/Catalyst_connect.png
	new file:   public/it4i/software/viz/insitu/img/CoProcess.png
	new file:   public/it4i/software/viz/insitu/img/Data_shown.png
	new file:   public/it4i/software/viz/insitu/img/Extract_input.png
	new file:   public/it4i/software/viz/insitu/img/FEDriver.png
	new file:   public/it4i/software/viz/insitu/img/Finalize.png
	new file:   public/it4i/software/viz/insitu/img/Initialize.png
	new file:   public/it4i/software/viz/insitu/img/Input_pipeline.png
	new file:   public/it4i/software/viz/insitu/img/Result.png
	new file:   public/it4i/software/viz/insitu/img/Show_velocity.png
	new file:   public/it4i/software/viz/insitu/img/Simulator_response.png
	new file:   public/it4i/software/viz/insitu/img/UpdateFields.png
	new file:   public/it4i/software/viz/insitu/img/feslicescript.png
	new file:   public/it4i/software/viz/insitu/insitu.tar.gz
	new file:   public/it4i/src/IB_single-plane_topology_-_Accelerated_nodes.pdf
	new file:   public/it4i/src/IB_single-plane_topology_-_ICEX_Mcell.pdf
	new file:   public/it4i/src/css.css
	new file:   public/it4i/src/mympiprog_32p_2014-10-15_16-56.html
	new file:   public/it4i/src/mympiprog_32p_2014-10-15_16-56.txt
	new file:   public/it4i/src/ompi/Hello.java
	new file:   public/it4i/src/ompi/Ring.java
	new file:   public/it4i/src/ompi/connectivity_c.c
	new file:   public/it4i/src/ompi/hello_c.c
	new file:   public/it4i/src/ompi/hello_cxx.cc
	new file:   public/it4i/src/ompi/hello_mpifh.f
	new file:   public/it4i/src/ompi/hello_oshmem_c.c
	new file:   public/it4i/src/ompi/hello_oshmem_cxx.cc
	new file:   public/it4i/src/ompi/hello_oshmemfh.f90
	new file:   public/it4i/src/ompi/hello_usempi.f90
	new file:   public/it4i/src/ompi/hello_usempif08.f90
	new file:   public/it4i/src/ompi/ompi.tar.gz
	new file:   public/it4i/src/ompi/oshmem_circular_shift.c
	new file:   public/it4i/src/ompi/oshmem_max_reduction.c
	new file:   public/it4i/src/ompi/oshmem_shmalloc.c
	new file:   public/it4i/src/ompi/oshmem_strided_puts.c
	new file:   public/it4i/src/ompi/oshmem_symmetric_data.c
	new file:   public/it4i/src/ompi/ring_c.c
	new file:   public/it4i/src/ompi/ring_cxx.cc
	new file:   public/it4i/src/ompi/ring_mpifh.f
	new file:   public/it4i/src/ompi/ring_oshmem_c.c
	new file:   public/it4i/src/ompi/ring_oshmemfh.f90
	new file:   public/it4i/src/ompi/ring_usempi.f90
	new file:   public/it4i/src/ompi/ring_usempif08.f90
	new file:   public/it4i/src/ompi/spc_example.c
	new file:   public/it4i/src/qnn_example.txt
	new file:   public/it4i/src/srun_karolina.pdf
	new file:   public/it4i/storage/.gitkeep
	new file:   public/it4i/storage/img/file-storage-block4.png
	new file:   public/it4i/storage/img/project-storage-overview2.png
	new file:   public/it4i/storage/img/project-storage-overview3.png
	new file:   scripts/maketitle.py
	new file:   scripts/movefiles.sh
	new file:   scripts/movepublic.sh
	deleted:    scripts/preklopeni_dokumentace/html_md.sh
parent bb890de2
Branches
No related tags found
1 merge request!486new file: content/docs/anselm/compute-nodes.mdx
Showing
with 2502 additions and 0 deletions
---
title: "Compute Nodes"
---
## Node Configuration
Anselm is a cluster of x86-64 Intel-based nodes built with the Bull Extreme Computing bullx technology. The cluster contains four types of compute nodes.
### Compute Nodes Without Accelerators
* 180 nodes
* 2880 cores in total
* two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
* 64 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* bullx B510 blade servers
* cn[1-180]
### Compute Nodes With a GPU Accelerator
* 23 nodes
* 368 cores in total
* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
* 96 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* GPU accelerator 1x NVIDIA Tesla Kepler K20m per node
* bullx B515 blade servers
* cn[181-203]
### Compute Nodes With a MIC Accelerator
* 4 nodes
* 64 cores in total
* two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
* 96 GB of physical memory per node
* one 500GB SATA 2,5” 7,2 krpm HDD per node
* MIC accelerator 1x Intel Phi 5110P per node
* bullx B515 blade servers
* cn[204-207]
### Fat Compute Nodes
* 2 nodes
* 32 cores in total
* 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
* 512 GB of physical memory per node
* two 300GB SAS 3,5” 15krpm HDD (RAID1) per node
* two 100GB SLC SSD per node
* bullx R423-E3 servers
* cn[208-209]
![](/it4i/img/bullxB510.png)
**Anselm bullx B510 servers**
### Compute Node Summary
| Node type | Count | Range | Memory | Cores | Queues |
| ---------------------------- | ----- | ----------- | ------ | ----------- | -------------------------------------- |
| Nodes without an accelerator | 180 | cn[1-180] | 64GB | 16 @ 2.4GHz | qexp, qprod, qlong, qfree, qprace, qatlas |
| Nodes with a GPU accelerator | 23 | cn[181-203] | 96GB | 16 @ 2.3GHz | qnvidia, qexp |
| Nodes with a MIC accelerator | 4 | cn[204-207] | 96GB | 16 @ 2.3GHz | qmic, qexp |
| Fat compute nodes | 2 | cn[208-209] | 512GB | 16 @ 2.4GHz | qfat, qexp |
## Processor Architecture
Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665 (nodes without accelerators and fat nodes) and Intel Xeon E5-2470 (nodes with accelerators). The processors support Advanced Vector Extensions (AVX) 256-bit instruction set.
### Intel Sandy Bridge E5-2665 Processor
* eight-core
* speed: 2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
* peak performance: 19.2 GFLOP/s per core
* caches:
* L2: 256 KB per core
* L3: 20 MB per processor
* memory bandwidth at the level of the processor: 51.2 GB/s
### Intel Sandy Bridge E5-2470 Processor
* eight-core
* speed: 2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
* peak performance: 18.4 GFLOP/s per core
* caches:
* L2: 256 KB per core
* L3: 20 MB per processor
* memory bandwidth at the level of the processor: 38.4 GB/s
Nodes equipped with Intel Xeon E5-2665 CPU have a set PBS resource attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU have set PBS resource attribute cpu_freq = 23.
```console
$ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16:cpu_freq=24 -I
```
In this example, we allocate 4 nodes, 16 cores at 2.4GHhz per node.
Intel Turbo Boost Technology is used by default, you can disable it for all nodes of job by using the cpu_turbo_boost resource attribute.
```console
$ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I
```
## Memmory Architecture
The cluster contains three types of compute nodes.
### Compute Nodes Without Accelerators
* 2 sockets
* Memory Controllers are integrated into processors.
* 8 DDR3 DIMMs per node
* 4 DDR3 DIMMs per CPU
* 1 DDR3 DIMMs per channel
* Data rate support: up to 1600MT/s
* Populated memory: 8 x 8 GB DDR3 DIMM 1600 MHz
### Compute Nodes With a GPU or MIC Accelerator
* 2 sockets
* Memory Controllers are integrated into processors.
* 6 DDR3 DIMMs per node
* 3 DDR3 DIMMs per CPU
* 1 DDR3 DIMMs per channel
* Data rate support: up to 1600MT/s
* Populated memory: 6 x 16 GB DDR3 DIMM 1600 MHz
### Fat Compute Nodes
* 2 sockets
* Memory Controllers are integrated into processors.
* 16 DDR3 DIMMs per node
* 8 DDR3 DIMMs per CPU
* 2 DDR3 DIMMs per channel
* Data rate support: up to 1600MT/s
* Populated memory: 16 x 32 GB DDR3 DIMM 1600 MHz
---
title: "Hardware Overview"
---
The Anselm cluster consists of 209 computational nodes named cn[1-209] of which 180 are regular compute nodes, 23 are GPU Kepler K20 accelerated nodes, 4 are MIC Xeon Phi 5110P accelerated nodes, and 2 are fat nodes. Each node is a powerful x86-64 computer, equipped with 16 cores (two eight-core Intel Sandy Bridge processors), at least 64 GB of RAM, and a local hard drive. User access to the Anselm cluster is provided by two login nodes login[1,2]. The nodes are interlinked through high speed InfiniBand and Ethernet networks. All nodes share a 320 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data.
The Fat nodes are equipped with a large amount (512 GB) of memory. Virtualization infrastructure provides resources to run long-term servers and services in virtual mode. Fat nodes and virtual servers may access 45 TB of dedicated block storage. Accelerated nodes, fat nodes, and virtualization infrastructure are available [upon request][a] from a PI.
Schematic representation of the Anselm cluster. Each box represents a node (computer) or storage capacity:
![](/it4i/img/Anselm-Schematic-Representation.png)
The cluster compute nodes cn[1-207] are organized within 13 chassis.
There are four types of compute nodes:
* 180 compute nodes without an accelerator
* 23 compute nodes with a GPU accelerator - an NVIDIA Tesla Kepler K20m
* 4 compute nodes with a MIC accelerator - an Intel Xeon Phi 5110P
* 2 fat nodes - equipped with 512 GB of RAM and two 100 GB SSD drives
[More about Compute nodes][1].
GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy][2].
All of these nodes are interconnected through fast InfiniBand and Ethernet networks. [More about the Network][3].
Every chassis provides an InfiniBand switch, marked **isw**, connecting all nodes in the chassis, as well as connecting the chassis to the upper level switches.
All of the nodes share a 360 TB /home disk for storage of user files. The 146 TB shared /scratch storage is available for scratch data. These file systems are provided by the Lustre parallel file system. There is also local disk storage available on all compute nodes in /lscratch. [More about Storage][4].
User access to the Anselm cluster is provided by two login nodes login1, login2, and data mover node dm1. [More about accessing the cluster][5].
The parameters are summarized in the following tables:
| **In general** | |
| ------------------------------------------- | -------------------------------------------- |
| Primary purpose | High Performance Computing |
| Architecture of compute nodes | x86-64 |
| Operating system | Linux (CentOS) |
| [**Compute nodes**][1] | |
| Total | 209 |
| Processor cores | 16 (2 x 8 cores) |
| RAM | min. 64 GB, min. 4 GB per core |
| Local disk drive | yes - usually 500 GB |
| Compute network | InfiniBand QDR, fully non-blocking, fat-tree |
| w/o accelerator | 180, cn[1-180] |
| GPU accelerated | 23, cn[181-203] |
| MIC accelerated | 4, cn[204-207] |
| Fat compute nodes | 2, cn[208-209] |
| **In total** | |
| Total theoretical peak performance (Rpeak) | 94 TFLOP/s |
| Total max. LINPACK performance (Rmax) | 73 TFLOP/s |
| Total amount of RAM | 15.136 TB |
| Node | Processor | Memory | Accelerator |
| ---------------- | --------------------------------------- | ------ | -------------------- |
| w/o accelerator | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 64 GB | - |
| GPU accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | NVIDIA Kepler K20m |
| MIC accelerated | 2 x Intel Sandy Bridge E5-2470, 2.3 GHz | 96 GB | Intel Xeon Phi 5110P |
| Fat compute node | 2 x Intel Sandy Bridge E5-2665, 2.4 GHz | 512 GB | - |
For more details, refer to [Compute nodes][1], [Storage][4], and [Network][3].
[1]: compute-nodes.md
[2]: ../general/resources-allocation-policy.md
[3]: network.md
[4]: storage.md
[5]: ../general/shell-and-data-access.md
[a]: https://support.it4i.cz/rt
---
title: "Introduction"
---
Welcome to the Anselm supercomputer cluster. The Anselm cluster consists of 209 compute nodes, totaling 3344 compute cores with 15 TB RAM, giving over 94 TFLOP/s theoretical peak performance. Each node is a powerful x86-64 computer, equipped with 16 cores, at least 64 GB of RAM, and a 500 GB hard disk drive. Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network and are equipped with Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware Overview][1].
Anselm runs with an operating system compatible with the Red Hat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2].
The user data shared file-system (HOME, 320 TB) and job data shared file-system (SCRATCH, 146 TB) are available to users.
The PBS Professional workload manager provides [computing resources allocations and job execution][3].
Read more on how to [apply for resources][4], [obtain login credentials][5] and [access the cluster][6].
[1]: hardware-overview.md
[2]: ../environment-and-modules.md
[3]: ../general/resources-allocation-policy.md
[4]: ../general/applying-for-resources.md
[5]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[6]: ../general/shell-and-data-access.md
[a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg
---
title: "Network"
---
All of the compute and login nodes of Anselm are interconnected through an [InfiniBand][a] QDR network and a Gigabit [Ethernet][b] network. Both networks may be used to transfer user data.
## InfiniBand Network
All of the compute and login nodes of Anselm are interconnected through a high-bandwidth, low-latency [InfiniBand][a] QDR network (IB 4 x QDR, 40 Gbps). The network topology is a fully non-blocking fat-tree.
The compute nodes may be accessed via the InfiniBand network using the ib0 network interface, in address range 10.2.1.1-209. The MPI may be used to establish native InfiniBand connection among the nodes.
!!! note
The network provides **2170 MB/s** transfer rates via the TCP connection (single stream) and up to **3600 MB/s** via the native InfiniBand protocol.
The Fat tree topology ensures that peak transfer rates are achieved between any two nodes, independent of network traffic exchanged among other nodes concurrently.
## Ethernet Network
The compute nodes may be accessed via the regular Gigabit Ethernet network interface eth0, in the address range 10.1.1.1-209, or by using aliases cn1-cn209. The network provides **114 MB/s** transfer rates via the TCP connection.
## Example
In this example, we access the node cn110 through the InfiniBand network via the ib0 interface, then from cn110 to cn108 through the Ethernet network.
```console
$ qsub -q qexp -l select=4:ncpus=16 -N Name0 ./myjob
$ qstat -n -u username
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -- |---|---| ------ --- --- ------ ----- - -----
15209.srv11 username qexp Name0 5530 4 64 -- 01:00 R 00:00
cn17/0*16+cn108/0*16+cn109/0*16+cn110/0*16
$ ssh 10.2.1.110
$ ssh 10.1.1.108
```
[a]: http://en.wikipedia.org/wiki/InfiniBand
[b]: http://en.wikipedia.org/wiki/Ethernet
This diff is collapsed.
---
title: "API Placeholder"
---
This page is created automatically from the API source code.
---
title: "Introduction"
---
This section contains documentation of decommissioned IT4Innovations' supercomputers and services.
## Salomon
The second supercomputer, built by SGI (now Hewlett Packard Enterprise), was launched in 2015. With a performance of 2 PFlop/s, it was immediately included in the TOP500 list, which ranks the world's most powerful supercomputers. It stayed there until November 2020, falling from the 40th place to 460th.
Salomon was decommissioned after six years - at the end of 2021.
### Interesting Facts
| Salomon's facts | |
| ---------------------------- | ------------------ |
| In operation | Q2 2015 - Q4 2021 |
| Theoretical peak performance | 2 PFLOP/s |
| Number of nodes | 1,008 |
| HOME storage capacity | 500 TB |
| SCRATCH storage capacity | 1,638 TB |
| Projects computed | 1,085 |
| Computing jobs run | ca. 8,700,000 |
| Corehours used | ca. 1,014,000,000 |
## Anselm
The first supercomputer, built by Atos, was launched in 2013. For the first 3 years, it was placed in makeshift containers on the campus of VSB – Technical University of Ostrava, and was subsequently moved to the data room of the newly constructed IT4Innovations building. Anselm's computational resources were available to Czech and foreign students and scientists in fields such as material sciences, computational chemistry, biosciences, and engineering.
At the end of January 2021, after more than seven years, its operation permanently ceased. In the future, it will be a part of the [World of Civilization exhibition][a] in Lower Vitkovice.
### Interesting Facts
| Anselm's facts | |
| ---------------------------- | ------------------ |
| Cost | 90,000,000 CZK |
| In operation | Q2 2013 - Q1 2021 |
| Theoretical peak performance | 94 TFLOP/s |
| Number of nodes | 209 |
| HOME storage capacity | 320 TB |
| SCRATCH storage capacity | 146 TB |
| Projects computed | 725 |
| Computing jobs run | 2,630,567 |
| Corehours used | 134,130,309 |
| Power consumption | 77 kW |
## PRACE
Partnership for Advanced Computing in Europe aims to facilitate the access to a research infrastructure that enables high-impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. For more information, see the [official website][b].
[a]: https://www.dolnivitkovice.cz/en/science-and-technology-centre/exhibitions/
[b]: https://prace-ri.eu/
---
title: "Compute Nodes"
---
Barbora is a cluster of x86-64 Intel-based nodes built with the BullSequana Computing technology.
The cluster contains three types of compute nodes.
## Compute Nodes Without Accelerators
* 192 nodes
* 6912 cores in total
* 2x Intel Cascade Lake 6240, 18-core, 2.6 GHz processors per node
* 192 GB DDR4 2933 MT/s of physical memory per node (12x16 GB)
* BullSequana X1120 blade servers
* 2995.2 GFLOP/s per compute node
* 1x 1 GB Ethernet
* 1x HDR100 IB port
* 3 compute nodes per X1120 blade server
* cn[1-192]
![](/it4i/barbora/img/BullSequanaX1120.png)
## Compute Nodes With a GPU Accelerator
* 8 nodes
* 192 cores in total
* two Intel Skylake Gold 6126, 12-core, 2.6 GHz processors per node
* 192 GB DDR4 2933MT/s with ECC of physical memory per node (12x16 GB)
* 4x GPU accelerator NVIDIA Tesla V100-SXM2 per node
* Bullsequana X410-E5 NVLink-V blade servers
* 1996.8 GFLOP/s per compute nodes
* GPU-to-GPU All-to-All NVLINK 2.0, GPU-Direct
* 1 GB Ethernet
* 2x HDR100 IB ports
* cn[193-200]
![](/it4i/barbora/img/BullSequanaX410E5GPUNVLink.jpg)
## Fat Compute Node
* 1x BullSequana X808 server
* 128 cores in total
* 8 Intel Skylake 8153, 16-core, 2.0 GHz, 125 W
* 6144 GiB DDR4 2667 MT/s of physical memory per node (92x64 GB)
* 2x HDR100 IB port
* 8192 GFLOP/s
* cn[201]
![](/it4i/barbora/img/BullSequanaX808.jpg)
## Compute Node Summary
| Node type | Count | Range | Memory | Cores |
| ---------------------------- | ----- | ----------- | -------- | ------------- |
| Nodes without an accelerator | 192 | cn[1-192] | 192 GB | 36 @ 2.6 GHz |
| Nodes with a GPU accelerator | 8 | cn[193-200] | 192 GB | 24 @ 2.6 GHz |
| Fat compute nodes | 1 | cn[201] | 6144 GiB | 128 @ 2.0 GHz |
## Processor Architecture
Barbora is equipped with Intel Cascade Lake processors Intel Xeon 6240 (nodes without accelerators),
Intel Skylake Gold 6126 (nodes with accelerators) and Intel Skylake Platinum 8153.
### Intel [Cascade Lake 6240][d]
Cascade Lake core is largely identical to that of [Skylake's][a].
For in-depth detail of the Skylake core/pipeline see [Skylake (client) § Pipeline][b].
Xeon Gold 6240 is a 64-bit 18-core x86 multi-socket high performance server microprocessor set to be introduced by Intel in late 2018. This chip supports up to 4-way multiprocessing. The Gold 6240, which is based on the Cascade Lake microarchitecture and is manufactured on a 14 nm process, sports 2 AVX-512 FMA units as well as three Ultra Path Interconnect links. This microprocessor, which operates at 2.6 GHz with a TDP of 150 W and a turbo boost frequency of up to 3.9 GHz, supports up 1 TB of hexa-channel DDR4-2933 ECC memory.
* **Family**: Xeon Gold
* **Cores**: 18
* **Threads**: 36
* **L1I Cache**: 576 KiB, 18x32 KiB, 8-way set associative
* **L1D Cache**: 576 KiB, 18x32 KiB, 8-way set associative, write-back
* **L2 Cache**: 18 MiB, 18x1 MiB, 16-way set associative, write-back
* **L3 Cache**: 24.75 MiB, 18x1.375 MiB, 11-way set associative, write-back
* **Instructions**: x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX, AVX-512 (New instructions for [Vector Neural Network Instructions][c])
* **Frequency**: 2.6 GHz
* **Max turbo**: 3.9 GHz
* **Process**: 14 nm
* **TDP**: 140+ W
### Intel [Skylake Gold 6126][e]
Xeon Gold 6126 is a 64-bit dodeca-core x86 multi-socket high performance server microprocessor introduced by Intel in mid-2017. This chip supports up to 4-way multiprocessing. The Gold 6126, which is based on the server configuration of the Skylake microarchitecture and is manufactured on a 14 nm+ process, sports 2 AVX-512 FMA units as well as three Ultra Path Interconnect links. This microprocessor, which operates at 2.6 GHz with a TDP of 125 W and a turbo boost frequency of up to 3.7 GHz, supports up to 768 GiB of hexa-channel DDR4-2666 ECC memory.
* **Family**: Xeon Gold
* **Cores**: 12
* **Threads**: 24
* **L1I Cache**: 384 KiB, 12x32 KiB, 8-way set associative
* **L1D Cache**: 384 KiB, 12x32 KiB, 8-way set associative, write-back
* **L2 Cache**: 12 MiB, 12x1 MiB, 16-way set associative, write-back
* **L3 Cache**: 19.25 MiB, 14x1.375 MiB, 11-way set associative, write-back
* **Instructions**: x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX, AVX-512
* **Frequency**: 2.6 GHz
* **Max turbo**: 3.7 GHz
* **Process**: 14 nm
* **TDP**: 125 W
### Intel [Skylake Platinum 8153][f]
Xeon Platinum 8153 is a 64-bit 16-core x86 multi-socket highest performance server microprocessor introduced by Intel in mid-2017. This chip supports up to 8-way multiprocessing. The Platinum 8153, which is based on the server configuration of the Skylake microarchitecture and is manufactured on a 14 nm+ process, sports 2 AVX-512 FMA units as well as three Ultra Path Interconnect links. This microprocessor, which operates at 2 GHz with a TDP of 125 W and a turbo boost frequency of up to 2.8 GHz, supports up to 768 GiB of hexa-channel DDR4-2666 ECC memory.
* **Family**: Xeon Platinum
* **Cores**: 16
* **Threads**: 32
* **L1I Cache**: 512 KiB, 16x32 KiB, 8-way set associative
* **L1D Cache**: 512 KiB, 16x32 KiB, 8-way set associative, write-back
* **L2 Cache**: 16 MiB, 16x1 MiB, 16-way set associative, write-back
* **L3 Cache**: 22 MiB, 16x1.375 MiB, 11-way set associative, write-back
* **Instructions**: x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, VT-x, VT-d, TXT, TSX, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SGX, MPX, AVX-512
* **Frequency**: 2.0 GHz
* **Max turbo**: 2.8 GHz
* **Process**: 14 nm
* **TDP**: 125 W
## GPU Accelerator
Barbora is equipped with an [NVIDIA Tesla V100-SXM2][g] accelerator.
![](/it4i/barbora/img/gpu-v100.png)
| NVIDIA Tesla V100-SXM2 | |
| ---------------------------- | -------------------------------------- |
| GPU Architecture | NVIDIA Volta |
| NVIDIA Tensor Cores | 640 |
| NVIDIA CUDA® Cores | 5120 |
| Double-Precision Performance | 7.8 TFLOP/s |
| Single-Precision Performance | 15.7 TFLOP/s |
| Tensor Performance | 125 TFLOP/s |
| GPU Memory | 16 GB HBM2 |
| Memory Bandwidth | 900 GB/sec |
| ECC | Yes |
| Interconnect Bandwidth | 300 GB/sec |
| System Interface | NVIDIA NVLink |
| Form Factor | SXM2 |
| Max Power Consumption | 300 W |
| Thermal Solution | Passive |
| Compute APIs | CUDA, DirectCompute, OpenCLTM, OpenACC |
[a]: https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)#Core
[b]: https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Pipeline
[c]: https://en.wikichip.org/wiki/x86/avx512vnni
[d]: https://en.wikichip.org/wiki/intel/xeon_gold/6240
[e]: https://en.wikichip.org/wiki/intel/xeon_gold/6126
[f]: https://en.wikichip.org/wiki/intel/xeon_platinum/8153
[g]: https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf
---
title: "Hardware Overview"
---
The Barbora cluster consists of 201 computational nodes named **cn[001-201]**
of which 192 are regular compute nodes, 8 are GPU Tesla V100 accelerated nodes and 1 is a fat node.
Each node is a powerful x86-64 computer, equipped with 36/24/128 cores
(18-core Intel Cascade Lake 6240 / 12-core Intel Skylake Gold 6126 / 16-core Intel Skylake 8153), at least 192 GB of RAM.
User access to the Barbora cluster is provided by two login nodes **login[1,2]**.
The nodes are interlinked through high speed InfiniBand and Ethernet networks.
The fat node is equipped with 6144 GB of memory.
Virtualization infrastructure provides resources for running long-term servers and services in virtual mode.
The Accelerated nodes, fat node, and virtualization infrastructure are available [upon request][a] from a PI.
**There are three types of compute nodes:**
* 192 compute nodes without an accelerator
* 8 compute nodes with a GPU accelerator - 4x NVIDIA Tesla V100-SXM2
* 1 fat node - equipped with 6144 GB of RAM
[More about compute nodes][1].
GPU and accelerated nodes are available upon request, see the [Resources Allocation Policy][2].
All of these nodes are interconnected through fast InfiniBand and Ethernet networks.
[More about the computing network][3].
Every chassis provides an InfiniBand switch, marked **isw**, connecting all nodes in the chassis,
as well as connecting the chassis to the upper level switches.
User access to Barbora is provided by two login nodes: login1 and login2.
[More about accessing the cluster][5].
The parameters are summarized in the following tables:
| **In general** | |
| ------------------------------------------- | -------------------------------------------- |
| Primary purpose | High Performance Computing |
| Architecture of compute nodes | x86-64 |
| Operating system | Linux |
| [**Compute nodes**][1] | |
| Total | 201 |
| Processor cores | 36/24/128 (2x18 cores/2x12 cores/8x16 cores) |
| RAM | min. 192 GB |
| Local disk drive | no |
| Compute network | InfiniBand HDR |
| w/o accelerator | 192, cn[001-192] |
| GPU accelerated | 8, cn[193-200] |
| Fat compute nodes | 1, cn[201] |
| **In total** | |
| Total theoretical peak performance (Rpeak) | 848.8448 TFLOP/s |
| Total amount of RAM | 44.544 TB |
| Node | Processor | Memory | Accelerator |
| ---------------- | --------------------------------------- | ------ | ---------------------- |
| Regular node | 2x Intel Cascade Lake 6240, 2.6 GHz | 192GB | - |
| GPU accelerated | 2x Intel Skylake Gold 6126, 2.6 GHz | 192GB | NVIDIA Tesla V100-SXM2 |
| Fat compute node | 2x Intel Skylake Platinum 8153, 2.0 GHz | 6144GB | - |
For more details refer to the sections [Compute Nodes][1], [Storage][4], [Visualization Servers][6], and [Network][3].
[1]: compute-nodes.md
[2]: ../general/resources-allocation-policy.md
[3]: network.md
[4]: storage.md
[5]: ../general/shell-and-data-access.md
[6]: visualization.md
[a]: https://support.it4i.cz/rt
---
title: "Introduction"
---
Welcome to Barbora supercomputer cluster. The Barbora cluster consists of 201 compute nodes, totaling 7232 compute cores with 44544 GB RAM, giving over 848 TFLOP/s theoretical peak performance.
Nodes are interconnected through a fully non-blocking fat-tree InfiniBand network, and are equipped with Intel Cascade Lake processors. A few nodes are also equipped with NVIDIA Tesla V100-SXM2. Read more in [Hardware Overview][1].
The cluster runs with an operating system compatible with the Red Hat [Linux family][a]. We have installed a wide range of software packages targeted at different scientific domains. These packages are accessible via the [modules environment][2].
The user data shared file system and job data shared file system are available to users.
The [Slurm][b] workload manager provides [computing resources allocations and job execution][3].
Read more on how to [apply for resources][4], [obtain login credentials][5] and [access the cluster][6].
![](/it4i/barbora/img/BullSequanaX.png)
[1]: hardware-overview.md
[2]: ../environment-and-modules.md
[3]: ../general/resources-allocation-policy.md
[4]: ../general/applying-for-resources.md
[5]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[6]: ../general/shell-and-data-access.md
[a]: http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg
[b]: https://slurm.schedmd.com/
---
title: "Network"
---
All of the compute and login nodes of Barbora are interconnected through a [InfiniBand][a] HDR 200 Gbps network and a Gigabit Ethernet network.
Compute nodes and the service infrastructure is connected by the HDR100 technology
that allows one 200 Gbps HDR port (aggregation 4x 50 Gbps) to be divided into two HDR100 ports with 100 Gbps (2x 50 Gbps) bandwidth.
The cabling between the L1 and L2 layer is realized by HDR cabling,
connecting the end devices is realized by so called Y or splitter cable (1x HRD200 - 2x HDR100).
![](/it4i/barbora/img/hdr.jpg)
**The computing network thus implemented fulfills the following parameters**
* 100Gbps
* Latencies less than 10 microseconds (0.6 μs end-to-end, <90ns switch hop)
* Adaptive routing support
* MPI communication support
* IP protocol support (IPoIB)
* Support for SCRATCH Data Storage and NVMe over Fabric Data Storage.
## Mellanox QM8700 40-Ports Switch
**Performance**
* 40x HDR 200 Gb/s ports in a 1U switch
* 80x HDR100 100 Gb/s ports in a 1U switch
* 16 Tb/s aggregate switch throughput
* Up to 15.8 billion messages-per-second
* 90ns switch latency
**Optimized Design**
* 1+1 redundant & hot-swappable power
* 80 gold+ and energy star certified power supplies
* Dual-core x86 CPU
**Advanced Design**
* Adaptive routing
* Collective offloads (Mellanox SHARP technology)
* VL mapping (VL2VL)
![](/it4i/barbora/img/QM8700.jpg)
## BullSequana XH2000 HDRx WH40 MODULE
* Mellanox QM8700 switch modified for direct liquid cooling (Atos Cold Plate), with form factor for installing the Bull Sequana XH2000 rack
![](/it4i/barbora/img/XH2000.png)
[a]: http://en.wikipedia.org/wiki/InfiniBand
---
title: "Storage"
---
There are three main shared file systems on the Barbora cluster: [HOME][1], [SCRATCH][2], and [PROJECT][5]. All login and compute nodes may access same data on shared file systems. Compute nodes are also equipped with local (non-shared) scratch, RAM disk, and tmp file systems.
## Archiving
Do not use shared filesystems as a backup for large amount of data or long-term archiving mean. The academic staff and students of research institutions in the Czech Republic can use [CESNET storage service][3], which is available via SSHFS.
## Shared Filesystems
Barbora computer provides three main shared filesystems, the [HOME filesystem][1], [SCRATCH filesystem][2], and the [PROJECT filesystems][5].
All filesystems are accessible via the Infiniband network.
The HOME and PROJECT filesystems are realized as NFS filesystem.
The SCRATCH filesystem is realized as a parallel Lustre filesystem.
Extended ACLs are provided on both Lustre filesystems for sharing data with other users using fine-grained control
### Understanding the Lustre Filesystems
A user file on the [Lustre filesystem][a] can be divided into multiple chunks (stripes) and stored across a subset of the object storage targets (OSTs) (disks). The stripes are distributed among the OSTs in a round-robin fashion to ensure load balancing.
When a client (a compute node from your job) needs to create or access a file, the client queries the metadata server (MDS) and the metadata target (MDT) for the layout and location of the [file's stripes][b]. Once the file is opened and the client obtains the striping information, the MDS is no longer involved in the file I/O process. The client interacts directly with the object storage servers (OSSes) and OSTs to perform I/O operations such as locking, disk allocation, storage, and retrieval.
If multiple clients try to read and write the same part of a file at the same time, the Lustre distributed lock manager enforces coherency, so that all clients see consistent results.
There is default stripe configuration for Barbora Lustre filesystems. However, users can set the following stripe parameters for their own directories or files to get optimum I/O performance:
1. `stripe_size` the size of the chunk in bytes; specify with k, m, or g to use units of KB, MB, or GB, respectively; the size must be an even multiple of 65,536 bytes; default is 1MB for all Barbora Lustre filesystems
1. `stripe_count` the number of OSTs to stripe across; default is 1 for Barbora Lustre filesystems one can specify -1 to use all OSTs in the filesystem.
1. `stripe_offset` the index of the OST where the first stripe is to be placed; default is -1 which results in random selection; using a non-default value is NOT recommended.
!!! note
Setting stripe size and stripe count correctly for your needs may significantly affect the I/O performance.
Use the `lfs getstripe` command for getting the stripe parameters. Use `lfs setstripe` for setting the stripe parameters to get optimal I/O performance. The correct stripe setting depends on your needs and file access patterns.
```console
$ lfs getstripe dir|filename
$ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir|filename
```
Example:
```console
$ lfs getstripe /scratch/projname
$ lfs setstripe -c -1 /scratch/projname
$ lfs getstripe /scratch/projname
```
In this example, we view the current stripe setting of the /scratch/projname/ directory. The stripe count is changed to all OSTs and verified. All files written to this directory will be striped over 5 OSTs
Use `lfs check osts` to see the number and status of active OSTs for each filesystem on Barbora. Learn more by reading the man page:
```console
$ lfs check osts
$ man lfs
```
### Hints on Lustre Stripping
!!! note
Increase the `stripe_count` for parallel I/O to the same file.
When multiple processes are writing blocks of data to the same file in parallel, the I/O performance for large files will improve when the `stripe_count` is set to a larger value. The stripe count sets the number of OSTs to which the file will be written. By default, the stripe count is set to 1. While this default setting provides for efficient access of metadata (for example to support the `ls -l` command), large files should use stripe counts of greater than 1. This will increase the aggregate I/O bandwidth by using multiple OSTs in parallel instead of just one. A rule of thumb is to use a stripe count approximately equal to the number of gigabytes in the file.
Another good practice is to make the stripe count be an integral factor of the number of processes performing the write in parallel, so that you achieve load balance among the OSTs. For example, set the stripe count to 16 instead of 15 when you have 64 processes performing the writes.
!!! note
Using a large stripe size can improve performance when accessing very large files
Large stripe size allows each client to have exclusive access to its own part of a file. However, it can be counterproductive in some cases if it does not match your I/O pattern. The choice of stripe size has no effect on a single-stripe file.
Read more [here][c].
### Lustre on Barbora
The architecture of Lustre on Barbora is composed of two metadata servers (MDS) and two data/object storage servers (OSS).
Configuration of the SCRATCH storage
* 2x Metadata server
* 2x Object storage server
* Lustre object storage
* One disk array NetApp E2800
* 54x 8TB 10kRPM 2,5” SAS HDD
* 5 x RAID6(8+2) OST Object storage target
* 4 hotspare
* Lustre metadata storage
* One disk array NetApp E2600
* 12 300GB SAS 15krpm disks
* 2 groups of 5 disks in RAID5 Metadata target
* 2 hot-spare disks
### HOME File System
The HOME filesystem is mounted in directory /home. Users home directories /home/username reside on this filesystem. Accessible capacity is 28TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 25GB per user. Should 25GB prove insufficient, contact [support][d], the quota may be lifted upon request.
!!! note
The HOME filesystem is intended for preparation, evaluation, processing and storage of data generated by active Projects.
The HOME filesystem should not be used to archive data of past Projects or other unrelated data.
The files on HOME filesystem will not be deleted until the end of the [user's lifecycle][4].
The filesystem is backed up, so that it can be restored in case of a catastrophic failure resulting in significant data loss. However, this backup is not intended to restore old versions of user data or to restore (accidentally) deleted files.
| HOME filesystem | |
| -------------------- | --------------- |
| Accesspoint | /home/username |
| Capacity | 28TB |
| Throughput | 1GB/s |
| User space quota | 25GB |
| User inodes quota | 500K |
| Protocol | NFS |
### SCRATCH File System
The SCRATCH is realized as Lustre parallel file system and is available from all login and computational nodes. There are 5 OSTs dedicated for the SCRATCH file system.
The SCRATCH filesystem is mounted in the `/scratch/project/PROJECT_ID` directory created automatically with the `PROJECT_ID` project. Accessible capacity is 310TB, shared among all users. Individual users are restricted by filesystem usage quotas, set to 10TB per user. The purpose of this quota is to prevent runaway programs from filling the entire filesystem and deny service to other users. Should 10TB prove insufficient, contact [support][d], the quota may be lifted upon request.
!!! note
The Scratch filesystem is intended for temporary scratch data generated during the calculation as well as for high-performance access to input and output files. All I/O intensive jobs must use the SCRATCH filesystem as their working directory.
Users are advised to save the necessary data from the SCRATCH filesystem to HOME filesystem after the calculations and clean up the scratch files.
!!! warning
Files on the SCRATCH filesystem that are **not accessed for more than 90 days** will be automatically **deleted**.
The SCRATCH filesystem is realized as Lustre parallel filesystem and is available from all login and computational nodes. Default stripe size is 1MB, stripe count is 1. There are 5 OSTs dedicated for the SCRATCH filesystem.
!!! note
Setting stripe size and stripe count correctly for your needs may significantly affect the I/O performance.
| SCRATCH filesystem | |
| -------------------- | --------- |
| Mountpoint | /scratch |
| Capacity | 310TB |
| Throughput | 5GB/s |
| Throughput [Burst] | 38GB/s |
| User space quota | 10TB |
| User inodes quota | 10M |
| Default stripe size | 1MB |
| Default stripe count | 1 |
| Number of OSTs | 5 |
### PROJECT File System
The PROJECT data storage is a central storage for projects'/users' data on IT4Innovations that is accessible from all clusters.
For more information, see the [PROJECT storage][6] section.
### Disk Usage and Quota Commands
Disk usage and user quotas can be checked and reviewed using the `it4ifsusage` command. You can see an example output [here][9].
To have a better understanding of where the space is exactly used, you can use following command:
```console
$ du -hs dir
```
Example for your HOME directory:
```console
$ cd /home
$ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr
258M cuda-samples
15M .cache
13M .mozilla
5,5M .eclipse
2,7M .idb_13.0_linux_intel64_app
```
This will list all directories with MegaBytes or GigaBytes of consumed space in your actual (in this example HOME) directory. List is sorted in descending order from largest to smallest files/directories.
### Extended ACLs
Extended ACLs provide another security mechanism beside the standard POSIX ACLs, which are defined by three entries (for owner/group/others). Extended ACLs have more than the three basic entries. In addition, they also contain a mask entry and may contain any number of named user and named group entries.
ACLs on a Lustre file system work exactly like ACLs on any Linux file system. They are manipulated with the standard tools in the standard manner.
For more information, see the [Access Control List][7] section of the documentation.
## Local Filesystems
### TMP
Each node is equipped with local /tmp RAMDISK directory. The /tmp directory should be used to work with temporary files. Old files in /tmp directory are automatically purged.
### SCRATCH and RAMDISK
Each node is equipped with RAMDISK storage accessible at /tmp, /lscratch and /ramdisk. The RAMDISK capacity is 180GB. Data placed on RAMDISK occupies the node RAM memory of 192GB total. The RAMDISK directory should only be used to work with temporary files, where very high throughput or I/O performance is required. Old files in RAMDISK directory are automatically purged with job's end.
#### Global RAM Disk
The Global RAM disk spans the local RAM disks of all the allocated nodes within a single job.
For more information, see the [Job Features][8] section.
## Summary
| Mountpoint | Usage | Protocol | Net Capacity | Throughput | Limitations | Access | Services |
| ---------- | ------------------------- | -------- | -------------- | ------------------------------ | ----------- | ----------------------- | ------------------------------- |
| /home | home directory | NFS | 28TB | 1GB/s | Quota 25GB | Compute and login nodes | backed up |
| /scratch | scratch temoporary | Lustre | 310TB | 5GB/s, 30GB/s burst buffer | Quota 10TB | Compute and login nodes |files older 90 days autoremoved |
| /lscratch | local scratch ramdisk | tmpfs | 180GB | 130GB/s | none | Node local | auto purged after job end |
[1]: #home-file-system
[2]: #scratch-file-system
[3]: ../storage/cesnet-storage.md
[4]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[5]: #project-file-system
[6]: ../storage/project-storage.md
[7]: ../storage/standard-file-acl.md
[8]: ../job-features.md#global-ram-disk
[9]: ../storage/project-storage.md#project-quotas
[a]: http://www.nas.nasa.gov
[b]: http://www.nas.nasa.gov/hecc/support/kb/Lustre_Basics_224.html#striping
[c]: http://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace
[d]: https://support.it4i.cz/rt
[e]: http://man7.org/linux/man-pages/man1/nfs4_setfacl.1.html
---
title: "Visualization Servers"
---
Remote visualization with [VirtualGL][3] is available on two nodes.
* 2 nodes
* 32 cores in total
* 2x Intel Skylake Gold 6130 – 16-core@2,1 GHz processors per node
* 192 GB DDR4 2667 MT/s of physical memory per node (12x 16 GB)
* BullSequana X450-E5 blade servers
* 2150.4 GFLOP/s per compute node
* 1x 1 GB Ethernet and 2x 10 GB Ethernet
* 1x HDR100 IB port
* 2x SSD 240 GB
![](/it4i/barbora/img/bullsequanaX450-E5.png)
## NVIDIA Quadro P6000
* GPU Memory: 24 GB GDDR5X
* Memory Interface: 384-bit
* Memory Bandwidth: Up to 432 GB/s
* NVIDIA CUDA® Cores: 3840
* System Interface: PCI Express 3.0 x16
* Max Power Consumption: 250 W
* Thermal Solution: Active
* Form Factor: 4.4”H x 10.5” L, Dual Slot, Full Height
* Display Connectors: 4x DP 1.4 + DVI-D DL
* Max Simultaneous Displays: 4 direct, 4 DP1.4 Multi-Stream
* Max DP 1.4 Resolution: 7680 x 4320 @ 30 Hz
* Max DVI-D DL Resolution: 2560 x 1600 @ 60 Hz
* Graphics APIs: Shader Model 5.1, OpenGL 4.5, DirectX 12.0, Vulkan 1.0,
* Compute APIs: CUDA, DirectCompute, OpenCL™
* Floating-Point Performance-Single Precision: 12.6 TFLOP/s, Peak
![](/it4i/barbora/img/quadrop6000.jpg)
## Resource Allocation Policy
| queue | active project | project resources | nodes | min ncpus | priority | authorization | walltime |
|-------|----------------|-------------------|-------|-----------|----------|---------------|----------|
| qviz Visualization queue | yes | none required | 2 | 4 | 150 | no | 1h/8h |
## References
* [Graphical User Interface][1]
* [VPN Access][2]
[1]: ../general/shell-and-data-access.md#graphical-user-interface
[2]: ../general/shell-and-data-access.md#vpn-access
[3]: ../software/viz/vgl.md
---
title: "e-INFRA CZ Cloud Ostrava"
---
Ostrava cloud consists of 22 nodes from the [Karolina][a] supercomputer.
The cloud site is built on top of OpenStack,
which is a free open standard cloud computing platform.
## Access
To acces the cloud you must:
* have an [e-Infra CZ account][3],
* be a member of an [active project][b].
The dashboard is available at [https://ostrava.openstack.cloud.e-infra.cz/][6].
You can specify resources/quotas for your project.
For more information, see the [Quota Limits][5] section.
## Creating First Instance
To create your first VM instance, follow the [e-INFRA CZ guide][4].
Note that the guide is similar for clouds in Brno and Ostrava,
so make sure that you follow steps for Ostrava cloud where applicable.
### Process Automatization
You can automate the process using Terraform or Openstack.
#### Terraform
Prerequisites:
* Linux/Mac/WSL terminal BASH shell
* installed Terraform and sshuttle
* downloaded [application credentials][9] from OpenStack Horizon dashboard and saved as a `project_openrc.sh.inc` text file
Follow the guide: [https://code.it4i.cz/terraform][8]
#### OpenStack
Prerequisites:
* Linux/Mac/WSL terminal BASH shell
* installed [OpenStack client][7]
Follow the guide: [https://code.it4i.cz/commandline][10]
Run commands:
```console
source project_openrc.sh.inc
```
```console
./cmdline-demo.sh basic-infrastructure-1
```
## Technical Reference
For the list of deployed OpenStack services, see the [list of components][1].
More information can be found on the [e-INFRA CZ website][2].
[1]: https://docs.e-infra.cz/compute/openstack/technical-reference/ostrava-site/openstack-components/
[2]: https://docs.e-infra.cz/compute/openstack/technical-reference/ostrava-site/
[3]: https://docs.e-infra.cz/account/
[4]: https://docs.e-infra.cz/compute/openstack/getting-started/creating-first-infrastructure/
[5]: https://docs.e-infra.cz/compute/openstack/technical-reference/ostrava-g2-site/quota-limits/
[6]: https://ostrava.openstack.cloud.e-infra.cz/
[7]: https://docs.fuga.cloud/how-to-use-the-openstack-cli-tools-on-linux
[8]: https://code.it4i.cz/dvo0012/infrastructure-by-script/-/tree/main/openstack-infrastructure-as-code-automation/clouds/g2/ostrava/general/terraform
[9]: https://docs.e-infra.cz/compute/openstack/how-to-guides/obtaining-api-key/
[10]: https://code.it4i.cz/dvo0012/infrastructure-by-script/-/tree/main/openstack-infrastructure-as-code-automation/clouds/g2/ostrava/general/commandline
[a]: ../karolina/introduction.md
[b]: ../general/access/project-access.md
---
title: "IT4I Cloud"
---
IT4I cloud consists of 14 nodes from the [Karolina][a] supercomputer.
The cloud site is built on top of OpenStack,
which is a free open standard cloud computing platform.
!!! Note
The guide describes steps for personal projects.<br>
Some steps may differ for large projects.<br>
For large project, apply for resources to the [Allocation Committee][11].
## Access
To access the cloud you must be a member of an active EUROHPC project,
or fall into the **Access Category B**, i.e. [Access For Thematic HPC Resource Utilisation][11].
A personal OpenStack project is required. Request one by contacting [IT4I Support][12].
The dashboard is available at [https://cloud.it4i.cz][6].
You can see quotas set for the IT4I Cloud in the [Quota Limits][f] section.
## Creating First Instance
To create your first VM instance, follow the steps below:
### Log In
Go to [https://cloud.it4i.cz][6], enter your LDAP username and password and choose the `IT4I_LDAP` domain. After you sign in, you will be redirected to the dashboard.
![](/it4i/img/login.png)
### Create Key Pair
SSH key is required for remote access to your instance.
1. Go to **Project > Compute > Key Pairs** and click the **Create Key Pair** button.
![](/it4i/img/keypairs.png)
1. In the Create Key Pair window, name your key pair, select `SSH Key` for key type and confirm by clicking Create Key Pair.
![](/it4i/img/keypairs1.png)
1. Download and manage the private key according to your operating system.
### Update Security Group
To be able to remotely access your VM instance, you have to allow access in the security group.
1. Go to **Project > Network > Security Groups** and click on **Manage Rules**, for the default security group.
![](/it4i/img/securityg.png)
1. Click on **Add Rule**, choose **SSH**, and leave the remaining fields unchanged.
![](/it4i/img/securityg1.png)
### Create VM Instance
1. In **Compute > Instances**, click **Launch Instance**.
![](/it4i/img/instance.png)
1. Choose Instance Name, Description, and number of instances. Click **Next**.
![](/it4i/img/instance1.png)
1. Choose an image from which to boot the instance. Choose to delete the volume after instance delete. Click **Next**.
![](/it4i/img/instance2.png)
1. Choose the hardware resources of the instance by selecting a flavor. Additional volumes for data can be attached later on. Click **Next**.
![](/it4i/img/instance3.png)
1. Select the network and continue to **Security Groups**.
![](/it4i/img/instance4.png)
1. Allocate the security group with SSH rule that you added in the [Update Security Group](it4i-cloud.md#update-security-group) step. Then click **Next** to go to the **Key Pair**.
![](/it4i/img/securityg2.png)
1. Select the key that you created in the [Create Key Pair][g] section and launch the instance.
![](/it4i/img/instance5.png)
### Associate Floating IP
1. Click on the **Associate** button next to the floating IP.
![](/it4i/img/floatingip.png)
1. Select Port to be associated with the instance, then click the **Associate** button.
Now you can join the VM using your preferred SSH client.
## Process Automatization
You can automate the process using Openstack.
### OpenStack
Prerequisites:
* Linux/Mac/WSL terminal BASH shell
* installed [OpenStack client][7]
Follow the guide: [https://code.it4i.cz/commandline][10]
Run commands:
```console
source project_openrc.sh.inc
```
```console
./cmdline-demo.sh basic-infrastructure-1
```
[1]: https://docs.e-infra.cz/compute/openstack/technical-reference/ostrava-site/openstack-components/
[2]: https://docs.e-infra.cz/compute/openstack/technical-reference/ostrava-site/
[3]: https://docs.e-infra.cz/account/
[4]: https://docs.e-infra.cz/compute/openstack/getting-started/creating-first-infrastructure/
[5]: https://docs.e-infra.cz/compute/openstack/technical-reference/ostrava-g2-site/quota-limits/
[6]: https://cloud.it4i.cz
[7]: https://docs.fuga.cloud/how-to-use-the-openstack-cli-tools-on-linux
[8]: https://code.it4i.cz/dvo0012/infrastructure-by-script/-/tree/main/openstack-infrastructure-as-code-automation/clouds/g2/ostrava/general/terraform
[9]: https://docs.e-infra.cz/compute/openstack/how-to-guides/obtaining-api-key/
[10]: https://code.it4i.cz/dvo0012/infrastructure-by-script/-/tree/main/openstack-infrastructure-as-code-automation/clouds/g2/ostrava/general/commandline
[11]: https://www.it4i.cz/en/for-users/computing-resources-allocation
[12]: mailto:support@it4i.cz @@
[a]: ../karolina/introduction.md
[b]: ../general/access/project-access.md
[c]: einfracz-cloud.md
[d]: ../general/accessing-the-clusters/vpn-access.md
[e]: ../general/obtaining-login-credentials/obtaining-login-credentials.md
[f]: it4i-quotas.md
[g]: it4i-cloud.md#create-key-pair
---
title: "IT4I Cloud Quotas"
---
| Resource | Quota |
|---------------------------------------|-------|
| Instances | 10 |
| VCPUs | 20 |
| RAM | 32GB |
| Volumes | 20 |
| Volume Snapshots | 12 |
| Volume Storage | 500 |
| Floating-IPs | 1 |
| Security Groups | 10 |
| Security Group Rules | 100 |
| Networks | 1 |
| Ports | 10 |
| Routers | 1 |
| Backups | 12 |
| Groups | 10 |
| rbac_policies | 10 |
| Subnets | 1 |
| Subnet_pools | -1 |
| Fixed-ips | -1 |
| Injected-file-size | 10240 |
| Injected-path-size | 255 |
| Injected-files | 5 |
| Key-pairs | 100 |
| Properties | 128 |
| Server-groups | 10 |
| Server-group-members | 10 |
| Backup-gigabytes | 1002 |
| Per-volume-gigabytes | -1 |
---
title: "Accessing Complementary Systems"
---
Complementary systems can be accessed at `login.cs.it4i.cz`
by any user with an active account assigned to an active project.
**SSH is required** to access Complementary systems.
## Data Storage
### Home
The `/home` file system is shared across all Complementary systems. Note that this file system is **not** shared with the file system on IT4I clusters.
### Scratch
There are local `/lscratch` storages on individual nodes.
### PROJECT
Complementary systems are connected to the [PROJECT storage][1].
[1]: ../storage/project-storage.md
---
title: "Using AMD Partition"
---
For testing your application on the AMD partition,
you need to prepare a job script for that partition or use the interactive job:
```console
salloc -N 1 -c 64 -A PROJECT-ID -p p03-amd --gres=gpu:4 --time=08:00:00
```
where:
- `-N 1` means allocating one server,
- `-c 64` means allocating 64 cores,
- `-A` is your project,
- `-p p03-amd` is AMD partition,
- `--gres=gpu:4` means allocating all 4 GPUs of the node,
- `--time=08:00:00` means allocation for 8 hours.
You have also an option to allocate subset of the resources only,
by reducing the `-c` and `--gres=gpu` to smaller values.
```console
salloc -N 1 -c 48 -A PROJECT-ID -p p03-amd --gres=gpu:3 --time=08:00:00
salloc -N 1 -c 32 -A PROJECT-ID -p p03-amd --gres=gpu:2 --time=08:00:00
salloc -N 1 -c 16 -A PROJECT-ID -p p03-amd --gres=gpu:1 --time=08:00:00
```
!!! Note
p03-amd01 server has hyperthreading **enabled** therefore htop shows 128 cores.<br>
p03-amd02 server has hyperthreading **disabled** therefore htop shows 64 cores.
## Using AMD MI100 GPUs
The AMD GPUs can be programmed using the [ROCm open-source platform](https://docs.amd.com/).
ROCm and related libraries are installed directly in the system.
You can find it here:
```console
/opt/rocm/
```
The actual version can be found here:
```console
[user@p03-amd02.cs]$ cat /opt/rocm/.info/version
5.5.1-74
```
## Basic HIP Code
The first way how to program AMD GPUs is to use HIP.
The basic vector addition code in HIP looks like this.
This a full code and you can copy and paste it into a file.
For this example we use `vector_add.hip.cpp`.
```console
#include <cstdio>
#include <hip/hip_runtime.h>
__global__ void add_vectors(float * x, float * y, float alpha, int count)
{
long long idx = blockIdx.x * blockDim.x + threadIdx.x;
if(idx < count)
y[idx] += alpha * x[idx];
}
int main()
{
// number of elements in the vectors
long long count = 10;
// allocation and initialization of data on the host (CPU memory)
float * h_x = new float[count];
float * h_y = new float[count];
for(long long i = 0; i < count; i++)
{
h_x[i] = i;
h_y[i] = 10 * i;
}
// print the input data
printf("X:");
for(long long i = 0; i < count; i++)
printf(" %7.2f", h_x[i]);
printf("\n");
printf("Y:");
for(long long i = 0; i < count; i++)
printf(" %7.2f", h_y[i]);
printf("\n");
// allocation of memory on the GPU device
float * d_x;
float * d_y;
hipMalloc(&d_x, count * sizeof(float));
hipMalloc(&d_y, count * sizeof(float));
// copy the data from host memory to the device
hipMemcpy(d_x, h_x, count * sizeof(float), hipMemcpyHostToDevice);
hipMemcpy(d_y, h_y, count * sizeof(float), hipMemcpyHostToDevice);
int tpb = 256;
int bpg = (count - 1) / tpb + 1;
// launch the kernel on the GPU
add_vectors<<< bpg, tpb >>>(d_x, d_y, 100, count);
// hipLaunchKernelGGL(add_vectors, bpg, tpb, 0, 0, d_x, d_y, 100, count);
// copy the result back to CPU memory
hipMemcpy(h_y, d_y, count * sizeof(float), hipMemcpyDeviceToHost);
// print the results
printf("Y:");
for(long long i = 0; i < count; i++)
printf(" %7.2f", h_y[i]);
printf("\n");
// free the allocated memory
hipFree(d_x);
hipFree(d_y);
delete[] h_x;
delete[] h_y;
return 0;
}
```
To compile the code we use `hipcc` compiler.
For compiler information, use `hipcc --version`:
```console
[user@p03-amd02.cs ~]$ hipcc --version
HIP version: 5.5.30202-eaf00c0b
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.5.1 23194 69ef12a7c3cc5b0ccf820bc007bd87e8b3ac3037)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.5.1/llvm/bin
```
The code is compiled a follows:
```console
hipcc vector_add.hip.cpp -o vector_add.x
```
The correct output of the code is:
```console
[user@p03-amd02.cs ~]$ ./vector_add.x
X: 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
Y: 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00
Y: 0.00 110.00 220.00 330.00 440.00 550.00 660.00 770.00 880.00 990.00
```
More details on HIP programming is in the [HIP Programming Guide](https://docs.amd.com/bundle/HIP-Programming-Guide-v5.5/page/Introduction_to_HIP_Programming_Guide.html)
## HIP and ROCm Libraries
The list of official AMD libraries can be found [here](https://docs.amd.com/category/libraries).
The libraries are installed in the same directory is ROCm
```console
/opt/rocm/
```
Following libraries are installed:
```console
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipblas
drwxr-xr-x 3 root root 17 Jun 7 14:09 hipblas-clients
drwxr-xr-x 3 root root 29 Jun 7 14:09 hipcub
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipfft
drwxr-xr-x 3 root root 25 Jun 7 14:09 hipfort
drwxr-xr-x 4 root root 32 Jun 7 14:09 hiprand
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipsolver
drwxr-xr-x 4 root root 44 Jun 7 14:09 hipsparse
```
and
```console
drwxr-xr-x 4 root root 32 Jun 7 14:09 rocalution
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocblas
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocfft
drwxr-xr-x 4 root root 32 Jun 7 14:09 rocprim
drwxr-xr-x 4 root root 32 Jun 7 14:09 rocrand
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocsolver
drwxr-xr-x 4 root root 44 Jun 7 14:09 rocsparse
drwxr-xr-x 3 root root 29 Jun 7 14:09 rocthrust
```
## Using HipBlas Library
The basic code in HIP that uses hipBlas looks like this.
This a full code and you can copy and paste it into a file.
For this example we use `hipblas.hip.cpp`.
```console
#include <cstdio>
#include <vector>
#include <cstdlib>
#include <hip/hip_runtime.h>
#include <hipblas/hipblas.h>
int main()
{
srand(9600);
int width = 10;
int height = 7;
int elem_count = width * height;
// initialization of data in CPU memory
float * h_A;
hipHostMalloc(&h_A, elem_count * sizeof(*h_A));
for(int i = 0; i < elem_count; i++)
h_A[i] = (100.0f * rand()) / (float)RAND_MAX;
printf("Matrix A:\n");
for(int r = 0; r < height; r++)
{
for(int c = 0; c < width; c++)
printf("%6.3f ", h_A[r + height * c]);
printf("\n");
}
float * h_x;
hipHostMalloc(&h_x, width * sizeof(*h_x));
for(int i = 0; i < width; i++)
h_x[i] = (100.0f * rand()) / (float)RAND_MAX;
printf("vector x:\n");
for(int i = 0; i < width; i++)
printf("%6.3f ", h_x[i]);
printf("\n");
float * h_y;
hipHostMalloc(&h_y, height * sizeof(*h_y));
for(int i = 0; i < height; i++)
h_x[i] = 100.0f + i;
printf("vector y:\n");
for(int i = 0; i < height; i++)
printf("%6.3f ", h_x[i]);
printf("\n");
// initialization of data in GPU memory
float * d_A;
size_t pitch_A;
hipMallocPitch((void**)&d_A, &pitch_A, height * sizeof(*d_A), width);
hipMemcpy2D(d_A, pitch_A, h_A, height * sizeof(*d_A), height * sizeof(*d_A), width, hipMemcpyHostToDevice);
int lda = pitch_A / sizeof(float);
float * d_x;
hipMalloc(&d_x, width * sizeof(*d_x));
hipMemcpy(d_x, h_x, width * sizeof(*d_x), hipMemcpyHostToDevice);
float * d_y;
hipMalloc(&d_y, height * sizeof(*d_y));
hipMemcpy(d_y, h_y, height * sizeof(*d_y), hipMemcpyHostToDevice);
// basic calculation of the result on the CPU
float alpha=2.0f, beta=10.0f;
for(int i = 0; i < height; i++)
h_y[i] *= beta;
for(int r = 0; r < height; r++)
for(int c = 0; c < width; c++)
h_y[r] += alpha * h_x[c] * h_A[r + height * c];
printf("result y CPU:\n");
for(int i = 0; i < height; i++)
printf("%6.3f ", h_y[i]);
printf("\n");
// calculation of the result on the GPU using the hipBLAS library
hipblasHandle_t blas_handle;
hipblasCreate(&blas_handle);
hipblasSgemv(blas_handle, HIPBLAS_OP_N, height, width, &alpha, d_A, lda, d_x, 1, &beta, d_y, 1);
hipDeviceSynchronize();
hipblasDestroy(blas_handle);
// copy the GPU result to CPU memory and print it
hipMemcpy(h_y, d_y, height * sizeof(*d_y), hipMemcpyDeviceToHost);
printf("result y BLAS:\n");
for(int i = 0; i < height; i++)
printf("%6.3f ", h_y[i]);
printf("\n");
// free all the allocated memory
hipFree(d_A);
hipFree(d_x);
hipFree(d_y);
hipHostFree(h_A);
hipHostFree(h_x);
hipHostFree(h_y);
return 0;
}
```
The code compilation can be done as follows:
```console
hipcc hipblas.hip.cpp -o hipblas.x -lhipblas
```
## Using HipSolver Library
The basic code in HIP that uses hipSolver looks like this.
This a full code and you can copy and paste it into a file.
For this example we use `hipsolver.hip.cpp`.
```console
#include <cstdio>
#include <vector>
#include <cstdlib>
#include <algorithm>
#include <hipsolver/hipsolver.h>
#include <hipblas/hipblas.h>
int main()
{
srand(63456);
int size = 10;
// allocation and initialization of data on host. this time we use std::vector
int h_A_ld = size;
int h_A_pitch = h_A_ld * sizeof(float);
std::vector<float> h_A(size * h_A_ld);
for(int r = 0; r < size; r++)
for(int c = 0; c < size; c++)
h_A[r * h_A_ld + c] = (10.0 * rand()) / RAND_MAX;
printf("System matrix A:\n");
for(int r = 0; r < size; r++)
{
for(int c = 0; c < size; c++)
printf("%6.3f ", h_A[r * h_A_ld + c]);
printf("\n");
}
std::vector<float> h_b(size);
for(int i = 0; i < size; i++)
h_b[i] = (10.0 * rand()) / RAND_MAX;
printf("RHS vector b:\n");
for(int i = 0; i < size; i++)
printf("%6.3f ", h_b[i]);
printf("\n");
std::vector<float> h_x(size);
// memory allocation on the device and initialization
float * d_A;
size_t d_A_pitch;
hipMallocPitch((void**)&d_A, &d_A_pitch, size, size);
int d_A_ld = d_A_pitch / sizeof(float);
float * d_b;
hipMalloc(&d_b, size * sizeof(float));
float * d_x;
hipMalloc(&d_x, size * sizeof(float));
int * d_piv;
hipMalloc(&d_piv, size * sizeof(int));
int * info;
hipMallocManaged(&info, sizeof(int));
hipMemcpy2D(d_A, d_A_pitch, h_A.data(), h_A_pitch, size * sizeof(float), size, hipMemcpyHostToDevice);
hipMemcpy(d_b, h_b.data(), size * sizeof(float), hipMemcpyHostToDevice);
// solving the system using hipSOLVER
hipsolverHandle_t solverHandle;
hipsolverCreate(&solverHandle);
int wss_trf, wss_trs; // wss = WorkSpace Size
hipsolverSgetrf_bufferSize(solverHandle, size, size, d_A, d_A_ld, &wss_trf);
hipsolverSgetrs_bufferSize(solverHandle, HIPSOLVER_OP_N, size, 1, d_A, d_A_ld, d_piv, d_b, size, &wss_trs);
float * workspace;
int wss = std::max(wss_trf, wss_trs);
hipMalloc(&workspace, wss * sizeof(float));
hipsolverSgetrf(solverHandle, size, size, d_A, d_A_ld, workspace, wss, d_piv, info);
hipsolverSgetrs(solverHandle, HIPSOLVER_OP_N, size, 1, d_A, d_A_ld, d_piv, d_b, size, workspace, wss, info);
hipMemcpy(d_x, d_b, size * sizeof(float), hipMemcpyDeviceToDevice);
hipMemcpy(h_x.data(), d_x, size * sizeof(float), hipMemcpyDeviceToHost);
printf("Solution vector x:\n");
for(int i = 0; i < size; i++)
printf("%6.3f ", h_x[i]);
printf("\n");
hipFree(workspace);
hipsolverDestroy(solverHandle);
// perform matrix-vector multiplication A*x using hipBLAS to check if the solution is correct
hipblasHandle_t blasHandle;
hipblasCreate(&blasHandle);
float alpha = 1;
float beta = 0;
hipMemcpy2D(d_A, d_A_pitch, h_A.data(), h_A_pitch, size * sizeof(float), size, hipMemcpyHostToDevice);
hipblasSgemv(blasHandle, HIPBLAS_OP_N, size, size, &alpha, d_A, d_A_ld, d_x, 1, &beta, d_b, 1);
hipDeviceSynchronize();
hipblasDestroy(blasHandle);
for(int i = 0; i < size; i++)
h_b[i] = 0;
hipMemcpy(h_b.data(), d_b, size * sizeof(float), hipMemcpyDeviceToHost);
printf("Check multiplication vector Ax:\n");
for(int i = 0; i < size; i++)
printf("%6.3f ", h_b[i]);
printf("\n");
// free all the allocated memory
hipFree(info);
hipFree(d_piv);
hipFree(d_x);
hipFree(d_b);
hipFree(d_A);
return 0;
}
```
The code compilation can be done as follows:
```console
hipcc hipsolver.hip.cpp -o hipsolver.x -lhipblas -lhipsolver
```
## Using OpenMP Offload to Program AMD GPUs
The ROCm™ installation includes an LLVM-based implementation that fully supports the OpenMP 4.5 standard
and a subset of the OpenMP 5.0 standard.
Fortran, C/C++ compilers, and corresponding runtime libraries are included.
The OpenMP toolchain is automatically installed as part of the standard ROCm installation
and is available under `/opt/rocm/llvm`. The sub-directories are:
- `bin` : Compilers (flang and clang) and other binaries.
- `examples` : The usage section below shows how to compile and run these programs.
- `include` : Header files.
- `lib` : Libraries including those required for target offload.
- `lib-debug` : Debug versions of the above libraries.
More information can be found in the [AMD OpenMP Support Guide](https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.5/page/Introduction_to_OpenMP_Support_Guide.html).
## Compilation of OpenMP Code
Basic example that uses OpenMP offload is here.
Again, code is complete and can be copied and pasted into a file.
Here we use `vadd.cpp`.
```console
#include <cstdio>
#include <cstdlib>
int main(int argc, char ** argv)
{
long long count = 1 << 20;
if(argc > 1)
count = atoll(argv[1]);
long long print_count = 16;
if(argc > 2)
print_count = atoll(argv[2]);
long long * a = new long long[count];
long long * b = new long long[count];
long long * c = new long long[count];
#pragma omp parallel for
for(long long i = 0; i < count; i++)
{
a[i] = i;
b[i] = 10 * i;
}
printf("A: ");
for(long long i = 0; i < print_count; i++)
printf("%3lld ", a[i]);
printf("\n");
printf("B: ");
for(long long i = 0; i < print_count; i++)
printf("%3lld ", b[i]);
printf("\n");
#pragma omp target map(to: a[0:count],b[0:count]) map(from: c[0:count])
#pragma omp teams distribute parallel for
for(long long i = 0; i < count; i++)
{
c[i] = a[i] + b[i];
}
printf("C: ");
for(long long i = 0; i < print_count; i++)
printf("%3lld ", c[i]);
printf("\n");
delete[] a;
delete[] b;
delete[] c;
return 0;
}
```
This code can be compiled like this:
```console
/opt/rocm/llvm/bin/clang++ -O3 -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 vadd.cpp -o vadd.x
```
These options are required for target offload from an OpenMP program:
- `-target x86_64-pc-linux-gnu`
- `-fopenmp`
- `-fopenmp-targets=amdgcn-amd-amdhsa`
- `-Xopenmp-target=amdgcn-amd-amdhsa`
This flag specifies the GPU architecture of targeted GPU.
You need to chage this when moving for instance to LUMI with MI250X GPU.
The MI100 GPUs presented in CS have code `gfx908`:
- `-march=gfx908`
Note: You also have to include the `O0`, `O2`, `O3` or `O3` flag.
Without this flag the execution of the compiled code fails.
---
title: "Using ARM Partition"
---
For testing your application on the ARM partition,
you need to prepare a job script for that partition or use the interactive job:
```
salloc -A PROJECT-ID -p p01-arm
```
On the partition, you should reload the list of modules:
```
ml architecture/aarch64
```
For compilation, `gcc` and `OpenMPI` compilers are available.
Hence, the compilation process should be the same as on the `x64` architecture.
Let's have the following `hello world` example:
```
#include "mpi.h"
#include "omp.h"
int main(int argc, char **argv)
{
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
#pragma omp parallel
{
printf("Hello on rank %d, thread %d\n", rank, omp_get_thread_num());
}
MPI_Finalize();
}
```
You can compile and run the example:
```
ml OpenMPI/4.1.4-GCC-11.3.0
mpic++ -fopenmp hello.cpp -o hello
mpirun -n 4 ./hello
```
Please see [gcc options](https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html) for more advanced compilation settings.
No complications are expected as long as the application does not use any intrinsic for `x64` architecture.
If you want to use intrinsic,
[SVE](https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics) instruction set is available.
---
title: "Using NVIDIA Grace Partition"
---
For testing your application on the NVIDIA Grace Partition,
you need to prepare a job script for that partition or use the interactive job:
```console
salloc -N 1 -c 144 -A PROJECT-ID -p p11-grace --time=08:00:00
```
where:
- `-N 1` means allocation single node,
- `-c 144` means allocation 144 cores,
- `-p p11-grace` is NVIDIA Grace partition,
- `--time=08:00:00` means allocation for 8 hours.
## Available Toolchains
The platform offers three toolchains:
- Standard GCC (as a module `ml GCC`)
- [NVHPC](https://developer.nvidia.com/hpc-sdk) (as a module `ml NVHPC`)
- [Clang for NVIDIA Grace](https://developer.nvidia.com/grace/clang) (installed in `/opt/nvidia/clang`)
!!! note
The NVHPC toolchain showed strong results with minimal amount of tuning necessary in our initial evaluation.
### GCC Toolchain
The GCC compiler seems to struggle with vectorization of short (constant length) loops, which tend to get completely unrolled/eliminated instead of being vectorized. For example simple nested loop such as
```cpp
for(int i = 0; i < 1000000; ++i) {
// Iterations dependent in "i"
// ...
for(int j = 0; j < 8; ++j) {
// but independent in "j"
// ...
}
}
```
may emit scalar code for the inner loop leading to no vectorization being used at all.
### Clang (For Grace) Toolchain
The Clang/LLVM tends to behave similarly, but can be guided to properly vectorize the inner loop with either flags `-O3 -ffast-math -march=native -fno-unroll-loops -mllvm -force-vector-width=8` or pragmas such as `#pragma clang loop vectorize_width(8)` and `#pragma clang loop unroll(disable)`.
```cpp
for(int i = 0; i < 1000000; ++i) {
// Iterations dependent in "i"
// ...
#pragma clang loop unroll(disable) vectorize_width(8)
for(int j = 0; j < 8; ++j) {
// but independent in "j"
// ...
}
}
```
!!! note
Our basic experiments show that fixed width vectorization (NEON) tends to perform better in the case of short (register-length) loops than SVE. In cases (like above), where specified `vectorize_width` is larger than availiable vector unit width, Clang will emit multiple NEON instructions (eg. 4 instructions will be emitted to process 8 64-bit operations in 128-bit units of Grace).
### NVHPC Toolchain
The NVHPC toolchain handled aforementioned case without any additional tuning. Simple `-O3 -march=native -fast` should be therefore sufficient.
## Basic Math Libraries
The basic libraries (BLAS and LAPACK) are included in NVHPC toolchain and can be used simply as `-lblas` and `-llapack` for BLAS and LAPACK respectively (`lp64` and `ilp64` versions are also included).
!!! note
The Grace platform doesn't include CUDA-capable GPU, therefore `nvcc` will fail with an error. This means that `nvc`, `nvc++` and `nvfortran` should be used instead.
### NVIDIA Performance Libraries
The [NVPL](https://developer.nvidia.com/nvpl) package includes more extensive set of libraries in both sequential and multi-threaded versions:
- BLACS: `-lnvpl_blacs_{lp64,ilp64}_{mpich,openmpi3,openmpi4,openmpi5}`
- BLAS: `-lnvpl_blas_{lp64,ilp64}_{seq,gomp}`
- FFTW: `-lnvpl_fftw`
- LAPACK: `-lnvpl_lapack_{lp64,ilp64}_{seq,gomp}`
- ScaLAPACK: `-lnvpl_scalapack_{lp64,ilp64}`
- RAND: `-lnvpl_rand` or `-lnvpl_rand_mt`
- SPARSE: `-lnvpl_sparse`
This package should be compatible with all availiable toolchains and includes CMake module files for easy integration into CMake-based projects. For further documentation see also [NVPL](https://docs.nvidia.com/nvpl).
### Recommended BLAS Library
We recommend to use the multi-threaded BLAS library from the NVPL package.
!!! note
It is important to pin the processes using **OMP_PROC_BIND=spread**
Example:
```console
$ ml NVHPC
$ nvc -O3 -march=native myprog.c -o myprog -lnvpl_blas_lp64_gomp
$ OMP_PROC_BIND=spread ./myprog
```
## Basic Communication Libraries
The OpenMPI 4 implementation is included with NVHPC toolchain and is exposed as a module (`ml OpenMPI`). The following example
```cpp
#include <mpi.h>
#include <sched.h>
#include <omp.h>
int main(int argc, char **argv)
{
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
#pragma omp parallel
{
printf("Hello on rank %d, thread %d on CPU %d\n", rank, omp_get_thread_num(), sched_getcpu());
}
MPI_Finalize();
}
```
can be compiled and run as follows
```console
ml OpenMPI
mpic++ -fast -fopenmp hello.cpp -o hello
OMP_PROC_BIND=close OMP_NUM_THREADS=4 mpirun -np 4 --map-by slot:pe=36 ./hello
```
In this configuration we run 4 ranks bound to one quarter of cores each with 4 OpenMP threads.
## Simple BLAS Application
The `hello world` example application (written in `C++` and `Fortran`) uses simple stationary probability vector estimation to illustrate use of GEMM (BLAS 3 routine).
Stationary probability vector estimation in `C++`:
```cpp
#include <iostream>
#include <vector>
#include <chrono>
#include "cblas.h"
const size_t ITERATIONS = 32;
const size_t MATRIX_SIZE = 1024;
int main(int argc, char *argv[])
{
const size_t matrixElements = MATRIX_SIZE*MATRIX_SIZE;
std::vector<float> a(matrixElements, 1.0f / float(MATRIX_SIZE));
for(size_t i = 0; i < MATRIX_SIZE; ++i)
a[i] = 0.5f / (float(MATRIX_SIZE) - 1.0f);
a[0] = 0.5f;
std::vector<float> w1(matrixElements, 0.0f);
std::vector<float> w2(matrixElements, 0.0f);
std::copy(a.begin(), a.end(), w1.begin());
std::vector<float> *t1, *t2;
t1 = &w1;
t2 = &w2;
auto c1 = std::chrono::steady_clock::now();
for(size_t i = 0; i < ITERATIONS; ++i)
{
std::fill(t2->begin(), t2->end(), 0.0f);
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, MATRIX_SIZE, MATRIX_SIZE, MATRIX_SIZE,
1.0f, t1->data(), MATRIX_SIZE,
a.data(), MATRIX_SIZE,
1.0f, t2->data(), MATRIX_SIZE);
std::swap(t1, t2);
}
auto c2 = std::chrono::steady_clock::now();
for(size_t i = 0; i < MATRIX_SIZE; ++i)
{
std::cout << (*t1)[i*MATRIX_SIZE + i] << " ";
}
std::cout << std::endl;
std::cout << "Elapsed Time: " << std::chrono::duration<double>(c2 - c1).count() << std::endl;
return 0;
}
```
Stationary probability vector estimation in `Fortran`:
```fortran
program main
implicit none
integer :: matrix_size, iterations
integer :: i
real, allocatable, target :: a(:,:), w1(:,:), w2(:,:)
real, dimension(:,:), contiguous, pointer :: t1, t2, tmp
real, pointer :: out_data(:), out_diag(:)
integer :: cr, cm, c1, c2
iterations = 32
matrix_size = 1024
call system_clock(count_rate=cr)
call system_clock(count_max=cm)
allocate(a(matrix_size, matrix_size))
allocate(w1(matrix_size, matrix_size))
allocate(w2(matrix_size, matrix_size))
a(:,:) = 1.0 / real(matrix_size)
a(:,1) = 0.5 / real(matrix_size - 1)
a(1,1) = 0.5
w1 = a
w2(:,:) = 0.0
t1 => w1
t2 => w2
call system_clock(c1)
do i = 0, iterations
t2(:,:) = 0.0
call sgemm('N', 'N', matrix_size, matrix_size, matrix_size, 1.0, t1, matrix_size, a, matrix_size, 1.0, t2, matrix_size)
tmp => t1
t1 => t2
t2 => tmp
end do
call system_clock(c2)
out_data(1:size(t1)) => t1
out_diag => out_data(1::matrix_size+1)
print *, out_diag
print *, "Elapsed Time: ", (c2 - c1) / real(cr)
deallocate(a)
deallocate(w1)
deallocate(w2)
end program main
```
### Using NVHPC Toolchain
The C++ version of the example can be compiled with NVHPC and ran as follows
```console
ml NVHPC
nvc++ -O3 -march=native -fast -I$NVHPC/Linux_aarch64/$EBVERSIONNVHPC/compilers/include/lp64 -lblas main.cpp -o main
OMP_NUM_THREADS=144 OMP_PROC_BIND=spread ./main
```
The Fortran version is just as simple:
```console
ml NVHPC
nvfortran -O3 -march=native -fast -lblas main.f90 -o main.x
OMP_NUM_THREADS=144 OMP_PROC_BIND=spread ./main
```
!!! note
It may be advantageous to use NVPL libraries instead NVHPC ones. For example DGEMM BLAS 3 routine from NVPL is almost 30% faster than NVHPC one.
### Using Clang (For Grace) Toolchain
Similarly Clang for Grace toolchain with NVPL BLAS can be used to compile C++ version of the example.
```console
ml NVHPC
/opt/nvidia/clang/17.23.11/bin/clang++ -O3 -march=native -ffast-math -I$NVHPC/Linux_aarch64/$EBVERSIONNVHPC/compilers/include/lp64 -lnvpl_blas_lp64_gomp main.cpp -o main
```
!!! note
NVHPC module is used just for the `cblas.h` include in this case. This can be avoided by changing the code to use `nvpl_blas.h` instead.
## Additional Resources
- [https://www.nvidia.com/en-us/data-center/grace-cpu-superchip/][1]
- [https://developer.nvidia.com/hpc-sdk][2]
- [https://developer.nvidia.com/grace/clang][3]
- [https://docs.nvidia.com/nvpl][4]
[1]: https://www.nvidia.com/en-us/data-center/grace-cpu-superchip/
[2]: https://developer.nvidia.com/hpc-sdk
[3]: https://developer.nvidia.com/grace/clang
[4]: https://docs.nvidia.com/nvpl
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment