Skip to content
Snippets Groups Projects
Commit 3d23514a authored by Lukáš Krupčík's avatar Lukáš Krupčík
Browse files

add .md files

parent a621a1ab
No related branches found
Tags
5 merge requests!368Update prace.md to document the change from qprace to qprod as the default...,!367Update prace.md to document the change from qprace to qprod as the default...,!366Update prace.md to document the change from qprace to qprod as the default...,!323extended-acls-storage-section,!1add .md files
Pipeline #
Showing
with 1690 additions and 0 deletions
Shell access and data transfer
==============================
Interactive Login
-----------------
The Anselm cluster is accessed by SSH protocol via login nodes login1
and login2 at address anselm.it4i.cz. The login nodes may be addressed
specifically, by prepending the login node name to the address.
Login address Port Protocol Login node
----------------------- ------ ---------- ----------------------------------------------
anselm.it4i.cz 22 ssh round-robin DNS record for login1 and login2
login1.anselm.it4i.cz 22 ssh login1
login2.anselm.it4i.cz 22 ssh login2
The authentication is by the [private
key](https://docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys)
Please verify SSH fingerprints during the first logon. They are
identical on all login nodes:<span class="monospace">
29:b3:f4:64:b0:73:f5:6f:a7:85:0f:e0:0d:be:76:bf (DSA)
d4:6f:5c:18:f4:3f:70:ef:bc:fc:cc:2b:fd:13:36:b7 (RSA)</span>
Private keys authentication:
On **Linux** or **Mac**, use
```
local $ ssh -i /path/to/id_rsa username@anselm.it4i.cz
```
If you see warning message "UNPROTECTED PRIVATE KEY FILE!", use this
command to set lower permissions to private key file.
```
local $ chmod 600 /path/to/id_rsa
```
On **Windows**, use [PuTTY ssh
client](https://docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/putty/putty).
After logging in, you will see the command prompt:
_
/ | |
/ _ __ ___ ___| |_ __ ___
/ / | '_ / __|/ _ | '_ ` _
/ ____ | | | __ __/ | | | | | |
/_/ __| |_|___/___|_|_| |_| |_|
http://www.it4i.cz/?lang=en
Last loginTue Jul 9 15:57:38 2013 from your-host.example.com
[username@login2.anselm ~]$
The environment is **not** shared between login nodes, except for
[shared
filesystems](https://docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster/storage-1#section-1).
Data Transfer
-------------
Data in and out of the system may be transferred by the
[scp](http://en.wikipedia.org/wiki/Secure_copy) and sftp
protocols. <span class="discreet">(Not available yet.) In case large
volumes of data are transferred, use dedicated data mover node
dm1.anselm.it4i.cz for increased performance.</span>
Address Port Protocol
-------------------------------------------------- ---------------------------------- -----------------------------------------
anselm.it4i.cz 22 scp, sftp
login1.anselm.it4i.cz 22 scp, sftp
login2.anselm.it4i.cz 22 scp, sftp
<span class="discreet">dm1.anselm.it4i.cz</span> <span class="discreet">22</span> <span class="discreet">scp, sftp</span>
The authentication is by the [private
key](https://docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys)
Data transfer rates up to **160MB/s** can be achieved with scp or sftp.
1TB may be transferred in 1:50h.
To achieve 160MB/s transfer rates, the end user must be connected by 10G
line all the way to IT4Innovations and use computer with fast processor
for the transfer. Using Gigabit ethernet connection, up to 110MB/s may
be expected. Fast cipher (aes128-ctr) should be used.
If you experience degraded data transfer performance, consult your local
network provider.
On linux or Mac, use scp or sftp client to transfer the data to Anselm:
```
local $ scp -i /path/to/id_rsa my-local-file username@anselm.it4i.cz:directory/file
```
```
local $ scp -i /path/to/id_rsa -r my-local-dir username@anselm.it4i.cz:directory
```
> or
```
local $ sftp -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz
```
Very convenient way to transfer files in and out of the Anselm computer
is via the fuse filesystem
[sshfs](http://linux.die.net/man/1/sshfs)
```
local $ sshfs -o IdentityFile=/path/to/id_rsa username@anselm.it4i.cz:. mountpoint
```
Using sshfs, the users Anselm home directory will be mounted on your
local computer, just like an external disk.
Learn more on ssh, scp and sshfs by reading the manpages
```
$ man ssh
$ man scp
$ man sshfs
```
On Windows, use [WinSCP
client](http://winscp.net/eng/download.php) to transfer
the data. The [win-sshfs
client](http://code.google.com/p/win-sshfs/) provides a
way to mount the Anselm filesystems directly as an external disc.
More information about the shared file systems is available
[here](https://docs.it4i.cz/anselm-cluster-documentation/storage-1/storage).
Storage
=======
There are two main shared file systems on Anselm cluster, the
[HOME](#home) and [SCRATCH](#scratch). All
login and compute nodes may access same data on shared filesystems.
Compute nodes are also equipped with local (non-shared) scratch, ramdisk
and tmp filesystems.
Archiving
---------
Please don't use shared filesystems as a backup for large amount of data
or long-term archiving mean. The academic staff and students of research
institutions in the Czech Republic can use [CESNET storage
service](https://docs.it4i.cz/anselm-cluster-documentation/storage-1/cesnet-data-storage),
which is available via SSHFS.
Shared Filesystems
------------------
Anselm computer provides two main shared filesystems, the [HOME
filesystem](#home) and the [SCRATCH
filesystem](#scratch). Both HOME and SCRATCH filesystems
are realized as a parallel Lustre filesystem. Both shared file systems
are accessible via the Infiniband network. Extended ACLs are provided on
both Lustre filesystems for the purpose of sharing data with other users
using fine-grained control.
### Understanding the Lustre Filesystems
(source <http://www.nas.nasa.gov>)
A user file on the Lustre filesystem can be divided into multiple chunks
(stripes) and stored across a subset of the object storage targets
(OSTs) (disks). The stripes are distributed among the OSTs in a
round-robin fashion to ensure load balancing.
When a client (a <span class="glossaryItem">compute <span
class="glossaryItem">node</span></span> from your job) needs to create
or access a file, the client queries the metadata server (<span
class="glossaryItem">MDS</span>) and the metadata target (<span
class="glossaryItem">MDT</span>) for the layout and location of the
[file's
stripes](http://www.nas.nasa.gov/hecc/support/kb/Lustre_Basics_224.html#striping).
Once the file is opened and the client obtains the striping information,
the <span class="glossaryItem">MDS</span> is no longer involved in the
file I/O process. The client interacts directly with the object storage
servers (OSSes) and OSTs to perform I/O operations such as locking, disk
allocation, storage, and retrieval.
If multiple clients try to read and write the same part of a file at the
same time, the Lustre distributed lock manager enforces coherency so
that all clients see consistent results.
There is default stripe configuration for Anselm Lustre filesystems.
However, users can set the following stripe parameters for their own
directories or files to get optimum I/O performance:
1. stripe_sizethe size of the chunk in bytes; specify with k, m, or
g to use units of KB, MB, or GB, respectively; the size must be an
even multiple of 65,536 bytes; default is 1MB for all Anselm Lustre
filesystems
2. stripe_count the number of OSTs to stripe across; default is 1 for
Anselm Lustre filesystems one can specify -1 to use all OSTs in
the filesystem.
3. stripe_offset The index of the <span
class="glossaryItem">OST</span> where the first stripe is to be
placed; default is -1 which results in random selection; using a
non-default value is NOT recommended.
Setting stripe size and stripe count correctly for your needs may
significantly impact the I/O performance you experience.
Use the lfs getstripe for getting the stripe parameters. Use the lfs
setstripe command for setting the stripe parameters to get optimal I/O
performance The correct stripe setting depends on your needs and file
access patterns.
```
$ lfs getstripe dir|filename
$ lfs setstripe -s stripe_size -c stripe_count -o stripe_offset dir|filename
```
Example:
```
$ lfs getstripe /scratch/username/
/scratch/username/
stripe_count 1 stripe_size 1048576 stripe_offset -1
$ lfs setstripe -c -1 /scratch/username/
$ lfs getstripe /scratch/username/
/scratch/username/
stripe_count 10 stripe_size 1048576 stripe_offset -1
```
In this example, we view current stripe setting of the
/scratch/username/ directory. The stripe count is changed to all OSTs,
and verified. All files written to this directory will be striped over
10 OSTs
Use lfs check OSTs to see the number and status of active OSTs for each
filesystem on Anselm. Learn more by reading the man page
```
$ lfs check osts
$ man lfs
```
### Hints on Lustre Stripping
Increase the stripe_count for parallel I/O to the same file.
When multiple processes are writing blocks of data to the same file in
parallel, the I/O performance for large files will improve when the
stripe_count is set to a larger value. The stripe count sets the number
of OSTs the file will be written to. By default, the stripe count is set
to 1. While this default setting provides for efficient access of
metadata (for example to support the ls -l command), large files should
use stripe counts of greater than 1. This will increase the aggregate
I/O bandwidth by using multiple OSTs in parallel instead of just one. A
rule of thumb is to use a stripe count approximately equal to the number
of gigabytes in the file.
Another good practice is to make the stripe count be an integral factor
of the number of processes performing the write in parallel, so that you
achieve load balance among the OSTs. For example, set the stripe count
to 16 instead of 15 when you have 64 processes performing the writes.
Using a large stripe size can improve performance when accessing very
large files
Large stripe size allows each client to have exclusive access to its own
part of a file. However, it can be counterproductive in some cases if it
does not match your I/O pattern. The choice of stripe size has no effect
on a single-stripe file.
Read more on
<http://wiki.lustre.org/manual/LustreManual20_HTML/ManagingStripingFreeSpace.html>
### Lustre on Anselm
The architecture of Lustre on Anselm is composed of two metadata
servers (MDS) and four data/object storage servers (OSS). Two object
storage servers are used for file system HOME and another two object
storage servers are used for file system SCRATCH.
<span class="emphasis">Configuration of the storages</span>
- <span class="emphasis">HOME Lustre object storage</span>
<div class="itemizedlist">
- One disk array NetApp E5400
- 22 OSTs
- 227 2TB NL-SAS 7.2krpm disks
- 22 groups of 10 disks in RAID6 (8+2)
- 7 hot-spare disks
- <span class="emphasis">SCRATCH Lustre object storage</span>
<div class="itemizedlist">
- Two disk arrays NetApp E5400
- 10 OSTs
- 106 2TB NL-SAS 7.2krpm disks
- 10 groups of 10 disks in RAID6 (8+2)
- 6 hot-spare disks
- <span class="emphasis">Lustre metadata storage</span>
<div class="itemizedlist">
- One disk array NetApp E2600
- 12 300GB SAS 15krpm disks
- 2 groups of 5 disks in RAID5
- 2 hot-spare disks
### []()[]()HOME
The HOME filesystem is mounted in directory /home. Users home
directories /home/username reside on this filesystem. Accessible
capacity is 320TB, shared among all users. Individual users are
restricted by filesystem usage quotas, set to 250GB per user. <span>If
250GB should prove as insufficient for particular user, please
contact</span> [support](https://support.it4i.cz/rt),
the quota may be lifted upon request.
The HOME filesystem is intended for preparation, evaluation, processing
and storage of data generated by active Projects.
The HOME filesystem should not be used to archive data of past Projects
or other unrelated data.
The files on HOME filesystem will not be deleted until end of the [users
lifecycle](https://docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials/obtaining-login-credentials).
The filesystem is backed up, such that it can be restored in case of
catasthropic failure resulting in significant data loss. This backup
however is not intended to restore old versions of user data or to
restore (accidentaly) deleted files.
The HOME filesystem is realized as Lustre parallel filesystem and is
available on all login and computational nodes.
Default stripe size is 1MB, stripe count is 1. There are 22 OSTs
dedicated for the HOME filesystem.
Setting stripe size and stripe count correctly for your needs may
significantly impact the I/O performance you experience.
HOME filesystem
Mountpoint
/home
Capacity
320TB
Throughput
2GB/s
User quota
250GB
Default stripe size
1MB
Default stripe count
1
Number of OSTs
22
### []()[]()SCRATCH
The SCRATCH filesystem is mounted in directory /scratch. Users may
freely create subdirectories and files on the filesystem. Accessible
capacity is 146TB, shared among all users. Individual users are
restricted by filesystem usage quotas, set to 100TB per user. The
purpose of this quota is to prevent runaway programs from filling the
entire filesystem and deny service to other users. <span>If 100TB should
prove as insufficient for particular user, please contact
[support](https://support.it4i.cz/rt), the quota may be
lifted upon request. </span>
The Scratch filesystem is intended for temporary scratch data generated
during the calculation as well as for high performance access to input
and output files. All I/O intensive jobs must use the SCRATCH filesystem
as their working directory.
Users are advised to save the necessary data from the SCRATCH filesystem
to HOME filesystem after the calculations and clean up the scratch
files.
Files on the SCRATCH filesystem that are **not accessed for more than 90
days** will be automatically **deleted**.
The SCRATCH filesystem is realized as Lustre parallel filesystem and is
available from all login and computational nodes.
Default stripe size is 1MB, stripe count is 1. There are 10 OSTs
dedicated for the SCRATCH filesystem.
Setting stripe size and stripe count correctly for your needs may
significantly impact the I/O performance you experience.
SCRATCH filesystem
Mountpoint
/scratch
Capacity
146TB
Throughput
6GB/s
User quota
100TB
Default stripe size
1MB
Default stripe count
1
Number of OSTs
10
### <span>Disk usage and quota commands</span>
<span>User quotas on the file systems can be checked and reviewed using
following command:</span>
```
$ lfs quota dir
```
Example for Lustre HOME directory:
```
$ lfs quota /home
Disk quotas for user user001 (uid 1234):
Filesystem kbytes quota limit grace files quota limit grace
/home 300096 0 250000000 - 2102 0 500000 -
Disk quotas for group user001 (gid 1234):
Filesystem kbytes quota limit grace files quota limit grace
/home 300096 0 0 - 2102 0 0 -
```
In this example, we view current quota size limit of 250GB and 300MB
currently used by user001.
Example for Lustre SCRATCH directory:
```
$ lfs quota /scratch
Disk quotas for user user001 (uid 1234):
Filesystem kbytes quota limit grace files quota limit grace
/scratch 8 0 100000000000 - 3 0 0 -
Disk quotas for group user001 (gid 1234):
Filesystem kbytes quota limit grace files quota limit grace
/scratch 8 0 0 - 3 0 0 -
```
In this example, we view current quota size limit of 100TB and 8KB
currently used by user001.
To have a better understanding of where the space is exactly used, you
can use following command to find out.
```
$ du -hs dir
```
Example for your HOME directory:
```
$ cd /home
$ du -hs * .[a-zA-z0-9]* | grep -E "[0-9]*G|[0-9]*M" | sort -hr
258M cuda-samples
15M .cache
13M .mozilla
5,5M .eclipse
2,7M .idb_13.0_linux_intel64_app
```
This will list all directories which are having MegaBytes or GigaBytes
of consumed space in your actual (in this example HOME) directory. List
is sorted in descending order from largest to smallest
files/directories.
<span>To have a better understanding of previous commands, you can read
manpages.</span>
```
$ man lfs
```
```
$ man du
```
### Extended ACLs
Extended ACLs provide another security mechanism beside the standard
POSIX ACLs which are defined by three entries (for
owner/group/others). Extended ACLs have more than the three basic
entries. In addition, they also contain a mask entry and may contain any
number of named user and named group entries.
ACLs on a Lustre file system work exactly like ACLs on any Linux file
system. They are manipulated with the standard tools in the standard
manner. Below, we create a directory and allow a specific user access.
```
[vop999@login1.anselm ~]$ umask 027
[vop999@login1.anselm ~]$ mkdir test
[vop999@login1.anselm ~]$ ls -ld test
drwxr-x--- 2 vop999 vop999 4096 Nov 5 14:17 test
[vop999@login1.anselm ~]$ getfacl test
# filetest
# ownervop999
# groupvop999
user::rwx
group::r-x
other::---
[vop999@login1.anselm ~]$ setfacl -m user:johnsm:rwx test
[vop999@login1.anselm ~]$ ls -ld test
drwxrwx---+ 2 vop999 vop999 4096 Nov 5 14:17 test
[vop999@login1.anselm ~]$ getfacl test
# filetest
# ownervop999
# groupvop999
user::rwx
user:johnsm:rwx
group::r-x
mask::rwx
other::---
```
Default ACL mechanism can be used to replace setuid/setgid permissions
on directories. Setting a default ACL on a directory (-d flag to
setfacl) will cause the ACL permissions to be inherited by any newly
created file or subdirectory within the directory. Refer to this page
for more information on Linux ACL:
[http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html ](http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html)
Local Filesystems
-----------------
### Local Scratch
Every computational node is equipped with 330GB local scratch disk.
Use local scratch in case you need to access large amount of small files
during your calculation.
[]()The local scratch disk is mounted as /lscratch and is accessible to
user at /lscratch/$PBS_JOBID directory.
The local scratch filesystem is intended for temporary scratch data
generated during the calculation as well as for high performance access
to input and output files. All I/O intensive jobs that access large
number of small files within the calculation must use the local scratch
filesystem as their working directory. This is required for performance
reasons, as frequent access to number of small files may overload the
metadata servers (MDS) of the Lustre filesystem.
The local scratch directory /lscratch/$PBS_JOBID will be deleted
immediately after the calculation end. Users should take care to save
the output data from within the jobscript.
local SCRATCH filesystem
Mountpoint
/lscratch
Accesspoint
/lscratch/$PBS_JOBID
Capacity
330GB
Throughput
100MB/s
User quota
none
### RAM disk
Every computational node is equipped with filesystem realized in memory,
so called RAM disk.
Use RAM disk in case you need really fast access to your data of limited
size during your calculation.
Be very careful, use of RAM disk filesystem is at the expense of
operational memory.
[]()The local RAM disk is mounted as /ramdisk and is accessible to user
at /ramdisk/$PBS_JOBID directory.
The local RAM disk filesystem is intended for temporary scratch data
generated during the calculation as well as for high performance access
to input and output files. Size of RAM disk filesystem is limited. Be
very careful, use of RAM disk filesystem is at the expense of
operational memory. It is not recommended to allocate large amount of
memory and use large amount of data in RAM disk filesystem at the same
time.
The local RAM disk directory /ramdisk/$PBS_JOBID will be deleted
immediately after the calculation end. Users should take care to save
the output data from within the jobscript.
RAM disk
Mountpoint
<span class="monospace">/ramdisk</span>
Accesspoint
<span class="monospace">/ramdisk/$PBS_JOBID</span>
Capacity
60GB at compute nodes without accelerator
90GB at compute nodes with accelerator
500GB at fat nodes
Throughput
over 1.5 GB/s write, over 5 GB/s read, single thread
over 10 GB/s write, over 50 GB/s read, 16 threads
User quota
none
### tmp
Each node is equipped with local /tmp directory of few GB capacity. The
/tmp directory should be used to work with small temporary files. Old
files in /tmp directory are automatically purged.
**Summary
**
----------
Mountpoint Usage Protocol Net Capacity Throughput Limitations Access Services
------------------------------------------ --------------------------- ---------- ---------------- ------------ ------------- ------------------------- -----------------------------
<span class="monospace">/home</span> home directory Lustre 320 TiB 2 GB/s Quota 250GB Compute and login nodes backed up
<span class="monospace">/scratch</span> cluster shared jobs' data Lustre 146 TiB 6 GB/s Quota 100TB Compute and login nodes files older 90 days removed
<span class="monospace">/lscratch</span> node local jobs' data local 330 GB 100 MB/s none Compute nodes purged after job ends
<span class="monospace">/ramdisk</span> node local jobs' data local 60, 90, 500 GB 5-50 GB/s none Compute nodes purged after job ends
<span class="monospace">/tmp</span> local temporary files local 100 MB/s none Compute and login nodes auto purged
VPN Access
==========
Accessing IT4Innovations internal resources via VPN
---------------------------------------------------
**Failed to initialize connection subsystem Win 8.1 - 02-10-15 MS
patch**
Workaround can be found at
<https://docs.it4i.cz/vpn-connection-fail-in-win-8.1>
For using resources and licenses which are located at IT4Innovations
local network, it is necessary to VPN connect to this network.
We use Cisco AnyConnect Secure Mobility Client, which is supported on
the following operating systems:
- <span>Windows XP</span>
- <span>Windows Vista</span>
- <span>Windows 7</span>
- <span>Windows 8</span>
- <span>Linux</span>
- <span>MacOS</span>
It is impossible to connect to VPN from other operating systems.
<span>VPN client installation</span>
------------------------------------
You can install VPN client from web interface after successful login
with LDAP credentials on address <https://vpn1.it4i.cz/anselm>
![](https://docs.it4i.cz/anselm-cluster-documentation/login.jpg/@@images/30271119-b392-4db9-a212-309fb41925d6.jpeg)
According to the Java settings after login, the client either
automatically installs, or downloads installation file for your
operating system. It is necessary to allow start of installation tool
for automatic installation.
![Java
detection](https://docs.it4i.cz/anselm-cluster-documentation/java_detection.jpg/@@images/5498e1ba-2242-4b9c-a799-0377a73f779e.jpeg "Java detection")
![Execution
access](https://docs.it4i.cz/anselm-cluster-documentation/executionaccess.jpg/@@images/4d6e7cb7-9aa7-419c-9583-6dfd92b2c015.jpeg "Execution access")![Execution
access
2](https://docs.it4i.cz/anselm-cluster-documentation/executionaccess2.jpg/@@images/bed3998c-4b82-4b40-83bd-c3528dde2425.jpeg "Execution access 2")
After successful installation, VPN connection will be established and
you can use available resources from IT4I network.
![Successfull
instalation](https://docs.it4i.cz/anselm-cluster-documentation/successfullinstalation.jpg/@@images/c6d69ffe-da75-4cb6-972a-0cf4c686b6e1.jpeg "Successfull instalation")
If your Java setting doesn't allow automatic installation, you can
download installation file and install VPN client manually.
![Installation
file](https://docs.it4i.cz/anselm-cluster-documentation/instalationfile.jpg/@@images/202d14e9-e2e1-450b-a584-e78c018d6b6a.jpeg "Installation file")
After you click on the link, download of installation file will start.
![Download file
successfull](https://docs.it4i.cz/anselm-cluster-documentation/downloadfilesuccessfull.jpg/@@images/69842481-634a-484e-90cd-d65e0ddca1e8.jpeg "Download file successfull")
After successful download of installation file, you have to execute this
tool with administrator's rights and install VPN client manually.
Working with VPN client
-----------------------
You can use graphical user interface or command line interface to run
VPN client on all supported operating systems. We suggest using GUI.
![Icon](https://docs.it4i.cz/anselm-cluster-documentation/icon.jpg "Icon")
Before the first login to VPN, you have to fill
URL **https://vpn1.it4i.cz/anselm** into the text field.
![First
run](https://docs.it4i.cz/anselm-cluster-documentation/firstrun.jpg "First run")
After you click on the Connect button, you must fill your login
credentials.
![Login -
GUI](https://docs.it4i.cz/anselm-cluster-documentation/logingui.jpg "Login - GUI")
After a successful login, the client will minimize to the system tray.
If everything works, you can see a lock in the Cisco tray icon.
![Successfull
connection](https://docs.it4i.cz/anselm-cluster-documentation/anyconnecticon.jpg "Successfull connection")
If you right-click on this icon, you will see a context menu in which
you can control the VPN connection.
![Context
menu](https://docs.it4i.cz/anselm-cluster-documentation/anyconnectcontextmenu.jpg "Context menu")
When you connect to the VPN for the first time, the client downloads the
profile and creates a new item "ANSELM" in the connection list. For
subsequent connections, it is not necessary to re-enter the URL address,
but just select the corresponding item.
![Anselm
profile](https://docs.it4i.cz/anselm-cluster-documentation/Anselmprofile.jpg "Anselm profile")
Then AnyConnect automatically proceeds like in the case of first logon.
![Login with
profile](https://docs.it4i.cz/anselm-cluster-documentation/loginwithprofile.jpg/@@images/a6fd5f3f-bce4-45c9-85e1-8d93c6395eee.jpeg "Login with profile")
After a successful logon, you can see a green circle with a tick mark on
the lock icon.
![successful
login](https://docs.it4i.cz/anselm-cluster-documentation/successfullconnection.jpg "successful login")
For disconnecting, right-click on the AnyConnect client icon in the
system tray and select **VPN Disconnect**.
Graphical User Interface
========================
X Window System
---------------
The X Window system is a principal way to get GUI access to the
clusters.
Read more about configuring [**X Window
System**](https://docs.it4i.cz/get-started-with-it4innovations/accessing-the-clusters/graphical-user-interface/x-window-system/x-window-and-vnc).
VNC
---
The **Virtual Network Computing** (**VNC**) is a graphical <span
class="link-external">[desktop
sharing](http://en.wikipedia.org/wiki/Desktop_sharing "Desktop sharing")</span>
system that uses the <span class="link-external">[Remote Frame Buffer
protocol
(RFB)](http://en.wikipedia.org/wiki/RFB_protocol "RFB protocol")</span>
to remotely control another <span
class="link-external">[computer](http://en.wikipedia.org/wiki/Computer "Computer")</span>.
Read more about configuring
**[VNC](https://docs.it4i.cz/salomon/accessing-the-cluster/graphical-user-interface/vnc)**.
docs.it4i.cz/anselm-cluster-documentation/anyconnectcontextmenu.jpg

12.8 KiB

docs.it4i.cz/anselm-cluster-documentation/anyconnecticon.jpg

13.1 KiB

docs.it4i.cz/anselm-cluster-documentation/bullxB510.png

21.5 KiB

Compute Nodes
=============
Nodes Configuration
-------------------
Anselm is cluster of x86-64 Intel based nodes built on Bull Extreme
Computing bullx technology. The cluster contains four types of compute
nodes.****
### **Compute Nodes Without Accelerator**
- <div class="itemizedlist">
180 nodes
- <div class="itemizedlist">
2880 cores in total
- <div class="itemizedlist">
two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
- <div class="itemizedlist">
64 GB of physical memory per node
- one 500GB SATA 2,5” 7,2 krpm HDD per node
- <div class="itemizedlist">
bullx B510 blade servers
- <div class="itemizedlist">
cn[1-180]
### **Compute Nodes With GPU Accelerator**
- <div class="itemizedlist">
23 nodes
- <div class="itemizedlist">
368 cores in total
- <div class="itemizedlist">
two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
- <div class="itemizedlist">
96 GB of physical memory per node
- one 500GB SATA 2,5” 7,2 krpm HDD per node
- <div class="itemizedlist">
GPU accelerator 1x NVIDIA Tesla Kepler K20 per node
- <div class="itemizedlist">
bullx B515 blade servers
- <div class="itemizedlist">
cn[181-203]
### **Compute Nodes With MIC Accelerator**
- <div class="itemizedlist">
4 nodes
- <div class="itemizedlist">
64 cores in total
- <div class="itemizedlist">
two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per node
- <div class="itemizedlist">
96 GB of physical memory per node
- one 500GB SATA 2,5” 7,2 krpm HDD per node
- <div class="itemizedlist">
MIC accelerator 1x Intel Phi 5110P per node
- <div class="itemizedlist">
bullx B515 blade servers
- <div class="itemizedlist">
cn[204-207]
### **Fat Compute Nodes**
- <div>
2 nodes
- <div>
32 cores in total
- <div>
2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per node
- <div>
512 GB of physical memory per node
- two 300GB SAS 3,5”15krpm HDD (RAID1) per node
- <div>
two 100GB SLC SSD per node
- <div>
bullx R423-E3 servers
- <div>
cn[208-209]
**![](https://docs.it4i.cz/anselm-cluster-documentation/bullxB510.png)**
****Figure Anselm bullx B510 servers****
### Compute Nodes Summary********
Node type Count Range Memory Cores [Access](https://docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/resources-allocation-policy)
---------------------------- ------- --------------- -------- ------------- -----------------------------------------------------------------------------------------------------------------------------------------------
Nodes without accelerator 180 cn[1-180] 64GB 16 @ 2.4Ghz qexp, qprod, qlong, qfree
Nodes with GPU accelerator 23 cn[181-203] 96GB 16 @ 2.3Ghz qgpu, qprod
Nodes with MIC accelerator 4 cn[204-207] 96GB 16 @ 2.3GHz qmic, qprod
Fat compute nodes 2 cn[208-209] 512GB 16 @ 2.4GHz qfat, qprod
Processor Architecture
----------------------
Anselm is equipped with Intel Sandy Bridge processors Intel Xeon E5-2665
(nodes without accelerator and fat nodes) and Intel Xeon E5-2470 (nodes
with accelerator). Processors support Advanced Vector Extensions (AVX)
256-bit instruction set.
### Intel Sandy Bridge E5-2665 Processor
- eight-core
- speed2.4 GHz, up to 3.1 GHz using Turbo Boost Technology
- peak performance<span class="emphasis">19.2 Gflop/s</span> per
core
- caches:
<div class="itemizedlist">
- L2256 KB per core
- L320 MB per processor
- memory bandwidth at the level of the processor51.2 GB/s
### Intel Sandy Bridge E5-2470 Processor
- eight-core
- speed2.3 GHz, up to 3.1 GHz using Turbo Boost Technology
- peak performance<span class="emphasis">18.4 Gflop/s</span> per
core
- caches:
<div class="itemizedlist">
- L2256 KB per core
- L320 MB per processor
- memory bandwidth at the level of the processor38.4 GB/s
Nodes equipped with Intel Xeon E5-2665 CPU have set PBS resource
attribute cpu_freq = 24, nodes equipped with Intel Xeon E5-2470 CPU
have set PBS resource attribute cpu_freq = 23.
```
$ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16:cpu_freq=24 -I
```
In this example, we allocate 4 nodes, 16 cores at 2.4GHhz per node.
Intel Turbo Boost Technology is used by default, you can disable it for
all nodes of job by using resource attribute cpu_turbo_boost.
$ qsub -A OPEN-0-0 -q qprod -l select=4:ncpus=16 -l cpu_turbo_boost=0 -I
Memory Architecture
-------------------
### Compute Node Without Accelerator
- 2 sockets
- Memory Controllers are integrated into processors.
<div class="itemizedlist">
- 8 DDR3 DIMMS per node
- 4 DDR3 DIMMS per CPU
- 1 DDR3 DIMMS per channel
- Data rate supportup to 1600MT/s
- Populated memory8x 8GB DDR3 DIMM 1600Mhz
### Compute Node With GPU or MIC Accelerator
- 2 sockets
- Memory Controllers are integrated into processors.
<div class="itemizedlist">
- 6 DDR3 DIMMS per node
- 3 DDR3 DIMMS per CPU
- 1 DDR3 DIMMS per channel
- Data rate supportup to 1600MT/s
- Populated memory6x 16GB DDR3 DIMM 1600Mhz
### Fat Compute Node
- 2 sockets
- Memory Controllers are integrated into processors.
<div class="itemizedlist">
- 16 DDR3 DIMMS per node
- 8 DDR3 DIMMS per CPU
- 2 DDR3 DIMMS per channel
- Data rate supportup to 1600MT/s
- Populated memory16x 32GB DDR3 DIMM 1600Mhz
Environment and Modules
=======================
### Environment Customization
After logging in, you may want to configure the environment. Write your
preferred path definitions, aliases, functions and module loads in the
.bashrc file
```
# ./bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
alias qs='qstat -a'
module load PrgEnv-gnu
# Display informations to standard output - only in interactive ssh session
if [ -n "$SSH_TTY" ]
then
module list # Display loaded modules
fi
```
Do not run commands outputing to standard output (echo, module list,
etc) in .bashrc for non-interactive SSH sessions. It breaks fundamental
functionality (scp, PBS) of your account! Take care for SSH session
interactivity for such commands as <span id="result_box"
class="short_text"><span class="hps alt-edited">stated</span> <span
class="hps">in the previous example.</span></span>
### Application Modules
In order to configure your shell for running particular application on
Anselm we use Module package interface.
The modules set up the application paths, library paths and environment
variables for running particular application.
We have also second modules repository. This modules repository is
created using tool called EasyBuild. On Salomon cluster, all modules
will be build by this tool. If you want to use software from this
modules repository, please follow instructions in section [Application
Modules Path Expansion](#EasyBuild).
The modules may be loaded, unloaded and switched, according to momentary
needs.
To check available modules use
```
$ module avail
```
To load a module, for example the octave module use
```
$ module load octave
```
loading the octave module will set up paths and environment variables of
your active shell such that you are ready to run the octave software
To check loaded modules use
```
$ module list
```
To unload a module, for example the octave module use
```
$ module unload octave
```
Learn more on modules by reading the module man page
```
$ man module
```
Following modules set up the development environment
PrgEnv-gnu sets up the GNU development environment in conjunction with
the bullx MPI library
PrgEnv-intel sets up the INTEL development environment in conjunction
with the Intel MPI library
### []()Application Modules Path Expansion
All application modules on Salomon cluster (and further) will be build
using tool called
[EasyBuild](http://hpcugent.github.io/easybuild/ "EasyBuild").
In case that you want to use some applications that are build by
EasyBuild already, you have to modify your MODULEPATH environment
variable.
```
export MODULEPATH=$MODULEPATH:/apps/easybuild/modules/all/
```
This command expands your searched paths to modules. You can also add
this command to the .bashrc file to expand paths permanently. After this
command, you can use same commands to list/add/remove modules as is
described above.
docs.it4i.cz/anselm-cluster-documentation/firstrun.jpg

18.2 KiB

Hardware Overview
=================
The Anselm cluster consists of 209 computational nodes named cn[1-209]
of which 180 are regular compute nodes, 23 GPU Kepler K20 accelerated
nodes, 4 MIC Xeon Phi 5110 accelerated nodes and 2 fat nodes. Each node
is a <span class="WYSIWYG_LINK">powerful</span> x86-64 computer,
equipped with 16 cores (two eight-core Intel Sandy Bridge processors),
at least 64GB RAM, and local hard drive. The user access to the Anselm
cluster is provided by two login nodes login[1,2]. The nodes are
interlinked by high speed InfiniBand and Ethernet networks. All nodes
share 320TB /home disk storage to store the user files. The 146TB shared
/scratch storage is available for the scratch data.
The Fat nodes are equipped with large amount (512GB) of memory.
Virtualization infrastructure provides resources to run long term
servers and services in virtual mode. Fat nodes and virtual servers may
access 45 TB of dedicated block storage. Accelerated nodes, fat nodes,
and virtualization infrastructure are available [upon
request](https://support.it4i.cz/rt) made by a PI.
Schematic representation of the Anselm cluster. Each box represents a
node (computer) or storage capacity:
User-oriented infrastructure
Storage
Management infrastructure
--------
login1
login2
dm1
--------
**Rack 01, Switch isw5
**
-------------- -------------- -------------- -------------- --------------
cn186 cn187 cn188 cn189
cn181 cn182 cn183 cn184 cn185
-------------- -------------- -------------- -------------- --------------
**Rack 01, Switch isw4
**
cn29
cn30
cn31
cn32
cn33
cn34
cn35
cn36
cn19
cn20
cn21
cn22
cn23
cn24
cn25
cn26
cn27
cn28
<table>
<colgroup>
<col width="100%" />
</colgroup>
<tbody>
<tr class="odd">
<td align="left"><p> </p>
<p> </p>
<p>Lustre FS</p>
<p>/home<br />
320TB</p>
<p> </p>
<p> </p></td>
</tr>
<tr class="even">
<td align="left"><p>Lustre FS</p>
<p>/scratch<br />
146TB</p></td>
</tr>
</tbody>
</table>
Management
nodes
Block storage
45 TB
Virtualization
infrastructure
servers
...
Srv node
Srv node
Srv node
...
**Rack 01, Switch isw0
**
cn11
cn12
cn13
cn14
cn15
cn16
cn17
cn18
cn1
cn2
cn3
cn4
cn5
cn6
cn7
cn8
cn9
cn10
**Rack 02, Switch isw10
**
cn73
cn74
cn75
cn76
cn77
cn78
cn79
cn80
cn190
cn191
cn192
cn205
cn206
**Rack 02, Switch isw9
**
cn65
cn66
cn67
cn68
cn69
cn70
cn71
cn72
cn55
cn56
cn57
cn58
cn59
cn60
cn61
cn62
cn63
cn64
**Rack 02, Switch isw6
**
cn47
cn48
cn49
cn50
cn51
cn52
cn53
cn54
cn37
cn38
cn39
cn40
cn41
cn42
cn43
cn44
cn45
cn46
**Rack 03, Switch isw15
**
cn193
cn194
cn195
cn207
cn117
cn118
cn119
cn120
cn121
cn122
cn123
cn124
cn125
cn126
**Rack 03, Switch isw14
**
cn109
cn110
cn111
cn112
cn113
cn114
cn115
cn116
cn99
cn100
cn101
cn102
cn103
cn104
cn105
cn106
cn107
cn108
**Rack 03, Switch isw11
**
cn91
cn92
cn93
cn94
cn95
cn96
cn97
cn98
cn81
cn82
cn83
cn84
cn85
cn86
cn87
cn88
cn89
cn90
**Rack 04, Switch isw20
**
cn173
cn174
cn175
cn176
cn177
cn178
cn179
cn180
cn163
cn164
cn165
cn166
cn167
cn168
cn169
cn170
cn171
cn172
**Rack 04, **Switch** isw19
**
cn155
cn156
cn157
cn158
cn159
cn160
cn161
cn162
cn145
cn146
cn147
cn148
cn149
cn150
cn151
cn152
cn153
cn154
**Rack 04, Switch isw16
**
cn137
cn138
cn139
cn140
cn141
cn142
cn143
cn144
cn127
cn128
cn129
cn130
cn131
cn132
cn133
cn134
cn135
cn136
**Rack 05, Switch isw21
**
-------------- -------------- -------------- -------------- --------------
cn201 cn202 cn203 cn204
cn196 cn197 cn198 cn199 cn200
-------------- -------------- -------------- -------------- --------------
----------------
Fat node cn208
Fat node cn209
...
----------------
The cluster compute nodes cn[1-207] are organized within 13 chassis.
There are four types of compute nodes:
- 180 compute nodes without the accelerator
- 23 compute nodes with GPU accelerator - equipped with NVIDIA Tesla
Kepler K20
- 4 compute nodes with MIC accelerator - equipped with Intel Xeon Phi
5110P
- 2 fat nodes - equipped with 512GB RAM and two 100GB SSD drives
[More about Compute
nodes](https://docs.it4i.cz/anselm-cluster-documentation/compute-nodes).
GPU and accelerated nodes are available upon request, see the [Resources
Allocation
Policy](https://docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution/resources-allocation-policy).
All these nodes are interconnected by fast <span
class="WYSIWYG_LINK">InfiniBand <span class="WYSIWYG_LINK">QDR</span>
network</span> and Ethernet network. [More about the <span
class="WYSIWYG_LINK">Network</span>](https://docs.it4i.cz/anselm-cluster-documentation/network).
Every chassis provides Infiniband switch, marked **isw**, connecting all
nodes in the chassis, as well as connecting the chassis to the upper
level switches.
All nodes share 360TB /home disk storage to store user files. The 146TB
shared /scratch storage is available for the scratch data. These file
systems are provided by Lustre parallel file system. There is also local
disk storage available on all compute nodes /lscratch. [More about
<span
class="WYSIWYG_LINK">Storage</span>](https://docs.it4i.cz/anselm-cluster-documentation/storage-1/storage).
The user access to the Anselm cluster is provided by two login nodes
login1, login2, and data mover node dm1. [More about accessing
cluster.](https://docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster)
The parameters are summarized in the following tables:
**In general**
Primary purpose
High Performance Computing
Architecture of compute nodes
x86-64
Operating system
Linux
[**Compute
nodes**](https://docs.it4i.cz/anselm-cluster-documentation/compute-nodes)
Totally
209
Processor cores
16 (2x8 cores)
RAM
min. 64 GB, min. 4 GB per core
Local disk drive
yes - usually 500 GB
Compute network
InfiniBand QDR, fully non-blocking, fat-tree
w/o accelerator
180, cn[1-180]
GPU accelerated
23, cn[181-203]
MIC accelerated
4, cn[204-207]
Fat compute nodes
2, cn[208-209]
**In total**
Total theoretical peak performance (Rpeak)
94 Tflop/s
Total max. LINPACK performance (Rmax)
73 Tflop/s
Total amount of RAM
15.136 TB
Node Processor Memory Accelerator
------------------ --------------------------------------- -------- ----------------------
w/o accelerator 2x Intel Sandy Bridge E5-2665, 2.4GHz 64GB -
GPU accelerated 2x Intel Sandy Bridge E5-2470, 2.3GHz 96GB NVIDIA Kepler K20
MIC accelerated 2x Intel Sandy Bridge E5-2470, 2.3GHz 96GB Intel Xeon Phi P5110
Fat compute node 2x Intel Sandy Bridge E5-2665, 2.4GHz 512GB -
For more details please refer to the [Compute
nodes](https://docs.it4i.cz/anselm-cluster-documentation/compute-nodes),
[Storage](https://docs.it4i.cz/anselm-cluster-documentation/storage-1/storage),
and
[Network](https://docs.it4i.cz/anselm-cluster-documentation/network).
File added
Introduction
============
Welcome to Anselm supercomputer cluster. The Anselm cluster consists of
209 compute nodes, totaling 3344 compute cores with 15TB RAM and giving
over 94 Tflop/s theoretical peak performance. Each node is a <span
class="WYSIWYG_LINK">powerful</span> x86-64 computer, equipped with 16
cores, at least 64GB RAM, and 500GB harddrive. Nodes are interconnected
by fully non-blocking fat-tree Infiniband network and equipped with
Intel Sandy Bridge processors. A few nodes are also equipped with NVIDIA
Kepler GPU or Intel Xeon Phi MIC accelerators. Read more in [Hardware
Overview](https://docs.it4i.cz/anselm-cluster-documentation/hardware-overview).
The cluster runs bullx Linux [<span
class="WYSIWYG_LINK"></span>](http://www.bull.com/bullx-logiciels/systeme-exploitation.html)[operating
system](https://docs.it4i.cz/anselm-cluster-documentation/software/operating-system),
which is compatible with the <span class="WYSIWYG_LINK">RedHat</span>
[<span class="WYSIWYG_LINK">Linux
family.</span>](http://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg)
We have installed a wide range of
[software](https://docs.it4i.cz/anselm-cluster-documentation/software)
packages targeted at different scientific domains. These packages are
accessible via the [modules
environment](https://docs.it4i.cz/anselm-cluster-documentation/environment-and-modules).
User data shared file-system (HOME, 320TB) and job data shared
file-system (SCRATCH, 146TB) are available to users.
The PBS Professional workload manager provides [computing resources
allocations and job
execution](https://docs.it4i.cz/anselm-cluster-documentation/resource-allocation-and-job-execution).
Read more on how to [apply for
resources](https://docs.it4i.cz/get-started-with-it4innovations/applying-for-resources),
[obtain login
credentials,](https://docs.it4i.cz/get-started-with-it4innovations/obtaining-login-credentials)
and [access the
cluster](https://docs.it4i.cz/anselm-cluster-documentation/accessing-the-cluster).
docs.it4i.cz/anselm-cluster-documentation/legend.png

43.1 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment