diff --git a/docs.it4i/software/numerical-languages/introduction.md b/docs.it4i/software/numerical-languages/introduction.md new file mode 100644 index 0000000000000000000000000000000000000000..92251d6dfa3fe1fecf0c3cf8510adb7607923052 --- /dev/null +++ b/docs.it4i/software/numerical-languages/introduction.md @@ -0,0 +1,40 @@ +# Numerical languages + +Interpreted languages for numerical computations and analysis + +## Introduction + +This section contains a collection of high-level interpreted languages, primarily intended for numerical computations. + +## Matlab + +MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. + +```console +$ ml MATLAB +$ matlab +``` + +Read more at the [Matlab page](matlab/). + +## Octave + +GNU Octave is a high-level interpreted language, primarily intended for numerical computations. The Octave language is quite similar to Matlab so that most programs are easily portable. + +```console +$ ml Octave +$ octave +``` + +Read more at the [Octave page](octave/). + +## R + +The R is an interpreted language and environment for statistical computing and graphics. + +```console +$ ml R +$ R +``` + +Read more at the [R page](r/). diff --git a/docs.it4i/software/numerical-languages/matlab.md b/docs.it4i/software/numerical-languages/matlab.md new file mode 100644 index 0000000000000000000000000000000000000000..c1d52e46fc6c21e669fc5e5488d4004d743af1dc --- /dev/null +++ b/docs.it4i/software/numerical-languages/matlab.md @@ -0,0 +1,281 @@ +# Matlab + +## Introduction + +Matlab is available in versions R2015a and R2015b. There are always two variants of the release: + +* Non commercial or so called EDU variant, which can be used for common research and educational purposes. +* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. + +To load the latest version of Matlab load the module + +```console +$ ml MATLAB +``` + +By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use + +```console +$ ml av MATLAB +``` + +If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. + +If you require the Matlab GUI, please follow the general information about [running graphical applications](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/). + +Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so using X11 display redirection either via SSH or directly by xauth (please see the "GUI Applications on Compute Nodes over VNC" part [here](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/)) is recommended. + +To run Matlab with GUI, use + +```console +$ matlab +``` + +To run Matlab in text mode, without the Matlab Desktop GUI environment, use + +```console +$ matlab -nodesktop -nosplash +``` + +plots, images, etc... will be still available. + +## Running Parallel Matlab Using Distributed Computing Toolbox / Engine + +Distributed toolbox is available only for the EDU variant + +The MPIEXEC mode available in previous versions is no longer available in MATLAB 2015. Also, the programming interface has changed. Refer to [Release Notes](http://www.mathworks.com/help/distcomp/release-notes.html#buanp9e-1). + +Delete previously used file mpiLibConf.m, we have observed crashes when using Intel MPI. + +To use Distributed Computing, you first need to setup a parallel profile. We have provided the profile for you, you can either import it in MATLAB command line: + +```console +> parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings') + +ans = + +SalomonPBSPro +``` + +Or in the GUI, go to tab HOME -> Parallel -> Manage Cluster Profiles..., click Import and navigate to : + +/apps/all/MATLAB/2015b-EDU/SalomonPBSPro.settings + +With the new mode, MATLAB itself launches the workers via PBS, so you can either use interactive mode or a batch mode on one node, but the actual parallel processing will be done in a separate job started by MATLAB itself. Alternatively, you can use "local" mode to run parallel code on just a single node. + +!!! note + The profile is confusingly named Salomon, but you can use it also on Anselm. + +### Parallel Matlab Interactive Session + +Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see [this page](../../../general/accessing-the-clusters/graphical-user-interface/x-window-system/). + +```console +$ xhost + +$ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=1 -l walltime=00:30:00 -l feature__matlab__MATLAB=1 +``` + +This qsub command example shows how to run Matlab on a single node. + +The second part of the command shows how to request all necessary licenses. In this case 1 Matlab-EDU license and 48 Distributed Computing Engines licenses. + +Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: + +```console +$ ml MATLAB/2015a-EDU +$ matlab & +``` + +### Parallel Matlab Batch Job in Local Mode + +To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. + +```bash +#!/bin/bash +#PBS -A PROJECT ID +#PBS -q qprod +#PBS -l select=1:ncpus=24:mpiprocs=24:ompthreads=1 # Anselm: ncpus=16:mpiprocs=16 + +# change to shared scratch directory +SCR=/scratch/.../$USER/$PBS_JOBID # change path in according to the cluster +mkdir -p $SCR ; cd $SCR || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/matlabcode.m . + +# load modules +module load MATLAB/2015a-EDU + +# execute the calculation +matlab -nodisplay -r matlabcode > output.out + +# copy output file to home +cp output.out $PBS_O_WORKDIR/. +``` + +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m file, outputs in output.out file. Note the missing .m extension in the matlab -r matlabcodefile call, **the .m must not be included**. Note that the **shared /scratch must be used**. Further, it is **important to include quit** statement at the end of the matlabcode.m script. + +Submit the jobscript using qsub + +```console +$ qsub ./jobscript +``` + +### Parallel Matlab Local Mode Program Example + +The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. + +```console +cluster = parcluster('local') +``` + +This script creates scheduler object "cluster" of type "local" that starts workers locally. + +!!! hint + Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling parpool(sched, ...) function. + +The last step is to start matlabpool with "cluster" object and correct number of workers. We have 24 cores per node, so we start 24 workers. + +```console +parpool(cluster,24); # Anselm: parpool(cluster,24) + + +... parallel code ... + + +parpool close +``` + +The complete example showing how to use Distributed Computing Toolbox in local mode is shown here. + +```matlab +cluster = parcluster('local'); +cluster + +parpool(cluster,24); + +n=2000; + +W = rand(n,n); +W = distributed(W); +x = (1:n)'; +x = distributed(x); +spmd +[~, name] = system('hostname') + + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. +end +T; +whos % T and W are both distributed arrays here. + +parpool close +quit +``` + +You can copy and paste the example in a .m file and execute. Note that the parpool size should correspond to **total number of cores** available on allocated nodes. + +### Parallel Matlab Batch Job Using PBS Mode (Workers Spawned in a Separate Job) + +This mode uses PBS scheduler to launch the parallel pool. It uses the SalomonPBSPro profile that needs to be imported to Cluster Manager, as mentioned before. This methodod uses MATLAB's PBS Scheduler interface - it spawns the workers in a separate job submitted by MATLAB using qsub. + +This is an example of m-script using PBS mode: + +```matlab +cluster = parcluster('SalomonPBSPro'); +set(cluster, 'SubmitArguments', '-A OPEN-0-0'); +set(cluster, 'ResourceTemplate', '-q qprod -l select=10:ncpus=24'); +set(cluster, 'NumWorkers', 240); + +pool = parpool(cluster,240); + +n=2000; + +W = rand(n,n); +W = distributed(W); +x = (1:n)'; +x = distributed(x); +spmd +[~, name] = system('hostname') + + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. +end +whos % T and W are both distributed arrays here. + +% shut down parallel pool +delete(pool) +``` + +Note that we first construct a cluster object using the imported profile, then set some important options, namely : SubmitArguments, where you need to specify accounting id, and ResourceTemplate, where you need to specify number of nodes to run the job. + +You can start this script using batch mode the same way as in Local mode example. + +### Parallel Matlab Batch With Direct Launch (Workers Spawned Within the Existing Job) + +This method is a "hack" invented by us to emulate the mpiexec functionality found in previous MATLAB versions. We leverage the MATLAB Generic Scheduler interface, but instead of submitting the workers to PBS, we launch the workers directly within the running job, thus we avoid the issues with master script and workers running in separate jobs (issues with license not available, waiting for the worker's job to spawn etc.) + +!!! warning + This method is experimental. + +For this method, you need to use SalomonDirect profile, import it using [the same way as SalomonPBSPro](matlab.md#running-parallel-matlab-using-distributed-computing-toolbox---engine) + +This is an example of m-script using direct mode: + +```matlab +parallel.importProfile('/apps/all/MATLAB/2015b-EDU/SalomonDirect.settings') +cluster = parcluster('SalomonDirect'); +set(cluster, 'NumWorkers', 48); + +pool = parpool(cluster, 48); + +n=2000; + +W = rand(n,n); +W = distributed(W); +x = (1:n)'; +x = distributed(x); +spmd +[~, name] = system('hostname') + + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. +end +whos % T and W are both distributed arrays here. + +% shut down parallel pool +delete(pool) +``` + +### Non-Interactive Session and Licenses + +If you want to run batch jobs with Matlab, be sure to request appropriate license features with the PBS Pro scheduler, at least the `-l __feature__matlab__MATLAB=1` for EDU variant of Matlab. More information about how to check the license features states and how to request them with PBS Pro, please [look here](../../../anselm/software/isv_licenses/). + +The licensing feature of PBS is currently disabled. + +In case of non-interactive session please read the [following information](../../../anselm/software/isv_licenses/) on how to modify the qsub command to test for available licenses prior getting the resource allocation. + +### Matlab Distributed Computing Engines Start Up Time + +Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: + +| compute nodes | number of workers | start-up time[s] | +| ------------- | ----------------- | ---------------- | +| 16 | 384 | 831 | +| 8 | 192 | 807 | +| 4 | 96 | 483 | +| 2 | 48 | 16 | + +## MATLAB on UV2000 + +UV2000 machine available in queue "qfat" can be used for MATLAB computations. This is a SMP NUMA machine with large amount of RAM, which can be beneficial for certain types of MATLAB jobs. CPU cores are allocated in chunks of 8 for this machine. + +You can use MATLAB on UV2000 in two parallel modes: + +### Threaded Mode + +Since this is a SMP machine, you can completely avoid using Parallel Toolbox and use only MATLAB's threading. MATLAB will automatically detect the number of cores you have allocated and will set maxNumCompThreads accordingly and certain operations, such as fft, , eig, svd, etc. will be automatically run in threads. The advantage of this mode is that you don't need to modify your existing sequential codes. + +### Local Cluster Mode + +You can also use Parallel Toolbox on UV2000. Use [local cluster mode](matlab/#parallel-matlab-batch-job-in-local-mode), "SalomonPBSPro" profile will not work. diff --git a/docs.it4i/software/numerical-languages/matlab_1314.md b/docs.it4i/software/numerical-languages/matlab_1314.md new file mode 100644 index 0000000000000000000000000000000000000000..5c0c1bc7e2004a0ab0b86d342fc92a7a6e9af5cf --- /dev/null +++ b/docs.it4i/software/numerical-languages/matlab_1314.md @@ -0,0 +1,206 @@ +# Matlab 2013-2014 + +## Introduction + +!!! note + This document relates to the old versions R2013 and R2014. For MATLAB 2015, please use [this documentation instead](matlab/). + +Matlab is available in the latest stable version. There are always two variants of the release: + +* Non commercial or so called EDU variant, which can be used for common research and educational purposes. +* Commercial or so called COM variant, which can used also for commercial activities. The licenses for commercial variant are much more expensive, so usually the commercial variant has only subset of features compared to the EDU available. + +To load the latest version of Matlab load the module + +```console +$ ml matlab +``` + +By default the EDU variant is marked as default. If you need other version or variant, load the particular version. To obtain the list of available versions use + +```console +$ ml matlab +``` + +If you need to use the Matlab GUI to prepare your Matlab programs, you can use Matlab directly on the login nodes. But for all computations use Matlab on the compute nodes via PBS Pro scheduler. + +If you require the Matlab GUI, please follow the general information about running graphical applications + +Matlab GUI is quite slow using the X forwarding built in the PBS (qsub -X), so using X11 display redirection either via SSH or directly by xauth (please see the "GUI Applications on Compute Nodes over VNC" part) is recommended. + +To run Matlab with GUI, use + +```console +$ matlab +``` + +To run Matlab in text mode, without the Matlab Desktop GUI environment, use + +```console +$ matlab -nodesktop -nosplash +``` + +Plots, images, etc... will be still available. + +## Running Parallel Matlab Using Distributed Computing Toolbox / Engine + +Recommended parallel mode for running parallel Matlab on Anselm is MPIEXEC mode. In this mode user allocates resources through PBS prior to starting Matlab. Once resources are granted the main Matlab instance is started on the first compute node assigned to job by PBS and workers are started on all remaining nodes. User can use both interactive and non-interactive PBS sessions. This mode guarantees that the data processing is not performed on login nodes, but all processing is on compute nodes. + + + +For the performance reasons Matlab should use system MPI. On Anselm the supported MPI implementation for Matlab is Intel MPI. To switch to system MPI user has to override default Matlab setting by creating new configuration file in its home directory. The path and file name has to be exactly the same as in the following listing: + +```matlab +$ vim ~/matlab/mpiLibConf.m + +function [lib, extras] = mpiLibConf +%MATLAB MPI Library overloading for Infiniband Networks + +mpich = '/opt/intel/impi/4.1.1.036/lib64/'; + +disp('Using Intel MPI 4.1.1.036 over Infiniband') + +lib = strcat(mpich, 'libmpich.so'); +mpl = strcat(mpich, 'libmpl.so'); +opa = strcat(mpich, 'libopa.so'); + +extras = {}; +``` + +System MPI library allows Matlab to communicate through 40 Gbit/s InfiniBand QDR interconnect instead of slower 1 Gbit Ethernet network. + +!!! note + The path to MPI library in "mpiLibConf.m" has to match with version of loaded Intel MPI module. In this example the version 4.1.1.036 of Intel MPI is used by Matlab and therefore module impi/4.1.1.036 has to be loaded prior to starting Matlab. + +### Parallel Matlab Interactive Session + +Once this file is in place, user can request resources from PBS. Following example shows how to start interactive session with support for Matlab GUI. For more information about GUI based applications on Anselm see. + +```console +$ xhost + +$ qsub -I -v DISPLAY=$(uname -n):$(echo $DISPLAY | cut -d ':' -f 2) -A NONE-0-0 -q qexp -l select=4:ncpus=16:mpiprocs=16 -l walltime=00:30:00 -l feature__matlab__MATLAB=1 +``` + +This qsub command example shows how to run Matlab with 32 workers in following configuration: 2 nodes (use all 16 cores per node) and 16 workers = mpirocs per node (-l select=2:ncpus=16:mpiprocs=16). If user requires to run smaller number of workers per node then the "mpiprocs" parameter has to be changed. + +The second part of the command shows how to request all necessary licenses. In this case 1 Matlab-EDU license and 32 Distributed Computing Engines licenses. + +Once the access to compute nodes is granted by PBS, user can load following modules and start Matlab: + +```console +$ ml matlab/R2013a-EDU +$ ml impi/4.1.1.036 +$ matlab & +``` + +### Parallel Matlab Batch Job + +To run matlab in batch mode, write an matlab script, then write a bash jobscript and execute via the qsub command. By default, matlab will execute one matlab worker instance per allocated core. + +```bash +#!/bin/bash +#PBS -A PROJECT ID +#PBS -q qprod +#PBS -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 + +# change to shared scratch directory +SCR=/scratch/$USER/$PBS_JOBID +mkdir -p $SCR ; cd $SCR || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/matlabcode.m . + +# load modules +module load matlab/R2013a-EDU +module load impi/4.1.1.036 + +# execute the calculation +matlab -nodisplay -r matlabcode > output.out + +# copy output file to home +cp output.out $PBS_O_WORKDIR/. +``` + +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs and matlab script are in matlabcode.m file, outputs in output.out file. Note the missing .m extension in the matlab -r matlabcodefile call, **the .m must not be included**. Note that the **shared /scratch must be used**. Further, it is **important to include quit** statement at the end of the matlabcode.m script. + +Submit the jobscript using qsub + +```console +$ qsub ./jobscript +``` + +### Parallel Matlab Program Example + +The last part of the configuration is done directly in the user Matlab script before Distributed Computing Toolbox is started. + +```matlab +sched = findResource('scheduler', 'type', 'mpiexec'); +set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun'); +set(sched, 'EnvironmentSetMethod', 'setenv'); +``` + +This script creates scheduler object "sched" of type "mpiexec" that starts workers using mpirun tool. To use correct version of mpirun, the second line specifies the path to correct version of system Intel MPI library. + +!!! note + Every Matlab script that needs to initialize/use matlabpool has to contain these three lines prior to calling matlabpool(sched, ...) function. + +The last step is to start matlabpool with "sched" object and correct number of workers. In this case qsub asked for total number of 32 cores, therefore the number of workers is also set to 32. + +```console +matlabpool(sched,32); + + +... parallel code ... + + +matlabpool close +``` + +The complete example showing how to use Distributed Computing Toolbox is show here. + +```matlab +sched = findResource('scheduler', 'type', 'mpiexec'); +set(sched, 'MpiexecFileName', '/apps/intel/impi/4.1.1/bin/mpirun') +set(sched, 'EnvironmentSetMethod', 'setenv') +set(sched, 'SubmitArguments', '') +sched + +matlabpool(sched,32); + +n=2000; + +W = rand(n,n); +W = distributed(W); +x = (1:n)'; +x = distributed(x); +spmd +[~, name] = system('hostname') + + T = W*x; % Calculation performed on labs, in parallel. + % T and W are both codistributed arrays here. +end +T; +whos % T and W are both distributed arrays here. + +matlabpool close +quit +``` + +You can copy and paste the example in a .m file and execute. Note that the matlabpool size should correspond to **total number of cores** available on allocated nodes. + +### Non-Interactive Session and Licenses + +If you want to run batch jobs with Matlab, be sure to request appropriate license features with the PBS Pro scheduler, at least the ` -l __feature__matlab__MATLAB=1` for EDU variant of Matlab. More information about how to check the license features states and how to request them with PBS Pro, please [look here](../isv_licenses/). + +In case of non-interactive session please read the [following information](../isv_licenses/) on how to modify the qsub command to test for available licenses prior getting the resource allocation. + +### Matlab Distributed Computing Engines Start Up Time + +Starting Matlab workers is an expensive process that requires certain amount of time. For your information please see the following table: + +| compute nodes | number of workers | start-up time[s] | +| ------------- | ----------------- | ---------------- | +| 16 | 256 | 1008 | +| 8 | 128 | 534 | +| 4 | 64 | 333 | +| 2 | 32 | 210 | diff --git a/docs.it4i/software/numerical-languages/octave.md b/docs.it4i/software/numerical-languages/octave.md new file mode 100644 index 0000000000000000000000000000000000000000..e41a465ae87d98cfca8a3ff59d21f3ee2dbf5a83 --- /dev/null +++ b/docs.it4i/software/numerical-languages/octave.md @@ -0,0 +1,108 @@ +# Octave + +## Introduction + +GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to Matlab so that most programs are easily portable. Read more on <http://www.gnu.org/software/octave/> + +For looking for avaible modules, type: + +```console +$ ml av octave +``` + +## Modules and Execution + +```console +$ ml Octave +``` + +The octave on clusters is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many octave kernels, notably the linear algebra subroutines. Octave runs these heavy calculation kernels without any penalty. By default, octave would parallelize to 16 (Anselm) or 24 (Salomon) threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. + +To run octave interactively, log in with ssh -X parameter for X11 forwarding. Run octave: + +```console +$ octave +``` + +To run octave in batch mode, write an octave script, then write a bash jobscript and execute via the qsub command. By default, octave will use 16 (Anselm) or 24 (Salomon) threads when running MKL kernels. + +```bash +#!/bin/bash + +# change to local scratch directory +cd /lscratch/$PBS_JOBID || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/octcode.m . + +# load octave module +module load octave + +# execute the calculation +octave -q --eval octcode > output.out + +# copy output file to home +cp output.out $PBS_O_WORKDIR/. + +#exit +exit +``` + +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in octcode.m file, outputs in output.out file. See the single node jobscript example in the [Job execution section](../../job-submission-and-execution/). + +The octave c compiler mkoctfile calls the GNU gcc 4.8.1 for compiling native c code. This is very useful for running native c subroutines in octave environment. + +```console +$ mkoctfile -v +``` + +Octave may use MPI for interprocess communication This functionality is currently not supported on Anselm cluster. In case you require the octave interface to MPI, please contact [Anselm support](https://support.it4i.cz/rt/). + +## Xeon Phi Support + +Octave may take advantage of the Xeon Phi accelerators. This will only work on the [Intel Xeon Phi](../intel-xeon-phi/) [accelerated nodes](../../compute-nodes/). + +### Automatic Offload Support + +Octave can accelerate BLAS type operations (in particular the Matrix Matrix multiplications] on the Xeon Phi accelerator, via [Automatic Offload using the MKL library](../intel-xeon-phi/#section-3) + +Example + +```octave +$ export OFFLOAD_REPORT=2 +$ export MKL_MIC_ENABLE=1 +$ ml octave +$ octave -q + octave:1> A=rand(10000); B=rand(10000); + octave:2> tic; C=A*B; toc + [MKL] [MIC --] [AO Function] DGEMM + [MKL] [MIC --] [AO DGEMM Workdivision] 0.32 0.68 + [MKL] [MIC 00] [AO DGEMM CPU Time] 2.896003 seconds + [MKL] [MIC 00] [AO DGEMM MIC Time] 1.967384 seconds + [MKL] [MIC 00] [AO DGEMM CPU->MIC Data] 1347200000 bytes + [MKL] [MIC 00] [AO DGEMM MIC->CPU Data] 2188800000 bytes + Elapsed time is 2.93701 seconds. +``` + +In this example, the calculation was automatically divided among the CPU cores and the Xeon Phi MIC accelerator, reducing the total runtime from 6.3 secs down to 2.9 secs. + +### Native Support + +A version of [native](../intel-xeon-phi/#section-4) Octave is compiled for Xeon Phi accelerators. Some limitations apply for this version: + +* Only command line support. GUI, graph plotting etc. is not supported. +* Command history in interactive mode is not supported. + +Octave is linked with parallel Intel MKL, so it best suited for batch processing of tasks that utilize BLAS, LAPACK and FFT operations. By default, number of threads is set to 120, you can control this with > OMP_NUM_THREADS environment +variable. + +!!! note + Calculations that do not employ parallelism (either by using parallel MKL e.g. via matrix operations, fork() function, [parallel package](http://octave.sourceforge.net/parallel/) or other mechanism) will actually run slower than on host CPU. + +To use Octave on a node with Xeon Phi: + +```console +$ ssh mic0 # login to the MIC card +$ source /apps/tools/octave/3.8.2-mic/bin/octave-env.sh # set up environment variables +$ octave -q /apps/tools/octave/3.8.2-mic/example/test0.m # run an example +``` diff --git a/docs.it4i/software/numerical-languages/opencoarrays.md b/docs.it4i/software/numerical-languages/opencoarrays.md new file mode 100644 index 0000000000000000000000000000000000000000..ada43753a67e0a949cd8d2b6995cf3f56e2b736f --- /dev/null +++ b/docs.it4i/software/numerical-languages/opencoarrays.md @@ -0,0 +1,127 @@ +# OpenCoarrays + +## Introduction + +Coarray Fortran (CAF) is an extension of Fortran language and offers a simple interface for parallel processing and memory sharing. +The advantage is that only small changes are required to convert existing Fortran code to support a robust and potentially efficient parallelism. + +A CAF program is interpreted as if it was replicated a number of times and all copies were executed asynchronously. +The number of copies is decided at execution time. Each copy (called *image*) has its own private variables. +The variable syntax of Fortran language is extended with indexes in square brackets (called *co-dimension*) representing a reference to data distributed across images. + +By default, the CAF is using Message Passing Interface (MPI) for lower-level communication, so there are some similarities with MPI. + +Read more on <http://www.opencoarrays.org/> + +## Coarray Basics + +### Indexing of Coarray Images + +Indexing of individual images can be shown on the simple *Hello World* program: + +```fortran +program hello_world + implicit none + print *, 'Hello world from image ', this_image() , 'of', num_images() +end program hello_world +``` + +* num_images() - returns the number of all images +* this_image() - returns the image index - numbered from 1 to num_images() + +### Co-dimension Variables Declaration + +Coarray variables can be declared with the **codimension[*]** attribute or by adding trailing index **[*]** after the variable name. +Notice, the ***** character always has to be in the square brackets. + +```fortran +integer, codimension[*] :: scalar +integer :: scalar[*] +real, dimension(64), codimension[*] :: vector +real :: vector(64)[*] +``` + +### Images Synchronization + +Because each image is running on its own, the image synchronization is needed to ensure, that all altered data are distributed to all images. +Synchronization can be done across all images or only between selected images. Be aware, that selective synchronization can lead to the race condition problems like deadlock. + +Example program: + +```fortran +program synchronization_test + implicit none + integer :: i ! Local variable + integer :: numbers[*] ! Scalar coarray + + ! Genereate random number on image 1 + if (this_image() == 1) then + numbers = floor(rand(1) * 1000) + ! Distribute information to other images + do i = 2, num_images() + numbers[i] = numbers + end do + end if + + sync all ! Barrier to synchronize all images + + print *, 'The random number is', numbers +end program synchronization_test +``` + +* sync all - Synchronize all images between each other +* sync images(*) - Synchronize this image to all other +* sync images(*index*) - Synchronize this image to image with *index* + +!!! note + **number** is the local variable while **number[*index*]** accesses the variable in a specific image. + **number[this_image()]** is the same as **number**. + +## Compile and Run + +Currently, version 1.8.10 compiled with OpenMPI 1.10.7 library is installed on Cluster. The OpenCoarrays module can be load as follows: + +```console +$ ml OpenCoarrays/1.8.10-GCC-6.3.0-2.27 +``` + +### Compile CAF Program + +The preferred method for compiling a CAF program is by invoking the *caf* compiler wrapper. +The above mentioned *Hello World* program can be compiled as follows: + +```console +$ caf hello_world.f90 -o hello_world.x +``` + +!!! warning + The input file extension **.f90** or **.F90** are to be interpreted as *Fortran 90*. + If the input file extension is **.f** or **.F** the source code will be interpreted as *Fortran 77*. + +Another method for compiling is by invoking the *mpif90* compiler wrapper directly: + +```console +$ mpif90 hello_world.f90 -o hello_world.x -fcoarray=lib -lcaf_mpi +``` + +### Run CAF Program + +A CAF program can be run by invoking the *cafrun* wrapper or directly by the *mpiexec*: + +```console +$ cafrun -np 4 ./hello_world.x + Hello world from image 1 of 4 + Hello world from image 2 of 4 + Hello world from image 3 of 4 + Hello world from image 4 of 4 + +$ mpiexec -np 4 ./synchronization_test.x + The random number is 242 + The random number is 242 + The random number is 242 + The random number is 242 +``` + +**-np 4** is number of images to run. The parameters of **cafrun** and **mpiexec** are the same. + +For more information about running CAF program please follow [Running OpenMPI](../mpi/Running_OpenMPI.md) diff --git a/docs.it4i/software/numerical-languages/r.md b/docs.it4i/software/numerical-languages/r.md new file mode 100644 index 0000000000000000000000000000000000000000..3bf569fe7ca99f4f25e99c11df2835ae8373552c --- /dev/null +++ b/docs.it4i/software/numerical-languages/r.md @@ -0,0 +1,405 @@ +# R + +## Introduction + +The R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. + +One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. + +Another convenience is the ease with which the C code or third party libraries may be integrated within R. + +Extensive support for parallel computing is available within R. + +Read more on <http://www.r-project.org/>, <http://cran.r-project.org/doc/manuals/r-release/R-lang.html> + +## Modules + +The R version 3.1.1 is available on the cluster, along with GUI interface Rstudio + +| Application | Version | module | +| ----------- | ----------------- | ------------------- | +| **R** | R 3.1.1 | R/3.1.1-intel-2015b | +| **Rstudio** | Rstudio 0.98.1103 | Rstudio | + +```console +$ ml R +``` + +## Execution + +The R on cluster is linked to highly optimized MKL mathematical library. This provides threaded parallelization to many R kernels, notably the linear algebra subroutines. The R runs these heavy calculation kernels without any penalty. By default, the R would parallelize to 24 (Salomon) or 16 (Anselm) threads. You may control the threads by setting the OMP_NUM_THREADS environment variable. + +### Interactive Execution + +To run R interactively, using Rstudio GUI, log in with ssh -X parameter for X11 forwarding. Run rstudio: + +```console +$ ml Rstudio +$ rstudio +``` + +### Batch Execution + +To run R in batch mode, write an R script, then write a bash jobscript and execute via the qsub command. By default, R will use 24 (Salomon) or 16 (Anselm) threads when running MKL kernels. + +Example jobscript: + +```bash +#!/bin/bash + +# change to local scratch directory +cd /lscratch/$PBS_JOBID || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/rscript.R . + +# load R module +module load R + +# execute the calculation +R CMD BATCH rscript.R routput.out + +# copy output file to home +cp routput.out $PBS_O_WORKDIR/. + +#exit +exit +``` + +This script may be submitted directly to the PBS workload manager via the qsub command. The inputs are in rscript.R file, outputs in routput.out file. See the single node jobscript example in the [Job execution section](../../job-submission-and-execution/). + +## Parallel R + +Parallel execution of R may be achieved in many ways. One approach is the implied parallelization due to linked libraries or specially enabled functions, as [described above](r/#interactive-execution). In the following sections, we focus on explicit parallelization, where parallel constructs are directly stated within the R script. + +## Package Parallel + +The package parallel provides support for parallel computation, including by forking (taken from package multicore), by sockets (taken from package snow) and random-number generation. + +The package is activated this way: + +```console +$ R +> library(parallel) +``` + +More information and examples may be obtained directly by reading the documentation available in R + +```r +> ?parallel +> library(help = "parallel") +> vignette("parallel") +``` + +Download the package [parallell](package-parallel-vignette.pdf) vignette. + +The forking is the most simple to use. Forking family of functions provide parallelized, drop in replacement for the serial apply() family of functions. + +!!! warning + Forking via package parallel provides functionality similar to OpenMP construct omp parallel for + + Only cores of single node can be utilized this way! + +Forking example: + +```r +library(parallel) + +#integrand function +f <- function(i,h) { +x <- h*(i-0.5) +return (4/(1 + x*x)) +} + +#initialize +size <- detectCores() + +while (TRUE) +{ + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + + if(n<=0) break + + #run the calculation + n <- max(n,size) + h <- 1.0/n + + i <- seq(1,n); + pi3 <- h*sum(simplify2array(mclapply(i,f,h,mc.cores=size))); + + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) +} +``` + +The above example is the classic parallel example for calculating the number π. Note the **detectCores()** and **mclapply()** functions. Execute the example as: + +```console +$ R --slave --no-save --no-restore -f pi3p.R +``` + +Every evaluation of the integrad function runs in parallel on different process. + +## Package Rmpi + +package Rmpi provides an interface (wrapper) to MPI APIs. + +It also provides interactive R slave environment. On the cluster, Rmpi provides interface to the [OpenMPI](../mpi/Running_OpenMPI/). + +Read more on Rmpi at <http://cran.r-project.org/web/packages/Rmpi/>, reference manual is available at <http://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf> + +When using package Rmpi, both openmpi and R modules must be loaded + +```console +$ ml OpenMPI +$ ml R +``` + +Rmpi may be used in three basic ways. The static approach is identical to executing any other MPI programm. In addition, there is Rslaves dynamic MPI approach and the mpi.apply approach. In the following section, we will use the number π integration example, to illustrate all these concepts. + +### Static Rmpi + +Static Rmpi programs are executed via mpiexec, as any other MPI programs. Number of processes is static - given at the launch time. + +Static Rmpi example: + +```r +library(Rmpi) + +#integrand function +f <- function(i,h) { +x <- h*(i-0.5) +return (4/(1 + x*x)) +} + +#initialize +invisible(mpi.comm.dup(0,1)) +rank <- mpi.comm.rank() +size <- mpi.comm.size() +n<-0 + +while (TRUE) +{ + #read number of intervals + if (rank==0) { + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + } + + #broadcat the intervals + n <- mpi.bcast(as.integer(n),type=1) + + if(n<=0) break + + #run the calculation + n <- max(n,size) + h <- 1.0/n + + i <- seq(rank+1,n,size); + mypi <- h*sum(sapply(i,f,h)); + + pi3 <- mpi.reduce(mypi) + + #print results + if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) +} + +mpi.quit() +``` + +The above is the static MPI example for calculating the number π. Note the **library(Rmpi)** and **mpi.comm.dup()** function calls. Execute the example as: + +```console +$ mpirun R --slave --no-save --no-restore -f pi3.R +``` + +### Dynamic Rmpi + +Dynamic Rmpi programs are executed by calling the R directly. OpenMPI module must be still loaded. The R slave processes will be spawned by a function call within the Rmpi program. + +Dynamic Rmpi example: + +```r +#integrand function +f <- function(i,h) { +x <- h*(i-0.5) +return (4/(1 + x*x)) +} + +#the worker function +workerpi <- function() +{ +#initialize +rank <- mpi.comm.rank() +size <- mpi.comm.size() +n<-0 + +while (TRUE) +{ + #read number of intervals + if (rank==0) { + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + } + + #broadcat the intervals + n <- mpi.bcast(as.integer(n),type=1) + + if(n<=0) break + + #run the calculation + n <- max(n,size) + h <- 1.0/n + + i <- seq(rank+1,n,size); + mypi <- h*sum(sapply(i,f,h)); + + pi3 <- mpi.reduce(mypi) + + #print results + if (rank==0) cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) +} +} + +#main +library(Rmpi) + +cat("Enter the number of slaves: ") +fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + +mpi.spawn.Rslaves(nslaves=ns) +mpi.bcast.Robj2slave(f) +mpi.bcast.Robj2slave(workerpi) + +mpi.bcast.cmd(workerpi()) +workerpi() + +mpi.quit() +``` + +The above example is the dynamic MPI example for calculating the number π. Both master and slave processes carry out the calculation. Note the mpi.spawn.Rslaves(), mpi.bcast.Robj2slave()** and the mpi.bcast.cmd()** function calls. + +Execute the example as: + +```console +$ mpirun -np 1 R --slave --no-save --no-restore -f pi3Rslaves.R +``` + +Note that this method uses MPI_Comm_spawn (Dynamic process feature of MPI-2) to start the slave processes - the master process needs to be launched with MPI. In general, Dynamic processes are not well supported among MPI implementations, some issues might arise. Also, environment variables are not propagated to spawned processes, so they will not see paths from modules. + +### mpi.apply Rmpi + +mpi.apply is a specific way of executing Dynamic Rmpi programs. + +mpi.apply() family of functions provide MPI parallelized, drop in replacement for the serial apply() family of functions. + +Execution is identical to other dynamic Rmpi programs. + +mpi.apply Rmpi example: + +```r +#integrand function +f <- function(i,h) { +x <- h*(i-0.5) +return (4/(1 + x*x)) +} + +#the worker function +workerpi <- function(rank,size,n) +{ + #run the calculation + n <- max(n,size) + h <- 1.0/n + + i <- seq(rank,n,size); + mypi <- h*sum(sapply(i,f,h)); + + return(mypi) +} + +#main +library(Rmpi) + +cat("Enter the number of slaves: ") +fp<-file("stdin"); ns<-scan(fp,nmax=1); close(fp) + +mpi.spawn.Rslaves(nslaves=ns) +mpi.bcast.Robj2slave(f) +mpi.bcast.Robj2slave(workerpi) + +while (TRUE) +{ + #read number of intervals + cat("Enter the number of intervals: (0 quits) ") + fp<-file("stdin"); n<-scan(fp,nmax=1); close(fp) + if(n<=0) break + + #run workerpi + i=seq(1,2*ns) + pi3=sum(mpi.parSapply(i,workerpi,2*ns,n)) + + #print results + cat(sprintf("Value of PI %16.14f, diff= %16.14fn",pi3,pi3-pi)) +} + +mpi.quit() +``` + +The above is the mpi.apply MPI example for calculating the number π. Only the slave processes carry out the calculation. Note the **mpi.parSapply()**, function call. The package parallel [example](r/#package-parallel) [above](r/#package-parallel) may be trivially adapted (for much better performance) to this structure using the mclapply() in place of mpi.parSapply(). + +Execute the example as: + +```console +$ mpirun -np 1 R --slave --no-save --no-restore -f pi3parSapply.R +``` + +## Combining Parallel and Rmpi + +Currently, the two packages can not be combined for hybrid calculations. + +## Parallel Execution + +The R parallel jobs are executed via the PBS queue system exactly as any other parallel jobs. User must create an appropriate jobscript and submit via the **qsub** + +Example jobscript for [static Rmpi](r/#static-rmpi) parallel R execution, running 1 process per core: + +```bash +#!/bin/bash +#PBS -q qprod +#PBS -N Rjob +#PBS -l select=100:ncpus=24:mpiprocs=24:ompthreads=1 # Anselm: ncpus=16:mpiprocs=16 + +# change to scratch directory +SCRDIR=/scratch/work/user/$USER/myjob # Anselm: SCRDIR=/scratch/$USER/myjob +cd $SCRDIR || exit + +# copy input file to scratch +cp $PBS_O_WORKDIR/rscript.R . + +# load R and openmpi module +module load R +module load OpenMPI + +# execute the calculation +mpiexec -bycore -bind-to-core R --slave --no-save --no-restore -f rscript.R + +# copy output file to home +cp routput.out $PBS_O_WORKDIR/. + +#exit +exit +``` + +For more information about jobscripts and MPI execution refer to the [Job submission](../../anselm/job-submission-and-execution/) and general [MPI](../mpi/mpi/) sections. + +## Xeon Phi Offload + +By leveraging MKL, R can accelerate certain computations, most notably linear algebra operations on the Xeon Phi accelerator by using Automated Offload. To use MKL Automated Offload, you need to first set this environment variable before R execution: + +```console +$ export MKL_MIC_ENABLE=1 +``` + +[Read more about automatic offload](../intel-xeon-phi/) diff --git a/mkdocs.yml b/mkdocs.yml index ef4b539168f9913be25a4479ef7b3b01b5b78f6c..466cf0cc790ebd2eaa06a18c3766a780ca5dfa6d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -64,6 +64,13 @@ pages: - Singularity Container: software/singularity.md - EasyBuild: software/easybuild.md - Spack: software/spack.md + - 'Numerical languages': + - Introduction: software/numerical-languages/introduction.md + - R: software/numerical-languages/r.md + - Matlab: software/numerical-languages/matlab.md + - Matlab 2013-2014: software/numerical-languages/matlab_1314.md + - Octave: software/numerical-languages/octave.md + - OpenCoarrays: software/numerical-languages/opencoarrays.md - Bioinformatics: software/bioinformatics.md - Java: software/java.md - Salomon Software: