Commit 1d1e191a authored by Lukáš Krupčík's avatar Lukáš Krupčík

fix

parent c6f8891c
Pipeline #5127 passed with stages
in 59 seconds
......@@ -306,7 +306,7 @@ In this example, we submit a job array of 31 subjobs. Note the -J 1-992:**32**,
## Examples
Download the examples in [capacity.zip](anselm/capacity.zip), illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Download the examples in [capacity.zip](capacity.zip), illustrating the above listed ways to run a huge number of jobs. We recommend trying out the examples before using this for running production jobs.
Unzip the archive in an empty directory on Anselm and follow the instructions in the README file
......
# Resource Allocation and Job Execution
To run a [job](anselm/job-submission-and-execution/), [computational resources](anselm/resources-allocation-policy/) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which efficiently distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [official documentation here](pbspro/), especially in the PBS Pro User's Guide.
## Resource Allocation Policy
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](anselm/job-priority/) system of Anselm ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. The following queues are available to Anselm users:
* **qexp**, the Express queue
* **qprod**, the Production queue
* **qlong**, the Long queue, regula
* **qnvidia**, **qmic**, **qfat**, the Dedicated queues
* **qfree**, the Free resource utilization queue
!!! note
Check the queue status at <https://extranet.it4i.cz/anselm/>
Read more on the [Resource AllocationPolicy](anselm/resources-allocation-policy/) page.
## Job Submission and Execution
!!! note
Use the **qsub** command to submit your jobs.
The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is an entire node, 16 cores**, with the exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on the first of the allocated nodes.**
Read more on the [Job submission and execution](anselm/job-submission-and-execution/) page.
## Capacity Computing
!!! note
Use Job arrays when running a huge number of jobs.
Use GNU Parallel and/or Job arrays when running (many) single core jobs.
In many cases, it is useful to submit a huge (100+) number of computational jobs into the PBS queue system. A huge number of (small) jobs is one of the most effective ways to execute embarrassingly parallel calculations, achieving the best runtime, throughput, and computer utilization. In this chapter, we discuss the the recommended way to run a huge number of jobs, including **ways to run a huge number of single core jobs**.
Read more on the [Capacity computing](anselm/capacity-computing/) page.
# Allinea Forge (DDT,MAP)
Allinea Forge consist of two tools - debugger DDT and profiler MAP.
Allinea DDT, is a commercial debugger primarily for debugging parallel MPI or OpenMP programs. It also has a support for GPU (CUDA) and Intel Xeon Phi accelerators. DDT provides all the standard debugging features (stack trace, breakpoints, watches, view variables, threads etc.) for every thread running as part of your program, or for every process - even if these processes are distributed across a cluster using an MPI implementation.
Allinea MAP is a profiler for C/C++/Fortran HPC codes. It is designed for profiling parallel code, which uses Pthreads, OpenMP or MPI.
## License and Limitations for Anselm Users
On Anselm users can debug OpenMP or MPI code that runs up to 64 parallel processes. In case of debugging GPU or Xeon Phi accelerated codes the limit is 8 accelerators. These limitation means that:
* 1 user can debug up 64 processes, or
* 32 users can debug 2 processes, etc.
In case of debugging on accelerators:
* 1 user can debug on up to 8 accelerators, or
* 8 users can debug on single accelerator.
## Compiling Code to Run With DDT
### Modules
Load all necessary modules to compile the code. For example:
```bash
$ ml intel
$ ml impi ... or ... ml openmpi/X.X.X-icc
```
Load the Allinea DDT module:
```bash
$ ml Forge
```
Compile the code:
```bash
$ mpicc -g -O0 -o test_debug test.c
$ mpif90 -g -O0 -o test_debug test.f
```
### Compiler Flags
Before debugging, you need to compile your code with theses flags:
!!! note
\* **g** : Generates extra debugging information usable by GDB. -g3 includes even more debugging information. This option is available for GNU and INTEL C/C++ and Fortran compilers.
\* **O0** : Suppress all optimizations.
## Starting a Job With DDT
Be sure to log in with an X window forwarding enabled. This could mean using the -X in the ssh:
```bash
$ ssh -X username@anselm.it4i.cz
```
Other options is to access login node using VNC. Please see the detailed information on how to [use graphic user interface on Anselm](general/accessing-the-clusters/graphical-user-interface/x-window-system/)
From the login node an interactive session **with X windows forwarding** (-X option) can be started by following command:
```bash
$ qsub -I -X -A NONE-0-0 -q qexp -lselect=1:ncpus=16:mpiprocs=16,walltime=01:00:00
```
Then launch the debugger with the ddt command followed by the name of the executable to debug:
```bash
$ ddt test_debug
```
A submission window that appears have a prefilled path to the executable to debug. You can select the number of MPI processors and/or OpenMP threads on which to run and press run. Command line arguments to a program can be entered to the "Arguments " box.
![](../../../img/ddt1.png)
To start the debugging directly without the submission window, user can specify the debugging and execution parameters from the command line. For example the number of MPI processes is set by option "-np 4". Skipping the dialog is done by "-start" option. To see the list of the "ddt" command line parameters, run "ddt --help".
```bash
ddt -start -np 4 ./hello_debug_impi
```
## Documentation
Users can find original User Guide after loading the DDT module:
```bash
$DDTPATH/doc/userguide.pdf
```
[1] Discipline, Magic, Inspiration and Science: Best Practice Debugging with Allinea DDT, Workshop conducted at LLNL by Allinea on May 10, 2013, [link](https://computing.llnl.gov/tutorials/allineaDDT/index.html)
# Allinea Performance Reports
## Introduction
Allinea Performance Reports characterize the performance of HPC application runs. After executing your application through the tool, a synthetic HTML report is generated automatically, containing information about several metrics along with clear behavior statements and hints to help you improve the efficiency of your runs.
The Allinea Performance Reports is most useful in profiling MPI programs.
Our license is limited to 64 MPI processes.
## Modules
Allinea Performance Reports version 6.0 is available
```bash
$ ml PerformanceReports/6.0
```
The module sets up environment variables, required for using the Allinea Performance Reports. This particular command loads the default module, which is performance reports version 4.2.
## Usage
!!! note
Use the the perf-report wrapper on your (MPI) program.
Instead of [running your MPI program the usual way](anselm/software/mpi/), use the the perf report wrapper:
```bash
$ perf-report mpirun ./mympiprog.x
```
The MPI program will run as usual. The perf-report creates two additional files, in \*.txt and \*.html format, containing the performance report. Note that [demanding MPI codes should be run within the queue system](anselm/job-submission-and-execution/).
## Example
In this example, we will be profiling the mympiprog.x MPI program, using Allinea performance reports. Assume that the code is compiled with Intel compilers and linked against Intel MPI library:
First, we allocate some nodes via the express queue:
```bash
$ qsub -q qexp -l select=2:ncpus=16:mpiprocs=16:ompthreads=1 -I
qsub: waiting for job 262197.dm2 to start
qsub: job 262197.dm2 ready
```
Then we load the modules and run the program the usual way:
```bash
$ ml intel impi allinea-perf-report/4.2
$ mpirun ./mympiprog.x
```
Now lets profile the code:
```bash
$ perf-report mpirun ./mympiprog.x
```
Performance report files [mympiprog_32p\*.txt](src/mympiprog_32p_2014-10-15_16-56.txt) and [mympiprog_32p\*.html](src/mympiprog_32p_2014-10-15_16-56.html) were created. We can see that the code is very efficient on MPI and is CPU bounded.
# Intel Compilers
The Intel compilers version 13.1.1 are available, via module Intel. The compilers include the ICC C and C++ compiler and the IFORT Fortran 77/90/95 compiler.
```bash
$ ml intel
$ icc -v
$ ifort -v
```
The Intel compilers provide for vectorization of the code, via the AVX instructions and support threading parallelization via OpenMP
For maximum performance on the Anselm cluster, compile your programs using the AVX instructions, with reporting where the vectorization was used. We recommend following compilation options for high performance
```bash
$ icc -ipo -O3 -vec -xAVX -vec-report1 myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -vec -xAVX -vec-report1 myprog.f mysubroutines.f -o myprog.x
```
In this example, we compile the program enabling interprocedural optimizations between source files (-ipo), aggressive loop optimizations (-O3) and vectorization (-vec -xAVX)
The compiler recognizes the omp, simd, vector and ivdep pragmas for OpenMP parallelization and AVX vectorization. Enable the OpenMP parallelization by the **-openmp** compiler switch.
```bash
$ icc -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.c mysubroutines.c -o myprog.x
$ ifort -ipo -O3 -vec -xAVX -vec-report1 -openmp myprog.f mysubroutines.f -o myprog.x
```
Read more at [here](http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm)
## Sandy Bridge/Haswell Binary Compatibility
Anselm nodes are currently equipped with Sandy Bridge CPUs, while Salomon will use Haswell architecture. The new processors are backward compatible with the Sandy Bridge nodes, so all programs that ran on the Sandy Bridge processors, should also run on the new Haswell nodes. To get optimal performance out of the Haswell processors a program should make use of the special AVX2 instructions for this processor. One can do this by recompiling codes with the compiler flags designated to invoke these instructions. For the Intel compiler suite, there are two ways of doing this:
* Using compiler flag (both for Fortran and C): **-xCORE-AVX2**. This will create a binary with AVX2 instructions, specifically for the Haswell processors. Note that the executable will not run on Sandy Bridge nodes.
* Using compiler flags (both for Fortran and C): **-xAVX -axCORE-AVX2**. This will generate multiple, feature specific auto-dispatch code paths for Intel® processors, if there is a performance benefit. So this binary will run both on Sandy Bridge and Haswell processors. During runtime it will be decided which path to follow, dependent on which processor you are running on. In general this will result in larger binaries.
# Intel Debugger
## Debugging Serial Applications
The intel debugger version 13.0 is available, via module intel. The debugger works for applications compiled with C and C++ compiler and the ifort fortran 77/90/95 compiler. The debugger provides java GUI environment. Use X display for running the GUI.
```bash
$ ml intel
$ idb
```
The debugger may run in text mode. To debug in text mode, use
```bash
$ idbc
```
To debug on the compute nodes, module intel must be loaded. The GUI on compute nodes may be accessed using the same way as in the GUI section
Example:
```bash
$ qsub -q qexp -l select=1:ncpus=16 -X -I
qsub: waiting for job 19654.srv11 to start
qsub: job 19654.srv11 ready
$ ml intel
$ ml java
$ icc -O0 -g myprog.c -o myprog.x
$ idb ./myprog.x
```
In this example, we allocate 1 full compute node, compile program myprog.c with debugging options -O0 -g and run the idb debugger interactively on the myprog.x executable. The GUI access is via X11 port forwarding provided by the PBS workload manager.
## Debugging Parallel Applications
Intel debugger is capable of debugging multithreaded and MPI parallel programs as well.
### Small Number of MPI Ranks
For debugging small number of MPI ranks, you may execute and debug each rank in separate xterm terminal (do not forget the X display. Using Intel MPI, this may be done in following way:
```bash
$ qsub -q qexp -l select=2:ncpus=16 -X -I
qsub: waiting for job 19654.srv11 to start
qsub: job 19655.srv11 ready
$ ml intel impi
$ mpirun -ppn 1 -hostfile $PBS_NODEFILE --enable-x xterm -e idbc ./mympiprog.x
```
In this example, we allocate 2 full compute node, run xterm on each node and start idb debugger in command line mode, debugging two ranks of mympiprog.x application. The xterm will pop up for each rank, with idb prompt ready. The example is not limited to use of Intel MPI
### Large Number of MPI Ranks
Run the idb debugger from within the MPI debug option. This will cause the debugger to bind to all ranks and provide aggregated outputs across the ranks, pausing execution automatically just after startup. You may then set break points and step the execution manually. Using Intel MPI:
```bash
$ qsub -q qexp -l select=2:ncpus=16 -X -I
qsub: waiting for job 19654.srv11 to start
qsub: job 19655.srv11 ready
$ ml intel impi
$ mpirun -n 32 -idb ./mympiprog.x
```
### Debugging Multithreaded Application
Run the idb debugger in GUI mode. The menu Parallel contains number of tools for debugging multiple threads. One of the most useful tools is the **Serialize Execution** tool, which serializes execution of concurrent threads for easy orientation and identification of concurrency related bugs.
## Further Information
Exhaustive manual on IDB features and usage is published at [Intel website](http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/debugger/user_guide/index.htm)
# MPI
## Setting Up MPI Environment
The Anselm cluster provides several implementations of the MPI library:
| MPI Library | Thread support |
| ---------------------------------------------------- | --------------------------------------------------------------- |
| The highly optimized and stable **bullxmpi 1.2.4.1** | Partial thread support up to MPI_THREAD_SERIALIZED |
| The **Intel MPI 4.1** | Full thread support up to MPI_THREAD_MULTIPLE |
| The [OpenMPI 1.6.5](http://www.open-mpi.org) | Full thread support up to MPI_THREAD_MULTIPLE, BLCR c/r support |
| The OpenMPI 1.8.1 | Full thread support up to MPI_THREAD_MULTIPLE, MPI-3.0 support |
| The **mpich2 1.9** | Full thread support up to MPI_THREAD_MULTIPLE, BLCR c/r support |
MPI libraries are activated via the environment modules.
Look up section modulefiles/mpi in ml av
```bash
$ ml av
------------------------- /opt/modules/modulefiles/mpi -------------------------
bullxmpi/bullxmpi-1.2.4.1 mvapich2/1.9-icc
impi/4.0.3.008 openmpi/1.6.5-gcc(default)
impi/4.1.0.024 openmpi/1.6.5-gcc46
impi/4.1.0.030 openmpi/1.6.5-icc
impi/4.1.1.036(default) openmpi/1.8.1-gcc
openmpi/1.8.1-gcc46
mvapich2/1.9-gcc(default) openmpi/1.8.1-gcc49
mvapich2/1.9-gcc46 openmpi/1.8.1-icc
```
There are default compilers associated with any particular MPI implementation. The defaults may be changed, the MPI libraries may be used in conjunction with any compiler. The defaults are selected via the modules in following way
| Module | MPI | Compiler suite |
| ------------ | ---------------- | ------------------------------------------------------------------------------ |
| PrgEnv-gnu | bullxmpi-1.2.4.1 | bullx GNU 4.4.6 |
| PrgEnv-intel | Intel MPI 4.1.1 | Intel 13.1.1 |
| bullxmpi | bullxmpi-1.2.4.1 | none, select via module |
| impi | Intel MPI 4.1.1 | none, select via module |
| openmpi | OpenMPI 1.6.5 | GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers |
| openmpi | OpenMPI 1.8.1 | GNU compilers 4.8.1, GNU compilers 4.4.6, GNU compilers 4.9.0, Intel Compilers |
| mvapich2 | MPICH2 1.9 | GNU compilers 4.8.1, GNU compilers 4.4.6, Intel Compilers |
Examples:
```bash
$ ml openmpi
```
In this example, we activate the latest openmpi with latest GNU compilers
To use openmpi with the intel compiler suite, use
```bash
$ ml intel
$ ml openmpi/1.6.5-icc
```
In this example, the openmpi 1.6.5 using intel compilers is activated
## Compiling MPI Programs
!!! note
After setting up your MPI environment, compile your program using one of the mpi wrappers
```bash
$ mpicc -v
$ mpif77 -v
$ mpif90 -v
```
Example program:
```cpp
// helloworld_mpi.c
#include <stdio.h>
#include<mpi.h>
int main(int argc, char **argv) {
int len;
int rank, size;
char node[MPI_MAX_PROCESSOR_NAME];
// Initiate MPI
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
// Get hostame and print
MPI_Get_processor_name(node,&len);
printf("Hello world! from rank %d of %d on host %sn",rank,size,node);
// Finalize and exit
MPI_Finalize();
return 0;
}
```
Compile the above example with
```bash
$ mpicc helloworld_mpi.c -o helloworld_mpi.x
```
## Running MPI Programs
!!! note
The MPI program executable must be compatible with the loaded MPI module.
Always compile and execute using the very same MPI module.
It is strongly discouraged to mix MPI implementations. Linking an application with one MPI implementation and running mpirun/mpiexec form other implementation may result in unexpected errors.
The MPI program executable must be available within the same path on all nodes. This is automatically fulfilled on the /home and /scratch file system. You need to preload the executable, if running on the local scratch /lscratch file system.
### Ways to Run MPI Programs
Optimal way to run an MPI program depends on its memory requirements, memory access pattern and communication pattern.
!!! note
Consider these ways to run an MPI program:
1. One MPI process per node, 16 threads per process
2. Two MPI processes per node, 8 threads per process
3. 16 MPI processes per node, 1 thread per process.
**One MPI** process per node, using 16 threads, is most useful for memory demanding applications, that make good use of processor cache memory and are not memory bound. This is also a preferred way for communication intensive applications as one process per node enjoys full bandwidth access to the network interface.
**Two MPI** processes per node, using 8 threads each, bound to processor socket is most useful for memory bandwidth bound applications such as BLAS1 or FFT, with scalable memory demand. However, note that the two processes will share access to the network interface. The 8 threads and socket binding should ensure maximum memory access bandwidth and minimize communication, migration and NUMA effect overheads.
!!! note
Important! Bind every OpenMP thread to a core!
In the previous two cases with one or two MPI processes per node, the operating system might still migrate OpenMP threads between cores. You want to avoid this by setting the KMP_AFFINITY or GOMP_CPU_AFFINITY environment variables.
**16 MPI** processes per node, using 1 thread each bound to processor core is most suitable for highly scalable applications with low communication demand.
### Running OpenMPI
The **bullxmpi-1.2.4.1** and [**OpenMPI 1.6.5**](http://www.open-mpi.org/) are both based on OpenMPI. Read more on [how to run OpenMPI](software/Running_OpenMPI/) based MPI.
### Running MPICH2
The **Intel MPI** and **mpich2 1.9** are MPICH2 based implementations. Read more on [how to run MPICH2](software/running-mpich2/) based MPI.
The Intel MPI may run on the Intel Xeon Phi accelerators as well. Read more on [how to run Intel MPI on accelerators](software/intel/intel-xeon-phi-anselm/).
# Graphical User Interface
## X Window System
The X Window system is a principal way to get GUI access to the clusters.
Read more about configuring [X Window System](general/accessing-the-clusters/graphical-user-interface/x-window-system/).
## VNC
The **Virtual Network Computing** (**VNC**) is a graphical [desktop sharing](http://en.wikipedia.org/wiki/Desktop_sharing "Desktop sharing") system that uses the [Remote Frame Buffer protocol (RFB)](http://en.wikipedia.org/wiki/RFB_protocol "RFB protocol") to remotely control another [computer](http://en.wikipedia.org/wiki/Computer "Computer").
Read more about configuring [VNC](general/accessing-the-clusters/graphical-user-interface/vnc/).
# Accessing the Clusters
The IT4Innovations clusters are accessed by SSH protocol via login nodes.
!!! note
Read more on [Accessing the Salomon Cluster](salomon/shell-and-data-access.md) or [Accessing the Anselm Cluster](nselm/shell-and-data-access.md) pages.
## PuTTY
On **Windows**, use [PuTTY ssh client](general/accessing-the-clusters/shell-access-and-data-transfer/putty/).
## SSH Keys
Read more about [SSH keys management](general/accessing-the-clusters/shell-access-and-data-transfer/ssh-keys/).
## Graphical User Interface
Read more about [X Window System](general/accessing-the-clusters/graphical-user-interface/x-window-system/).
Read more about [Virtual Network Computing (VNC)](general/accessing-the-clusters/graphical-user-interface/vnc/).
## Accessing IT4Innovations Internal Resources via VPN
Read more about [VPN Access](general/accessing-the-clusters/vpn-access/).
# Resource Allocation and Job Execution
To run a [job](salomon/job-submission-and-execution/), [computational resources](salomon/resources-allocation-policy/) for this particular job must be allocated. This is done via the PBS Pro job workload manager software, which efficiently distributes workloads across the supercomputer. Extensive information about PBS Pro can be found in the [official documentation here](software/pbspro/), especially in the PBS Pro User's Guide.
## Resources Allocation Policy
The resources are allocated to the job in a fair-share fashion, subject to constraints set by the queue and resources available to the Project. [The Fair-share](salomon/job-priority/) at Salomon ensures that individual users may consume approximately equal amount of resources per week. The resources are accessible via several queues for queueing the jobs. The queues provide prioritized and exclusive access to the computational resources. Following queues are available to Anselm users:
* **qexp**, the Express queue
* **qprod**, the Production queue
* **qlong**, the Long queue
* **qmpp**, the Massively parallel queue
* **qmic**, the 864 MIC nodes queue
* **qfat**, the queue to access SMP UV2000 machine
* **qfree**, the Free resource utilization queue
!!! note
Check the queue status at [https://extranet.it4i.cz/rsweb/salomon/](https://extranet.it4i.cz/rsweb/salomon/)
Read more on the [Resource Allocation Policy](salomon/resources-allocation-policy/) page.
## Job Submission and Execution
!!! note
Use the **qsub** command to submit your jobs.
The qsub submits the job into the queue. The qsub command creates a request to the PBS Job manager for allocation of specified resources. The **smallest allocation unit is entire node, 24 cores**, with exception of the qexp queue. The resources will be allocated when available, subject to allocation policies and constraints. **After the resources are allocated the jobscript or interactive shell is executed on first of the allocated nodes.**
Read more on the [Job submission and execution](salomon/job-submission-and-execution/) page.
# ANSYS CFX
[ANSYS CFX](http://www.ansys.com/products/fluids/ansys-cfx) software is a high-performance, general purpose fluid dynamics program that has been applied to solve wide-ranging fluid flow problems for over 20 years. At the heart of ANSYS CFX is its advanced solver technology, the key to achieving reliable and accurate solutions quickly and robustly. The modern, highly parallelized solver is the foundation for an abundant choice of physical models to capture virtually any type of phenomena related to fluid flow. The solver and its many physical models are wrapped in a modern, intuitive, and flexible GUI and user environment, with extensive capabilities for customization and automation using session files, scripting and a powerful expression language.
To run ANSYS CFX in batch mode you can utilize/modify the default cfx.pbs script and execute it via the qsub command.
```bash
#!/bin/bash
#PBS -l nodes=2:ppn=16
#PBS -q qprod
#PBS -N $USER-CFX-Project
#PBS -A XX-YY-ZZ
#! Mail to user when job terminate or abort
#PBS -m ae
#!change the working directory (default is home directory)
#cd <working directory> (working directory must exists)
WORK_DIR="/scratch/$USER/work"
cd $WORK_DIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `cat $PBS_NODEFILE`
ml ansys
#### Set number of processors per host listing
#### (set to 1 as $PBS_NODEFILE lists each node twice if :ppn=2)
procs_per_host=1
#### Create host list
hl=""
for host in `cat $PBS_NODEFILE`
do
if ["$hl" = "" ]
then hl="$host:$procs_per_host"
else hl="${hl}:$host:$procs_per_host"
fi
done
echo Machines: $hl
#-dev input.def includes the input of CFX analysis in DEF format
#-P the name of prefered license feature (aa_r=ANSYS Academic Research, ane3fl=Multiphysics(commercial))
/ansys_inc/v145/CFX/bin/cfx5solve -def input.def -size 4 -size-ni 4x -part-large -start-method "Platform MPI Distributed Parallel" -par-dist $hl -P aa_r
```
Header of the pbs file (above) is common and description can be find on [this site](salomon/job-submission-and-execution/). SVS FEM recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources.
Working directory has to be created before sending PBS job into the queue. Input file should be in working directory or full path to input file has to be specified. >Input file has to be defined by common CFX def file which is attached to the CFX solver via parameter -def
**License** should be selected by parameter -P (Big letter **P**). Licensed products are the following: aa_r (ANSYS **Academic** Research), ane3fl (ANSYS Multiphysics)-**Commercial**.
# ANSYS Fluent
[ANSYS Fluent](http://www.ansys.com/products/fluids/ansys-fluent)
software contains the broad physical modeling capabilities needed to model flow, turbulence, heat transfer, and reactions for industrial applications ranging from air flow over an aircraft wing to combustion in a furnace, from bubble columns to oil platforms, from blood flow to semiconductor manufacturing, and from clean room design to wastewater treatment plants. Special models that give the software the ability to model in-cylinder combustion, aeroacoustics, turbomachinery, and multiphase systems have served to broaden its reach.
1. Common way to run Fluent over PBS file
To run ANSYS Fluent in batch mode you can utilize/modify the default fluent.pbs script and execute it via the qsub command.
```bash
#!/bin/bash
#PBS -S /bin/bash
#PBS -l nodes=2:ppn=16
#PBS -q qprod
#PBS -N $USER-Fluent-Project
#PBS -A XX-YY-ZZ
#! Mail to user when job terminate or abort
#PBS -m ae
#!change the working directory (default is home directory)
#cd <working directory> (working directory must exists)
WORK_DIR="/scratch/$USER/work"
cd $WORK_DIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `cat $PBS_NODEFILE`
#### Load ansys module so that we find the cfx5solve command
ml ansys
# Use following line to specify MPI for message-passing instead
NCORES=`wc -l $PBS_NODEFILE |awk '{print $1}'`
/ansys_inc/v145/fluent/bin/fluent 3d -t$NCORES -cnf=$PBS_NODEFILE -g -i fluent.jou
```
Header of the pbs file (above) is common and description can be find on [this site](salomon/resources-allocation-policy/). [SVS FEM](http://www.svsfem.cz) recommends to utilize sources by keywords: nodes, ppn. These keywords allows to address directly the number of nodes (computers) and cores (ppn) which will be utilized in the job. Also the rest of code assumes such structure of allocated resources.
Working directory has to be created before sending pbs job into the queue. Input file should be in working directory or full path to input file has to be specified. Input file has to be defined by common Fluent journal file which is attached to the Fluent solver via parameter -i fluent.jou
Journal file with definition of the input geometry and boundary conditions and defined process of solution has e.g. the following structure:
```bash
/file/read-case aircraft_2m.cas.gz
/solve/init
init
/solve/iterate
10
/file/write-case-dat aircraft_2m-solution
/exit yes
```
The appropriate dimension of the problem has to be set by parameter (2d/3d).
1. Fast way to run Fluent from command line
```bash
fluent solver_version [FLUENT_options] -i journal_file -pbs
```
This syntax will start the ANSYS FLUENT job under PBS Professional using the qsub command in a batch manner. When resources are available, PBS Professional will start the job and return a job ID, usually in the form of _job_ID.hostname_. This job ID can then be used to query, control, or stop the job using standard PBS Professional commands, such as qstat or qdel. The job will be run out of the current working directory, and all output will be written to the file fluent.o _job_ID_.
1. Running Fluent via user's config file
The sample script uses a configuration file called pbs_fluent.conf if no command line arguments are present. This configuration file should be present in the directory from which the jobs are submitted (which is also the directory in which the jobs are executed). The following is an example of what the content of pbs_fluent.conf can be:
```bash
input="example_small.flin"
case="Small-1.65m.cas"
fluent_args="3d -pmyrinet"
outfile="fluent_test.out"
mpp="true"
```
The following is an explanation of the parameters:
input is the name of the input file.
case is the name of the .cas file that the input file will utilize.
fluent_args are extra ANSYS FLUENT arguments. As shown in the previous example, you can specify the interconnect by using the -p interconnect command. The available interconnects include ethernet (the default), myrinet, Infiniband, vendor, altix, and crayx. The MPI is selected automatically, based on the specified interconnect.
outfile is the name of the file to which the standard output will be sent.
mpp="true" will tell the job script to execute the job across multiple processors.
To run ANSYS Fluent in batch mode with user's config file you can utilize/modify the following script and execute it via the qsub command.
```bash
#!/bin/sh
#PBS -l nodes=2:ppn=4
#PBS -1 qprod
#PBS -N $USE-Fluent-Project
#PBS -A XX-YY-ZZ
cd $PBS_O_WORKDIR
#We assume that if they didn’t specify arguments then they should use the
#config file if ["xx${input}${case}${mpp}${fluent_args}zz" = "xxzz" ]; then
if [ -f pbs_fluent.conf ]; then
. pbs_fluent.conf
else
printf "No command line arguments specified, "
printf "and no configuration file found. Exiting n"
fi
fi
#Augment the ANSYS FLUENT command line arguments case "$mpp" in
true)
#MPI job execution scenario
num_nodes=‘cat $PBS_NODEFILE | sort -u | wc -l
cpus=‘expr $num_nodes * $NCPUS
#Default arguments for mpp jobs, these should be changed to suit your
#needs.
fluent_args="-t${cpus} $fluent_args -cnf=$PBS_NODEFILE"
;;
*)
#SMP case
#Default arguments for smp jobs, should be adjusted to suit your
#needs.
fluent_args="-t$NCPUS $fluent_args"
;;
esac
#Default arguments for all jobs