Skip to content
Snippets Groups Projects
Commit 8ba253f3 authored by Jan Siwiec's avatar Jan Siwiec
Browse files

Merge branch 'tf-gpu' into 'master'

Add section about using TensorFlow with GPUs

See merge request sccs/docs.it4i.cz!411
parents ede3d82f 6140f612
Branches
No related tags found
1 merge request!411Add section about using TensorFlow with GPUs
Pipeline #28722 passed with warnings
......@@ -21,7 +21,7 @@ For more information, see the [official website][d] or [GitHub][e].
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. For more information, see the [official website][a].
For the list of available versions, see the [TensorFlow][1] section:
For more information see the [TensorFlow][1] section.
## Theano
......
# TensorFlow
TensorFlow is an open-source software library for machine intelligence.
For searching available modules type:
TensorFlow (TF) is an open-source software library which can compile tensor operations to execute
very quickly on both CPUs and GPUs. It is often used as a backend for machine learning libraries
and models.
We heavily recommend the usage of `TensorFlow 2.x`. TensorFlow 1 has been long deprecated and it
will probably be difficult to make it run on GPUs on our clusters.
## Installation
For TensorFlow to work with GPUs, you have to use several libraries (CUDA, cuDNN, NCCL etc.)
with versions that are compatible together.
You can load the correct modules with the following command:
```console
$ ml av Tensorflow
$ ml TensorFlow
```
<!---
If you want to upgrade the TensorFlow version used in this package or install additional Python
modules, you can simply create a virtual environment and install a different TensorFlow version
inside it:
```console
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ python3 -m pip install -U setuptools wheel pip
(venv) $ python3 -m pip install tensorflow
```
However, if you use a newer TensorFlow version than the one included in the `TensorFlow` module,
you should make sure that it is still compatible with the CUDA version provided by the module.
You can find the required `CUDA`/`cuDNN` versions for the latest TF
[here](https://www.tensorflow.org/install/pip).
## Salomon Modules
## TensorFlow Example
Salomon provides (besides other) these TensorFlow modules:
After loading TensorFlow, you can check its functionality by running the following Python script.
**Tensorflow/1.1.0** (not recommended), module built with:
```python
import tensorflow as tf
* GCC/4.9.3
* Python/3.6.1
a = tf.constant([1, 2, 3])
b = tf.constant([2, 4, 6])
c = a + b
print(c.numpy())
```
**Tensorflow/1.2.0-GCC-7.1.0-2.28** (default, recommended), module built with:
## Using TensorFlow With GPUs
* TensorFlow 1.2 with SIMD support. TensorFlow build taking advantage of the Salomon CPU architecture.
* GCC/7.1.0-2.28
* Python/3.6.1
* protobuf/3.2.0-GCC-7.1.0-2.28-Python-3.6.1
With TensorFlow, you can leverage either a single GPU or multiple GPUs in a single process, to e.g.
train neural networks much faster.
-->
Using the available `TensorFlow` module should make sure that these modules will be loaded correctly.
### Selecting GPUs
You can select how many and which (NVIDIA) GPUs will be used by TensorFlow with the
`CUDA_VISIBLE_DEVICES` environment variable.
```console
# Do not use any GPUs
$ CUDA_VISIBLE_DEVICES=-1 python3 my_script.py
# Use a single GPU with ID 0
$ CUDA_VISIBLE_DEVICES=0 python3 my_script.py
# Use multiple GPUs
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 my_script.py
```
## TensorFlow Application Example
By default, if you do not specify the environment variable, all available GPUs will be used by
TensorFlow.
After loading one of the available TensorFlow modules, you can check the functionality by running the following Python script.
### Multi-GPU TensorFlow Example
This script uses `keras` and `TensorFlow` to train a simple neural network on the
[MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. It assumes that you have
`tensorflow` (2.x), `keras` and `tensorflow_datasets` Python packages installed. The training
is performed on multiple GPUs.
```python
import tensorflow_datasets as tfds
import tensorflow as tf
c = tf.constant('Hello World!')
sess = tf.Session()
print(sess.run(c))
datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
# Use NCCL reduction if NCCL is available, it should be the most efficient strategy
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce())
# Different reduction strategy, use if NCCL causes errors
# strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples
BUFFER_SIZE = 10000
BATCH_SIZE_PER_REPLICA = 64
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)
# The following line makes sure that the model will run on multiple GPUs (if they are available)
# Without `strategy.scopy()`, the model would only be trained on a single GPU
with strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
model.fit(train_dataset, epochs=100)
```
!!! note
If using the `NCCL` strategy causes runtime errors, try to run your application with the
environment variable `TF_FORCE_GPU_ALLOW_GROWTH` set to `true`.
!!! tip
For real-world multi-GPU training, it might be better to use a dedicated multi-GPU framework such
as [Horovod](https://github.com/horovod/horovod).
<!---
2022-10-14
Add multi-GPU example script.
2021-04-08
It's necessary to load the correct NumPy module along with the Tensorflow one.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment