Newer
Older
# CI/CD
## Introduction
Continuous Integration (CI) is the practice of automatically executing a compilation script and set of test cases to ensure that the integrated codebase is in a workable state. The integration is often followed by Continuous Benchmarking (CB) to evaluate the impact of the code change on the application performance and Continuous Deployment (CD) to distribute a new version of the developed code.
IT4I offers its users a possibility to set up CI for their projects and to execute their dedicated CI jobs directly in computational nodes of the production HPC clusters (Karolina, Barbora) and Complementary systems. The Complementary systems gives a possibility to run the tests on emerging, non-traditional, and highly specialized hardware architectures. It consists of computational nodes built on Intel Sapphire Rapids + HBM, NVIDIA Grace CPU, IBM Power10, A64FX, and many more.
Besides that, there is also a possibility to execute CI jobs in a customizable virtual environment (Docker containers). This allows to test the code in a clean build environment. It also makes dependency management more straight-forward since all dependencies for building the project can be put in the Docker image, from which the corresponding containers are created.
## CI Infrastructure Deployed at IT4I
IT4Innovations maintains a GitLab server (code.it4i.cz), which has built-in support for CI/CD. It provides a set of GitLab runners, which is an application that executes jobs specified in the project CI pipelines, consisting of jobs and stages. Grouping jobs together in collections is called stages. Stages run in sequence, while all jobs in a stage can run in parallel.
Detailed documentation about GitLab CI/CD is available [here][1].
For all the users, a unified solution is provided to let them execute their CI jobs at Karolina, Barbora, and Complementary systems without the need to create their own project runners. For each of the HPC clusters, a GitLab instance runner has been deployed. The runners are running in the login nodes and are visible to all the projects of the IT4I GitLab server. These runners are shared by all users.
These runners are using **Jacamar CI driver** – an HPC-focused open-source CI/CD driver for GitLab runners. It allows a GitLab runner to interact directly with a job scheduler of a given cluster. One of the main benefits this driver provides is a downscoping mechanism. It ensures that every command within each CI job is executed as the user who triggers the CI pipeline to which the job belongs.
For more information about the Jacamar CI driver, please visit [the official documentation][2].
The execution of CI pipelines works as follows. First, a user in the IT4I GitLab server triggers a CI pipeline (for example, by making push to a repository, etc.). Then, the jobs, which the pipeline consists of, are sent to the corresponding runner, running in the login node. Lastly, for every CI job, the runner clones the repository (or just fetches changes to an already cloned one, if there are any), restores [cache][3], downloads [artifacts][4] (if specified), and submits the job as a Slurm job to the corresponding HPC cluster using the `sbatch` command. After each execution of a job, the runner reports the results back to the server, creates cache, and uploads artifacts (if specified).
<img src="../../../img/it4i-ci.svg" title="IT4I CI" width="750">
!!! note
The GitLab runners at Karolina and Barbora are able to submit (as a Slurm job) and execute 32 CI jobs concurrently, while the runner at Complementary systems can submit 16 jobs concurrently at most. Jobs above this limit are postponed in submission to respective slurm queue until a previous job has finished.
### Virtual Environment (Docker Containers)
There are also 5 GitLab instance runners with Docker executor configured, which have been deployed in the local virtual infrastructure (each runs in a dedicated virtual machine). The runners use Docker Engine to execute each job in a separate and isolated container created from the image specified beforehand. These runners are also visible to all the projects of the IT4I GitLab server.
Detailed information about the Docker executor and its workflow (the execution of CI pipelines) can be found [here][5].
In addition, these runners have distributed caching enabled. This feature uses pre-configured object storage server and allows to share the [cache][3] between subsequent CI jobs (of the same project) executed on multiple runners (2 or more of the 5 deployed). Refer to [Caching in GitLab CI/CD][6] for information about cache and how cache is different from artifacts.
## How to Set Up Continuous Integration for Your Project
To begin with, a CI pipeline of a project must be defined in a YAML file. The most common name of this file is `.gitlab-ci.yml` and it should be located in the repository top level. For detailed information, see [tutorial][7] on how to create your first pipeline. Additionally, [CI/CD YAML syntax reference][8] lists all possible keywords, that can be specified in the definition of CI/CD pipelines and jobs.
!!! note
The default maximum time that a CI job can run for before it times out is 1 hour. This can be changed in [project's CI/CD settings][9]. When jobs exceed the specified timeout, they are marked as failed. Pending jobs are dropped after 24 hours of inactivity.
Every CI job in the project CI pipeline, intended to be submitted as a Slurm job to one of the HPC clusters, must have the 3 following keywords specified in its definition.
* `id_tokens`, in which `SITE_ID_TOKEN` must be defined with `aud` set to the URL of IT4I GitLab server.
```yaml
id_tokens:
SITE_ID_TOKEN:
aud: https://code.it4i.cz/
```
* `tags`, by which the appropriate runner for the CI job is selected. There are exactly 3 tags that must be specified in the `tags` clause of the CI job. Two of these are `it4i` and `slurmjob`. The third one represents name of the target cluster. It can be `karolina`, `barbora`, or `compsys`.
```yaml
tags:
- it4i
- karolina/barbora/compsys
- slurmjob
```
* `variables`, where the `SCHEDULER_PARAMETERS` variable must be specified. This variable should contain all the arguments that the developer wants to pass to the `sbatch` command during the submission of the CI job - project, queue, partition, etc. There are also arguments, which are specified by the Jacamar CI driver automatically. Those are `--wait`, `--job-name`, and `--output`.
```yaml
variables:
SCHEDULER_PARAMETERS: "-A ... –p ... -N ..."
```
Optionally, a custom build directory can also be specified. The deployed GitLab runners are configured to store all files and directories for the CI job in the home directory of the user, who triggers the associated CI pipeline (the repository is also cloned there in a unique subpath). This behavior can be changed by specifying the `CUSTOM_CI_BUILDS_DIR` variable in the `variables` clause of the CI job.
```yaml
variables:
SCHEDULER_PARAMETERS: ...
CUSTOM_CI_BUILDS_DIR: /path/to/custom/build/dir/
```
A GitLab repository with examples of CI jobs can be found [here][10].
### Execution of CI Pipelines in Docker Containers
Every CI job in the project CI pipeline, intended to be executed by one of the 5 runners with Docker executor configured, must have the 2 following keywords specified in its definition.
* `image`, where the name of the Docker image must be specified. Image requirements are listed [here][11]. See also [the description][12] in CI/CD YAML syntax reference for information about all possible name formats. The runners are configured to pull the images from [Docker Hub][13].
```yaml
image: <image-name-in-one-of-the-accepted-formats>
# or
name: <image-name-in-one-of-the-accepted-formats>
```
* `tags`, by which one of the 5 runners is selected (the selection is done automatically). There are exactly 2 tags that must be specified in the `tags` clause of the CI job. Those are `centos7` and `docker`.
```yaml
tags:
- centos7
- docker
```
[1]: https://docs.gitlab.com/ee/topics/build_your_application.html
[2]: https://ecp-ci.gitlab.io/docs/admin/jacamar/introduction.html
[3]: https://docs.gitlab.com/ee/ci/yaml/#cache
[4]: https://docs.gitlab.com/ee/ci/yaml/#artifacts
[5]: https://docs.gitlab.com/runner/executors/docker.html
[6]: https://docs.gitlab.com/ee/ci/caching/index.html
[7]: https://docs.gitlab.com/ee/ci/quick_start/
[8]: https://docs.gitlab.com/ee/ci/yaml/index.html
[9]: https://docs.gitlab.com/ee/ci/pipelines/settings.html#set-a-limit-for-how-long-jobs-can-run
[10]: https://code.it4i.cz/nie0056/it4i-cicd-example
[11]: https://docs.gitlab.com/ee/ci/docker/using_docker_images.html#image-requirements
[12]: https://docs.gitlab.com/ee/ci/yaml/index.html#image
[13]: https://hub.docker.com/