The software emulation mode is preferable for development as HLS synthesis is very time consuming. To build following applications in this mode we set:
```console
$export IT4I_BUILD_MODE=sw_emu
```
and run each application with `XCL_EMULATION_MODE` set to `sw_emu`:
```
$ XCL_EMULATION_MODE=sw_emu <application>
```
### Hardware Synthesis Mode
!!! note
The HLS of these simple applications **can take up to 2 hours** to finish.
To allow the application to utilize real hardware we have to synthetize FPGA design for the accelerator. This can be done by repeating same steps used to build kernels in emulation mode, but with `IT4I_BUILD_MODE` set to `hw` like so:
```console
$export IT4I_BUILD_MODE=hw
```
the host application binary can be reused, but it has to be run without `XCL_EMULATION_MODE`:
```console
$<application>
```
## Sample Applications
The first two samples illustrate two main approaches to building FPGA accelerated applications using Xilinx platform - **XRT** and **OpenCL**.
The final example combines **HIP** with **XRT** to show basics necessary to build application, which utilizes both GPU and FPGA accelerators.
### Using HLS and XRT
The applications are typically separated into host and accelerator/kernel side.
This process should result in `vadd.xclbin`, which can be loaded by host-side application.
### Running in Emulation Mode
### Running the Application
With both host application and kernel binary at hand the application (in emulation mode) can be launched as
...
...
@@ -625,23 +659,210 @@ With both host application and kernel binary at hand the application (in emulati
$XCL_EMULATION_MODE=sw_emu ./host vadd.xclbin
```
## Building Application for Real HW
or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
```console
./host vadd.xclbin
```
So far we have assumed software emulation (`sw_emu`), however the same steps can be used to build application for real hardware.
To do so we have to rebuild our kernel binaries in `hw` mode by setting
## Hybrid GPU and FPGA Application (HIP+XRT)
This simple 8-bit quantized dot product (`R = sum(X[i]*Y[i])`) example illustrates basic approach to utilize both GPU and FPGA accelerators in a single application.
The application takes the simplest approach, where both synchronization and data transfers are handled explicitly by the host.
The HIP toolchain is used to compile the single source host/GPU code as usual, but it is also linked with XRT runtime, which allows host to control the FPGA accelerator.
The FPGA kernels are built separately as in previous examples.
The host/GPU HIP code should be saved as `main.hip`