The HLS of these simple applications **can take up to 2 hours** to finish.
The HLS of these simple applications **can take up to 2 hours** to finish.
To allow the application to utilize real hardware we have to synthetize FPGA design for the accelerator. This can be done by repeating same steps used to build kernels in emulation mode, but with `IT4I_BUILD_MODE` set to `hw` like so:
To allow the application to utilize real hardware we have to synthetize FPGA design for the accelerator. This can be done by repeating same steps used to build kernels in emulation mode, but with `IT4I_BUILD_MODE` set to `hw` like so:
```console
```console
$export IT4I_BUILD_MODE=hw
$export IT4I_BUILD_MODE=hw
```
```
the host application binary can be reused, but it has to be run without `XCL_EMULATION_MODE`:
the host application binary can be reused, but it has to be run without `XCL_EMULATION_MODE`:
```console
```console
$<application>
$<application>
```
```
## Sample Applications
## Sample Applications
The first two samples illustrate two main approaches to building FPGA accelerated applications using Xilinx platform - **XRT** and **OpenCL**.
The first two samples illustrate two main approaches to building FPGA accelerated applications using Xilinx platform - **XRT** and **OpenCL**.
The final example combines **HIP** with **XRT** to show basics necessary to build application, which utilizes both GPU and FPGA accelerators.
The final example combines **HIP** with **XRT** to show basics necessary to build application, which utilizes both GPU and FPGA accelerators.
### Using HLS and XRT
### Using HLS and XRT
...
@@ -667,9 +670,9 @@ or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
...
@@ -667,9 +670,9 @@ or with real hardware (having compiled kernels with `IT4I_BUILD_MODE=hw`)
## Hybrid GPU and FPGA Application (HIP+XRT)
## Hybrid GPU and FPGA Application (HIP+XRT)
This simple 8-bit quantized dot product (`R = sum(X[i]*Y[i])`) example illustrates basic approach to utilize both GPU and FPGA accelerators in a single application.
This simple 8-bit quantized dot product (`R = sum(X[i]*Y[i])`) example illustrates basic approach to utilize both GPU and FPGA accelerators in a single application.
The application takes the simplest approach, where both synchronization and data transfers are handled explicitly by the host.
The application takes the simplest approach, where both synchronization and data transfers are handled explicitly by the host.
The HIP toolchain is used to compile the single source host/GPU code as usual, but it is also linked with XRT runtime, which allows host to control the FPGA accelerator.
The HIP toolchain is used to compile the single source host/GPU code as usual, but it is also linked with XRT runtime, which allows host to control the FPGA accelerator.
The FPGA kernels are built separately as in previous examples.
The FPGA kernels are built separately as in previous examples.
The host/GPU HIP code should be saved as `main.hip`
The host/GPU HIP code should be saved as `main.hip`