By Maxim Vorontsov — Jul 3, 2024

My experience with Vortex GPGPU: RISC-V GPGPU on FPGA

Picture Source: Blaise Tine et al. Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics

Introduction

RISC-V is an open-source instruction set architecture (ISA), which means it benefits from a large community of developers and users who contribute to its development and improvement.

General-Purpose Graphics Processing Units (GPGPUs) are specialized processors designed to handle a wide range of computational tasks beyond just rendering graphics.
They are based on the architecture of GPUs (Graphics Processing Units), which were originally designed for rendering images and animations in video games and movies.
However, GPUs have been adapted to perform general-purpose computations, making them highly versatile for various applications.

GPGPUs play a pivotal role in the field of Artificial Intelligence (AI) due to their ability to efficiently process large amounts of data and perform complex calculations required for AI algorithms.

An FPGA (Field Programmable Gate Array) is a semiconductor device composed of an array of configurable logic blocks (CLBs) and programmable interconnects, allowing for the customization of its functionality post-manufacturing through a binary file bitstream known as a firmware, which specifies the configuration of the CLBs, interconnects, and other components for the desired digital circuit

Although basing GPGPUs on FPGAs is not optimal for performance and power consumption, however it lets developers to create hardware that is specifically designed for the tasks they need.

Vortex GPGPU

Once I discovered quite fascinating project -- Vortex GPGPU, and I decided to take a look at. Vortex is the open-source RISC-V GPGPU written in Verilog with FPGA implementation.

While I was digging into the project, surely I found plenty of articles and academic research papers dedicated to it, which I recommend to read as well.

Project building

Attempt 1: master branch

Initially I was trying to work with the master branch, given the provided instructions in the VortexGPGPU's README.md file. It mostly went well, except that I needed to manually patch up $VORTEX_ROOT/tests/opencl/common.mk file:

-POCL_CC_PATH ?= $(TOOLDIR)/pocl/compiler
-POCL_RT_PATH ?= $(TOOLDIR)/pocl/runtime
+POCL_CC_PATH ?= $(TOOLDIR)/compiler
+POCL_RT_PATH ?= $(TOOLDIR)/runtime

(However, later on I found the above "fix" doesn't solve the root cause, but it allows project to build -- more on this below.)

In any case, at this point the build was successful, and we can try to run some tests!

However while running the first VortexGPGPU's test from the "Quick demo running vecadd" section of the readme, I quickly faced with the other issue:

$ ./ci/blackbox.sh --cores=2 --app=vecadd
...
Workload size=64
invalid caps id: 6
Aborted
make: *** [../common.mk:94: run-simx] Error 134

Interestingly, I wasn't alone, and the issue was reported by others, too -- unfortunately, without any resolution.

Shortly after some more digging I've come to the conclusion that capabilities interface in runtime/*/vortex.cpp is broken or not finished, and because of it was non-obvious how to resolve the issue quickly. At this point, the issue was irreparable in my eyes and I gave up for that day.

Attempt 2: turning to the develop branch

After some thinking, the next day I decided to look at the branch list, and found more alive branch -- develop!

So, I started to work in the develop branch (which at that time was at the commit daf1360d83fae9e8725689049a6f3d1b38687629, -- this is important to mention, as later on it turned out that more recent commits break the tests again. :) )

On the develop branch I still needed the POCL fix mentioned above, and with it applied the branch built successfully and... the vector addition example worked perfectly fine!

$ ./ci/blackbox.sh --cores=2 --app=vecadd
Workload size=64
[VXDRV] WAIT
[VXDRV] DCR_WRITE: addr=0x1, value=0x80000000
[VXDRV] WAIT
[VXDRV] DCR_WRITE: addr=0x2, value=0x0
[VXDRV] WAIT
[VXDRV] DCR_WRITE: addr=0x3, value=0x0
[VXDRV] WAIT
[VXDRV] DCR_WRITE: addr=0x4, value=0x0
[VXDRV] WAIT
[VXDRV] DCR_WRITE: addr=0x5, value=0x0
[VXDRV] WAIT
[VXDRV] DCR_WRITE: addr=0x5, value=0x0
[VXDRV] MEM_ALLOC: size=1048580
[VXDRV] COPY_TO_DEV: dev_addr=0x100040, host_addr=0x0x7ffc23d15a54, size=4
Create context
Allocate device buffers
Create program from kernel source
Upload source buffers
[VXDRV] MEM_ALLOC: size=256
[VXDRV] COPY_TO_DEV: dev_addr=0x100080, host_addr=0x0x558cefaba0d0, size=256
[VXDRV] MEM_ALLOC: size=256
[VXDRV] COPY_TO_DEV: dev_addr=0x100180, host_addr=0x0x558cefab1420, size=256
Execute the kernel
[VXDRV] MEM_ALLOC: size=256
[VXDRV] MEM_ALLOC: size=76
[VXDRV] COPY_TO_DEV: dev_addr=0x100380, host_addr=0x0x558cefab7c90, size=76
[VXDRV] WAIT
[VXDRV] DCR_READ: addr=0x1, value=0x80000000
[VXDRV] WAIT
[VXDRV] DCR_READ: addr=0x2, value=0x0
[VXDRV] COPY_TO_DEV: dev_addr=0x80000000, host_addr=0x0x558cef7fa048, size=4764
[VXDRV] START: krnl_addr=0x80000000, args_addr=0x100380
[VXDRV] WAIT
36: [sim] run()
[VXDRV] MEM_FREE: dev_addr=0x100380
[VXDRV] COPY_FROM_DEV: dev_addr=0x100040, host_addr=0x0x7ffc23d1718c, size=4
Elapsed time: 3054 ms
Download destination buffer
[VXDRV] COPY_FROM_DEV: dev_addr=0x100280, host_addr=0x0x558cefab1530, size=256
Verify result
PASSED!
[VXDRV] MEM_FREE: dev_addr=0x100080
[VXDRV] MEM_FREE: dev_addr=0x100180
[VXDRV] MEM_FREE: dev_addr=0x100280
[VXDRV] COPY_FROM_DEV: dev_addr=0xff004040, host_addr=0x0x558cefab1530, size=256
PERF: core0: instrs=5403, cycles=14527, IPC=0.371928
[VXDRV] COPY_FROM_DEV: dev_addr=0xff004140, host_addr=0x0x558cefab1530, size=256
PERF: core1: instrs=5406, cycles=14526, IPC=0.372160
PERF: instrs=10809, cycles=14527, IPC=0.744063

Hooray!

Back to POCL fix

One thing that was still unclear, is that why the POCL fix was needed in the first place?
At first, it looked like VortexGPGPU authors just changed compiler/runtime paths,
but did not update the make scripts. But since the issue existed for long time, it was
less likely an oversight.

Funnily, while I was debugging the issue, I found that github user gedatsu217 faced the same issue with the proposed fix that similarly changes the POCL paths -- so, after all I wasn't alone!

But why did authors not faced the issue? I started looking more closely into toolchain_install.sh script, suspecting that maybe it either has some special options, or it has an issue. And yep -- it indeed has very subtle issue, which took me some time to identify, but in retrospective the bug is quite obvious.

For example, with --all argument, if TOOLDIR does not exists, cp -r pocl $TOOLDIR would essentially rename pocl to $TOOLDIR, instead of moving the pocl directory to $TOOLDIR itself. I.e. the structure will be tools/runtime and tools/compiler instead of tools/pocl/runtime and tools/pocl/compiler.

The VortexGPGPU authors probably either used tools in the existing directory, so the bug never showed up for them.

In either case, the issue has a fix in the Github PR, hopefully it will save some time and effort for others!

So, at this point we have VortexGPGPU fully up and running -- which is awesome!

Next, as I mostly interested in hardware aspect of the project, I will take a little dip into RTL of VortexGPGPU.