ArchInteriors 13, scene 8
ArchInteriors 13, scene 11
ArchExteriors 25, scene 2
ArchInteriors 33, scene 8
For this first set of tests, we used the regular CUDA version of V-Ray GPU Next, which doesn’t use the RT Cores yet. We measured the pure CUDA performance of the RTX 2080 and RTX 2080 Ti cards and compared them to the GeForce GTX 1080 Ti, which is quite popular for GPU rendering right now.
For the three scenes, the RT Cores provide a speedup of 1.78x, 1.53x and 1.47x respectively compared to the pure CUDA version. We expect these results to get better as we get closer to the official builds in the coming months.
For this set of tests, we used our Project Lavina real-time ray-tracing engine which we first unveiled at Siggraph 2018. It is based on the DXR ray-tracing extension for DirectX 12 and is written from the ground up for real-time ray-tracing performance. The engine is based entirely on ray tracing, including shadows, reflections, refractions and a couple of bounces for GI, with a denoising pass to smooth the results. There is no rasterization involved at all and the RT Cores are utilized quite heavily. DXR is not supported on older GPU generations like the GTX 1080 Ti, so right now we can compare only the two available GeForce RTX cards.
The results for the DXR tests are in frames per second for HD resolution, so higher results are better:
From these results, the RTX 2080 Ti card provides on average 1.35x performance improvement over the RTX 2080 card; this translates into noticeably better responsiveness of the engine.
In addition to the RT Cores, the new RTX cards also support NVLink, which gives V-Ray GPU the ability to share the memory between two GPUs; this has some impact on rendering speed — and in this benchmark, we aim to measure it. In order to enable NVLink, the cards need to be connected with a special NVLink connector (also called NVLink Bridge). There are two types of connectors for GeForce RTX cards: three-slot wide and four-slot wide, depending on how far the cards are physically. The NVLink Bridges for Quadro RTX cards will be two-slot and three-slot wide respectively.
Three- and four-slot NVLink connectors for GeForce RTX cards:
Two RTX 2080 Ti cards connected with a four-slot NVLink connector:
For NVLink to work on Windows, GeForce RTX cards must be put in SLI mode from the NVIDIA control panel (this is not required for Quadro RTX cards, nor is it needed on Linux, and it’s not recommended for older GPUs). If the SLI mode is disabled, NVLink will not be active. This means that the motherboard must support SLI, otherwise you will not be able to use NVLink with GeForce cards. Also note that in an SLI group, only monitors connected to the primary GPU will work. Additionally, if two GeForce GPUs are linked in SLI mode, at least one of them must have a monitor attached (or a dummy plug) so that Windows can recognize them (this is not required for Quadro RTX cards nor is it necessary on Linux).
Screenshot of NVIDIA control panel with SLI mode enabled (SLI mode is required for NVLink with GeForce RTX cards on Windows):
The NVLink speed is also different between the RTX 2080 and the RTX 2080 Ti cards, so we expect different performance hits from using NVLink.
In the tests below, we rendered several scenes with the cards in SLI mode versus non-SLI mode to see what the performance impact of NVLink is. We used the regular CUDA version of V-Ray GPU for these tests. In some instances, the scene failed to render in non-SLI mode due to the limited RAM on each GPU separately.
Note that the available memory for GPU rendering is not exactly doubled with NVLink; V-Ray GPU needs to duplicate some data on each GPU for performance reasons, and it needs to reserve some memory on each GPU as a scratchpad for calculations during rendering. Still, using NVLink allows us to render much larger scenes than would fit on each GPU alone.
The last scene, Lake Lavina, could only be rendered in NVLink mode with the RTX 2080 Ti cards and failed to render in the other testing scenarios due to insufficient GPU memory. As can be seen, NVLink does introduce some performance hit compared to rendering on the GPUs separately, but it allows the rendering of far larger scenes. In many cases, the slowdown is only a few percent. Fine-tuning the way data is distributed between the two cards may provide even better performance in the future.
Important note: It seems like the regular GPU memory reporting API provided by NVIDIA currently (at the time of this writing) does not work correctly in SLI mode. This means that programs like GPUz, MSI Afterburner, nvidia-smi, etc. might not show accurate memory usage for each GPU. Knowing this, we have modified the memory statistics shown in the V-Ray frame buffer so you can track actual GPU memory usage there. We expect NVIDIA will correct these reporting issues in the future.