This will further accelerate Ray Tracing future GPUs from AMD and NVIDIA



Ray Tracing has undoubtedly become the future of real-time rendering, especially when rasterization cannot solve certain visual problems and does not give more than itself. But its implementation is not being easy and the performance is not ideal even with the most advanced GPUs. This is where the next step comes in, Coherent Ray Tracing. What does it contribute, why is it necessary and what does it consist of?

Ray Tracing used in games today is what we call hybrid rendering in which the coherent part of the scene is rendered using the rasterization algorithm and the incoherent part of the scene is rendered through ray tracing, so Despite what the marketing of the different companies say, the era in which games are rendered purely through ray tracing has not arrived.


To make this statement more understandable, let's say that the scene is rendered using rasterization completely ignoring indirect lighting, which is produced when a light source falls on an object and it reflects the light in new directions.


Ray Tracing renders incoherent elements of the scene more accurately, quickly and efficiently than rasterization, but there is a related performance problem when rendering the incoherent part of the scene that makes the computational cost very high at When it comes to applying ray tracing and that is precisely the next big challenge for companies like NVIDIA and AMD, optimizing the performance of the incoherent part of the scene in ray tracing.

Consistent Ray Tracing and Inconsistent Ray Tracing


Let's put aside the hybrid rendering used in games for the moment and turn our attention to pure Ray Tracing, where rays can be scored in two different ways.

  • In pure Ray Tracing coherent rays are considered those that come out of the camera and follow the path of the frustum view of the scene, these rays are called coherent but they are not used in hybrid rendering.
  • The incoherent rays are those that the product originate from the impact of a light beam on an object.
  • Coherent rays are those that come out of a primary light source, that is, they have not been generated by the impact of a previous ray on the object.
At a visual level if we talk only about direct lighting there is no difference in visual quality in rendering a scene with only direct lighting between rasterization and ray tracing, add this to the fact that all game engines work via raster and you will understand why which is not used ray tracing when rendering the coherent part of the scene.

The performance of the inconsistent part of Ray Tracing on a GPU

The problem is that, although ray-tracing is much better at rendering the incoherent part of a scene than rasterization, there is the problem that incoherent rays have much lower performance than calculating the coherent rays of the scene.

The reason for this disparity in performance is due to the fact that not all the information of the scene is not in the GPU cache, which is what the lightning intersection units access, with non-coherent rays they do not impact on the same area of ​​the scene and therefore do not affect the same shader, which causes stops in a huge number of threads in the GPU, causing drops in performance.


This is a problem that in the film industry they solve through ray reordering algorithms, but they can do it easily because they know the position of the camera in advance and therefore can convert all the incoherent rays of the scene into rays. consistent through a sorting algorithm.

But when it comes to rendering a movie, they have all the time in the world, they don't have to show an image every few milliseconds and the sorting algorithms are rather to save time and with it the cost of their powerful rendering farms, but, the situation in video games is different.


But in a video game where each frame is unique, it cannot be done, moreover, it would take highly powerful hardware so that ordering the rays of the scene would not affect the high frame rate of the same, so it is right now the next A great challenge to be solved by GPU manufacturers and it is a crucial element if Ray Tracing is not to be stagnant in terms of performance.

Current GPUs are not intended for Inconsistent Ray Tracing


The graphics processors that we use in our PCs were designed for rasterization, which is an exploited rendering algorithm that benefits very well from the spatial and temporal location of memory accesses.

Most of the work that the GPU has to carry out during rasterization has the particularity that when applying a shader program, especially during the Pixel Shader, that the data of the pixels and triangles that it is processing are shared with its closest neighbors. In the scene.

So there are a lot of possibilities that if the GPU accesses the data for a group of triangles and pixels and collects all the nearby ones in memory into the caches then it will already have the data for the neighboring pixels and triangles. So the changes have to go in order to exploit that common characteristic of all GPUs.

The spatial data structure


In order to speed up Ray Tracing, what is done is to build a spatial data structure, said structure is nothing more than the map of the position of the objects in the scene in an orderly way.

The scene is converted into a kind of cube with several sub-divisions that indicate where the objects are, of which there are two types:
  • The scene is divided into regular blocks by space.
  • The scene is divided into those parts where there is geometry or elements.

In games, the second type has been chosen by adopting BVH, especially due to the fact that NVIDIA has dedicated hardware in its GPUs to quickly navigate this tree data structure, but there are two types of BVH:

Static BVHs need to be rebuilt again after we have modified any objects in the scene, however once they have been built they speed up the rendering time of the scene.
Dynamic BVHs allow the objects to be updated individually, in such a way that when rebuilding the BVH the time to do it is much lower, but in return the subsequent rendering time increases.
And how important is it? If we want to order the rays according to their trajectory in the scene, we first need to be able to have a map of the same scene that allows us to store the trajectory of the rays.

Mapping the path of the rays


One solution is to make the rays pre-travel the scene without modifying it, just to know which objects are going to affect the different rays and which are the rays that are going to cross the scene. Once the pre-tour is finished, the different rays that affect a part of the scene in particular are stored in a memory buffer, although they are not related to each other.

Although there is no direct relationship between the different rays of the same place, there is a spatial relationship, which helps to exploit the common architecture of all GPUs when rendering a scene with non-coherent rays. The idea is to pre-render the scene but without calculating the shaders that vary the color values ​​of the different objects when rendering the scene, we are simply interested in knowing which parts of the scene each of the rays will affect.

Rays pre-running the scene


The rays that pre-cross the scene are only going to execute a shader, the Ray Generation Shader, which indicates that this object in the scene has the ability to generate an indirect ray of light, as for the rays themselves they have a series with them parameters to prevent them from bouncing forever like ping pong balls all over the scene.

To do this, it is necessary to place a series of parameters associated with the rays and the objects that would be the following:
  • A constant that is the number of bounces that a ray can make in the scene, once this amount of bounces has been made regardless of the other conditions, said ray stops bouncing.
  • A constant in each material that is the refraction constant, which goes from 0 to 1, with each intersection the energy value of the ray is multiplied by the refraction constant, when a ray reaches an energy level low enough it is discard.

With this we can already make the rays bounce through the scene in a preliminary way, which helps to order the data, since with this we can know in which parts of the scene the different rays will affect. This will greatly speed up performance, but this requires two changes to the hardware.

Embedded memory to store spatial data structure


What we now have left is to be able to store the entire spatial data structure in a memory as close as possible to the processor, as well as the pre-tour data, but this data structure cannot be stored in the limited caches of a few Megabytes. , neither the Infinity Caché despite its 128 MB would be able to store such an amount of data.

What is needed is to find a way to place as much memory as possible close to the GPU, which serves to store the entire spatial data structure, said memory would not be a cache and would not be part of the memory hierarchy of the processor, it would simply serve to store the entire spatial data structure inside.

One way to achieve this would be to use SRAM memory vertically connected to the GPU, but the implementation of this memory could come with some additional additions taking advantage of its future implementation in the GPUs. Although there are other ways to do this, they may even do so in the form of a new high-density, last-level cache.

The next fixed function units


There are going to be two, which are going to be crucial in order to increase performance:
  • The first one will be in charge of generating the spatial data structure through the position of the geometry in the scene.
  • The second thing you will do is take note of where each ray hits during the tour prior to applying Ray Tracing.
Both units will take advantage of the enormous embedded memory that the GPUs will include to store the spatial data structure of the scene. Thanks to them we will see a great increase in performance as far as Ray Tracing is concerned.


These units are already found in hardware solutions such as Wizard's PowerVR in the form of Scene Hierarchy Generator and Coherency Engine, their usefulness has been more than demonstrated but not in extremely complex environments where the implementation of embedded memory will be necessary.

Post a Comment

0 Comments