RTX 3090 battles 8K resolution 3A games again Exquisite, clear and sharp new


At the press conference in September this year, Mr. Huang Renxun, CEO of NVIDIA, once excitedly introduced a beast graphics card "RTX 3090" to players, and said that the emergence of this card makes 8K gaming possible. In order to improve the fluency of players in 8K games, NVIDIA has customized 24GB of GDDR6X video memory for RTX 3090 and coupled with the new DLSS super performance mode, let the AI ​​tensor unit do its best to improve the RTX 3090's 8K resolution in the game Performance. However, the old saying goes well. Hearing is false and seeing is believing. How many shocks can this ultra-high resolution experience bring players? How good is the 8K game experience at this stage? Can I still play with 8K Open Light Chase? In order to answer these questions, I decided to experience the 8K game personally to see if the game screen is as sharp as the legend.

What is 8K? What's so good about high-resolution games?

Before we start today’s 8K experience, we must first understand what is good about 8K. 8K is short for “7680*4320” resolution. This resolution is equivalent to 16 1080P (1920*1080) resolutions, or 4 A combination of the display standard 4K (3840*2160) resolution. Let me make another analogy. If 8K is a "full-open" paper, 1080P is equivalent to a "16-open" paper, and 4K is equivalent to a "4-open" paper. They can display no information than 8K." "Full open" has a lot of paper. Of course, this also means that when the graphics card outputs 8K signals, the calculation pressure is equivalent to 16 times the 1080P or 4 times the 4K resolution, which is not a small challenge for today's graphics cards.

A picture of a game that looks fine, after zooming in, is composed of pixels

At the same time, due to the improvement of screen resolution, if the size of the display does not change, then the number of pixels per inch will be more, that is, PPI (Pixel Per Inch) has been improved. At this time, our eyes will feel that the 8K game screen is more delicate and real. At the same time, due to the explosive increase of 8K resolution pixels, the actual pixel points occupied by the edges of each game object are also closer to a circle instead of the previous square sawtooth, the picture looks more comfortable, the actual look of the picture There will be fewer sawtooths, and at 8K resolution, the effect of sawtooths is almost negligible.

8K and mainstream resolution game screenshot comparison

We chose two games for screenshot comparison. First, TAA’s "Death Stranding" screenshot test is turned on by default in the game. The optimization of this game created by "Kojima Studio" on the PC side can be said to be very good. It is a very friendly game for players. We will zoom in and compare screenshots of 1080P, 1440P, 4K 2160P, and 8K 4320P to see how big the gap between mainstream resolution and 8K ultra-high-definition resolution is.

"Forza Motorsport: Horizon" is a well-known casual racing game owned by Microsoft, and now its fourth-generation product "Forza Horizon 4" has come to London, England. The exquisite graphics and excellent optimization make him the best in 8K testing. Choose, this game is full of realism at 8K resolution as if you are really in the "Forza Motorsport" in London. We also zoomed in and compared the screenshots. This game does not enable TAA by default, so the low-resolution aliasing is more obvious.

Ray tracing has now become a popular technology in games. It can make the light and shadow of the game virtual world more real. Coupled with the 8K ultra-high resolution, the immersion of the virtual world has never been so strong. Tracking light and shadows, approaching reality. On the NVIDIA Turing architecture, NVIDIA introduced the RT Core that can accelerate real-time ray tracing operations for the first time. When performing real-time ray tracing-related calculations, modern SIMD-based CUDA cores are too inefficient when performing calculations such as light and object performance collision points. Instead, specific-purpose computing modules based on the MIMD architecture are more efficient. NVIDIA's RT Core is such a dedicated hardware unit designed to accelerate real-time ray tracing calculations.

The RT Core on the Ampere GPU mainly adds support for the acceleration of motion blur. In the case of non-light chasing, the motion blur is often just a post-processing filter applied to the picture, and its effect is not real. In the case of real-time light tracking, dynamic blur is generated by real-time calculation of the interaction between the object and the light. The calculation is very complicated, and even the RT Core on Turing is difficult to carry. In the Nvidia ampere architecture, the second-generation RT Core has added an interpolation algorithm designed by NVIDIA, which improves the real-time ray tracing efficiency in this case while ensuring the accuracy of a dynamic blur. The official said that it can achieve up to 8 times the previous generation. speed. In addition, in the basic BVH calculation, the new generation RT Core can also be twice as fast. It is the significant performance improvement of the second-generation RT Core that makes 8K optical chase gaming possible.

The third generation of Tensor Core makes AI performance leap

Of course, we know that if we only rely on the traditional SM computing unit, it is not realistic to solve the 8K light chase game at this stage. Starting from the NVIDIA Volta architecture, NVIDIA has introduced Tensor Core optimized for AI computing in the SM unit. These tensor calculations The unit can improve the efficiency of the graphics card in machine learning calculations. In the NVIDIA Ampere architecture, Tensor Core has evolved to the third generation. The previously released A100 computing card has used the new third-generation Tensor Core, which can provide 4 times higher performance than the second-generation Tensor Core. However, the Tensor Core on the game card has been streamlined to a certain extent, and the throughput of its FP16 FMA calculation is only half of the Tensor Core in the GA100 core.


In addition to the performance improvement of the third-generation Tensor Core, it also provides support for sparse matrix operations. For a detailed introduction, you can see our previous analysis of the NVIDIA Ampere architecture in the direction of computing cards: "A simple interpretation of NVIDIA's new-generation Ampere architecture: An improved and revolutionary structure upgrade. " In general, even if the game-oriented Nvidia ampere architecture reduces the number of Tensor Cores per SM from 8 to 4, its overall performance is still greatly improved. The performance improvement of AI tensor unit will directly help RTX 3090 open the door to 8K games.

DLSS 8K is a big step forward

The more powerful AI computing power brought by the new Tensor Core will help DLSS. Earlier this year, NVIDIA began to promote DLSS 2.0 technology in an all-round way. Compared with the original DLSS, DLSS 2.0 is both in terms of image quality and rendering efficiency. The above has been greatly improved. It is no longer the so-called tasteless function, but can effectively make the mid-end graphics card run out of 4K60 in the 3A masterpiece. Ampere GPU does not bring the updated "DLSS 3.0", but it still pushes this technology one step forward-DLSS 8K, as the name suggests, is a new version of DLSS that stretches the screen resolution to 8K through deep learning technology.

Although NVIDIA did not use the name DLSS 3.0, DLSS 8K still has certain breakthroughs in technology. The specific point is that it will use the actual rendered image with 1440p resolution to derive the 8K resolution output image and the number of pixels. Spanned a full 9 times (2560x1440=>7680x4320). Previously, on DLSS, the highest image stretch was achieved by 4 times (1920x1080=>3840x2160). Now, this figure jumps directly to 9 times, showing the huge potential of AI upscaling technology.

More parallel rendering pipeline

Delivering different types of calculations to different units for processing is a concept that has been adopted since the NVIDIA Volta architecture. The Tensor Core introduced at that time diverted many AI-related operations, and the RT Core introduced afterward used real-time light. Tracking related calculations are diverted. Can they be executed in parallel? Yes, but not all operations can be executed in parallel.

As shown in the figure above, when Turing GPU turns on real-time light tracking and DLSS, its RT Core and Tensor Core do not work in parallel. The time when Tensor Core is called is close to the end of the entire rendering process, and it does not run simultaneously with RT Core.

On the NVIDIA Ampere architecture, NVIDIA has improved the parallelism between the various units within the GPU. Now the traditional computing unit, RT Core and Tensor Core three units can work at the same time, continuing to shorten the frame rendering time on the original basis.

GDDR6X video memory allows bandwidth to take off

The texture data and video memory usage of the game at 8K resolution is amazing. It is commonplace for 3A games to eat up to 20G of video memory. At the same time, we also know that the flagship GPU at this stage still relies heavily on cache, not only the various cache systems inside the GPU but also the very high requirements for the external memory system used as a "warehouse". The higher the rendering resolution Nowadays, GPU not only needs larger video memory for it to store various rendering materials but also needs more bandwidth video memory for it to achieve faster data reading. From the earliest 3D accelerator cards all the way, the video memory has changed from using GDDR2, which is not much different from traditional DDR, to using G DDR3, GDDR5, and then using HBM designed for ultra-high bandwidth. Its type of replacement and upgrade speed are much faster than Traditional DDR memory come fast.

NVIDIA first applied GDDR6 video memory on Turing graphics cards in 2018. At that time, GDDR6 video memory could provide much higher bandwidth than the exhausted GDDR5, which also eclipsed GDDR5X. However, with the rapid increase in the core scale of RTX 30 series graphics cards, the original GDDR6 memory is somewhat insufficient, so NVIDIA and Micron have launched an upgraded version of GDDR6-GDDR6X memory. Not to mention that it only has an X suffix, but it has a very significant change in the underlying signal transmission, and for the first time the GDDR series video memory bandwidth has been pushed to the height of 1TB/s.

The major change in GDDR6X memory is its signal transmission mechanism. The original GDDR series video memory uses a very primitive binary signal. To be more specific, this series of video memory uses NRZ (Non-Return-to-Zero) modulation. This signal modulation method is very simple. It is represented by a high level. 1, low level represents 0. If you want to increase its data bandwidth, then increase the clock frequency of the video memory. However, due to the influence of various factors such as process technology, the clock frequency of video memory is difficult to increase at this stage, so what should be done? Manufacturers thought of using a new signal modulation mechanism to improve the efficiency of signal transmission, and they chose PAM4, which is already widely available.

PAM is a signal modulation method that uses analog signal pulses to encode information, and PAM4 is the simpler one. Different from the binary signal of NRZ, which has only two states of high and low, PAM4 has 4 different level values, that is, it has 4 different states, and each state corresponds to a 0 and 1 The combination of NRZ, that is, each state corresponds to 2 bits of data, which is double that of NRZ.

If this is a bit vague, the PAM4 signal can be compared to the way MLC flash memory stores data. We know that each cell of MLC flash memory can store 2-bit data. On the electrical signal level, it is represented by 4 different levels, with a fixed interval between each level. The master is reading and writing. When entering, the data and electrical signals are converted according to fixed rules.

The same is true for GDDR6X. According to the information released by NVIDIA, GDDR6X has four different level signals, and the voltage difference between each level signal is 250mV. In addition, NVIDIA has also introduced MTA encoding to reduce loss at the signal transmission level and ensure stability. The RTX 3090 has 24GB of GDDR6X video memory, and the memory bandwidth is 936GB/s, which is close to the 1000GB/s mark. The high bandwidth provides the necessary conditions for the RTX 3090 to start 8K games.

Test platform and description

Testing Platform
Hardware Platform
CPUAMD Ryzen 7 5950X
MotherboardASUS ROG CROSSHAIR VIII Formula | AMD X570
Graphics cardNVIDIA RTX 3090 Founder Edition
RAMZhiqi Royal Halberd DDR4-3200 8GB*2 CL14
Hard DiskSamsung 980 PRO NVMe Gen 4 1TB SSD
Heat SinkASUS ROG STRIX LC Flying Dragon 360
Power SupplyEVGA 1200 P2 PLATINUM
MonitorDell UltraSharp UP3218K
Software Configuration
Operating SystemMicrosoft Windows 10 64bit build 20H2
DriveGeForce Driver 460.79 Game Ready
Super Network Production

If you want to drive 8K, only one RTX 3090 graphics card is not enough. I have built a set of flagship game platforms at this stage. The CPU used is Ryzen 9 5950X processor, and the motherboard is also a solid X570 ROG CROSSHAIR VIII. The formula, other accessories are also readily available. It should be noted that the monitor this time is Dell UltraSharp UP3218K from Dell, which is an 8K monitor, which can perfectly complete our 8K display task.

8K traditional raster game experience: DLSS helps 8K 60 frame experience

In traditional rasterization games, we have selected several well-optimized games for 8K resolution testing. At the same time, we have also conducted a super performance comparison test for those that support DLSS2.0 technology to see if this mode can give in 8K mode. How much gain does the RTX 3090 graphics card bring?

Through the above test results, you can see that the RTX 3090 can provide a game experience of about 60 frames in a well-optimized game at 8K resolution, while in some masterpieces with very delicious configurations, the frame number is only 20 or 30. At this time, we can turn on the DLSS super performance mode in the game to further increase the number of frames. It can be seen that most 3A games can achieve a smooth playing experience of 60 frames with the help of DLSS super performance.

It’s just that some players wonder how much the sharpness of the game will be lost after turning on the DLSS super performance mode?

It can be observed that because real-time ray tracing puts great pressure on the graphics card, the RTX 3090 is already overwhelmed at 8K' resolution, and the frame number of "Control" is even in the early 10s. We once again turned on the DLSS super performance mode that can bring 8K games back to life, and the frame rate of 8K light chase games has been greatly improved. Except for the poorly optimized "Cyberpunk 2077", other games have broken through the 60-frame pass line. The 8K light chase game becomes a reality, and the DLSS super performance mode is indispensable.

Players can get this kind of experience today, of course, thanks to the RTX 3090 graphics card. Huang kindly calls it "BFGPU". The performance of this heavyweight product in 8K games is really impressive. At this stage, if players want to experience 8K games first, the only graphics card they can choose is RTX 3090, massive GDDR6X memory capacity and bandwidth, and the second-generation optical tracking unit and the third-generation AI unit. These powerful units make RTX 3090 has emerged in 8K games, bringing players a shocking experience that is hard to describe in words. If you can personally play 8K games on an 8K monitor, you will find that the entire game screen is really slim, clear and sharp.

The texture on the door of Destiny 2 under 8K is clearly visible

At the same time, the RTX 3090 is also a very friendly GPU for content creators. It also performed very well in our previous content creation tests. Therefore, the RTX 3090 is actually a very comprehensive flagship graphics card, both in terms of game performance and productivity. To provide players with the strongest assistance, the strength of "Card King" is really not covered. Let's take a look at the two sets of comparison diagrams of whether "Cyberpunk 2077" is currently on fire or not.

Post a Comment

0 Comments