Rendering & Ram, How does Frequency, number of channels and capacity affect ray tracing in Blender?
A while back I made a video about testing a simple upgrade, making my 8GB DDR3 PC a 16GB DDR3 PC to test what the performance improvements were. You can watch that video right here. The video is just four minutes long, so it won't take you long to get the idea.
There were some comments I need to address though, I wasn't as thorough as I should have been in the tests.
The contention was that when I upgraded to 16GB the resulting configuration of RAM on my motherboard meant that the PC now had two channels for accessing data from the memory, before it only had one. It was a fair criticism, you should only test one variable at once and hold others constant. No Nobel prize in science for me I guess.
To remedy the situation we ended up building a whole new PC recently (which you can watch the making of here). Lol! Actually, we mostly did it to do crowdrender software development with modern hardware. But it also allowed us to properly test a whole stack of configurations and get the data on what matters for rendering in Blender when it comes to RAM.
So lets get stuck in :)
The Setup and method
The config we used was a Ryzen 9 3900X, with a Gigabyte Aorus Elite wifi motherboard. We used Crucial Ballistix RAM with 8GB per module. Since the motherboard has four RAM slots, two for channel A, and two for channel B, we had everything we needed to test number of channels and total capacity independently. We threw in tests of different frequency as a bonus :)
Before going any further, a huge thanks to Mwave Australia for supplying the parts necessary for this test, you can find the RAM we used, and motherboard in their store :)
Software wise, we used blender 2.83 LTS on windows 10 and downloaded benchmark files from blender.org
Lets look at the results then?
Does the number of channels matter?
First of all, what are channels where RAM is involved?
Channels for memory can be thought of like having multiple queues at the bank or supermarket, having more than one teller or checkout means there's a higher capacity for requests for data to be serviced. They increase the ability to get data to and from the CPU, for a primer on memory subsystems in a lot more detail, check this link.
So, does having more than one channel affect rendering performance in Blender?
According to our tests it does not! Turns out it makes negligible difference as you can see from the charts.
I guess you might be wondering why not? Well, as a programmer, one thing I know I should avoid like the plague (or any pandemic) is excessively accessing system memory or heaven forbid, the hard disk. As the diagram below shows, accessing memory is a lot slower than the CPU's cache, and the hard drive even slower.
Because trips out to the RAM modules for data are costly, ray tracing engines try to make use of something called coherence. That is they try to load data they need into memory as few times as possible. As programmers we tend not to do this explicitly but we take advantage of the memory system which includes the fast cache memory that is on the CPU itself.
The memory system tries to keep frequently accessed items in the cache, only loading from RAM when a cache 'miss' happens, meaning the data needed for the next instruction isn't in the cache and has to be loaded from RAM. Ray tracing engines can keep the number of misses down by exploiting coherence. That is making multiple use of the same data, like instancing meshes for example. Or bunching rays together which may bounce and land in a similar location, meaning the same data is sampled for each ray in the bunch.
The above tactics aren't perfect, but they help to reduce cache misses and avoid main memory references. This helps ray tracing engines be efficient with the slow trips out to RAM.
So what are multiple channels for then?
As far as I can tell, servers. A server can have terabytes of RAM installed, and have many programs running simultaneously, serving web requests, calculating, querying databases that are in memory. Since many uncoordinated tasks are potentially trying to get access to the memory, it's hard to have coherence. Any of those programs might be requesting data from memory at any time in a completely chaotic way. Therefore, there are many requests, and the need for multiple channels is more clear.
Conclusions on channels and render performance?
None of the tests we did in Blender showed much if any difference between results when testing with single or dual channel configured. So it turns out that for ray tracing, having more than one channel of memory is simply overkill and doesn't do much to performance.
That said maybe we just didn't try hard enough, could a larger scene benefit? We'll look into it, not being experts in cycles, its speculation, but I certainly notice a lot of single threaded activity going on during the sync part of the render, when its getting ready to start drawing pixels. Multiple channels are like having two queue's at the checkout, the cores can get at the data in memory faster this way, but it only benefits if one of the queues is full, like I said, we try not to fill those queues and it seems Blender doesn't get close to doing this either.
Does Frequency/speed of RAM matter?
When you buy RAM you'll frequently notice (see what I did there? what? I am dad ok, I'm allowed to make puns like that) that there is a timing and frequency quoted for the modules you can buy. Generally, you want higher frequencies and lower timing. The timings, usually in clock cycles, together with the frequency, give you the latency of the RAM, or in layman's terms, how long it takes the RAM to respond to a request from the CPU for data.
Ray tracing, benefits from low latency, as you can see from our results, though only in the Victor benchmark. This might seem a bit weird at first, but thinking (or theorising) about it for a second, Victor is a large scene, weighing in at around 12GB, the other scenes are all less than 1GB in size. So I assume that the hardware has to move 12GB of data around when rendering Victor. Since CPU caches are typically small, the 3900X has 64MB of L3 cache, the 12GB worth of scene data will be transferred from RAM to the CPU at some stage.
Being able to transfer that data faster clearly has a benefit to the render times, but, in my opinion, its likely to be the sync portion of the render that benefits most, more tests to come on that particular nuance.
Increasing the speed also has diminishing returns according to the data from the Victor scene, going from 1600 to 2133 is a much large decrease in total render time than going from 2800 to 3266 (a comparable increase in speed of about 500 ish). Also 3200 is the recommended speed for memory according to the 3900X specs, so memory overclocking beyond that might not be amazing in terms of risk vs reward.
So, frequency matters, if your scene is large. You can get more speed from your system if you make sure your RAM is running at the appropriate frequency. Speed here is a function of the number of cycles it takes for the RAM to respond, lower the better, and the speed. RAM modules with higher speed and lower latency will likely cost more though.
Does the amount of RAM matter?
Looking back, our first experiment with RAM was meant to test the effect of doubling RAM on performance. Now that we have a computer where we can test this in isolation it was heartening to see that our original statements about amount of RAM still hold, we just tested them in a more controlled experiment this time!
So, as you can see from the diagrams, yes, how many bytes you can store in RAM matters, but only in cases where you run out of it! Once you start to need more RAM than is available in the system, your computer has no choice but to start saving data from dormant processes to the hard disk to make space. Its like your computer has to tidy its room before it can let you in to play. This takes time of course, and with hard drives typically responding up to four times slower than RAM (see that diagram above again :) ), its going to cost time.
So, yes, amount of RAM matters a lot, it can make or break your render. Have enough RAM and you'll get the full speed of the CPU or GPU that you bought for the system, run out and you'll be waiting far longer for those pixels to show up.
Why does this happen?
According to people more expert than me, ray tracing engines can spend a lot of their time shuffling data between memory and processor, this paper studied such phenomena. When you see the amount of time as a proportion ray tracing spends accessing RAM, it begins to make sense that lower latency and increasing bandwidth should make a difference to speed, and also why when you make that latency 4 times greater (cause your system ran out of space in RAM and is now using your hard drive).
Final Conclusions
Basically, have enough RAM, which is as fast as the speed your CPU manufacturer recommends. Single channel is probably fine unless you get into territory that only professionals or hyper enthusiasts are likely to go (cause rea$on$), even then, I'm not sure it matters since ray tracing by its very nature tries its best to optimise how it accesses RAM.
Next time we'll overclock it? Leave your thoughts in the comments section!
Commentaires