Sunday, March 16, 2008

Multithreading in Clouds

The Clouds demo posed an interesting problem for me. It required large amounts of CPU power, while at the same time I was finding it difficult to optimise the code any further. The cloud generation and render setup was hitting the CPU hard - and only 50% of my dual core CPU was being utilised.

While I had dabbled in multithreading before, I'd never applied it to any real-life applications. Clouds was in a unique situation: it was a CPU-intensive application, but had absolutely no requirements regarding latency. So I was free to design the most rediculously input-latency-heavy system I could think of.

I immediately split the application into 2 threads - simulation and render. Simulation would generate perlin noise, do the day/night cycles, and everything relating to the world itself. The render thread would, well, render. The render thread and the simulation thread are completely decoupled - they can both run at any speed and neither would affect the other.

I chose a buffered approach to communicate between the two threads. The simulation thread would write information to a buffer, and the render thread would then take that information and use it to draw stuff to the screen. I started off with 3 buffers - one for simulation and two for render. Since the simulation thread was locked to 10hz and the render thread ran as fast as it could, the render thread needed to "fill in the blanks" somehow between each 100ms long timestep. So the render thread uses its two buffers to interpolate between, to produce smooth looking results while the simulation thread writes to the third buffer. Here's what it looks like:

Here's the problem: this leaves the system open to data starvation. If the simulation thread finishes its simulation before the render thread is done, then the simulation thread has to sit there and wait until the render thread is done with one of its two buffers. However, if the render thread finishes before the simulation thread is done, then the render thread has to sit there and wait for the simulation thread to finish with its current timestep before it can continue interpolating.

So I added a fourth buffer. The addition of the fourth buffer meant that input latency had grown to a rediculously massive 400-500ms. But that didn't matter, since I didn't have any input that could affect the simulation! The fourth buffer sits in between the simulation thread and render thread, and it donates itself to whoever needs it most. If the simulation thread is done before the render thread, the fourth buffer goes to the simulation thread so it can begin writing the next timestep's data. If the render thread is done before the simulation thread, the fourth buffer goes to the render thread so it can continue interpolating and rendering stuff. So here's the final flow:

Because of this buffered approach and the fact that the two threads were completely decoupled, syncronisation overhead is basically nonexistant. However, this system would not be suitable for usage in a real game, since it introduces impossibly large amounts of input latency to the game.

However, thanks to the helpful members of GameDev.Net, I now have some ideas for how to implementation of this system in a real game.