Chair Warming Facility: Text Rendering Optimisation

I finally got round to optimising the text rendering in my PSM game. I was seeing around 75 ms frame time on my text heavy stats screen because of the large amount of text, it's now down to around 3 ms.

Text Rendering Introduction

PSM comes with a Font class that lets you load TTF fonts. To get them on screen they need to be drawn to an Image object, which is then copied into a Texture2D object, which can finally be displayed using a draw call and vertex buffer of your choice. Drawing to Image and copying Image to Texture2D are both slow operations.

You need to use an Image and Texture2D that's big enough to contain your text. You can use the Font class to find out how big a piece of text will be. Querying the font class like this can be slow when called many times in a single frame.

Before

My initial implementation created 1 texture for each string I wanted to render. This also had a cache so duplicate strings in the same frame used the same texture. The cache was used across frames too.

The downside is this resulted in 1 draw call per string. The draw call count was the only problem I was aware of before I started optimising. In hindsight I should have spent more time profiling to find the bottlenecks before starting on the optimisation.

After

A lot of the things I draw generate a bunch of filled rectangle and font draw calls all mixed together, which leads to a lot of renderer state changes and draw calls. I decided to put these into a custom display list where they could be sorted into batches while preserving the order of overlapping objects. If two things overlap then whatever was drawn last should always appear on top.

This reduced all the rectangles to a few draw calls. The worst case for the text still needed 1 draw call per string because my shaders need a new draw call each time you set the texture. The next change was to put all the text into a single texture atlas so text could use fewer draw calls. I used a similar cache as before, duplicate text shares the same space in the texture and the texture can be re-used across frames.

These changes got rid of most of the draw calls, but didn't reduce the frame time as much as I hoped. After more profiling the culprit was Font.GetTextWidth. My display list needs to know the extents of each object so it can handle overlapping objects, which means calling Font.GetTextWidth a lot. I could have cached the entire display list but decided to cache the result of Font.GetTextWidth because it was easier than modifying my UI rendering code.

For the final touch I turned on vertex buffer buckets in my renderer. If a draw call only has a few vertices you don't want to be uploading a massive, mostly unused, vertex buffer to the GPU. I had left it off for development because it increases app startup time, but it takes so long now I probably won't notice any difference.

Conclusion

With enough caching all the really slow things can be moved from the regular frame update and into initialisation time, in other words, when moving between UI screens.

Using the correct size vertex buffer and batching things into fewer draw calls helps too.

I suggest reducing the number of times you call these functions:

Font.GetTextWidth (~500 us per call)
Image.DrawText (~1.5 ms per call)
Texture2D.SetPixels
GraphicsContext.DrawArrays

I found that calling Texture2D.SetPixels a few times on large textures is about 10 to 100 times more expensive than calling it lots of times on small textures. I might have done something wrong but I've reached my target frame time so will look at it another time.

Text

Sunday, 6 October 2013

Text Rendering Optimisation

Text Rendering Introduction

Before

After

Conclusion