Monthly Archives: February 2014

Using bitmasks for draw call sorting

As we wanted our graphics engine to be completely separate from the rest of the code, we needed a way to be able to assure order independent draw calls. To do this, we queue every draw call made to the graphics and store it for later. The actual drawing then happens at one time during the frame as opposed to at the time when the draw call is made.

We decided to use a bitmask sorting system as a tool for batching our draw calls. Our method is inspired by the blog post by Christer Ericson (

The idea is to pack every piece of information needed for a draw call into a single key.

In our case we use a 64-bit key to describe object type, mesh, material, depth etc. Other information, such as position, scale, and rotation, will be sent as a pointer along with the key to use with the actual rendering. We are using this key/value-interface for all draw calls to our graphics, including lights.

The values are stored in order of the most significant bit. If we want to sort mesh before material, we put the mesh bits in more significant bits than the material. By compiling the key this way, a simple sorting algorithm can then be used to sort all keys.

For example, imagine we want to sort our draw calls by mesh, and for each mesh we want to draw all instances with the same material together to minimize state changes. Sorting the bitmask keys before rendering will give us the result we want.

So why did we decide to use this method of sending draw calls?

Using this method allows us to have a relatively “dumb” renderer, i.e., a renderer with minimal logic and branching, making it faster as well as making it easier to maintain. It also works well with a data-oriented system.

As we started to implement bitmask sorting, we were surprised to see that it was such an easy system to extend. For example, as we implemented instancing, all that had to be done was pretty much skipping a few of the draw calls in the sorted list. This even applied for lighting, which was easily implemented using the same interface.

Kravall Level Editor

To save on development time we decided to make our editor as a plugin for a 3D modeling application. This prevents us from having to reinvent the wheel, as we receive most of the basic functionality expected from a level editor for free. We have people familiar with 3DS Max, Maya and Blender, but in the end we decided to go with Blender.

Here is a video explaining some of the features of the level editor.

There are several reasons that we decided to go with Blender.

  • It’s free and open source, so if we would want to continue using the editor after school is over, we wouldn’t have to invest in expensive software.
  • It works on both Windows and Linux, which is pretty important since 3 out of 11 of us develop in Linux.
  • It has hotkey presets for Maya and 3DS Max, so none of our artists who all come from different software would have to relearn all the hotkeys.

Core Engine Design: Our Entity-Component Based Framework

Since before we begun development of Kravall we were aware that the game we were going to develop was going to be resource intensive, primarily due to the AI systems that would rely heavily on calculation intensive algorithms. One way we mitigate this is by moving some of heaviest algorithms over to the GPU, another is by trying to have a performance focused design philosophy for the primary code-paths of the games reoccurring (per frame) calculations.

In the architecture of Kravall’s engine we are following a design philosophy called DOD (Data Oriented Design). The primary idea of this philosophy is to try to minimize memory access delays by designing the program to have memory access patterns that maximize the utilization of the CPU cache. The primary way this is achieved is by trying to allocate memory linearly and tightly packed, trying to have the program access it linearly, minimizing cache misses and using modern processors automatic memory pre-fetching mechanisms.

A practical implementation of this philosophy and the primary way this thinking is realized is in the implementation of our Entity-Component Based Framework heavily inspired by the Artemis Entity System Framework.

The Entity Component Framework is the central core for all logic driving the game. An Entity is a generic object in the world, which consist of one or many instances of different components which, in turn, contain a set of data, this setup of components will be the identity of the entity and define how it will be treated by the game engine. As an example: in the game engine there might be a WorldPositionComponent and a VelocityComponent. If we instance (create) a new entity and give it an instance each of the aforementioned components the game engine will assume, simply due to the topology of the entity, that the velocity values contained in the VelocityComponent should be applied to the WorldPositionComponents world position data each frame making the entity move in the world space (which is then visualized when the rendering system uses the WorldPositionComponent and unmentioned GraphicsComponent). This form of programming allows for some interesting and dynamic combinations of components if the components are well designed (e.g. add a sound component to the entity and it will give off a sound from its position).

The manipulation of the components occurs in what we call Systems. Systems manipulate data and moves it between the components within (and sometimes between) entities. The systems define an Aspect, a list, of components that it is interested in, essentially subscribing to all entities that match the Aspect and perform its specific calculation on them in sequential order. For the previous example we would have defined a system which performs the VelocityComponent to WorldPositionComponent calculation, that are run each frame, moving the entity.


The reason why an Entity-Component Based Framework can work so well in combination with Data Oriented Design is the assumption that one component instance will usually be accessed and calculated close to (in time) other component instances of the same type, this assumption comes from the way systems are designed: only manipulating a single setup of components with some specific transformation. The way we use this information is by allocating all the component instances data sequentially in large memory blocks, one block for each type of component.

Another boost comes from running the same code block in quick succession, not moving to far around in the programs execution memory, which can also result in cache misses if the program is exceedingly large. In contrast, large Object Oriented Designs can do very poorly when the execution chains becomes large and inheritance chains are deep, possibly causing cache misses that add many unnecessary CPU cycles, waiting for memory.

In practice we allocate a large block of data for each type of component at the start of the program (reallocating it when we run out of memory), and then assign parts of these memory blocks to entities as they are created and given a component setup, minimizing allocation run times in the game engine. As a result, creating, destroying and reclaiming entities and component data is without any system memory allocation. Depending on the creation and destruction patterns of entities these memory blocks might become fragmented and the execution order might suffer which can create a performance loss, the effect of this problem and scope in common usage patterns is unknown and is therefore not yet addressed by our implementation but is of interest in our future work.

Max Danielsson (Technical Lead)