Saturday, March 30, 2013

Using SDK Meshes from DX11 Tutorials

The DX Tutorials come with a bunch of assets which is in the SDK Mesh format. The issue with this format, when compared to the previous .X Format is that the new SDK Mesh format is in binary, and .X atleast had the option for being in text. The good news is that you can still use all these assets in your engine. I managed to reuse the SDK Mesh models that DX Tutorials has provided with my own Engine.

You could just use the DXUT framework to load them, or, be like me and incorporate the loading into your own engine. All of the code is in SDKMesh.h and SDKMesh.cpp in any of the DX Samples using the DXUT Framework. If you notice the code, the framework loads the binary file as an array of bytes, and fixes the pointers to point into areas in this array. All of the pointer fixups are based on the SDK Mesh definitions.

The main thing to remember is the vertex format. The following is the format you have to use for these meshes ( atleast for the power plant model) :

struct TexNormal_VS_Input
{
float3 pos: POSITION;
float3 normal: NORMAL0;
float2 tex: TEXCOORD0;
};

If there's a difference for other models, I will definitely update this post.
 
Once you have this set up, the rest is the same as the framework. The good thing about the SDK Mesh format is that there are pointers reserved for the actual Buffers and texture handles. Also you do not have to allocate extra memory. All of the work is already done when you load the binary data and do the pointer fixup based on the headers. All you have to do is make sure the buffers and textures are bound to the right spot before you make the draw call. If you already have used shader reflection to determine your binding points for each of the shader resources and created a framework around that (like I did), then setting up the draw call should be straight forward.

Following is a screenshot from the engine which loads up the Power Plant SDK Mesh provided by the DX Tutorials. The shader just displays the Diffuse part of the mesh.
 


Next Stop: Experimenting with different lighting methods (forward vs  deferred vs tiled forward vs tiled deferred)

Sunday, March 3, 2013

Task Based Game Engines

It's been a few months since my last blog, because work has really picked up and I haven't had much time to work on my own engine. But, whenever I do get a chance, I poke around my engine a bit to see how crazy I can make it.

So far I managed to do the following:

1. Shader Reflection
2. Add webserver for debug controls through web page
3. Redo all the DX based tutorials using a sort of platform independent architecture - restrict all the platform dependent stuff to a real low level and make sure higher level is all the same)
4. Multhi-threading

Today I'm going to talk about the last point: Multi-threading.

I had zero experience in multi threading. Sure there was that course during my undergrads where we memorize about few multithreading concepts, but never had a chance to actually work on it. And now, with the advent of multi-core machines, making a game engine multithreaded is probably a good way to go because we need to make every cycle count, and perform as much computations as possible per frame to reach our goal: Creating a perfect world.

A recent discussion with my colleagues inspired to investigate more about Task-Based Architectures. After searching around the net, I came across this link:

http://software.intel.com/sites/default/files/m/d/4/1/d/8/Efficient_scaling_in_a_Task-Based_Game_Engine16-9.pdf

After reading this and watching the presentation, I realized that we didn't have to spend too much time thinking about how to order the threads for the best CPU utilization. This architecture made it possible to mostly utilize all the cores, no matter the amount of cores, to execute all of the updates and also the rendering. So I built a basic task assignment based system, and what I noticed is that the results are amazing.

Following is an easy way to implement this system:
1. Detect number of cores
2. Assign thread affinity of main thread to Core 0
3. Create N-1 threads (one thread for each core, except for Core 0) in suspended state
4. Have a task scheduler to assign tasks to a free thread
5. Once task is done,let the thread pick up next task from the task queue

On my laptop with 8 logical cores, it was pretty cool to see all the cores getting used (using Task Manager). Now, you don't really have to worry about changing code based on the number of cores. This architecture automatically takes care of that. Also if you're planning to try out deferred context rendering, this is also an awesome way to use it, as you can assign different tasks to create draw commands in parallel and then use the immediate context to execute the deferred contexts in order ! This is the plan for me. Watch out for more updates on this.