Monday, August 17, 2020

Path Trace Visualization - Part 2

This is a continuation from Part 1. The main limitations for the results from Part 1 is that you can only capture for a certain number of frames. Also if you move the camera during the time of the capture the rays will get recorded from a new point based on where the camera is. For this part I will focus on getting a continuous capture and detaching the camera during the visualization. There is going to be some slight modifications on the algorithm from Part 1. The aim is to replicate the results as this video.


Resources Needed 

The text in bold is updated from Part 1.

1. 2 Large structured buffer that stores a PathCaptureEntry for each line segment in the path. 

2. 2 "Counter" buffer to maintain how many line segments are there. 

3. Indirect arguments buffer for instanced indirect line draw call. Initialize the values to {2,0,0,0} (2 is the vertex count and 0 is the instance count. Last two 0's for vertex/instance offset)

4. Linear Depth buffer to store depth of first hit from camera

Continuous capture

This feature enabled me to capture the rays continuously so I could observe the patterns when changing the material properties. In order to achieve this,  the constant buffer is updated with fields in bold:

enum PathTraceFlags
{
PATHTRACE_FLAGS_CONTINUOUSCAPTURE = (1<<0),
};

struct PathTraceVisualizationConstants
{
uint2 resolution;
uint2 mousePosition;
uint maxPathEntryCount;
uint maxPathFrameCollection;
uint pathIdFrameNumber;
int pathDebugId;
int boundDebugId;

uint flags;
float pathCaptureFade;
        //add padding if needed
};

We need 2 buffers and counters in order to pass the previous frames data from one frame to another. This way we can add a nice fade to the older rays. The algorithm for recording the path entries is slightly tweaked from previous part:

1. If in case this is the first frame of the capture, we can clear pathCaptureEntryCountUav[0] to 0. Otherwise we ping pong between the 2 buffers/counters. One is from previous frame and one is for current frame. All the entries from the previous frame has its alpha updated based on the "pathCaptureFade" and if the alpha reaches 0 or less, it gets rejected. The rest are appended to the current buffer. Following is the compute shader for that.

[numthreads(64, 1, 1)] 
void CS_PathTraceUpdateCapturedPaths(uint3 threadId : SV_DispatchThreadID) 
    if (threadId.x < pathCaptureEntryCount[0]) 
    
        PathCaptureEntry entry = pathCaptureEntries[threadId.x]; 
        entry.alpha -= constantsCB.pathCaptureFade; 
        //we dont care about pathID if its a continous capture
        uint pathId = (constantsCB.flags & PATHTRACE_FLAGS_CONTINUOUSCAPTURE) ? 0 : entry.pathId; 
        //do not copy entry if alpha <= 0.0f
        if (entry.alpha > 0.0f) 
        
            uint currentIndex = 0; 
            InterlockedAdd(pathCaptureEntryCountUav[0], 1, currentIndex); 
            if (currentIndex < constantsCB.maxDebugEntryCount) 
            
                pathCaptureEntriesUav[currentIndex] = entry; 
            
        
    
}

2. Select a pixel on the screen (mouse click/text entry/hard coded) and pass that info to the shader.  During path tracing, when you get a hit and the compute shader thread Id matches the pixel position, add the path entry to the buffer if there's enough space in it. This is the same as Part 1. 


3. When updating the indirect Args, I reset the previous frames counter to avoid doing an extra dispatch at the start of the frame. 

[numthreads(1,1,1)]
void CS_UpdateCapturePathIndirectArgs(uint3 threadId : SV_DispatchThreadID)
{
pathCaptureEntryCountUav[0] = 0; //previous frame's counter
indirectPathDrawArgsUav[1] = pathCaptureEntryCount[0]; //copying from current frame's counter
}

The rest is the same as Part1. The current frames path entries and counter is used for the rest of the passes, and we flip the index that accesses these buffers at the end of the frame. There might be a better way to keep it as single buffer instead of a double buffer.

Once you reach this point, you'll get a nice fade for the older rays and you do not have to stop capturing. But you'll notice that once you start moving the camera, the selected hit points move along with it, resulting in the following video:


This will be resolved in the next section.

Detaching the camera

In order to fix this issue of the newly spawned rays getting attached to the camera we need to tweak the algorithm a bit more. Instead of having 1 camera for scene, we need 2 cameras. One for the scene, and second for capturing the paths. Here's the tweaked (again) algorithm.

1. Make sure the camera is not detached yet. This means that both scene camera and path capture camera are identical. Select the pixel to be captured. Now you should be able to detach the camera and move freely. This means that only the scene camera is update based on input, but path capture camera is not updated. If the "detach camera" is unchecked, the scene camera is snapped back to the path capture camera and both are identical and updated all the time.

2. Clearing/updating paths is the same as before.

2. Path Trace the scene from the scene Camera, skip writing out path entries, but write out distance to distanceBufferUav

3.  Path Trace the scene again but from path capture Camera. Skip writing out color/distance, but call the AddPathEntry function

Steps 2/3 can reuse the same function but have an additional bool:

void CS_PathTrace_Common(uint3 threadId, bool capturePaths).

and this can be used in the following places:

if (capturePaths)
    AddPathEntry(threadId.xy, entry, true);
}

and

if (!capturePaths)
    output[threadId.xy] = float4(result, 0.0f); //color result from path trace
    linearDepthUav[threadId.xy] = firstHitDepth; //distance write out
}

and then you can have 2 compute shaders using the same function

[numthreads(8, 8, 1)] 
void CS_PathTrace(uint3 threadId : SV_DispatchThreadID) 
    CS_PathTrace_Common(threadId, false); 

[numthreads(8, 8, 1)] 
void CS_PathTrace_CapturePaths(uint3 threadId : SV_DispatchThreadID) 
    CS_PathTrace_Common(threadId, true); 
}

This ways you do not have to recopy the code and the compiler will optimize out the paths that do not get executed for each of those entry points

3. Use the Indirect Draw as before.

Once you have this setup, you should be able to get the same results as the very first video.

If you want to see the path of the rays upto a certain bounce, instead of isolating only that bounce, you can update the vertex shader to the following (update in bold):

PS_PathDrawInput VS_PathDraw(VS_PathDrawInput input) 
    PathCaptureEntry pathEntry = pathCaptureEntries[input.instanceID]; 
    //choose between start/end based on vertexID
    float3 position = (input.vertexID & 1)?pathEntry.endPosition:pathEntry.startPosition; 
    if (constantsCB.pathDebugId >= 0 && constantsCB.pathDebugId != pathEntry.pathId) 
        position = 0.0f; 
    if (constantsCB.boundDebugId >= 0 && constantsCB.boundDebugId < pathEntry.bounce) //this will reject bounces after the specified one 
        position = 0.0f; 
    PS_PathDrawInput output; 
    output.pos = mul(float4(position, 1.f), CameraConstantsCB.viewProjectionMtx);        output.worldPos = float4(position, 1.0f); 
    output.col = float4(pathEntry.color, pathEntry.alpha); 
    return output; 

Following video demonstrates this effect. You can see bounce 0 and 1 together while rest are rejected:


Now there's still one limitation left to be solved (there could be others also but ignoring them). The camera cannot be detached when starting a capture. Reason is that the selected pixel will be based on the scene camera but those same values are used for the path capture which could be in a completely different spot. This is handled in the next section.

Start capture while detached

In order for us to start capture with a detached camera, we need to transform the selected pixel from scene camera a pixel coordinate in path capture camera space. Following is the transformations required:

PixelSceneCam ⇒ World Space ⇒ pixelPathCaptureCam

In order to go from PixelSceneCam ⇒ World Space, we need to make use of the distance buffer. This means that the transformation has to be done on the gpu in order to avoid reading back the distance buffer to the CPU. To achieve this, I added an extra buffer 

RWBuffer<uint2> debugMousePositionUav : register(u5);

And an extra dispatch right after the "Path Trace the scene from the scene Camera" step after the distance buffer is updated. and before the "second path trace from path capture camera" step.

[numthreads(1, 1, 1)] 
void CS_UpdateMousePosition(uint3 threadId : SV_DispatchThreadID) 
    //all this can probably be simplified.
    const CameraData camera = CameraConstantsCB; 
    const CameraData cameraCapture = CameraCaptureConstantsCB; 
    //convert mouse pos to world space selected position 
    float2 mousePos = (float2)constantsCB.mousePosition + 0.5f; 
    float2 ndcPos = mousePos * constantsCB.invResolution; 
    ndcPos.y = 1.0f - ndcPos.y; //flip y 
    ndcPos = ndcPos * 2.0f - 1.0f; //convert from [0 1] to [-1 1] 
    ndcPos.x *= camera.aspectRatio; //apply aspect ratio 
    ndcPos *= camera.tanFOV; //apply field of view 
    float3 viewSpaceRay = float3(ndcPos, 1.0f);
    viewSpaceRay = normalize(viewSpaceRay); 
    float3 worldSpaceRay = mul(viewSpaceRay, (float3x3)camera.inverseViewMtx); 
    float linearDepthSample = linearDepth.Load(uint3(mousePos, 0)).x; 
    float3 worldSpacePos = camera.eye.xyz + worldSpaceRay * linearDepthSample; //convert worldSpace Pos to capture camera space 
    float4 capturePos = mul(float4(worldSpacePos, 1.0f), cameraCapture.viewProjectionMtx); 
    capturePos.xy /= capturePos.w; 
    capturePos.xy = capturePos.xy * 0.5f + 0.5f; 
    capturePos.y = 1.0f - capturePos.y; 
    float2 mouseCapturePos = capturePos.xy * PathTraceConstantsCB.resolution; //update mouse positions 
    debugMousePositionUav[0].xy = uint2(mouseCapturePos); 
}

Now that the transformed pixel position is stored on the gpu, we need to update the AddPathEntry function to use the gpu resource instead. (updated section in BOLD below)

void AddPathEntry(uint2 threadId, PathCaptureEntry entry) 
    //this will ensure only 1 thread writes to the instance count and appends to the list 
    if (all(threadId.xy == debugMousePosition[0])) //instead of constantsCB.mousePosition
    
        if(pathId < constantsCB.maxPathFrameCollection) //i set this to 1000
        {
            uint currentIndex = 0; 
            InterlockedAdd(pathCaptureEntryCountUav[0], 1, currentIndex); 
            if (currentIndex < constantsCB.maxDebugEntryCount) 
            
                pathCaptureEntriesUav[currentIndex] = entry;          
            }
        }
    }
}

The neat feature this adds is that you can check if a specific point has any direct hits from the path capture camera. Following is a video that demontrates this.





That completes the tutorial for visualizing the path trace. This trick has been very helpful for me in catching bugs and undertanding how the specular GGX BRDF works. I hope this trick works out for  you also. You can use this same trick to collect any sort of debug info for any technique.

Here is the same video from above that uses this feature to visualize how roughness affects the microfacet normal distribution using GGX.




And here is an example of catching a bug.




Thanks for taking the time check this out. If you have any comments/feedback, feel free to comment below.

Path Trace Visualization - Part 1

 It's been sometime (again) since i wrote my last post as I've been super busy with work/family. Due to the pandemic times, I finally got some time to work on my very own personal GPU path tracer. I've been tweeting  regular updates on twitter under the handle @createthematrix. There was one particular update that caught the attention of many folks on twitter. And that is related to visualizing the path trace itself. Here is the link to that post. So I decided to write a blog post about the implementation as I feel that it could help many folks and could also be applied for other purposes also. This feature is very useful for catching isues, understanding how different BRDFs work, sampling scheme and so on in terms of path tracing.

This is by far not the best way to implement the visualization but it is an implementation. I have implemented this in C++/HLSL for both Vulkan/DX12 API. My engine abstracts out the explicit Vulkan/DX12 API calls to graphics interface. Hopefully this page can explain the details of the implementation in such a way that you can apply this to your engine also.

I'll start with explaining how to do a capture from the current camera itself for a certain number of frames. I'm going to assume that the reader already has a path tracer implemented as a compute shader dispatch where each thread represents a pixel on the output. I am also assuming that you have the framework to send data to the gpu via constant buffers.

This is the data used for the path entry:

struct PathCaptureEntry 
{
float3 startPosition;
uint pathId;
float3 endPosition;
uint bounce;
        float3 color;
        float alpha; 
};

And here are the resources used in the shader:

struct PathTraceVisualizationConstants
{
uint2 resolution;
uint2 mousePosition;
uint maxPathEntryCount;
uint maxPathFrameCollection;
uint pathIdFrameNumber;
int pathDebugId;
int boundDebugId;
        //add padding if needed
};

ConstantBuffer<PathTraceVisualizationConstants> constantsCB : register(b0);
RWStructuredBuffer<PathCaptureEntry> pathCaptureEntriesUav : register(u0);
RWBuffer<uint> pathCaptureEntryCountUav : register(u1);
RWTexture2D<float> distanceBufferUav : register(u2);

I've excluded all the other resources from the path tracer itself as we're focusing only on the visualization portion. The idea is to generate path entries during the path trace, and then draw them.

Resources Needed

1. Large structured buffer that stores a PathCaptureEntry for each line segment in the path. 

2. "Counter" buffer to maintain how many line segments are there. 

3. Indirect arguments buffer for instanced indirect line draw call. Initialize the values to {2,0,0,0} (2 is the vertex count and 0 is the instance count. Last two 0's for vertex/instance offset)

4. Distance buffer to store distance of first hit from camera. Not calling it "depth buffer" as this is storing distance from camera and not "nonlinear projected depth"

Storing the path entries

Here is the high level algorithm for writing out the path entries to the buffer :

1. Reset the counter to 0 at start of frame. 

2. Select a pixel on the screen (mouse click/text entry/hard coded) and pass that info to the shader.  During path tracing, when you get a hit and the compute shader thread Id matches the pixel position, add the path entry to the buffer if there's enough space in it. Following is the function for it

void AddPathEntry(uint2 threadId, PathCaptureEntry entry) 
    //this will ensure only 1 thread writes to the instance count and appends to the list 
    if (all(threadId.xy == constantsCB.mousePosition)) 
    
        if(pathId < constantsCB.maxPathFrameCollection) //i set this to 1000
        {
            uint currentIndex = 0; 
            InterlockedAdd(pathCaptureEntryCountUav[0], 1, currentIndex); 
            if (currentIndex < constantsCB.maxDebugEntryCount) 
            
                pathCaptureEntriesUav[currentIndex] = entry;          
            }
        }
    }
}

3. Also write out the distance from camera to first hit position into distanceBufferUav. This will be used later.

Now you can add path entries whenever you have a hit position like this:

PathCaptureEntry hitEntry; 
hitEntry.startPosition = ray.startPos; 
hitEntry.endPosition = hitInfo.worldPosition; 
hitEntry.bounce = i; 
hitEntry.pathId = pathId; 
hitEntry.alpha = 1.0f; 
hitEntry.color = float3(1.0f, 1.0f, 0.0f); 
AddPathEntry(threadId.xy, hitEntry);

For a miss its this: 

PathCaptureEntry missEntry;
missEntry.startPosition = ray.startPos;
missEntry.endPosition = ray.startPos + ray.direction * 100.0f; //or any distance you want
missEntry.bounce = i;
missEntry.pathId = pathId;
missEntry.alpha = 1.0f;
missEntry.color = float3(0.0f, 1.0f, 1.0f);
AddPathEntry(threadId.xy, missEntry);

For surface normals you could have this:

PathCaptureEntry surfaceNormalEntry;
surfaceNormalEntry.startPosition = ray.startPos;
surfaceNormalEntry.endPosition = ray.startPos + hitInfo.worldNormal * 0.5f; //or any distance you want
surfaceNormalEntry.bounce = i;
surfaceNormalEntry.pathId = pathId;
surfaceNormalEntry.alpha = 1.0f;
surfaceNormalEntry.color = float3(1.0f, 0.0f, 1.0f);
AddPathEntry(threadId.xy, surfaceNormalEntry);

If you have reached this far, then at the end of your path trace dispatch, you should have a counter set to some value and some buffer entries for upto constantsCB.maxPathFrameCollection frames if you selected a pixel on screen with valid info. 

Here's a screenshot of the path entries buffer in renderdoc:


This is the total count (an example):

And here is the distance buffer:

Now how do we render this info?

Rendering the path entries

 This is done through an InstancedIndirect draw call using lines as the primitive. Please do not leave it as a triangle primitive type. Since the indirect args already has the values {2,0,0,0}, we'll just need to update the instance count. Following is a compute shader to do just that.

[numthreads(1,1,1)] 
void CS_UpdateCapturePathIndirectArgs(uint3 threadId : SV_DispatchThreadID) 
    indirectPathDrawArgsUav[1] = pathCaptureEntryCount[0]; 
}

As an option you can directly increment indirectPathDrawArgsUav[1] in AddPathEntry but I decided to keep the counter and indirect arguments as separate for easier debugging.

Now that we have the indirect args, path entries buffer and a distance buffer, we can do an InstanceIndirectDraw with lines. I added this draw after the tonemapping pass directly on swapchain itself. The distance buffer is used for discarding pixels in the pixel shader if it fails the depth test. You could generate a non linear depth buffer similar to how graphics pipeline does it and then use the HW depth test, but in my case, I kept it simple. The pathID/bounceID stored from the path trace comes into use here for isolation. This means that you can specify which path or bounce you want. Following are the shader details:

struct VS_PathDrawInput 
    uint vertexID : SV_VertexID; 
    uint instanceID : SV_InstanceID; 
}; 

struct PS_PathDrawInput 
    float4 pos : SV_POSITION; 
    float4 worldPos : POSITION0; 
    float4 col : COLOR0; 
}; 

PS_PathDrawInput VS_PathDraw(VS_PathDrawInput input) 
    PathCaptureEntry pathEntry = pathCaptureEntries[input.instanceID]; 
    //choose between start/end based on vertexID
    float3 position = (input.vertexID & 1)?pathEntry.endPosition:pathEntry.startPosition; 
    if (constantsCB.pathDebugId >= 0 && constantsCB.pathDebugId != pathEntry.pathId) 
        position = 0.0f; 
    if (constantsCB.boundDebugId >= 0 && constantsCB.boundDebugId != pathEntry.bounce) 
        position = 0.0f; 
    PS_PathDrawInput output; 
    output.pos = mul(float4(position, 1.f), CameraConstantsCB.viewProjectionMtx);        output.worldPos = float4(position, 1.0f); 
    output.col = float4(pathEntry.color, pathEntry.alpha); 
    return output; 

float4 PS_PathDraw(PS_PathDrawInput input) : SV_Target 
    float dist = length(input.worldPos.xyz - CameraConstantsCB.eye.xyz);//more optimal to use distSqr
    uint2 pixelPos = (uint2)input.pos.xy; 
    float distanceSample = distanceBuffer.Load(uint3(pixelPos, 0)).x; // depth test in shader
    if (distanceSample < dist) 
        discard; 
    return input.col; 
}

Here's a screenshot of all the captured rays:

With path isolation:



If you have followed these steps, you'll be able to get similar results like above. While capturing the frames, it would be best not to move the camera as the starting point for the ray will change. We will talk about how to fix this and be able to get a continuous capture going with a detached camera in Part 2.

If you have any questions/feedback (or find any issues), feel free to add comments. Thanks for taking the time to read this.