Friday, September 13, 2013

Accurate Luminance calculation using Pixel Shader

I recently completed my implementation of tonemapping using a few techniques. My aim was to compare the performance between the luminance calculation implementations - Pixel Shader vs Compute Shader. For the pixel shader version, I used the normal downsample technique in which a single bilinear tap is taken to get the average of 4 pixels at a time into a rendertarget which is quarter the size of the previous target, till it reaches the 1*1 size. For the Compute shader technique, I followed the Microsoft sample, and implemented using 2 passes (for 800*600 resolution). I shall talk about this in my next post. Let's focus on the Pixel shader version.

After I completed the implementations, I noticed that I was getting a difference in the average luminance calculated for both techniques. It was kind of hard to determine which was giving me the wrong result. So I made a third technique, which was used for debugging. For this technique, I dispatched a single group which has a single thread, and in this shader, I had looped over all the pixels, took the total sum and divided by the total resolution. For debugging this, I use a 200*200 target for debugging, which was small enough to keep track of what each pixel is doing. Using tools like AMD PerfStudio, I debugged the shader and kept track of what the total/average is.

Now, in order to make sure that I was getting the right value, I used an excel sheet ,and got the luminance calculated in each of the pixels, and from there I calculated the total and the average. From this exercise, I was able to determine that my debug shader was working correctly. I also simulated how the optimized compute shader version worked on the excel sheet, and it looked like the value I was getting the expected value (after fixing a couple of bugs :) )

The Pixel shader version did not work the way I expected it to. The Microsoft sample uses a 3*3 reduction pass and stores the average in each pass, till we reach a 1*1 size texture, which means we take 9 samples in the shader. If you're using a 2*2 reduction shader, you can either use 4 samples and take the average, or take a single bilinear tap at the right spot to get the average of 4 pixels, which is what I used.

The pixel shader method is simple enough, and the results looked alright, but it will not be entirely correct, if the resolution is not divisible by the reduction pass size. For example, for a 2*2 reduction pass, the resolution should be a power of 2 in order to land up with the right result. For a 3*3, the resolution should be a power of 3 for the right result. If this is not the case, then the result you end up with will not be correct! I plotted down the luminance values for a 200*200 resolution target on an excel sheet, and simulated each of the 2*2 reduction passes till I reached a single pixel. I was surprised that the average I got from using this method did not match the total average. The error started to appear when I reduced from an uneven target size (for a 2*2 reduction pass). The more times I reduced from an uneven target, the worst the error got. I tried the same exercise with a power of 2 size and the answer was correct. Why does the error occur? Because, for uneven textures, there are a few issues:

1) When we reduce the the resolution by half on each dimension, we usually assume diving a number and truncating it . So if we have a 3*3 texture, and we want to perform a 2*2 reduction, we land up with a 1*1 texture in the end, because we truncated 1.5 -> 1. This is incorrect, as now we have only one pixel, and doing the bilinear tap gives us the wrong value in the 2*2 neighbourhood where the UV Coordinate points us to. So, when using the reduction pass, use the ceil instead of truncating the resolution when dividing it by 2. That ways you are assuring that all the pixels from the higher res texture are taken into consideration. This means that we have to reduce from a 3*3 to a 2*2 and then to 1*1 instead of going to 1*1 directly.

2) The UV Coordinates for the current target does not match up to the center of the 2*2 neighbourhood in the higher resolution texture, when reducing from a 3*3 to a 2*2.
If you use UV coordinates that are passed in as input, then chances are that you end up with incorrect UV's for sampling the higher res texture. For example, in the 3*3 texture, the top left pixel has a pixel coordinate of 0.5, 0.5 which has the UV coordinate of 0.16666667, 0.16666667. So the 3*3 texture has the following texture coordinates, compared to a 2*2 texture :


So when we take a bilinear tap using the UV coordinates from 2*2 texture on the 3*3 texture, we end up with incorrect weights again, and so adding it up wouldn't match the correct result. The solution for this is to the calculate our own UV coordinates based on the actual dimensions without applying the ceil or truncate. In this case, if the pixel coord is 0.5, 0.5 on the 2*2 texture, we first calculate the actual resolution which is 1.5 * 1.5 (half res from 3*3). and then divide the pixel coord with this resolution. So we end up with a UV coord of 0.3333, 0.3333 which is right in between the top left quadrant of pixel in the 3*3. Following is the illustration:


The Red circles shows the incorrect sample spot when taking the UV coordinates directly from the 2*2 textures. I actually thought this would be right, and it would somehow be adjusted by the weights, but if you take a look at the actual values (or the total), they dont match up. The blue samples shows the right spots to sample based on the calculated UV coordinates values.
For example: we have a 3*3 texture , and we try to reduce this to a 2*2 using a 2*2 reduction pass. The numbers you see in the above image shows which samples on the 3*3 texture affects the pixels in the 2*2 grid.

3) When applying the above fix, the bilinear tap still produces incorrect results because when samples are taken on the edge, it assumes that you are takin an average of 4 pixels instead of 2 or even 1 pixel. In the case of pixels tagged as '2'  or '3', only 2 samples are required, whereas for '4', only one sample is required. This ends up giving us an incorrect average, and the error increases everytime we have an odd edge.

There are couple of options to fix this error. One option is to figure out which is a least number by which the resolution is divisible by and then use that size as the reduction pass. For example, lets say you're working with a 10*10 texture. We could first reduce it by a 2*2 pass, and then the next pass we use a 5*5 pass. The issue with this method is that we have to make all of those shader techniques available and that could have a huge memory consumption. The second option is that instead of taking the average, we keep track of the total on each reduction pass. And then we divide that total by the number of pixels of the target to get the right average. In the case of a 2*2 pass, there are 2 extra steps we need to take:

a. Take a single bilinear tap, and multiply that by 4, and store this result (same as adding up)
b. Use a sampler which uses BORDER as the filtering method (instead of the usual CLAMP), keep the border color as Black.

Let's use an example to understand why this works. For a reduction from 3*3 to 2*2 (using a 2*2 reduction) texture :





The number signifies which pixels from 3*3 targets affect which pixel in 2*2 target. The blue dot on the 3*3 target illustration shows at what spot the bilinear sample was taken for each pixel. In the case of 1, the UV coordinate specified was right at the center of the 4 pixels tagged by '1'. The red boxes shows the pixels belonging to the border. In the case of pixel 2 & 3, 2 samples were taken from the texture, and the other 2 from the border (signified by the red text). In the pixel 4, one sample was taken from the texture, and the other 3 was taken from the border.

If you take the average using the CLAMP addressing mode, you will get the wrong average compared to adding everything up and then taking the average of the total sum. Since the border color is zero, they dont affect the sum of the pixels on the edges as illustrated on the figure.

Once you add up all the values, then divide the total value by the resolution of the first target, to get the correct average.



Following is a video that shows the difference of not using this method, vs, using this method:




Here are the links for the excel sheets that I talked about earlier. You can have a look and understand the kind of values that you get from using the average, vs using the total

https://docs.google.com/file/d/0BwsMeT-EawQrVzl0ek1BYVMycGM/edit?usp=sharing
https://docs.google.com/file/d/0BwsMeT-EawQrdk1uaXpqRGZMR0E/edit?usp=sharing