Many moons ago at Insomniac, we used to use a partial derivative scheme to encode normals into texture maps. Artists complained that it was having detrimental effects on their normal maps, and rightly so. Then one day, our resident math guru Mike Day turned me onto this little trick in order to make amends. I hadn’t come across it before so thought it might be worth sharing.
A common technique for storing normals in a texture map is to compact them down to a 2 component xy pair and reconstruct z in the pixel shader. This is commonly done in conjunction with DXT5 compression since you can store one component in the DXT rgb channels and the other in the DXT alpha channel, and they won’t cross pollute each other during DXT compression stages. DirectX 10 and 11 go a step further and introduce a completely new texture format, BC5, which caters for storing 2 components in a texture map explicitly.
The code to reconstruct a 3 component normal from 2 components typically looks like this:
float3 MakeNormalHemisphere(float2 xy)
{
float3 n;
n.x = xy.x;
n.y = xy.y;
n.z = sqrt(1 - saturate(dot(n.xy, n.xy));
return n;
}
Graphically, what we’re doing is projecting points on the xy plane straight up and using the height of intersection with a unit hemisphere as our z value. Since the x, y, and z components all live on the surface of the unit sphere, no normalization is required.
Is this doing a good job of encoding our normals? Ultimately the answer depends on the source data we’re trying to encode, but let’s assume that we make use of the entire range of directions over the hemisphere with good proportion of those nearing the horizon. In the previous plot, the quads on the surface of the sphere are proportional to the solid angle covered by each texel in the normal map. As our normals near the horizon, we can see these quads are getting larger and larger, meaning that we are losing more and more precision as we try to encode them. This can be illustrated in 3D by plotting the ratio of the differential solid angle over the corresponding 2D area we’d get from projecting it straight down onto the xy plane. Or equivalently, since the graph is symmetrical around the z-axis, we can take a cross section and show it on a 2D plot.
Is this doing a good job of encoding our normals? Ultimately the answer depends on the source data we’re trying to encode, but let’s assume that we make use of the entire range of directions over the hemisphere with good proportion of those nearing the horizon. In the previous plot, the quads on the surface of the sphere are proportional to the solid angle covered by each texel in the normal map. As our normals near the horizon, we can see these quads are getting larger and larger, meaning that we are losing more and more precision as we try to encode them. This can be illustrated in 3D by plotting the ratio of the differential solid angle over the corresponding 2D area we’d get from projecting it straight down onto the xy plane. Or equivalently, since the graph is symmetrical around the z-axis, we can take a cross section and show it on a 2D plot.
The vertical asymptotes at -1 and 1 (or 0 and 1 in uv space if you prefer) indicate that as we near the borders of our normal map texture, our precision gets worse and worse until eventually we have none. Your mileage may vary, but in our case, we tended to use wide ranges of directions in our normal maps and would like to trade off some of the precision in the mid-regions for some extra precision at the edges. Precision in the center is the most important so let’s try and keep that intact. What options do we have? Well it turns out we can improve the precision distribution to better meet our needs with a couple of simple changes…
Nothing is constraining us to using a hemisphere to generate our new z component. By experimenting with different functions which map xy to z, better options become apparent. One function we have had success with is that of a simplified inverted elliptic paraboloid (that’s a mouthful, but is more simple than it sounds).
z = 1 – (x*x + y*y)
This is essentially a paraboloid with the a and b terms both set to 1, and inverted by subtracting the result from 1. Here is a graphical view:
Notice that we don’t have the large quads at the horizon anymore and overall, the distribution looks a lot more evenly spread over the surface. Wait a second though… this doesn’t give us a normalized back vector anymore! Fortunately the math to reconstruct the normalized version is roughly the same as before in terms of cost. We need to do a normalize operation but save on the square root.
Notice that we don’t have the large quads at the horizon anymore and overall, the distribution looks a lot more evenly spread over the surface. Wait a second though… this doesn’t give us a normalized back vector anymore! Fortunately the math to reconstruct the normalized version is roughly the same as before in terms of cost. We need to do a normalize operation but save on the square root.
float3 MakeNormalParaboloid(float2 xy)
{
float3 n;
n.x = xy.x;
n.y = xy.y;
n.z = 1 - saturate(dot(n.xy, n.xy));
return normalize(n);
}
Below, the overlaid green plot shows the angle to texture area ratio of the new scheme. The precision is still highest in the center where we want it, and we’re trading off some precision in the mid-ranges for the ability to better store normals near the horizon.
Looking at the 3D graphs, another observation is that we’re wasting chunks of valid ranges in our encodings. In fact, a quick calculation shows that ~14% of the possible xy combinations are going to waste. A straightforward extension to the scheme would be to use a function which covers the entire [-1, 1] range in x and y to make use of this lost space. There are an infinite amount of such functions, but a particularly simple one is this sort of dual inverted paraboloid shape.
Looking at the 3D graphs, another observation is that we’re wasting chunks of valid ranges in our encodings. In fact, a quick calculation shows that ~14% of the possible xy combinations are going to waste. A straightforward extension to the scheme would be to use a function which covers the entire [-1, 1] range in x and y to make use of this lost space. There are an infinite amount of such functions, but a particularly simple one is this sort of dual inverted paraboloid shape.
z = (1-x^2)(1-y^2)
which looks like:
One downside of this is that we need a little more math in the shaders in order to reconstruct the normal. Because of this, on current generation consoles, we went with the basic inverted paraboloid encoding. Going forward though, with the ratio of ALU ops to memory accesses still getting higher and higher, the latter scheme might make more sense to squeeze a little extra precision out.
One downside of this is that we need a little more math in the shaders in order to reconstruct the normal. Because of this, on current generation consoles, we went with the basic inverted paraboloid encoding. Going forward though, with the ratio of ALU ops to memory accesses still getting higher and higher, the latter scheme might make more sense to squeeze a little extra precision out.
Finally, here is the code used when generating the new normal map texels (the initial circular version not the second square version). The input is the normal you want to encode and the output is the 2D xy which is written into the final normal map.
Vec2 NormalToInvertedParaboloid(Vec3 const &n)
{
float a = (n.x * n.x) + (n.y * n.y);
float b = n.z;
float c = -1.f;
float discriminant = b*b - 4.f*a*c;
float t = (-b + sqrtf(discriminant)) / (2.f * a);
Vec2 p;
p.x = n.x * t;
p.y = n.y * t;
return p; // scale and bias as necessary
}
A question came up if the paraboloid normal representation combines easily with other normals, like partital derivative normals do. I would say, not as easily but combining normals isn't a lot of work regardless of format. One approach I've used in the past is to represent primary normals maps in paraboloid format and represent further detail maps with partial derivatives (it seems a fair tradeoff since horizontal normals aren't as critical for detail maps). This leads to the following code for combining them together:
ReplyDeletefloat3 CombineParaboloidNormalWithPD(float2 pn, float2 pd)
{
float nz = saturate(1 - pn.x * pn.x - pn.y * pn.y);
float3 combined;
combined.xy = (pd * -nz) + pn;
combined.z = nz;
return normalize(combined);
}
Hi,
ReplyDeleteusing the very nice tool from http://aras-p.info/texts/CompactNormalStorage.html ,
I (tried to) implement your method, but I ended up with wrong results, especially in "corners" (you'll see)
you may try that by downloading this tool, and add a textfile in /data :
//=================================================
half4 encode (half3 n, float3 view)
{
float3 src = n;
float2 srcSq = src.xy*src.xy;
float _4a = 4*(srcSq.x + srcSq.y);
float b = src.z;
float discriminant = (b*b) + _4a; // -4ac changed to +4a since c == -1
//float t = (-b + sqrt(discriminant)) / (2.f * a); // /2a becomes /4a since we should mult. res by 0.5
float t = (-b + sqrt(discriminant)) / _4a;
float2 res = float2(src.xy) *t; // *0.5
res += float2(0.5, 0.5);
return half4 (res, 0.0, 0.0);
}
half3 decode (half2 enc, float3 view)
{
float2 src = enc;
src = src - float2(0.5, 0.5);
src *= float2 (2.0, 2.0);
float2 tmp = src*src;
tmp = float2(1.0, 1.0)-tmp;
float3 res = normalize(float3(src, tmp.x*tmp.y));
return res;
}
//=================================================
note, the error is way smaller with an orthographic view : replace line 366 in tester.Cpp by
D3DXMatrixOrthoLH ( &s_ProjMatrix, float(d3d::SCR_WIDTH)/150.0, float(d3d::SCR_HEIGHT)/150.0, 0.1f, 100.0f );
That's annoying because I really liked your idea :/
Hi Kaseigan
ReplyDeleteThanks for the comment. I had read that page before but hadn't played around with the test app - nice little app. Your implementation is spot on but the issue is that Aras is looking at encoding schemes for viewspace normals, whereas this method (and method #1 on that page) only concern themselves with normals which live strictly within a hemisphere. The common use case for this scheme would be encode tangent space normal maps. Viewspace normals are different in that they can and do go outside the hemisphere; the wider the FOV, the wider the range of normals you need to be able to encode. That would explain why you were seeing less error when you switched to an orthographic view, in which case viewspace normals would be constrained to a within a hemisphere again. The pdf link on that page in the description of Method #1 goes into details on why viewspace normals combined with perspective projection may go outside the hemisphere. Hope that helps.