Thursday, September 29, 2016

libsquish's DXT1 "Cluster Fit" method applied to ETC1

libsquish (a popular DXT encoding library) internally uses a total ordering based method to find high-quality DXT endpoints. This method can also be applied to ETC1 encoding, using the equations in rg_etc1's optimizer's remarks to solve for optimal subblock colors given each possible selector distribution in the total ordering and the current best intensity index and subblock color.

I don't actually compute the total ordering, I instead iterate over all selector distributions present in the total ordering because the actual per-pixel selector values don't matter to the solver. A hash table is also used to prevent the optimizer from evaluating a trial solution more than once.

Single threaded results:

perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim03.png 768x512

--- basislib Quality: 4
basislib time: 5.644
basislib ETC image Error: Max:  70, Mean: 1.952, MSE: 8.220, RMSE: 2.867, PSNR: 38.982, SSIM: 0.964853

--- etc2comp effort: 10
etc2comp time: 75.792851
etc2comp Error: Max:  75, Mean: 1.925, MSE: 8.009, RMSE: 2.830, PSNR: 39.095, SSIM: 0.965339

--- etcpak time: 0.006
etcpak Error: Max:  80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.944697

--- ispc_etc time: 1.021655
ispc_etc1 Error: Max:  75, Mean: 1.965, MSE: 8.280, RMSE: 2.877, PSNR: 38.951, SSIM: 0.963916



After enabling multithreading (40 threads) in those encoders that support it:

J:\dev\basislib1\bin>texexp kodak\kodim03.png
perceptual: 0 etc2: 0 rec709: 1
Source filename: kodak\kodim03.png 768x512

--- basislib Quality: 4
basislib pack time: 0.266
basislib ETC image Error: Max:  70, Mean: 1.952, MSE: 8.220, RMSE: 2.867, PSNR: 38.982, SSIM: 0.964853

--- etc2comp effort: 10
etc2comp time: 3.608819
etc2comp Error: Max:  75, Mean: 1.925, MSE: 8.009, RMSE: 2.830, PSNR: 39.095, SSIM: 0.965339

--- etcpak time: 0.006
etcpak Error: Max:  80, Mean: 2.492, MSE: 12.757, RMSE: 3.572, PSNR: 37.073, SSIM: 0.944697

--- ispc_etc time: 1.054324

ispc_etc1 Error: Max:  75, Mean: 1.965, MSE: 8.280, RMSE: 2.877, PSNR: 38.951, SSIM: 0.963916

Intel is doing some kind of amazing SIMD dark magic in there. The ETC1 cluster fit method is around 10-12x faster than both rg_etc1 (which uses my previous method) and etc2comp in ETC1 mode. RGB Avg. PSNR is usually within ~.1 dB of Intel.

I'm so tempted to update rg_etc1 with this algorithm, if only I had the time.

Wednesday, September 28, 2016

LodePNG

So far this is a nice looking library, and I've heard it reliably handles 16-bit/component .PNG's :

http://lodev.org/lodepng/

An interesting ETC1/2 encoding test vector

Here's the 4x4 test vector image (zoomed in 32X for ease of visibility), provided to me by John Brooks and Victor Reynolds at Blue Shift:

Red pixel: 255,0,0
Blue pixel: 0,0,255

Seems simple enough, right? Here's what happens with the various encoders (in non-perceptual mode if the encoder supports the flag), using up to date versions from early last week, and non-perceptual RGB avg. metrics for both PSNR and SSIM:

etcpak (PSNR: 15.612, SSIM: 0.265737):

Red pixel: 93,60,93
Blue pixel: 51,18,51

etc2comp ETC1 (PSNR: 17.471, SSIM: 0.372446):


Red pixel: 111,60,60
Blue pixel: 60,60,111

Intel ISPC (PSNR: 24.968, SSIM: 0.587142):


Red pixel: 234,47,47
Blue pixel: 47,47,234

basislib_etc1 from yesterday (PSNR: 19.987, SSIM: 0.511227):

Red pixel: 149,47,47
Blue pixel: 47,47,149

etc2comp ETC2 (PSNR: 19.779, SSIM: 0.517508):



Red pixel: 255, 0, 0
Blue pixel: 64,64,98

This is an example of an well-tuned ETC1 encoder (Intel's) holding its own vs. etc2comp in ETC2 mode.

Want a little challenge: Try to figure how how Intel's encoder produced the best output.

John Brooks, the lead on etc2comp, told me that BSI is working with that test image because it's a known low-quality encoding pattern for etc2comp. It wasn't in their test corpus, so the PSNR of 17 & 19 should improve with future etc2comp iterations.

I've improved basislib's handling of this test vector, but the results now need a optimization pass. I've prototyped a version of squish's total ordering method in ETC1, by applying the equations in the remarks in rg_etc1.cpp's code. Amazingly, it competed against rg_etc1's current algorithm for quality on my first try of the new method, but it's slower.

Tuesday, September 27, 2016

How to use crunch's GPU block encoder test vector generator

This option selects a different mode of operation from crunch's usual texture file conversion role. It causes the tool to crawl through a directory and load every .PNG file there. It will then randomly select a percentage of the 4x4 pixel blocks from the image and append the results into one or more 4096x4096 output images. These output images can then be used as test vectors to compare different block encoders.

crunch -corpus_gen -deep .035 -width 4096 -height 4096 -in J:\dev\test_images\*.png

You can specify multiple -in arguments, and -in @file.txt loads a textual listing file of files/directories to load or scan.

The -corpus_test option can be used to compare the different DXT encoders supported by crunch, using images generated using -corpus_gen.

Here's a very zoomed in example from the test vector generator:



Notice how the blocks are sorted by the sum of R, G's, and B's standard deviation as a key.

Sunday, September 25, 2016

More on SSIM

This paper is referenced in the SSIM article on Wikipedia:

"A comprehensive assessment of the structural similarity index"
http://link.springer.com/article/10.1007/s11760-009-0144-1
"In this paper, it is shown, both empirically and analytically, that the index is directly related to the conventional, and often unreliable, mean squared error. In the first evaluation, the two metrics are statistically compared with one another. Then, in the second, a pair of functions that algebraically connects the two is derived. These results suggest a much closer relationship between the structural similarity index and mean squared error."
"This research, however, appears to be the first to directly consider the statistical relationships between the two methods. As well, this work develops a pair of mathematical functions that directly link the two. Given these findings, one is left to question whether the structural similarity index is ready for widespread adoption."
Interesting! I get the feeling there's more to SSIM than meets the eye. Unfortunately, this paper is behind a paywall. Another quote from the paper:
"These findings suggest a reasonably significant level of correlation between the SSIM and MSE. Values range from r = 0.6364 to r = 1.0000, with an average of r = 0.9116 and a variance of 0.007. An average this large, along with a small variance, suggests that most of the correlations are decidedly significant. Clearly, when ordering coded images, the SSIM and MSE often choose similar arrangements. Results such as this are likely a sign of a deeper relationship between the two methods."
Hmm, okay. So MSE and SSIM are highly correlated. The paper even has simple algorithms to convert between MSE<->SSIM. Perhaps I could use these algorithms to help optimize my SSIM code. (Just joking.) From the conclusion:
"Collectively, these findings suggest that the performance of the SSIM is perhaps much closer to that of the MSE than some might claim. Consequently, one is left to question the legitimacy of many of the applications of the SSIM."
Got it. Here's another interesting paper, this one not behind a paywall:

"Mystery behind similarity measures MSE and SSIM"
https://pdfs.semanticscholar.org/8a92/541e46fc4b8237c4e611401d601c8ecc6893.pdf

Some quotes:
"We see that it is based on the same sample moments and correlation coefficient as MSE. So this is the first observation/property or mystery revealed about MSE and SSIM: both measures are composed of the same parameters which are only combined in a different way."
"So the third observation for SSIM is its instability around zero point (0,0) and the fourth one – it can be used only for data of the same sign. The authors of SSIM solve these problems by introducing small constants and restricting the usage to non-negative data only, respectively."
"The fifth observation for Dice measure and thus for SSIM too is that it depends on the absolute values of input parameters. First, it is insensitive at all if one of the parameters is equal 0. Secondly, its sensitivity is decreasing by the increase of absolute parameter values."
Hmm, none of that sounds great to me. They go on to introduce their own metric they call CMSC, and claim "all proposed measures are free of drawbacks of MSE and SSIM and thus are more suitable as objective similarity/quality measures not only for the images but any signals."

John Brooks at Blue Shift experimented with using SSIM in his new ETC1/2 encoder, etc2comp. In a conversation about SSIM, he said that:
"It [SSIM] becomes insensitive in high-contrast areas. SSIM is all about matching contrast & structure. But Block Truncation Coding by its nature is increasing contrast because it posterizes color transitions to 4 selector values. This made the encoder freak out and try to reduce contrast to compensate, making the encoding look crappy. I think it might be the right tool for high-level jobs, but was a poor tool for driving low-level encoder behavior."
"BTC trades 16 shades for 4 which means sharper transitions and more contrast when measured against the original. It also usually means less structure than the original due to posterizing 16-to-4. But neither artifact can be controlled by the encoder as they are a result of the encoding, so it's very hard to navigate the encoding search space when SSIM is so outside its design parameters."
Sounds pretty reasonable to me. I'm going to be doing some testing using a ETC1 encoder optimized for SSIM very soon. Let's see what happens.

Image error metrics

While developing and refining crunch I used a matrix of statistics like this:

RGB Total   Error: Max:  73, Mean: 17.404, MSE: 176.834, RMSE: 13.298, PSNR: 25.655, SSIM: 0.000000
RGB Average Error: Max:  73, Mean: 5.801, MSE: 58.945, RMSE: 7.678, PSNR: 30.426, SSIM: 0.907993
Luma        Error: Max:  64, Mean: 4.640, MSE: 37.593, RMSE: 6.131, PSNR: 32.380, SSIM: 0.945000
Red         Error: Max:  69, Mean: 5.387, MSE: 52.239, RMSE: 7.228, PSNR: 30.951, SSIM: 0.921643
Green       Error: Max:  70, Mean: 5.052, MSE: 48.298, RMSE: 6.950, PSNR: 31.291, SSIM: 0.934051
Blue        Error: Max:  73, Mean: 6.966, MSE: 76.296, RMSE: 8.735, PSNR: 29.306, SSIM: 0.868285

I computed these stats from a PNG image uploaded by @dougallj showing the progress he's been making on his experimental ETC1 encoder with kodim18, originally from here:


The code that computes this stuff is actually used by the DXT1 front-end to determine how the 8x8 "macroblocks" should be tiled.

The per-channel stuff is useful for debugging, and for tuning the encoder's perceptual RGB weights (which is only used when the compressor is in perceptual mode). Per-channel stats are also useful when trying to get a rough idea what weights a closed source block encoder uses, too.

Here's a useful PCA paper I found while writing HW1's renderer

I used this technique in a real-time GPU DXT1 encoder I wrote around 10 years ago:

"Candid Covariance-Free Incremental Principal Component Analysis"
http://www.cse.msu.edu/~weng/research/CCIPCApami.pdf

With this approach you can compute a decent-enough PCA in a few lines of shader code.

HW1 used this encoder to compress all of the GPU splatted terrain textures into a GPU texture cache. One of my coworkers, Colt McAnlis, designed and wrote the game's amazing terrain texture caching system.