Friday, October 01, 2010

Not again...

Google decided to use the intra-frame coding algorithm of WebM to create a new image format for the web called WebP. And, of course, they make bold claims like "...an average 39% reduction in file size." There's nothing wrong with bold claims, but after reading the details of how they arrived at that conclusion, there are a few problems.

First, Google should stop using PSNR as a metric for image quality. PSNR does not correlate well with mean opinion scores of image quality. Furthermore, this error is particularly egregious because the WebM appears to be PSNR optimized and it isn't clear that the JPEG/JPEG2000 encoders were. If the JPEG encoder was optimized for SSIM then it may very well score lower on a PSNR test despite being visually superior. This is the no child left behind approach to encoding: Google is engineering encoders to pass the test instead of encoders that actually produce visually stunning results. A much better choice for an objective metric is MS-SSIM. It's not perfect, but statistically, it is more meaningful than PSNR.

The most meaningful objective, of course, is actually collecting mean opinion scores, but that involves actual humans, tallying lots of data and generally "making science." In the absence of this, though, why not use the most meaningful objective image quality metric available?

Second, if Google is going to try and woo us with pretty graphs, they should get their axis labels right (see Figure 1).

Third, Google basically took existing JPEG images (90% of all samples), decoded them, and re-encoded with different encoders to determine which encoder worked best? This method is totally bogus. Once an image is encoded to JPEG, it's going to have block-based artifacting on a grid with squares 8 by 8 due to JPEG's block splitting. For example, this grid is clearly visible in a lot of JPEG images:


You can clearly see the 8 by 8 blocking, particularly around the woman's chest (stop staring, perv!).

Here's another example from the wikipedia article on JPEG, blown up to better illustrate the blocking artifacts:


Again, you can clearly see the 8x8 grid. This could seriously skew results for the second JPEG compression, particularly if the encode settings are not identical to the first compression (see question 10 here). The data being used in the comparison has already been skewed by a first JPEG compression; thus, the JPEG results (i.e. the most important comparison) are potentially flawed, and it's not clear to me if this would be an advantage or disadvantage for JPEG in the comparison.

Luckily, this can be fixed--Google should do the comparison on source material with loss-less formats like PNG or BMP, which hopefully will not contain existing JPEG artifacting.

4 comments:

Ram Mohan said...

Hello,

The article Jpeg vs JpegXR vs WebP is interesting. The results of Jpeg XR in comparison with Jpeg and WebP are very Poor. Somehow i believe that this is not the case.

Configuring JpegXR is not so easy. I mean in Jpeg (Baseline) the number of options that needs to be configured are so minimal. For instance 1 quant. table for the entire channel.

In case of JpegXR, Quant tables can be Modeled per tile and Per MB as well. Also at low bit rates, it is best if you enable the pre-filter option. At high bit rates an option called scaled flag might be automatically gets set (unnecessary) which might boost the compressed image file size. What i mean to say is it could be a possibility that your test environment is not well tailored for JpegXR which shows the poor performance of the same.

WebP doesn't pick on Jpeg XR because it notices that JpegXR is superior than Weppy.

kidjan said...

@Ram,

You're not making a distinction between encoder and codec. I have every reason to believe JPEG XR is better than my testing showed (in fact, I made note of that point--I felt something must have been wrong, or I must have been using the API incorrectly), but those are the results I got with the encoder built-in to .NET.

As for whether or not WebP is better than JPEG XR, I disagree: they don't pick on JPEG XR because that format doesn't have any market share. You don't gain market share by picking on the 48 lbs. weakling, hence the obvious focus on JPEG as a competitor.

Ram Mohan said...

@kidjan
I believe the performance of any codec is solely dependant on the encoder (it is what supplies the compression algorithm and the strategy for compression). So while estimating the performance of a codec (which is what is done in this article) it is so essential to configure the encoder accurately else it is possible for the codec to down perform (this resolves my stress on encoder instead of codec).

I also noticed that you have mentioned in the article that you might be doing something wrong in case of XR [:)].

I agree that market share of XR is very less. But jpeg has been around for 15 years in the web world. So the replacement of jpeg by XR won't be so simple. But one must admit XR is superior as the Jpeg Committee itself standardised HD photo to an official jpeg standard. I have seen webp features, they match little to HD Photo.

kidjan said...

No, the article is estimating the performance of a specific encoder implementation, _not_ a codec. This is what I'm trying to tell you.

For example, the JPEG implementation was Microsoft's built-in JPEG implementation to .NET, although I also tested the IJG's reference encoder, which is commonly used in most software.

And no, I absolutely disagree: it _isn't_ clear that JPEG XR is "superior" to JPEG. Nor is it clear that WebP is superior. And this actually alludes to your "configure the codec correctly!" comment. For example, JPEG can optionally use arithmetic encoding instead of huffman encoding, and in my experience this reaps very, very significant performance gains--enough that WebP's gap all but evaporates. Most JPEG encoders also use very crude settings--non-optimized huffman tables, integer math and simplistic quantization tables are the norm.

Beyond that, there are a bunch of extensions to JPEG that the community could be making use of that have the possibility of greatly improving encoder performance.

So I disagree: I think JPEG could easily perform on par with WebP and JPEG XR. It's just a matter of improving encoder performance.