Thursday, November 05, 2009

Statistics Fail: NPR

The news media is all aflutter with the budding Obama Administration/Fox News stand-off. I'll let my biases be known: "Faux" News is great for entertainment, but it is a lousy way to get informed. I would no sooner watch Faux News to inform myself than I would trust Michael Moore to present an argument that doesn't represent the triumph of image over substance, but I suppose that level of critical thinking is beyond Fox's entire target demographic. Glenn Beck is compelling evidence that our country is getting more stupid by the hour; the only thing more shocking than the vapidity of his arguments is that any substantial cross section of the American population would consider them valid. Oh well. C'est la vie.

The above said, my snotty liberal love of nuance and fair-play was accosted by an offensively stupid "poll" posted on NPR:


Yes, the "STATISTIC FAIL YOU FAIL N00B" was necessary. What, can't one geek verbally accost another geek's statistically bogus methodology for conducting a poll?? What, you may ask, is the issue with this? Well, first of all, let's look at the results:

Right away, it's pretty clear that the results have been skewed. In this country, if a president wins the popular vote by two percent or more, it's considered a landslide. So there's obviously something else at play here.

The issue with this poll (and many other common online polls, such as the utterly abysmal and hopelessly misinformed facebook polls) is, as always, the sampling methodology. Sampling is that portion of statistics concerned with the selection of data points out of a population. The importance of sampling correctly cannot be overstated.

Example: let's say you want to find out if people are in favor of universal health care. One method of sampling would be to go to downtown Portland, OR and talk to people on the street. Another method of sampling would be to go to downtown Provo, UT and talk to people on the street. Now I haven't actually conducted this survey, but I think it is safe to say that I would get completely different results from these two methods. To make observations on the general population of the United States, I would have to take a random (i.e. unbiased) sample that broadly represented the entire country. But because my sample is biased, I can only infer results about my sample population (which, in this case, consists of either Provo, UT or Portland, OR).

Another example: The advice columnist Ann Landers once asked her readers, "If you had it to do over again, would you have children?" A few weeks later, her column was headlined "70% OF PARENTS SAY KIDS NOT WORTH IT." Indeed 70% of the nearly 10,000 parents who wrote in said they would not have children if they could make the choice again. The people who responded felt strongly enough to take the trouble to write Ann Landers. Their letters showed that many of them were angry at their children. These people did not fairly represent all parents. It is not surprising that a statistically designed opinion poll on the same issue a few months later found that 91% of parents would have children again. Ann Landers announced a 70% "No" result when the truth about parents was close to 90% "Yes." Needless to say, these data are worthless as indicators of opinion among all American parents.

And this second example is exactly what this shitty NPR poll amounts to: a voluntary response sample. It is a poll in which the participants themselves decide if they want to answer the poll, which precludes it from A) being a random sample and B) having any statistical significance. The only thing this poll establishes is that Faux News is capable of mobilizing its viewers to skew the results of a poll that was statistically bogus to begin with.

I have some parting comments for each news organization:

NPR: don't be moronic and waste people's time with shitty sampling methods. The last thing this country needs is to be inundated with (more) bad statistics. You do honesty and statistical significance a great disservice. Sure, you can hide behind claims that "these kinds of surveys aren't meant to be scientific," but if this is true, what are "these kinds of surveys" even for??? If it isn't scientific, don't bother, you are wasting bandwidth on the Internet. Leave statistics up to people who understand the importance of a random sample.

Faux News (and/or biased, partisan hacks who have discovered the joy that is "tweeting"): you have the audacity to unabashedly skew the results of a poll that was already statistically irrelevant? Clearly this says a lot about your "news" organization. Suggestion: head down to the local community college and sign up for that intro to statistics class you drooled through (presumably due to practicing your keg stand technique) during your freshman year of college. Or take five minutes to read Wikipedia. You are collectively making this nation stupid; you are hurting America. Please stop.

Tuesday, September 29, 2009

Programming Fail: Directory.GetFiles()

I went to demo some shiny new code for a friend, and we both had a laugh when my program pretty much puked all over itself.

Admittedly I had not run this code on this particular machine before, and it was running Vista and .NET 3.5 (neither of which I'd tested against) But upon finding the bug, I find it difficult to understand what sort of design decision would involve such an arbitrary behavior.

Directory.GetFiles() seems like a pretty straight-forward function. You hand it a directory to search, and a search pattern (e.g. "*.exe"), and it returns a list of all the files in that directory which match the search. OK, I can handle that. Or so I thought.

The fine print of the documentation contains the gotcha:

The following list shows the behavior of different lengths for the searchPattern parameter:

* "*.abc" returns files having an extension of .abc, .abcd, .abcde, .abcdef, and so on.
* "*.abcd" returns only files having an extension of .abcd.
* "*.abcde" returns only files having an extension of .abcde.
* "*.abcdef" returns only files having an extension of .abcdef.


Ummm. OK. So, MSFT, basically your search pattern violates decades of common convention with regards to regular expressions, and the developer gets to figure this out when the app bombs? OK, sure. Brilliant. Ship it, yo.

Let's say you're looking for TIFF files, which can have either "tif" or "tiff" as the file extension. What happens is: calling GetFiles() with both of those extensions will result in the "tiff" files being added twice. Bzzt, fail MSFT, fail.

So, the fix is: either go through the aggregate list and remove duplicates, or remove the "*.tiff" search pattern (by the way, I got my supported extensions through the image handling classes in .NET). But it'd be a lot easier to simply have this method behave in a manner that is normal and predictable and sane: if I ask for "*.tif," I only want "*.tif". But I guess that's asking for too much.

Monday, September 21, 2009

More fun with AM_MEDIA_TYPE

While messing around with CMediaType (a wrapper class for AM_MEDIA_TYPE), I came across an inconsistency/bug. If you execute the following code:

 
pmt->cbFormat = sizeof(WAVEFORMATEX);
WAVEFORMATEX *wfex = (WAVEFORMATEX*)pmt->AllocFormatBuffer(sizeof(WAVEFORMATEX));


...you will find that wfex (which is a pointer to pbFormat) is NULL. My first thought was "out of memory?!" and the next thought was that simply wasn't possible. This call was about 30k lines of code deep in my application, so clearly memory allocation would have nailed me long before this point in time. So I poked around in mtype.cpp, which is the file that implements CMediaSource, and found the bug:



// allocate length bytes for the format and return a read/write pointer
// If we cannot allocate the new block of memory we return NULL leaving
// the original block of memory untouched (as does ReallocFormatBuffer)
BYTE*
CMediaType::AllocFormatBuffer(ULONG length)
{
ASSERT(length);

// do the types have the same buffer size

if (cbFormat == length) {
return pbFormat;
}

// allocate the new format buffer

BYTE *pNewFormat = (PBYTE)CoTaskMemAlloc(length);
if (pNewFormat == NULL) {
if (length <= cbFormat) return pbFormat; //reuse the old block anyway.
return NULL;
}

// delete the old format

if (cbFormat != 0) {
ASSERT(pbFormat);
CoTaskMemFree((PVOID)pbFormat);
}

cbFormat = length;
pbFormat = pNewFormat;
return pbFormat;
}


It becomes pretty clear what the issue is: if the requested length is the same as cbFormat, then it simply returns pbFormat, but since pbFormat was never allocated, it simply returns 0x00000000. Bzzt.

I guess the obvious moral of this story is: when using wrapper classes, it's probably not a good idea to manually set the underlying structure parameters. But still, this is a clear bug: this method will CoTaskMemFree pbFormat if it already exists. So this class should probably be like so:


if (pbFormat != NULL && cbFormat == length) {
return pbFormat;
}


If someone is allocating the same amount of memory, and pbFormat isn't null, it's acceptable to return pbFormat.

Again, I realize it doesn't make sense to set cbFormat if AllocFormatBuffer is going to do it for you, but that argument assumes one is aware that AllocFormatBuffer is going to do that for you. Nowhere in the documentation does it mention that I am forbidden from interacting with the underlying structure, so the only way you figure this out is by kicking yourself in the teeth, which I'd rather not do because it hurts.

Lastly, the same issue is present if the allocation itself fails. So a better implementation of the method is:


// allocate length bytes for the format and return a read/write pointer
// If we cannot allocate the new block of memory we return NULL leaving
// the original block of memory untouched (as does ReallocFormatBuffer)
BYTE*
CMediaType::AllocFormatBuffer(ULONG length)
{
ASSERT(length);

// do the types have the same buffer size, _and_
// do we have a valid pbFormat pointer???
if (pbFormat != NULL && cbFormat == length) {
return pbFormat;
}

// allocate the new format buffer
BYTE *pNewFormat = (PBYTE)CoTaskMemAlloc(length);
if (pNewFormat == NULL) {
// If the current pbFormat works, reuse it. Otherwise, fail.
return (pbFormat != NULL && length <= cbFormat) ?
pbFormat : NULL;
}

// delete the old format
if (pbFormat != NULL && cbFormat != 0) {
CoTaskMemFree((PVOID)pbFormat);
}

cbFormat = length;
pbFormat = pNewFormat;
return pbFormat;
}


Slightly better, but I still hate AM_MEDIA_TYPE.

Tuesday, September 08, 2009

What stands between me and my doctor is...

...Kaiser Permanente. Seriously. I live in a small town in Oregon. To receive any non-emergency medical/vision services, I must drive 50+ minutes (one way) just to see an "approved doctor" in my "coverage network." Any service not in this "coverage network" (which, so far as I can see, extends as far as Kaiser's front door) causes me to incur 100% of the expenses.

There is a doctor I like ten minutes walking from my house--he is my preferred choice. There is an eye doctor two minutes walking from my house--he is my preferred choice. Why must I drive two hours to see a doctor I don't particularly like in a facility that--quite frankly--was a little grungy?

The question we need to ask ourselves is: where is the competition? With the current situation, private insurers inhibit competition by forcing me to use their facilities and approved networks of doctors. With the government plan, no such limitation would exist anymore--I would see whatever doctor I want. In the current system, my insurance provider dictates in detail which doctors and facilities I may frequent. Lo and behold, they prefer "their" doctors. This is done under the guise of "providing comprehensive service at reduced costs," but clearly this is complete and utter bullshit. This sort of vertical-tying is the stuff that anti-trust lawsuits are made of, but nobody in Congress seems to have the balls to make this observation about the way our health care system works.

I don't understand where "competition" exists in this system; who exactly is Kaiser competing against? I get to choose from a whopping two health care providers (wow, what fierce competition that is), who then offer me a doctor in any color I want so long as it's black. If the choice is between a government monopoly that will allow me to choose the doctor I want, and an oligopoly of insurance providers limiting my choice, then the choice is clear.

Would you go with Statefarm if they insisted on from whom you purchased your next car? Would you go with Allstate if they insisted what sort of house you purchase in order to insure it? This sort of ludicrous inversion of power is basically how Kaiser functions: I pay them money, and they tell me what to do.

It is unclear how the United States arrived at a system that completely dis-empowers the people receiving a service they pay for, but I want this to change. If single-payer is how that happens, so be it. If new anti-trust regulation forbidding insurance providers from forcing me on a doctor is how it works, so be it. But something needs to change, and it needs to cover the people in this country who lack insurance for so many obvious reasons that it still blows my mind that we're debating it.

I am absolutely open to any plan, but I am not open to one thing: whatever we have now. Some serious changes must happen.


Disclaimer to stupid people: I am not against competition, capitalism, making money, democracy, industry, I am not a communist, I do not drown puppies, etc. I am against companies engaging in uncompetitive practices, and I am hard-pressed to see how I myself and others are not getting completely screwed by Kaiser. I am not necessarily in favor of "government run" health care, although I seriously doubt they could craft a system that functions worse than the one I just described.

Sunday, July 12, 2009

Ten things you didn't know about rock climbing

I've been a climber for almost twelve years now, and I think it's fair to say that the sport is mostly misunderstood by the public at large. So, I decided it might be fun to outline a few common misconceptions about climbing.

1. Climbing is not actually that dangerous.

Yes, this will come as a surprise to some, but the statistics don't lie. You are more likely to die in your car, giving birth to a child, hang-gliding, scuba diving or during anesthesia. The sad truth about climbing is the public's perceived risk is vastly greater than the actual risk, and I think it demonstrates just how bad people are at objectively evaluating risk.

And this brings about an interesting point: a substantial part of climbing is learning to identify and mitigate risk. A huge amount of effort goes into making the activity safe and predictable. The equipment climbers use is so stupidly over-engineered (and obviously so: abundant failures equal pricy wrongful death lawsuits) that the vast majority of failure occurs due to misuse and/or human error.

From my own anecdotal experience, I have seen a single major accident where a climber was critically injured (and from my vantage point, it was clearly human error and not equipment failure), and a handful of minor accidents (none of which I would attribute to equipment failure). In 12 years of climbing on a weekly--if not daily--basis, I have never seen a single piece of climbing equipment fail. If only I could say the same about my morning commute.

2. Climbers are not thrill-seeking adrenaline junkies

This is a big one; from early Gatorade commercials to x-games coverage in the late ninties, climbing was cast as an extreme sport. The wiki article pretty much says it all: the term is applied to sports that are perceived as having a "high level of inherent danger." Yet statistically, there really isn't that much danger involved in the activity. The uncomfortable truth is if climbing is an "extreme" sport, that would make one's morning commute something ESPN could potentially work with.

Most climbers seem to get caught up in the challenge of the task in front of them; getting past the "fear factor" is absolutely crucial to climbing hard routes. If someone spends all their time worrying about taking a fall that--as illustrated above--is completely safe, then they're not going to get a lot of climbing done. Fear is something we climbers work hard to get past--not embrace. When climbing, fear is an irrational (and unhelpful) response that typically causes people to make bad decisions.

3. Climbers are athletes.

People spend years pulling on plastic, pushing the limits of their footwork, working through technically and physically demanding sequences of climbing moves, and engaging in often brutal training sessions. A professional climber is every bit as dedicated as the Armstrongs and Phelps of the world, and climbers will often spend months or years attempting to do a single route.

Too often this is lost. I think a lot of people see climbing not as a sport, but something boy scouts get merit badges in. And possibly a totally suitable alternative to Chucky-Cheese when Sally has her 9th birthday. It's hard to see someone at a climbing gym as a serious athlete when unruly packs of adolescent boys are scaling the walls like the Persian army storming the Acropolis.

4. Not all climbing involves tall things. In fact, most of it doesn't.

Most climbing involves short routes (less than 30 meters). A significant amount of climbing involves short routes that are rarely higher than 5 meters; this is called bouldering. In general, climbers are not a bunch of pie-eyed Mallory types staring at the Eiger or Mt. Everest; actually, most seem perfectly happy climbing something about as tall as the first story of a typical suburban home.

5. There is absolutely no way someone's harness would fail like it did in Cliffhanger.

Simply put, climbing gear rarely fails because it is stupidly over-engineered. If it does fail, someone was (likely) doing something incredibly stupid or their equipment was seriously neglected. So when I see something like this scene out of Cliffhanger (which, for what it's worth, made up for its poor understanding of climbing physics by having Wolfgang Gullich be Sly Stallone's stunt double):




...I just have to roll my eyes. There is no way a climbing harness will fail while being subjected to someone's body weight. Period. In Cliffhanger, the buckle on the woman's harness is shown coming apart as though it were made of paperclips and bubblegum.

Climbing gear is designed to withstand huge dynamic forces (15+ kilo newtons; for reference, the cruising thrust of a single Boeing 737 jet engine is about ~17 kilo newtons. Some carabiners are rated to 30+ kilo newtons). For a climbing harness to fail in the way portrayed in Cliffhanger is not just statistically unlikely; she might as well have won the lottery, been hit by lighting and contracted Ebola all in the same hour (okay, okay, so that's a bit of hyperbole, but it was pretty stupid).

6. Climbing isn't about having a strong upper body, although it helps.

One of the best climbers I knew could do a grand total of seven (7) pull-ups. He made up for it with amazing technique, superb footwork and a great work ethic.

Climbing is all about the feet. Climbing is all about the feet. Climbing is all about the feet. Let me say it one more time, just to make it clear about how I feel: climbing is all about the feet. If you want to be a good climber, start looking down, because you need to find good feet. Your legs are much stronger than your arms, and almost all upward momentum should be generated with your lower body. Your arms tire quickly and aren't as good at dragging your body up a sheer rock face as one would think. I particularly enjoy watching "tough guy" who's clearly been spending the last few years under a bench press attempt to thug their way up the wall, only to be completely blown twelve feet off the deck.

Yes, some climbers have massive upper bodies and can rattle off twelve one-arm pull-ups, but that isn't what ultimately makes them successful as climbers.

7. Being a professional climber (typically) pays like crap

Most sponsored climbers get free gear. Some of them might get a few hundred bucks here or there. A few climbers (i.e. Sharma, Graham, Caldwell and Rodden, etc.) might do well enough to make a living at their sport, but for the most part, climbing is a sport completely detached from a reliable source of income (I'm told this isn't the case in Europe, although I'm unclear about the validity of that statement).

I say this as someone who was once sponsored by multiple companies: I always had free shoes, harnesses and quickdraws, but good luck trying to eat aluminum and shoe-leather for breakfast. This is not to say I wish climbing was more like golf or tennis or soccer; I'd only like to point out that if you're in it for the money, consider trading in the 70m rope and trad rack for a set of golf-clubs. Hell, there's more money in disk golf than there is in rock climbing.

8. Tommy Caldwell has climbed some of the most difficult routes in the world--missing a finger.

You'd think cutting off a finger with a table saw would be the end of one's climbing career:



...yes, that missing pointer-finger ain't photoshop. Homeboy climbs just as hard--if not harder--as he did before the accident. Who knew?

9. Climbers do not wear socks with their climbing shoes.

'nuff said.

10. Climbing hurts.

I think part of the real barrier the general public encounters when they try any climbing is they quickly find that a lot of it is...how shall I say...rather uncomfortable. It's often more about enduring some pretty uncomfortable shit (razor sharp holds, disfiguring footwear, extreme physical exertion, long approach hikes up poorly establish trails, brutal descents to get back to the ground, repeated failure on your project, bloody stumps for hands, being burnt to a crisp in the summer, freezing your ass off in the winter, waking up at the crack of dawn to chase shade, bouldering at night by headlamp to avoid the heat, some rather interesting psychological barriers, and if you're in Lander WY, possibly some binge-drinking). Often I come home from a day of climbing feeling as though I've had the living crap kicked out of me, and one has to wonder why I'd want to go back.

Simply put, I think you need to really, truly love climbing to overlook some of the discomfort (er...pain?) that comes with the activity. And after twelve years of bloody-stump goodness, I keep coming back for more. I don't know what that says about me, but it's what I do.

Thursday, May 14, 2009

HD Video Standard

Despite the last couple of years being a time when "high definition" video has really gained traction, there's one surprising thing about HD video: it doesn't have an obvious definition. Dan Rayburn brings up this observation in a recent blog post:
For an entire industry that defines itself based on the word "quality", today there is still no agreed upon standard for what classifies HD quality video on the web....If the industry wants to progress with HD quality video, we're going to have to agree on a standard - and fast.
He's absolutely right. Many companies attempt to pass off 480p as HD video, but most video enthusiasts would reject such an assertion--after all, if it isn't HD for an analog signal, why would it be HD for a digital signal? Likewise, lots of video is encoded at an unacceptably low bit rate which results in obvious artifacts. Why would such poor quality video be considered "high definition?"

Wikipedia's definition for High-definition television is a decent start:
High-definition television (or HDTV) is a digital television broadcasting system with higher resolution than traditional television systems (standard-definition TV, or SDTV). HDTV is digitally broadcast; the earliest implementations used analog broadcasting, but today digital television (DTV) signals are used, requiring less bandwidth due to digital video compression.
This is still lacking. What exactly is "higher resolution than traditional television systems?" And just what SDTV system, since there were many of them? And is resolution all there is to it? What if I encode video to 1080p but at a horrible bit rate, which causes lots of blocking artifacts? What about video with an odd aspect ratio, where the number of verticals lines doesn't pass muster? Clearly this definition is lacking.

Some aspects of creating a standard are fairly straight-forward: most people seem to be fairly comfortable with 720p being the "minimum" resolution at which video can be encoded to. Ben Waggoner had an interesting proposal where 720p was acceptable, but it was also acceptable to generalize it to anything with "at least 16 million pixels per second," which takes into account both framerate and resolution. He also brought up the issue of using horizontal resolution as a criteria, since not everything is 16:9.

But on the question of "quality," most simply punted, and I find this odd. Ben Waggoner mentions:

Hassan Wharton-Ali brought up another good point on the thread - HD should actually be HD quality. It can’t be a lousy, over-quantized encode using a suboptimally high resolution just so it can be called HD.

A good test is the video should look worse (due to less detail), not better (due to less artifacts), if encoded at a lower resolution at the same data rate. If reducing your frame size makes the video look better when scaled to the same size, then the frame size is too high!

It is a good point, and I don't completely disagree with Ben's proposal--it should look worse due to less detail if encoded at a lower resolution. But this is the crux of the issue: what does it mean to look worse? Is this just a subjective judgment call on behalf of the person encoding the video? I don't think this addresses the problem of having a minimum acceptable "quality" for HD video.

Dan Rayburn's suggestion is even less desirable, in my opinion:

To me, the term HD should refer to and be defined by the resolution and a minimum bitrate requirement. Since you could have a 1080p HD video encoded at a very low bitrate, which could result in a poor viewing experience inferior to that of a higher-bitrate video in SD resolution, the resolution and bitrate is the only way to define HD.

The first issue with this is the "minimum bit rate" requirement would have to somehow scale with the resolution and frame rate. It would have to account for the codec being used. This would result in an impossibly complicated system, endless arguments, etc. (for example, would we impose the same bit rate requirement on H.264 as we would on VC1? What about "future" codecs?)

A bigger issue is not all video content is the same. The resulting "quality" of a video encoded to a given bitrate absolutely has a relationship with the video being encoded. A video with very little movement can often be encoded with a low bit rate and look fantastic, so the bit rate requirement would essentially amount to wasted bandwidth. Conversely, a video with a lot of motion and scene changes may require a lot more bits to get an acceptable, block-free viewing experience--and it's not clear what that acceptable threshold would be.

I find both of these suggestions insufficient. I propose an alternative: objective video quality algorithms. The idea is straight-forward: by comparing the source material with the output material, we can objectively establish a score that at least has some meaningful relationship with Mean Opinion Scores. In a nutshell, a MOS is how "good" the average person thinks some piece of video appears.

Peak Signal-to-Noise/Mean Squared Error is the most common algorithm, albeit one that is quite crude, widely considered to be deficient by most engineers and scientists. But it's 2009, baby--we can do better. We have better.

My suggestion would be the Structural SIMilarity Index, which is relatively inexpensive (its closely related brother, MSSIM, is much more pricey) and definitely correlates better with MOS.

How would this work?

  1. During the encode process, a SSIM score is computed for each frame using the input as a reference image.
  2. This process is repeated for every input frame, and every output frame.
  3. The lowest observed SSIM score is the resulting quality score for that piece of encoded video. (I suppose another alternative is to use the average. Yet another option is to use the variance. I'd avoid the median, since it's robust against outliers, and outliers matter)
  4. If the lowest observed SSIM score is less than some threshold, then the video cannot be considered High Definition.
For a visual representation of how this works, take a look at this graph:


This is a graph of SSIM over time, displaying multiple bit rates. The x-axis is frame number, and the y-axis is SSIM score. My input was a VGA, 30 FPS, ~30 second raw-RGB video clip. Each line corresponds with a bit rate requested of the encoder (x264's H.264 implementation, using a baseline profile--you can see this by the low SSIM scores at the beginning of the video due to single-pass encoding). Notice the clear relationship between SSIM scores and bit rate. Also note how much variance there is in video quality: clearly certain portions of this clip are "more difficult" to encode, and this results in a degradation of video quality. Also, notice a clear law of diminishing returns: as more and more bits are thrown at the video clip, the SSIM scores converge on 1.0--SSIM at 2 mbit/sec aren't substantially different from the SSIM scores at 500 kbit/sec.

There are a few gotchas with this plan: what if we're changing the frame rate (e.g. 3:2 pulldown) and there is no clear reference frame to which we compare the output? How do we determine the SSIM threshold? Do we really want to use SSIM, or is some other algorithm better?

The first question is answered relatively easy: we compare only what was input to the encoder and the resulting output. Presumably the process of manipulating the frame rate is separate from the process of encoding. What we're talking about is how well of a job our encoder does matching the input.

The second question is easier, but it requires someone conducting subjective video quality assessment tests to determine what threshold corresponds with a baseline SSIM number. In effect, someone has to do some statistical analysis on data captured during viewing sessions of actual people watching actual footage encoded with an actual compression algorithm, and determine a threshold that correlates well with people's perception of "High Definition." But at least with SSIM, this is a manageable process: once a threshold is determined, it's really independent of a whole slew of factors, like codec, the video being encoded, etc.

Let's say we decide that any SSIM score below 0.9 invalidates the video from being called "high definition"--for the above video, this would mean 500 kbps would be just slightly too poor to call HD (notice the poor quality at the beginning of the clip). And 1000 kbps would be more than acceptable.

Lastly, even though PSNR is an outdated method, I see no reason a high-definition standard could not include metrics for both objective quality tests. There are other objective video quality algorithms, and certainly more will be developed in the future, so any standard should be open to extension at a later date.

(side note: maybe part of our problem is this emphasis on bit rate--which has no relationship to quality beyond "more is probably better"--when our real emphasis should be on a metric that correlates with quality, but I digress)

I don't really care what objective metric is used, and certainly there is plenty of debate over which objective method correlates best with MOS, and what threshold should be used--but let's at least be scientific about this. If there's going to be a "standard" for High Quality video, then let's choose a standard that will carry us forward and not create a quagmire.

Monday, May 04, 2009

Dealing with Image Formats

One of the most common tasks when working with video is dealing with colorspaces and image formats. In this post, I'll discuss the two major colorspaces commonly used in Microsoft code, converting between different formats of a given colorspace. In some future post, I might talk about converting one colorspace to a totally separate colorspace, but that topic is worthy of its own discussion.

In the Microsoft world, there are two colorspaces that we're concerned about: YUV and RGB.

RGB Color Space
RGB is generally the easiest colorspace to visualize, since most of us have dabbled with finger paints or crayons. By mixing various amounts of red, green, and blue, the result is a broad spectrum of colors. Here is a simple illustration to convey this colorspace:

The top image of the barn is what you see. Each of the three pictures below are the red, green and blue components, respectively. When you add them together, voila, you get barnyard goodness. (sidenote: because you "add" colors together in the RGB colorspace, we call this an "additive" color model)

In the digital world, we have a convenient representation for RGB. Typically 0, 0, 0 corresponds with black (i.e. red, green and blue values are set to 0), and 255, 255, 255 is white. Intermediate values result in a large palette of colors. A common RGB format is RGB24, which allocates three 8 bit channels for red, green, and blue values. Since each channel has 256 possible values, the total number of colors this format can represent is 256^3, or 16,777,216 colors. There are also other RGB formats that use less/more data per channel (and thus, less/more data per pixel), but the general idea is the same. To get an idea of how many RGB formats exist, one need not go any farther than fourcc.org.

Despite the multitude of RGB formats, in the MSFT world, you can basically count on dealing with RGB24 or RGB32. RGB32 is simply RGB24, but with 8 bits devoted to an "alpha" channel specifying how translucent a given value is.

YUV Color Space
YUV is a substantially different from RGB. Instead of mixing three different colors, YUV separates out the luminance and chroma into separate values, whereas RGB implicitly contains this information in the combination of its channels. Y represents the luminance component (think of this as a "black and white" channel, much like black and white television) and U and V are the chrominance (color) components. There are several advantages to this format over RGB that make it desirable in a number of situations:
  • The primary advantage of luminance/chrominance systems such as YUV is that they remain compatible with black and white analog television.
  • Another advantage is that the signal in YUV can be easily manipulated to deliberately discard some information in order to reduce bandwidth.
  • The human eye is more sensitive to luminance than chroma; in this sense, YUV is generally considered to be "more efficient" than RGB because more information is spent on data that the human eye is sensitive to.
  • It is more efficient to perform many common operations in the YUV colorspace than in RGB--for example, image/video compression. By nature, these operations occur more easily in a YUV colorspace. Often, the heavy lifting in many image processing algorithms is applied only to the luminance channel.
Thus far, the best way I've seen to visualize the YUV colorspace was on this site.

Original image on the left, and the single Y (luminance) channel on the right:



...And here are the U and V channels combined:



Notice that the Y channel is simply a black and white picture. All of the color information is contained in the U and V channels.

Like RGB, YUV has a number of sub-formats. Another quick trip to fourcc.org reveals a plethora of YUV types, and Microsoft also has this article on a handful of the different YUV types used in Windows. YUV types are even more varied than RGB when it comes to different format.

The bad news is there's a lot of redundant YUV image formats. For example, YUY2 and YUYV are the exact same format entirely, but merely have different fourcc names. YUY2 and UYVY are exactly the same thing (16 bpp, "packed" format) but merely have the per-pixel byte order reversed. IMC4 and IMC2 are exactly the same thing (both 12 bpp, "planar" formats) but merely have the U and V "planes" swapped. (more on planar/packed in a moment)

The good news is that it's pretty easy to go between the different formats without too much trouble, as we'll demonstrate later.

Packed/Planar Image Formats
The majority of image formats (in both the RGB and YUV colorspaces) are in either a packed or a planar format. These terms refer to how the image is formatted in computer memory:
  • Packed: the channels (either YUV or RGB) are stored in a single array, and all of the values are mixed together in one monolithic chunk of memory.
  • Planar: the channels are stored as three separate planes. Fo
For example, the following image shows a packed format:



This is YUY2. Notice that the different Y, U, and V values are simply alongside one another. Also note that the above represents six pixels. They are not segregated in memory in any way. RGB24/RGB32/YUV2 are all examples of packed formats.

This image shows a planar format:



This is YV12. Notice that the three planes have been separated in memory, rather than being in a single, monolithic array. Often times this format is desirable (especially in the YUV colorspace, where the luminance values can then easily be extracted). YV12 is an example of a planar format.


Converting Between Different Formats in the Same Color Space
Within a given colorspace are multiple formats. For example, YUV has multiple formats with differing amounts of information per pixel and layout in memory (planar vs. packed). Additionally, you may have different amounts of information for the individual Y, U, and V values, but most Microsoft formats typically allocate no more than 8 bits per channel.

As long as the Y, U, and V values for the source and destination images have equivalent allocation, converting between various YUV formats is reduced to copying memory around. For this section we'll deal with YUV formats, since RGB will follow the same general principles. As an example, let's convert from YUY2 to AYUV.

YUY2 is a packed, 16 bits/pixel format. In memory, it looks like so:

The above would represent the first six pixels of the image. Notice that each pixel ends up with a Y value, and every other pixel contains a U and a V value. There is no alpha channel. The image contains a 2:1 horizontal down sampling.

A common misconception is that the # of bits per pixel is directly related to the color depth (i.e. the # of colors that can be represented). In YUY2, our color depth is 24 bits (there are 2^24 possible color combinations), but it's only 16 bits/pixel because the U and V channels have been down sampled.

AYUV, on the other hand, is a 32 bits/pixel packed format. Each pixel contains a Y, U, V, and Alpha channel. In memory, it ends up looking like so:

The above would represent the first three pixels of the image. Notice that each pixels has three full 8 bit values for the Y, U and V channels. There is no down sampling. There is also a fourth channel for an alpha value.

In going from YUY2 to AYUV, notice that the YUY2 image contains 16 bits/pixel whereas the AYUV contains 32 bits/pixel. If we wanted to convert from YUY2 to AYUV, we have a couple of options, but the easiest way is to simply reuse the U and V values contained in the first two pixels of the YUY2 image. Thus, we have to do no interpolation at all to go from YUY2 to AYUV--it's simply a matter of re-arranging memory. Since all the values are 8 bit, there isn't any additional massaging to do; they can simply be reused as is.

Here is a sample function to converty YUY2 to AYUV:
  
// Converts an image from YUY2 to AYUV. Input and output images must
// be of identical size. Function does not deal with any potential stride
// issues.
HRESULT ConvertYUY2ToAYUV( char * pYUY2Buffer, char * pAYUVBuffer, int IMAGEHEIGHT, int IMAGEWIDTH )
{
if( pYUY2Buffer == NULL || pAYUVBuffer == NULL || IMAGEHEIGHT < 2
|| IMAGEWIDTH < 2 )
{
return E_INVALIDARG;
}

char * pSource = pYUY2Buffer; // Note: this buffer will be w * h * 2 bytes (16 bpp)
char * pDest = pAYUVBuffer; // note: this buffer will be w * h * 4 bytes (32 bpp)
char Y0, U0, Y1, V0; // these are going to be our YUY2 values

for( int rows = 0; rows < IMAGEHEIGHT; rows++ )
{
for( int columns = 0; columns < (IMAGEWIDTH / 2); columns++ )
{
// we'll copy two pixels at a time, since it's easier to deal with that way.
Y0 = *pSource;
pSource++;
U0 = *pSource;
pSource++;
Y1 = *pSource;
pSource++;
V0 = *pSource;
pSource++;

// So, we have the first two pixels--because the U and V values are subsampled, we *reuse* them when converting
// to 32 bpp.
// First pixel
*pDest = V0;
pDest++;
*pDest = U0;
pDest++;
*pDest = Y0;
pDest += 2; // NOTE: not sure if you have to put in a value for the alpha channel--we'll just skip over it.

// Second pixel
*pDest = V0;
pDest++;
*pDest = U0;
pDest++;
*pDest = Y1;
pDest += 2; // NOTE: not sure if you have to put in a value for the alpha channel--we'll just skip over it.
}
}

return S_OK;
}

Note that the inner "for" loop processes two pixels at a time.

For a second example, let's convert from YV12 to YUY2. YV12 is a 12 bit/pixel, planar format. In memory, it looks like so:

...notice that every four pixel Y block has one corresponding U and V value, or to put it a different way, each 2*2 Y block has a U and V value associated with it. And, yet another way to visualize it: the U and V planes are one quarter the size of the Y plane.

Since all of the YUV channels are 8 bits/pixel, again--it comes down to selectively moving memory around. No interpolation is required:
  
// Converts an image from YV12 to YUY2. Input and output images must
// be of identical size. Function does not deal with any potential stride
// issues.
HRESULT ConvertYV12ToYUY2( char * pYV12Buffer, char * pYUY2Buffer, int IMAGEHEIGHT, int IMAGEWIDTH )
{
if( pYUY2Buffer == NULL || pYV12Buffer == NULL || IMAGEHEIGHT < 2
|| IMAGEWIDTH < 2 )
{
return E_INVALIDARG;
}

// Let's start out by getting pointers to the individual planes in our
// YV12 image. Note that the Y plane in a YV12 image's size is
// simply the image height * image width. This is because all values
// are 8 bits. Also notice that the U and V planes are one quarter
// the size of the Y plane (hence the division by 4).
BYTE * pYV12YPlane = pYV12Buffer;
BYTE * pYV12VPlane = pYV12YPlane + ( IMAGEHEIGHT * IMAGEWIDTH );
BYTE * pYV12UPlane = pYV12VPlane + ( ( IMAGEHEIGHT * IMAGEWIDTH ) / 4 );

BYTE * pYUV2BufferCursor = pYUV2Buffer;

// Keep in mind that YV12 has only half of the U and V information that
// a YUY2 image contains. Because of that, we need to reuse the U and
// V plane values, so we only increment that buffer every other row
// of pixels.
bool bMustIncrementUVPlanes = false;

for( int ImageHeight = 0; ImageHeight < IMAGEHEIGHT; ImageHeight++ )
{
// Two temporary cursors for our U and V planes, which are the weird ones to deal with.
BYTE * pUCursor = pYV12UPlane;
BYTE * pVCursor = pYV12VPlane;

// We process two pixels per pass through this equation,
// hence the (IMAGEWIDTH/2).
for( int ImageWidth = 0; ImageWidth < ( IMAGEWIDTH / 2 ) ; ImageWidth++ )
{
// first things first: copy our Y0 value.
*pYUY2BufferCursor = *pYV12YPlane;
pYUY2BufferCursor++;
pYV12YPlane++;

// Copy U0 value
*pYUY2BufferCursor = *pUCursor;
pYUY2BufferCursor++;
pUCursor++;

// Copy Y1 value
*pYUY2BufferCursor = *pYV12YPlane;
pYUY2BufferCursor++;
pYV12YPlane++;

// Copy V0 value
*pYUY2BufferCursor = *pVCursor;
pYUY2BufferCursor++;
pVCursor++;
}

// Since YV12 has half the UV data that YUY2 has, we reuse these
// values--so we only increment these planes every other pass
// through.
if( bMustIncrementUVPlanes )
{
pYV12VPlane += IMAGEWIDTH / 2;
pYV12UPlane += IMAGEWIDTH / 2;
bMustIncrementUVPlanes = false;
}
else
{
bMustIncrementUVPlanes = true;
}
}

return S_OK;
}


This code is a little more complicated than the previous sample. Because YV12 is a planar format and contains half of the U and V information contained in a YUY2 image, we end up reusing U and V values. Still, the code itself isn't particularly daunting.

One thing to realize: neither of the above functions are optimized in any way, and there are multiple ways of doing the conversion. For example, here's an in-depth article about converting YV12 to YUY2 and some performance implications on P4 processors. Some people have also recommended doing interpolation on pixel values, but in my (limited and likely anecdotal) experience, it doesn't make a substantial difference.