Saturday, December 06, 2008

It's big, it's heavy, it's wood!


Everybody loves log:


...this posting concerns a different type of log: the type we programmers use to keep a history of what our application did.

I love Jeff Atwood. I really do. I read his blog on a regular basis. I enjoy his style of writing, and both of us are unabashed nerds. We both post on Silent PC Review. His blog, Coding Horror, has possibly been my favorite blog over the last few years. Most of the time, he's spot on, but occasionally he stumbles, as is the case in The Problem With Logging. Atwood makes it clear how little respect he has for logging:
So is logging a giant waste of time? I'm sure some people will read about this far and draw that conclusion, no matter what else I write. I am not anti-logging. I am anti-abusive-logging. Like any other tool in your toolkit, when used properly and appropriately, it can help you create better programs. The problem with logging isn't the logging, per se -- it's the seductive OCD "just one more bit of data in the log" trap that programmers fall into when implementing logging. Logging gets a bad name because it's so often abused. It's a shame to end up with all this extra code generating volumes and volumes of logs that aren't helping anyone.

We've since removed all logging from Stack Overflow, relying exclusively on exception logging. Honestly, I don't miss it at all. I can't even think of a single time since then that I'd wished I'd had a giant verbose logfile to help me diagnose a problem.

When it comes to logging, the right answer is not "yes, always, and as much as possible." Resist the tendency to log everything. Start small and simple, logging only the most obvious and critical of errors. Add (or ideally, inject) more logging only as demonstrated by specific, verifiable needs.
This sounds perfectly reasonable, until you consider a few facts:
  1. Jeff's application, Stack Overflow, is a web application, so logging may or may not be a valuable tool. For many applications, logging isn't that useful.
  2. Nobody wants a giant, verbose logfile. At least nobody sane, that is. The goal of logging has never been to mindlessly "log" every conceivable message, and anyone who thinks that is a good thing has A) never spent time parsing a log file or B) is prone to wasting their own time.
I, for the record, have spent an inordinate amount of time sifting through our customer's log files. Our application is real-time so debugging can be very difficult and logging provides a crucial method by which we solve problems. We go to great pains to only log things that are useful, and even more pain to remove messages that are not useful.

Atwood claims he's not anti-logging, but that's hard to believe given that Jeff's application no longer contains any logging beyond exceptions. The impetus for this rash decision?

Oh, sure, logging seems harmless enough, but let me tell you, it can deal some serious hurt. We ran into a particularly nasty recursive logging bug:

  • On thread #1, our code was doing Log (lock) / DB stuff (lock)
  • On thread #2, our code was doing DB stuff (lock) / log stuff (lock)

If these things happened close together enough under heavy load, this resulted in -- you guessed it -- a classic out-of-order deadlock scenario. I'm not sure you'd ever see it on a lightly loaded app, but on our website it happened about once a day on average.

I don't blame log4net for this, I blame our crappy code. We spent days troubleshooting these deadlocks by .. wait for it .. adding more logging! Which naturally made the problem worse and even harder to figure out. We eventually were forced to take memory dumps and use dump analysis tools. With the generous assistance of Greg Varveris, we were finally able to identify the culprit: our logging strategy. How ironic. And I mean real irony, not the fake Alanis Morrissette kind.

Correction: with the generous assistance of Greg Varveris, you were finally able to identify the culprit: your locking strategy.

Furthermore, why would you troubleshoot a deadlock with log messages? Think about this for a moment: logging only happens if the threads are active. Deadlocked threads: not active. The only sane conclusion one can arrive at in the event of a deadlock is that logging isn't going to help you much. Taking a crash dump and doing post-mortem debugging is what you do in the event of a deadlock (Varveris++). Reviewing your locking strategy is what you do in the event of a deadlock. But logging can't help you with a thread that is stuck waiting for a lock to free up because it is incapable of logging. It may be possible to add logging to figure out how the threads got in the state they're in, but the mere act of doing this implies your application has a threading/locking model that you don't understand.

This post was highly ineffective because the basic problems are A) failing to acknowledge a flawed locking strategy and B) misusing a tool (in this case, logging). And then when said tool is (unfairly) implicated as being the source of the problem, removing it from the application entirely. It's more like mindless thrashing than calculated debugging.

I do agree with Jeff that overzealous logging is a bad thing, and I would strongly advise against it. But I can't even tell you how many times logging has helped me solve a problem in our application. Here are some general rules of thumb I follow for logging:
  1. Keep a low signal-to-noise ratio; logging unnecessary messages only makes it more difficult for you to find interesting ones. Be brutal; remove anything you can deduce without the message.
  2. Keep a low signal-to-noise ratio; you do not have an infinite amount of disk space, so logs must be purged at some point in time. This means excessive logging impacts how much prior history you have. Sometimes it means log messages that were useful get purged and replaced by stuff that is not.
  3. Do not print out periodic messages to the log. If you must, print them as infrequently as possible.
  4. Keep messages as short as possible without making them cryptic.
  5. Always include variable values in messages (e.g. if you have the option between "blah failed!" and "blah failed with error code: E_INVALIDARG", the second is vastly preferable)
  6. If you are a real-time programmer (e.g. we do live video streaming), logging is always preferable to debugging. The act of debugging influences the system enough that logging is a far better option.
  7. There must be a way to access your logs. For a web application, this is obvious. You get them from the web server. For an application you distribute to customers, it can be more difficult. Implement a mechanism to fetch your logs, or give customers an easy way to generate and send them.
  8. We use two log levels: info and error. Some people prefer to add a dozen different levels, but in practice nobody knows the difference between INFO and IMPORTANT and ERROR and FATAL and SYSTEMEXPLODED and OMFG and JUSTKILLMENOW, etc.
  9. Always log date/time with each log message.
  10. Make errors visually distinguishable from info messages.
  11. On application start, log crucial information like application version, OS type, system info, etc. This can be really useful, especially if you have a versioned desktop application. (for a web app, not so much).
  12. If you're using C#, System.Reflection.MethodBase.GetCurrentMethod().Name is your friend. Avoid hard-coding class/method names in your log messages; when you go to refactor, it is a royal pain.
  13. Do implement a log rotation scheme. My favorite is to close logs at xx MB (we use 5 MB), and then zip them. We allocate yy MB of space for archived, zipped logs. Delete logs that exceed the allocated space (so maximum space occupied by logs is xx + yy; for us). There's also a ping-pong method, but it is less efficient overall.
  14. Provide a method to disable all logging.
Logging is valuable. Unfortunately, there is no general rule of thumb for determining if a given message is useful or just noise. It takes practice and maintenance to make a logfile a useful addition to an application. But knowing how to log messages in a way that is useful is a crucial skill for any programmer.

Wednesday, November 05, 2008

Ye Olde Rippin' Budget Build

I have to be honest: one of my favorite things to do is go on newegg and see the best budget system I can piece together.

Anyone can toss a buttload of money at a computer and end up with something really nice. It doesn't take any creativity, really--money solves a lot of problems. The real talent comes in making a shoe-string-budget system that is decently fast, efficient, has room to grow, uses quality components (i.e. your case that looks like it was from The Fast and the Furious? Just say no), and provides a delightful user experience.

I place a big emphasis on spending money to get a good case, power supply and peripherals. The reason for this is these are the only components in the computing industry that don't obsolete themselves in mere months. Video cards have an eight-month release cycle. Desktop processors have about a year between releases. Purchasing cutting edge often nets you 15-50% more performance, but often at 2-4 times the cost. Completely not worth it, IMO, unless you're a die-hard enthusiast and demand nothing short of the best.

A few other quirks about my PC preferences:
  • I believe my computers should be seen and not heard. I will never recommend a product that does not take your ears into consideration.
  • Energy efficiency is important. Not only does an efficient computer save you money on your electric bill, but I believe they are typically more stable (because they run cooler), quieter (because they require less cooling) and less prone to hardware failure (because you are not pushing crucial components like the power supply).
  • Riced out computer cases are for boys. Real men want sophisticated, proven industrial design. If SilentPCReview hasn't endorsed it, I generally don't consider it a viable option. If it has a spoiler, flaming decals, ground effects packages, seriously...it's just fugly. Please, no.
  • I hate cable clutter. It makes your case look like spaghetti, and reduces airflow/cooling performance. I opt for PSUs with modular cables, thinner SATA cabling, etc.
  • Death to the floppy drive!
  • SLI: sort of dumb unless you have a gigantic monitor (1920*1200, minimum) and want to play cutting edge games with uncompromising graphics at absurd frame rates. Very few people fit into this category, and the cost is substantial (two bleeding edge video cards + power supply capable of driving them = a whole lot of top ramen).
  • Nothing delights like a quality case, keyboard, mouse, speakers and monitor. These are the components that you truly interact with, and they are the biggest difference between a "cheap" system and something that feels refined.

Anyway, without further ado, my latest budget system.

Case: Antec Mini P180

This is an attractive, smallish case with a segregated partition for the power supply and hard drives, and an upper partition with room for a mATX motherboard. Read the SPCR review here.

Eye candy:

...normally this case is quite pricey, at ~$150, but occasionally newegg discounts the hell out of it. Right now it can be had for $79.00, which is a ridiculous deal for a very high quality case. See the SPCR review for more details, but it would be difficult to get a better case for the money.

Power Supply: OCZ ModXStream Pro 600W

Ample power (actually, far more than this build would ever need. Far more than almost any build would need, for that matter), modular cables, reputable vendor...not much to dislike here. Even more important is combined with the memory, it's $50!

Motherboard: MSI G31M3-F

You might be wondering what this motherboard has going for it, but consider this: it costs less than $50. It uses an Intel chip set, so stability is a given. It will accept modern, 45 nm quad-core processors, so plenty of room for upgrades in the future...what's not to like? And as was previously noted, motherboards obsolete themselves so quickly that I see no point in splurging. Next year's motherboard is guaranteed to be shinier, more enticing, more attractive, etc. But you spent all your dough on some motherboard that costs three (or four) times as much, and the actual measurable performance differences are often 2-5%. It's simply not worth the money.

So my strategy with motherboards is to spend as little as possible. There are much more meaningful ways money can be spent to improve user experience and/or performance.

Processor: Intel P5200

A nice, overclockable, dual-core, somewhere-between-E2200-and-E7200 processor that is miserly when it comes to pulling watts. Should you want something more meaty, I'd suggest the quad-core Q9550, but that will set you back an additional ~$225. And it's all going to be obsolete in a year (actually, more like two months), so don't even bother buying bleeding edge.

Memory: OCZ 2 GB DDR2 800

2 GB seems to be a goodly amount for a system these days. Go for four if you really think you need it (doubtful), but it's hard to beat simple math: $36 - $25 = $11 for 2 GB of memory. Plus, with the combo deal on the PSU, newegg basically pays you to take this memory off their hands. Suhweeeeet.

Video Card: HIS Radeon 4670

Budget gaming has never looked so good. Across the board, the 4670 is by far the best budget gaming card available, benchmarking somewhere between last year's ~$200+ offerings, the AMD 3850 and AMD 3870. It should be capable of driving even high-resolution monitors on most recent games, decoding 1080P H.264/VC-1 with great image quality enhancements, and is amazingly efficient at idle and load. The HIS card in particular comes with a more effective cooling solution that other video card vendors, so it runs quieter and cooler. And at $70 after rebate, it's impossible to argue with the value. For a beefier solution, consider a 48xx version, but expect to pay at least another $50-100.

Hard Drive: Samsung Spinpoint F1

Cheap, roomy and speedy. I dream of a day when these aren't $600, but that day will have to wait.

Keyboard: Logitech Wired Wave Keyboard

Ergonomic: check. Nice feeling: check. Full featured: check. Reasonable price for quality: check.

Mouse: Logitech G9 Gaming Mouse

Even if you aren't a gamer, having a precise mouse makes working with your computer a pleasure. You don't realize how much effort you take trying to click on things and position a mouse until you go from using a shitty one to a good one. In my opinion, the absolute worst place you could possibly skimp on is the mouse, since it is the once device we use where precision and ergonomics really matter. Think about the difference between using a laptop touchpad and a normal mouse. IMO, the difference between a normal mouse and a quality pointing mouse is the same.

Speakers: Logitech Z-2300

I prefer two-channel audio for my computer setups because running wires for rear channels is always a mess. Sub is a must. Nice sound is a must. The Z-2300 pleases without fail.

Another option is to buy a nice pair of headphones. The sound quality can't be beat, and if you're trying to tune out distractions they can be a life-saver.

DVD Burner: SAMSUNG SH-S203N


SATA, lowest published access times of any drives, numerous customer satisfaction awards, and $26 bucks. Sold!

Cost for core components:
Peripherals:
Considering speakers, mouse and keyboard could be had for as little as $20-30, $246 is admittedly pricey, so use your better judgment. But for ~$655 after rebates, this system is totally beyond decent. It has a very nice case and an excellent power supply. The CPU and video card combined are easily enough to do some decent gaming.

Total investment in what I call "replaceable" parts (CPU, video card, motherboard and RAM) is about $200, so in a year or two, this entire system could be upgraded to the latest, greatest crap and still be completely competitive. It's the advantage of building upon quality peripherals that don't obsolete themselves in mere months!

Thursday, October 16, 2008

Why Windows Media Player is not VLC

ExtremeTech had an article by John Dvorak outlining his Windows 7 wish list.

First, let me say that I am not a Dvorak fan. I think most of his columns are prone to hyperbole, and his technical expertise is questionable at best. I think of him as a Bill O'Reilly/Michael Moore of the computing world: basically a talking head who represents the triumph of outrage over substance.

That said, in the article, he actually poses a good question:

4. Fix the media player so it actually is a universal player—The media player should function as a universal player and actually play everything from MOV files from my camera to old AVI files. Windows Media Player wants everything to be a WMV file, and half of those don't always work. It plays some MP4 files and not others; it's spotty as can be. I just hate when I get some message or other saying that I do not have the right codec or whatever. What's the point of having a media player embedded in the system that doesn't work as well as VLC media player, a free player actively given away by VideoLAN.org? How is it that this VLC product, coded by a few guys in their spare time, can do a job 20,000 Microsoft coders cannot make Media Player do? I'd seriously like to know. Does this make any sense to anyone?
Unfortunately, yes, it does make sense to some of us. Let's go through each question, starting at the top.

What is the point of having a media player that doesn't work as well as a free player actively given away by VideoLAN.org? This is actually a great question. What Dvorak doesn't know is that audio, video and container formats are a veritable intellectual property minefield. Microsoft cannot just toss an H.264 decoder into Windows 7 without having to pay royalties to an organization called MPEG-LA. These royalties are not cheap. The sole difference between Apple and Microsoft in this domain is Apple bit the bullet and paid royalty fees to license codecs like H.264, MPEG2 and MPEG4.

Now consider VLC: they are an open-source project released under the GPL. They are providing implementations of audio and video codecs (including H.264, MPEG2, MPEG4, WMV, MP3, AAC, etc., all of which require license fees), but who exactly is paying the licensing costs? The answer is: absolutely nobody.

The VLC FAQ has a breezy explanation of this in section 3.4:
Some of the codecs distributed with VLC are patented and require you to pay royalties to their licensors. These are mostly the MPEG style codecs.

With many products the producer pays the license body (in this case MPEG LA) so the user (commercial or personal) does not have to take care of this. VLC...cannot do this because they are Free and Open Source implementations of these codecs. The software is not sold and therefore the end-user becomes responsible for complying to the licensing and royalty requirements. You will need to contact the licensor on how to comply to these licenses.

This goes for playing a DVD with VLC for your personal joy ($2.50 one time payment to MPEG LA) as well as for using VLC for streaming a live event in MPEG-4 over the Internet.
So, let me restate this: if you use VLC, technically it is your responsibility to contact MPEG-LA and any other licensing organization and figure out how you should best comply with their licensing. VLC claims they cannot do this because they are "Free and Open Source implementations of these codecs."

This is one instance where Free and Open Source Software sounds more like a selective disregard for intellectual property rights than a legitimate enterprise. What's really going on is MPEG-LA has no financial incentive to sue the pants off the creators of VLC because they don't have any money. But this doesn't change the fact that the VLC project is arguably "stealing" intellectual property that belongs to other people and then hiding behind some weak FOSS excuse (and, I suppose, the non-existence of revenue). The notion that any of their users are actually going to navigate MPEG-LA's licensing conditions and pay up is patently absurd.

(side note: these are the same people who would shit sideways if Microsoft was caught violating the GPL, so apparently their ethical standards of ownership are selectively applied wherever it makes sense, but I digress.)

Were Microsoft to dump Windows Media Player and start distributing VLC, MPEG-LA would be all over Microsoft like a fly on shit. Measly open source project with no revenue stream? Meh, I'll pass. Fortune-50 company?

Show me the money!
Show me the money, baby. Show me the money. (and no, I did not pay royalties to Tom Cruise. I guess we all fudge it on occasion)

How is it that this VLC product, coded by a few guys in their spare time, can do a job 20,000 Microsoft coders cannot make Media Player do? Again, the problem isn't a lack of programming talent; it's that the same standards of intellectual property that Microsoft is held to do not apply to the owners of the VLC project. Plain and simple. One option Microsoft has is to pay the licensing fees for a few dozen codecs, but this isn't cheap. It's one reason XP never had an MPEG-2 decoder (and thus millions of us threw up our arms in despair whenever we tried to play a DVD on Windows XP) and also the reason it doesn't have an H.264 decoder. Apple absorbs those costs to help prevent their customers from going insane. In any event, the fundamental issue has nothing to do with a lack of programmers, and everything to do with licensing costs and legal minefields.

A few other pot-shots before I close out this sorry tirade:
  • Windows doesn't play any MP4 files out of the box, because it lacks an MP4 parser entirely. (MP4 is a container format, and not an actual payload--I'm not sure if this is because of a licensing issue, or if MSFT's 20k programmers never got around to it) Me 1, Dvorak 0.
  • Windows Media Player is a universal player, because the underlying framework (DirectShow) can be extended to support almost any container or format. There is no technical reason WMP cannot playback any media file. Me 2, Dvorak 0
  • Windows Media Player does not want everything to be a WMV file. In fact, it doesn't even care what the file is. All it cares about is whether or not it can locate a parser to open the file, and a decoder for content inside of the file. The Windows Media codecs themselves are implemented as DirectShow (DMO) filters that WMP loads up when you hand it a file. Me 3, Dvorak 0
  • What's the point in ranting about WMP when you can just download VLC and give MPEG-LA the middle finger? Me 4, Dvorak -12
And this concludes my rant. Apologies in advance to FOSS (which I am still a proponent of) and the makers of VLC (which is by any measure, a superb but ethically perilous product).

Friday, August 01, 2008

RAID-0 and Amdahl's Law

When I see comments like this:

Ok, yeah, a 10k drive is fast. Guess what? Two 7200 drives in Raid 0 are faster. Four drives in RAID 0+1 are bigger, cheaper, faster, and... yeah. Better. You have a measly 300GB drive - I have a 1TB (2TB counting mirroring) array that's faster and with automatic mirroring for less than what you spent on one 10k drive. And your cost doesn't include the second slower drive for archiving.

...the little nerd-alarm in my head goes off, and I feel compelled to present the case against RAID-0. For those who aren't aware, RAID stands for Redundant Array of Independent or Inexpensive Drives. The basic idea is simple: by spreading data out over several drives in varying configurations, one can theoretically improve performance and/or protect against data loss. For example, by duplicating data across two separate drives, one drive can fail and there will be no loss of data. This is commonly called RAID-1.

RAID-0, on the other hand, is when data is evenly split between two drives. By doing this, you can read data from both drives simultaneously. However, this also doubles the propensity for data loss--a single drive failure means you lose big.

First of all, I should note that the ground I am about to cover has been discussed before. Anandtech evaluated RAID-o performance using two Western Digital Raptor drives by running them through numerous benchmarks, and concluded that:
If you haven't gotten the hint by now, we'll spell it out for you: there is no place, and no need for a RAID-0 array on a desktop computer. The real world performance increases are negligible at best and the reduction in reliability, thanks to a halving of the mean time between failure, makes RAID-0 far from worth it on the desktop.
StorageReview.com, long the most trusted source with respect to drive performance, has also made it quite clear how unthrilled they are with RAID-0. The administrator of the site, Eugene, states:
Given the recent return of various "should I raid?!?!??!" posts in the community, I should also take time to point out that the FarCry data presented above is relatively STR-heavy compared to the Office and even the High-End pattens that are broken down in the TB4 article.

Despite this, however, note that while (sequential transfer rates represent about 70% of all accesses in FarCry, the array spends only 15% of its time on sequential transfers. Doubling STR through a two-drive array halves this 15%.

Hence, the 10-20% improvement we see in SR's tests when going from a single drive to 2xRAID0 comes from the doubling in capacity (which, in the past, has more or less established itself as a 7-10% performance boost in our tests) + this small improvement.

In other words, sequential transfers already complete so quickly that they have in effect written themselves out of the performance equation. Doubling the performance of a factor that exerts such a small effect nets a small improvement... as one should expect.
What Eugene is getting at is drive performance can be broken down into two distinct factors: physically moving the actuator arm to the location of the data (i.e. seek time), and then actually reading the data. The former is quite slow, while the latter happens rapidly. RAID-0 has no affect on seek times, and effectively doubles read speeds--which means it doubles something that already happens quickly.

There is a formal name for this rule: Amdahl's Law. Consider a program consisting of two distinct routines, A and B:

The length of the line corresponds with how much time each routine takes. Notice that doubling A results in a far greater improvement than making B five times faster. Programmers frequently refer to Amdahl's Law when optimizing software--they first identify the longest running task, and then seek to optimize that task. Bad Programmers™ (yes, we exist) often spend time optimizing B, which is clearly less effective.

Now consider this graph (complements of Storage Review):

The first bar consists of the amount of time it takes a single Western Digital Raptor drive to perform a random read. The second bar consists of the amount of time it takes two Western Digital drives to perform a random read. The yellow section is the actual process of reading data, and the red section is the time it takes to seek to the location on disk. And this brings us to an important conclusion: RAID-0 really only has a big affect with respect to sequential reads, but sequential reads take so little time on modern hard drives that they are completely dwarfed by seek times. Doubling sequential read speeds means jack squat. Again, Eugene states that:
In a typical access pattern that features significant localization such as the Office DriveMark, an adept drive such as the WD740GD can achieve about 600 I/Os per second. Inversely stated, each I/O (which, again, consists of positioning + transfer) takes about 1.7 milliseconds. In a single-drive, highly-localized scenario, the Raptor average 1.7 milliseconds per I/O. Of this 1.7 ms, 0.3 ms, or 18%, is the transfer of data to or from the platter. The other 82% of the operation consists of moving the actuator to or waiting for the platter to spin to the desired location. The situation further polarizes itself as transfer rates rise. At 126 MB/sec, transfers consist of just 11% of the total service time. In effect, sequential transfer rates ranging from 50 MB/sec to 130 MB/sec and higher "write themselves out of the equation" by trivializing the time it takes to read and write data when contrasted with the time it takes to position the read/write heads to the desired location.

...

RAID helps multi-user applications far more than it does single-user scenarios. The enthusiasm of the power user community combined with the marketing apparatus of firms catering to such crowds has led to an extraordinarily erroneous belief that striping data across two or more drives yields significant performance benefits for the majority of non-server uses. This could not be farther from the truth! Non-server use, even in heavy multitasking situations, generates lower-depth, highly-localized access patterns where read-ahead and write-back strategies dominate. Theory has told those willing to listen that striping does not yield significant performance benefits. Some time ago, a controlled, empirical test backed what theory suggested. Doubts still lingered- irrationally, many believed that results would somehow be different if the array was based off of an SATA or SCSI interface. As shown above, the results are the same. Save your time, money and data- leave RAID for the servers!

...or for those of us that don't like verbosity, Eugene's article can be distilled: Amdahl's Law, Baby. There are situations where sequential read is important (e.g. video editing), but the propensity for data loss needs to be carefully considered with the potential trade offs. For the average desktop user, RAID-0 does not provide a substantial performance benefit and comes with an increased chance of data loss, and should be avoided at all costs.

Monday, June 16, 2008

The Joy of Smart Pointers

One of the biggest mistakes people new to DirectShow (and COM, for that matter) is using raw COM pointers. This article aims to present a compelling alternative.

All COM objects implement the IUnknown interface. IUnknown defines three standard methods, of which two are a concern of this article: AddRef() and Release(). When a COM object is created, its "reference count" (the number of objects holding claim to the newly created object) is set to 1. A program never directly deletes these objects; instead, a program calls Release(), which de-increments the reference count. If the reference count reaches 0, the object self-destructs.

The benefit of this is an object will remain in memory only as long as it needs to; the lifetime of an object is managed by its reference count. A programmer cannot accidentally delete an object from memory that another module or portion of the application may be using. Only when the object's reference count goes to 0 (as a result of all involved parties calling Release() on their interface pointers) does the object automatically destroy itself.

This is, sadly, a double-edged sword; by mismanaging reference counts, you can easily get yourself in trouble, and leak memory in a very real way.

Take the following function as an example:


HRESULT StartGraph( IGraphBuilder * pIGraphBuilder )
{
IMediaControl * pIMediaControl;
hr = pIGraphBuilder->QueryInterface( IID_IMediaControl, ( void ** ) &pIMediaControl );
if( FAILED( hr ) )
return hr;

return pIMediaControl->Run(); // THIS WILL LEAK A REFERENCE COUNT
}

This is how we would normally start a filter graph in DirectShow. This code technically works, but it has a serious flaw. Because it queries the filter graph manager for the IMediaControl interface, this increments the reference count on pIGraphBuilder. However, when we return, because IMediaControl was declared locally, it goes out of scope without ever being released, thus leaving behind an extra reference count on pIGraphBuilder. The above code is leaking a reference count, which means abandoned COM objects galore!

One way to (sort of) fix the above code would be:


HRESULT StartGraph( IGraphBuilder * pIGraphBuilder )
{
IMediaControl * pIMediaControl;
HRESULT hr = pIGraphBuilder->QueryInterface( IID_IMediaControl, ( void ** ) &pIMediaControl );
if( FAILED( hr ) )
return hr;

hr = pIMediaControl->Run();
if( FAILED( hr ) )
return hr; // THIS WILL LEAK A REFERENCE COUNT!

// Release our interface....
SAFE_RELEASE( pIMediaControl );
return hr;
}

There are still major flaws with this code. We successfully release the interface before exiting the function at the end; this will automatically de-increment the reference count on the filter graph manager. However, what if the call to IMediaControl->Run() fails? Again, we leak a reference count because we fail to release pIMediaControl! We successfully queried for the interface, which adds a reference to our Filter Graph Manager, but by exiting before we've called SAFE_RELEASE() we've again leaked a reference count. We could easily add another SAFE_RELEASE() call to the middle return, but consider this: what if this method was 100 lines long? 200 lines long? What if it had elaborate loops or other conditional logic? It becomes easy to miss a release, and the code is visually convoluted by clean-up logic.

The solution to this problem is to use Smart Pointers to manage reference counting. When a COM pointer wrapped with a Smart Pointer goes out of scope, it automatically calls Release(), thus alleviating the pain of manually releasing:


#include ; // For Smart Pointers!

...

HRESULT StartGraph( IGraphBuilder * pIGraphBuilder )
{
CComPtr pIMediaControl;
HRESULT hr = pIGraphBuilder->QueryInterface( IID_IMediaControl, ( void ** ) &pIMediaControl );
if( FAILED( hr ) )
return hr;

return pIMediaControl->Run();
}

Notice that we don't have to release the pointer at the end of the method; upon going out of scope, the Smart Pointer will automatically call Release().

Alternately, you can also use CComQIPtr (QI = Query Interface) for slightly cleaner code; the above could be written like so:


HRESULT StartGraph( IGraphBuilder * pIGraphBuilder )
{
CComQIPtr pIMediaControl( pIGraphBuilder );
if( !pIMediaControl )
return E_FAIL;

return pIMediaControl->Run();
}

Either method will work; the second is slightly cleaner but I prefer the former for more accurate HRESULTs.

There are still instances where you'll need to manually release a smart pointer--take the following GetPin() function for example:


//---------------------------------------------------------------//
// GetPin - returns a pin of the specified direction on the
// specified filter. Note that this will return connected or
// unconnected pins.
HRESULT GetPin( IBaseFilter * pFilter, PIN_DIRECTION PinDir, IPin ** ppPin )
{
if( ppPin == NULL || pFilter == NULL )
{
return E_POINTER;
}

CComPtr pEnum;
HRESULT hr = pFilter->EnumPins( &pEnum );
if( FAILED( hr ) )
{
return hr;
}

CComPtr pPin;
hr = pEnum->Next( 1, &pPin, 0 );
while( hr == S_OK )
{
PIN_DIRECTION ThisPinDirection;
hr = pPin->QueryDirection( &ThisPinDirection );
if( FAILED( hr ) )
{
return hr;
}

if( PinDir == ThisPinDirection )
{
// Found a match. Return the IPin pointer to the caller.
return pPin.CopyTo( ppPin );
}
// Release the pin for the next time through the loop.
pPin.Release();
hr = pEnum->Next( 1, &pPin, 0 );
}

// No more pins. We did not find a match.
if( hr == VFW_E_ENUM_OUT_OF_SYNC )
{
return VFW_E_ENUM_OUT_OF_SYNC;
}
else
{
return E_FAIL;
}
}

The above function is the same function defined in the DirectShow documentation, with several errors corrected. And, of course, no more sloppy raw COM pointers.

Note the IPin object pPin--because we're enumerating the filter for a pin, we have to manually call pPin.Release() before reusing the object. Also of note is the pPin.CopyTo() call, which is another convenient feature of smart pointers. CopyTo() should be used in the event of copying a smart pointer to a dumb pointer. However, in the above function, since the passed in pPin is either going to be a dereferenced smart pointer (which is, again, a dumb pointer) or a regular dumb pointer, the CopyTo() is appropriate.

One other feature of Smart Pointers is the overloaded assignment operator. Observe:


CComPtr pSomeFilter;


//Somewhere else
HRESULT CreateFauxFilter()
{
CComPtr pSomeOtherFilter;
HRESULT hr = pSomeOtherFilter.CoCreateInstance( CLSID_OfSomeOtherFilter );
if( FAILED( hr ) )
{
return hr;
}
else
{
// Here, we can use the "=" operator to copy one smart pointer
// to another smart pointer.
pSomeFilter = pSomeOtherFilter;
return hr;
}
}

So, pSomeFilter is assigned to pSomeOtherFilter, and when pSomeOtherFilter goes out of scope, it'll clean up any remaining reference counting issues. It's a handy way to only keep around a pointer if we know everything has occurred successfully. A common use of this is to instantiate all of your filter graphs with locally declared smart pointers, and only assign the smart pointers to your member variables at the end of the function.

You may think these are equivalent to the CopyTo() method of a CComPtr, but the assignment operator should only be used when explicitly copying one smart pointer to another. The CopyTo() should be used when copying a smart pointer to a dumb pointer.

Also, be mindful of accidentally mixing smart pointers, dumb pointers, and the assignment operator:


IBaseFilter * pDumbPointer;
CComPtr pSmartPointer;

pSmartPointer = pDumbPointer; // THIS IS OKAY (sort of)

pDumbPointer = pSmartPointer; // THIS IS VERY, VERY BAD!!

The really nasty thing about the latter assignment is that pDumbPointer will not equal NULL (thus evading many common checks for validity) and subsequent attempt to call any methods on the dumb pointer will result in Very Bad Things™. If you're mixing smart pointers with dumb pointers, consider removing the dumb pointers entirely, as the use of dumb pointers is generally a sign of a greater problem and using both is generally confusing and/or odd.

Wednesday, April 02, 2008

The N Habits of Highly Defective DirectShow Applications

First off, let me pay proper homage by linking to the inspiration for this post, The n Habits of Highly Defective Windows Applications. If you do Windows development, I highly recommend reading through this article; it covers numerous pitfalls in Windows application development.

Anyway, I stumbled upon that article and though that it could use a DirectShow version, since there are many poor examples of DirectShow coding floating around. Many of the defects discussed in this post are common and have been addressed in the newsgroups numerous times. Generally when something is going amiss in a DirectShow program, it's for one of the following reasons.

If you find that I've listed something you're doing (or not doing) in your program, don't take it personally. I know most of the below are defects from personal experience, which is why I've been nice enough to compile them so you don't have to suffer like I did (and trust me, a messed up DirectShow program is a way to truly suffer). I think of this article as a means of helping people improve their code, rather than an attempt on my behalf to belittle other people's coding practices.

In keeping with the original article's overall layout, I've assigned a severity to defects:
  • Potentially Fatal: The use of this technique shouldn't work at all. If it appears to, it is an illusion.
  • Potentially Harmful: There is an excellent chance that minor changes in the program will result in a malfunctioning program, or necessitate gratuitous changes that a proper program construction would not have required.
  • Inefficient: this covers most polling situations. You will be wasting vast amounts of CPU time and accomplishing nothing.
  • Poor Style: While the code works, it represents poor style of programming. It should be cleaned up to something that is manageable.
  • Deeply Suspect: While the code works, there is almost certainly a better way to accomplish the same task which would be simpler, faster, cleaner, or some other suitably positive adjective. Generally, there are sometimes valid reasons for using the technique specified, but they are fairly rare and easily identified. You probably don't have one of these reasons.


Not using Smart Pointers
Severity: Potentially Harmful, Poor Style, Deeply Suspect

The most beneficial thing you can do for your program is use Smart Pointers instead of declaring regular COM pointers.

What are smart pointers, you ask? Smart Pointers are a simple wrapper class around a COM pointer which help manage the object's scope, in addition to several other nice perks. Instead of doing this:

IGraphBuilder * pIGraphBuilder; // A regular stupid pointer. This is bad.
HRESULT hr = CoCreateInstance( CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, IID_IGraphBuilder, ( void ** ) &pIGraphBuilder ); // yuk!

...try doing this:

CComPtr pIGraphBuilder; // A Smarter pointer--much better.
HRESULT hr = pIGraphBuilder.CoCreateInstance( CLSID_FilterGraph ); // just a tad less ugly than the one above

Using smart pointers will:

  • Make your code much cleaner.
  • Make your code much safer.
  • Enable you to do things that normally would be difficult, clumsy, and/or dangerous with normal pointers.

Since this is such a huge subject and so many people make the mistake of not using Smart Pointers(this is unsurprising, given that the DirectShow documentation itself does not use smart pointers--shame on MSFT!), I've written an entire article devoted to this very subject. Also, The March Hare (another newsgroup junkie) has a quick tutorial on his site--click here to visit that article.

Not using Smart Pointers will make any substantial DirectShow program buggy, unreliable, unstable, and prone to leaking memory. Using them also makes DirectShow code more maintainable and easy to read.

Using wrong synchronization primitives (i.e.: mutex, critical sections, etc.) for a given task.
Severity: Inefficient, Poor Style

Programming audio and video is a threaded endeavor, by its very nature. As such, Microsoft has provided us with all sorts of ways to manage threads. The three most common ways of doing this are:

  1. Mutexes - A mutex is a synchronization primitive that prevents two threads from executing the same code simultaneously. The two threads do not need to be in the same process space. See here for more information.
  2. Critical Sections - A Critical Section is a synchronization primitive that prevents two threads from executing the same code simultaneously (yes, its function is identical to that of a mutex). The two threads need to be in the same process space. See here for more information.
  3. Semaphores - A semaphore is a synchronization primitive that maintains a count between zero and a specified maximum value. The count is decremented each time a thread completes a wait for the semaphore and incremented each time a thread releases the semaphore. When the count reaches zero, no more threads can successfully wait for the semaphore object state to become signaled. The state of a semaphore is set to signaled when its count is greater than zero, and non signaled when its count is zero. The semaphore object is useful in controlling a shared resource that can support a limited number of users. It acts as a gate that limits the number of threads sharing the resource to a specified maximum number. See here for more information.

Phew!

Because a mutex actually shares a lock between processes (rather than within the same process space), it involves more overhead than a critical section, which only has a scope of the host process.

Semaphores aren't typically used in DirectShow coding, because they're more applicable to applications where the need to limit access to a given number of threads is a requirement (e.g.: database applications, web servers, etc.). While a semaphore with a one-thread limit is functionally the same thing as a mutex or critical section, the performance is not.

Generally, a critical section is the correct primitive to be using in DirectShow. Using anything else is suspect.

Lastly, using critical sections in filter and DirectShow code is easier as some very kind person has been nice enough to write the two classes that are the subject of our next bad habit:


Not using CCritSec and CAutoLock
Severity: Potentially Harmful, Inefficient, Poor Style, Deeply Suspect

CCritSec and CAutoLock are two classes provided by the DirectShow SDK that help simplify the implementation of thread synchronization.

The CCritSec class wraps a critical section object and contains methods to lock and unlock the critical section at will--additionally, it handles created and destruction of the critical section, making it very handy for keeping your code leak free.

However, the CAutoLock is where things get really slick. Observe the following code:


HRESULT CMyClass::SomeMethod()
{
// We can't have threads accessing this at will!
CAutoLock MyLock( &myCCritSecInstance );

// do something...

// Here we can return without releasing the lock--because the
// destructor of the CAutoLock class takes care of it for us
return S_OK;
}

...The beauty of the above code is that you can't forget to release the lock to your critical section because it's managed by our CAutoLock instance. This code, on the other hand, is heinously unsafe:

HRESULT CMyClass::SomeMethod()
{
myCCritSecInstance.Lock();

// Do something for a few hundred lines of code where we potentially
// forget to release the above lock, encounter an exception, or
// have some sort of failure that we didn't account for. This results in a
// deadlock.

myCCritSecInstance.Unlock();

return S_OK;
}
...the bad thing about it is that the programmer must do their own accounting throughout the code. It can be done, but the probability of programmer error increases substantially, and the resulting code is difficult to maintain and troubleshoot.

Why make an accounting nightmare? It's much easier and more reliable to let C++ handle scope than to attempt to deal with manually locking and unlocking critical sections. There's also an entire article devoted to using these classes for thread protection.


Not protecting graph construction and destruction with a synchronization primitive.
Severity: Potentially Fatal


Because graph construction and destruction can take substantial amounts of time to happen, the possibility of these two procedures overlapping becomes quite realistic.

I've seen this error several times. Somebody will post code that looks something like:


// Somewhere, Graph Builder is defined...
CComPtr m_pGraph;

// Function to build graph
HRESULT BuildMyGraph()
{
HRESULT hr = m_pGraph.CoCreateInstance( CLSID_FilterGraph );

// ...a couple hundred lines of code and setup pass

hr = m_pGraph->Render( &pSomeIPin );
return hr;
}


// Function to tear down graph
HRESULT TearDownGraph()
{
m_pGraph.Release();
return S_OK;
}

...there's nothing wrong with the above code, except for the fact that m_pGraph isn't protected by any thread synchronization mechanism. If your calls to the functions are protected by a critical section, great--read no further. Otherwise, the above code is dangerous because someone may call TearDownGraph() right in the middle of a call to BuildMyGraph(), which results in m_pGraph being a null pointer and ultimately an application crash.

If a programmer is really unlucky, the above code will always work until someone runs it on a dual core machine, or a faster/slower machine than your development machine, etc. Then they have something that worked without fail in the office, but crashes and burns out in the field.

Ideally, the above should be protected with a critical section that makes it impossible to call the two functions simultaneously. One would have to wait for the other to finish, and vice-versa.

Ideally, the above code should resemble something like so in order to be thread safe:

// Function to build graph
HRESULT BuildMyGraph()
{
// LOCK THIS CODE
CAutoLock cAutoLock( &m_pMyCCritSecInstance );
CComPtr m_pGraph;
HRESULT hr = m_pGraph.CoCreateInstance( CLSID_FilterGraph );

// ...a couple hundred lines of code and setup pass

m_pGraph->Render( &pSomeIPin );
return hr;
}


// Function to tear down graph
HRESULT TearDownGraph()
{
// LOCK THIS CODE
CAutoLock cAutoLock( &m_pMyCCritSecInstance );

m_pGraph.Release();
return S_OK;
}

This is a pretty simplified example, but the point stands.

Assigning smart pointer to a regular pointer.
Severity: Potentially Fatal

One of the benefits of smart pointers is that you can assign one smart pointer to another smart pointer because someone has cleverly overloaded the "=" operator. This is very useful; you can use local smart pointers to build a graph and only assign them at the end of the build process, thus never keeping around any permanent COM pointers unless necessary. However, trying to assign a regular pointer to a smart pointer is a big, big mistake.

I've seen programmers do this several times. A lot of this comes from the fact that people typically don't use smart pointers to begin with because none of the DirectShow samples use them. So you'll see something that ends up like so:


// Somewhere:
IGraphBuilder * m_pIGraphBuilder;

// Somewhere else:
CComPtr pIGraphBuilder;
pIGraphBuilder.CoCreateInstance( CLSID_FilterGraph );

// And, somewhere else...
m_pIGraphBuilder = pIGraphBuilder; // this is bad!
...The insidious thing about this error is that m_pIGraphBuilder will appear to be a valid pointer, but drilling down to the guts, it'll be null and ultimately result in a crash.

Not calling CoInitialize and CoUninitialize on new threads.
Severity: Potentially Fatal or Harmful

One of the fundamental rules of modern-day COM is that every thread that uses COM should first initialize COM by calling either CoInitialize or CoInitializeEx. Not doing so can result in some strange behavior--if you're lucky, the behavior will just be failed COM API calls. If you're not so lucky, the problems can be rather sublime.

An example of correctly calling CoInitialize on a new thread:


unsigned int dwThreadId = 0;
HANDLE hThreadHandle;
hThreadHandle = (HANDLE) _beginthreadex( NULL, 0, BuildMyFilterGraph, pMyData, 0, &dwThreadId );

...the above code launches a new thread, calling some generic method/function called BuildMyFilterGraph. Now, in BuildMyFilterGraph:

unsigned int __stdcall BuildMyFilterGraph( LPVOID lpParam )
{
CoInitializeEx( NULL, COINIT_APARTMENTTHREADED );

// Build our graph, do something, etc....

// When we're ready to exit the thread
CoUninitialize();
return 1;
}
So, the above calls CoInitializeEx when the thread is launched, and calles CoUninitialize when it's all done. Great.

However, the above code has the same issue that not using smart pointers has--what if the thread exits before calling CoUninitialize? What if the function is modified by another programmer who exits the thread and forgets to call CoUninitialize? We could account for each and every one of these exits by calling CoUninitialize directly, but such a solution is clumsy at best, especially given that DirectShow code tends to be sequential and the accounting often snowballs out of control.

The solution to this is actually quite simple:


class CInitCom
{
public:
CInitCom(DWORD dwCoInit = COINIT_APARTMENTTHREADED )
{
CoInitializeEx( NULL, dwCoInit );
}
~CInitCom()
{
CoUninitialize();
}
};
A coworker of mine clued me in on this class, and it's quite elegant: the constructor calls CoInitializeEx, and the subsequent call to CoUninitialize is managed entirely by the scope of the class.

Using this class, our BuildMyFilterGraph method/function now becomes:


unsigned int __stdcall BuildMyFilterGraph( LPVOID lpParam )
{
CInitCom cInitCom;

// Build our graph, do something, etc....

// Here, we just exit. The call to cInitCom's destructor will take care
// of calling CoUninitialize for us.
return 1;
}
We no longer have to clean up after ourselves. The code is protected from people exiting or modifying the function/method in strange ways, and much cleaner overall.


Deriving transform filter from CTransInPlace when CTransform should be used.
Severity: Inefficient, Deeply Suspect


Probably once a month or so you'll see someone post a plea for help on the newsgroups about how their transform filter works, but it's really slow. This is almost always a consequence of people deriving their filter from CTransInPlaceFilter instead of CTransformFilter.

The difference between CTransInPlaceFilter and CTransformFilter can appear to be subtle, but they're really two very different creatures. Many people are attracted to the very simple implementations of CTransInPlaceFilter derived filters, which can be as short as fifty lines of code (see the null filter sample in the DirectShow samples for a good example of this). CTransInPlaceFilter derived implementations modify the frame data of passing samples in its original buffer and don't have their own allocator.

To paraphrase the notes in the base classes defining In-Place Transform Filters: An in-place transform tries to do its work in someone else's buffers. It tries to persuade the filters on either side to use the same allocator (and for that matter the same media type). So, essentially, an In-Place Transform filter that is positioned directly upstream from a video rendering filter typically ends up having access to frame data after it already resides in video memory.

Writing to video memory is really, really fast--reading from video memory is really, really slow. So if you read from the buffer in the IMediaSample supplied in the Transform method of a CTransInPlaceFilter, you can expect to have seriously slow performance. Writing to the buffer is very quick and incurs no additional penalty, but any sort of read operation will incur steep penalties.

Moral of the story: don't read from video memory. The most common way this happens is people deriving their transform filters from CTransInPlaceFilter instead of CTransformFilter, which has its own allocators. CTransformFilter is the appropriate base class to use if you're going to be reading from a buffer as it passes through your transform filter.


Not Checking returned HRESULT values
Severity: Potentially Harmful, Poor Style, Deeply Suspect

Almost all DirectShow functions (and COM functions, for that matter) return an HRESULT data type. This result indicates if a given function was successful or if it failed. Often times, the failure is very specific and can lead to easy troubleshooting of a problem. Other times, a function returns success, but it has multiple successful returns.

Knowing that, it's very clear that not checking an HRESULT for failure or success is a great way to write a program that fails miserably.

Observe the following function:


// Code creates a Filter graph manager and assigns it to a member
// variable.
HRESULT InstantiateGraphbuilder()
{
CComPtr pIGraphBuilder;
pIGraphBuilder.CoCreateInstance( CLSID_FilterGraph );

m_pIGraphBuilder = pIGraphBuilder;
return S_OK;
}
So, the above code has several flaws. For starters, CoCreateInstance returns an HRESULT, which the above code does absolutely nothing with. It falls right on the floor. If you forgot to initialize COM and called the above code, it would fail, attempt to assign the pointer, and the programmer would have no idea why the program failed, simply because there was no check for the HRESULT return value. This is a somewhat simplified example of forgetting to check the HRESULT, but the general idea is the same.

The second flaw is returning S_OK, which is the general HRESULT value for success--this is also pointless. This denies the calling function from making sure things were successful or taking any sort of action based on a more specific return value. As it stands, the above code might as well have a void return value.

A much better implementation:


// Code creates a Filter graph manager and assigns it to a member
// variable.
HRESULT InstantiateGraphbuilder()
{
CComPtr pIGraphBuilder;
HRESULT hr = pIGraphBuilder.CoCreateInstance( CLSID_FilterGraph );
if( FAILED( hr ) ) // CHECK RETURN VALUE USING FAILED MACRO
{
return hr;
}
else
{
m_pIGraphBuilder = pIGraphBuilder;
return hr;
}
}

The above code checks the return value of the CoCreateInstance using the FAILED() macro. You can also check an HRESULT for success using the SUCCEEDED() macro, which also takes an HRESULT as a parameter and functions in exactly the opposite way as FAILED().

This is really just a basic example; there are literally thousands of possible return types for an HRESULT, and certain methods have their own specific set of HRESULT values they return that should be checked consistently.

Lastly, you can do the following to get more information about a given error:

#include

HRESULT hr = S_OK;

// we encounter some error which has been assigned to hr

CString Error( ::DXGetErrorString9( hr ) );
Error += ::DXGetErrorDescription9( hr );

...this is nice because it's easy to output it to a message log or display the error in English rather than some weird value in hex. Additionally, there's a utility in the DXSDK called "DirectX Error Lookup"; as the name implies it will look up a given hex or decimal value and display the error for you.

Forgetting to catch and examine HRESULT values consistently is, without a doubt, deeply suspect. Additionally, make sure functions return HRESULT values that are useful.

Using a do...while(0) loop
Severity: Potentially Harmful, Poor Style, Deeply Suspect

Observe the following function:

HRESULT ShowFilterPropertyPage(IBaseFilter *pFilter, HWND hwndParent)
{
HRESULT hr;
ISpecifyPropertyPages *pSpecify=0;

if (!pFilter)
return E_NOINTERFACE;

// Discover if this filter contains a property page
hr = pFilter->QueryInterface(IID_ISpecifyPropertyPages, (void **)&pSpecify);
if (SUCCEEDED(hr))
{
do
{
FILTER_INFO FilterInfo;
hr = pFilter->QueryFilterInfo(&FilterInfo);
if (FAILED(hr))
break;

CAUUID caGUID;
hr = pSpecify->GetPages(&caGUID);
if (FAILED(hr))
break;

// Display the filter's property page
OleCreatePropertyFrame(
hwndParent, // Parent window
0, // x (Reserved)
0, // y (Reserved)
FilterInfo.achName, // Caption for the dialog box
1, // Number of filters
(IUnknown **)&pFilter, // Pointer to the filter
caGUID.cElems, // Number of property pages
caGUID.pElems, // Pointer to property page CLSIDs
0, // Locale identifier
0, // Reserved
NULL // Reserved
);

CoTaskMemFree(caGUID.pElems);
FilterInfo.pGraph->Release();

} while(0);
}

// Release interfaces
if (pSpecify)
pSpecify->Release();

pFilter->Release();
return hr;
}
This is an actual function taken out of the DirectShow samples provided by Microsoft. The use of the do...while(0) loop is strange for two reasons:

  1. If you're like me and chronically dyslexic, it looks as though the loop never exits because the break points will rarely (if ever) get hit, and
  2. It will exit, because it's a while(0) loop--which begs the very simple question: WHY?

Why would a programmer do such a strange thing? The removal of the loop causes the function to work exactly the same way, except for one small thing: the programmer didn't use smart pointers, which meant they had to find alternate ways to clean up. So in the event of an error, they "break" out of the loop and clean up after themselves (albeit poorly--the code will leak a reference inside the FILTER_INFO structure if the second break is executed).

If you're trying to find creative ways to clean up after COM pointers, please, just take a look at smart pointers to avoid the above, which is absolute foolishness. A much, much better rewrite of this function is:


HRESULT ShowFilterPropertyPage( IBaseFilter * pFilter, HWND hwndParent )
{
if( !pFilter )
{
return E_POINTER;
}

CComPtr pSpecify;

// Discover if this filter contains a property page
HRESULT hr = pFilter->QueryInterface( IID_ISpecifyPropertyPages, ( void ** ) &pSpecify );
if( SUCCEEDED( hr ) )
{
FILTER_INFO FilterInfo;
hr = pFilter->QueryFilterInfo( &FilterInfo );
if( FAILED( hr ) )
{
return hr;
}

CAUUID caGUID;
hr = pSpecify->GetPages(&caGUID);
if (FAILED(hr))
{
FilterInfo.pGraph->Release();
return hr;
}

// Display the filter's property page
OleCreatePropertyFrame(
hwndParent, // Parent window
0, // x (Reserved)
0, // y (Reserved)
FilterInfo.achName, // Caption for the dialog box
1, // Number of filters
(IUnknown **)&pFilter, // Pointer to the filter
caGUID.cElems, // Number of property pages
caGUID.pElems, // Pointer to property page CLSIDs
0, // Locale identifier
0, // Reserved
NULL // Reserved
);

CoTaskMemFree(caGUID.pElems);
FilterInfo.pGraph->Release();
}

return hr;
}

...no strange while loop, and no leaks.

Using a goto Statement
Severity: Potentially Harmful, Poor Style, Deeply Suspect


This is really a variant on the above issue (use of a do...while(0) loop) and not using smart pointers.

Statements like goto should not be used, because they produce unmanageable code. More importantly, people almost exclusively use the goto statement as a means of cleaning up after COM pointers in the event of an error. Something much like so:


HRESULT SomeFunction()
{
IBaseFilter * pSomeFilter; // a very friendly dumb pointer
HRESULT hr = pSomeFilter.CoCreateInstance( CLSID_SomeFilter );
if( FAILED( hr ) )
goto Exit;

// similar error checking for another couple hundred lines of code...

Exit:
SAFE_RELEASE( pSomeFilter );
return hr;
}

None of this nonsense is necessary with smart pointers. Much to my dismay, Microsoft even has this macro available in the WMFSDK and it is commonly used throughout their samples:

#ifndef GOTO_EXIT_IF_FAILED
#define GOTO_EXIT_IF_FAILED(hr) if(FAILED(hr)) goto Exit;
#endif
...this is employed liberally throughout the WMF samples to help deal with cleaning up errant COM pointers. I've seen this exact macro used in DirectShow code as well; needless to say, this is poor code! Not only does it employ an already suspect goto statement, but it will ultimately produce unmanageable, error prone code that's difficult to support.

Never use a goto statement in your DirectShow code. If you're tempted to use one, ask yourself the following questions:

  1. Why would I use a goto statement to clean up my COM pointers?
  2. Is there something better to use?

...and then go read up on smart pointers. Again, if you're having issues cleaning up after COM pointers, the simple, effective solution to that problem is to use Smart Pointers.


Leaking memory or ending up with invalid pointers while using AM_MEDIA_TYPE
Severity: Potentially Fatal


Because of the way AM_MEDIA_TYPE has been defined, it is very easy to leak memory using it. This problem is surprisingly pervasive, with examples of potential leaks riddled throughout the SDK samples.

First, let's look at how AM_MEDIA_TYPE is definied:


typedef struct _MediaType
{
GUID majortype;
GUID subtype;
BOOL bFixedSizeSamples;
BOOL bTemporalCompression;
ULONG lSampleSize;
GUID formattype;
IUnknown * pUnk;
ULONG cbFormat;
BYTE * pbFormat;
} AM_MEDIA_TYPE;
AM_MEDIA_TYPE is a structure used to define the format of a given audio or video type. Most of the variables are pretty self-explanatory. However, note the GUID value "formattype"; this variable defines a type that is subsequently represented with another structure. This "other" structure is referenced as a BYTE pointer, pbFormat, and a variable to specify the length of the structure, cbFormat. For example, pbFormat often points to a VIDEOINFOHEADER or WAVEFORMATEX structure.

pbFormat can, in theory, point to just about anything, and of just about any length. This can lead to some pretty elaborate memory leaks. For example, this code excerpt is from the contrast filter sample:


AM_MEDIA_TYPE * pAdjustedType = NULL;
pMediaSample->GetMediaType(&pAdjustedType);

if(pAdjustedType != NULL)
{
if(CheckInputType(&CMediaType(*pAdjustedType)) == S_OK)
{
m_pInput->CurrentMediaType() = *pAdjustedType;
CoTaskMemFree(pAdjustedType);
}
else
{
CoTaskMemFree(pAdjustedType);
return E_FAIL;
}
}

The above code looks fine--the AM_MEDIA_TYPE pointer is created using GetMediaType, and if created, should be destroyed using CoTaskMemFree().

Sadly, this is not the case--the above code will leak sizeof(cbFormat) bytes with every AM_MEDIA_TYPE that is successfully created. The reason for this is GetMediaType returns a full AM_MEDIA_TYPE structure, and internally will call CoTaskMemAlloc to allocate pbFormat--calling CoTaskMemFree merely deletes the AM_MEDIA_TYPE structure, and will not clean up the associated pbFormat structure. To delete this format block, DeleteMediaType should be called. DeleteMediaType deletes both the AM_MEDIA_TYPE structure and also the pbFormat structure. For reference, it looks like so:

void WINAPI DeleteMediaType(AM_MEDIA_TYPE *pmt)
{
// allow NULL pointers for coding simplicity

if (pmt == NULL)
{
return;
}

FreeMediaType(*pmt);
CoTaskMemFree((PVOID)pmt);
}

// DeleteMediaType calls FreeMediaType, although FreeMediaType can
// be used to only delete the pbFormat structure for an AM_MEDIA_TYPE
void WINAPI FreeMediaType(AM_MEDIA_TYPE& mt)
{
if (mt.cbFormat != 0)
{
CoTaskMemFree((PVOID)mt.pbFormat);

// Strictly unnecessary but tidier
mt.cbFormat = 0;
mt.pbFormat = NULL;
}

if (mt.pUnk != NULL)
{
mt.pUnk->Release();
mt.pUnk = NULL;
}
}

...DeleteMediaType calls FreeMediaType (which deletes pbFormat) and then calls CoTaskMemFree, which deletes the rest of the AM_MEDIA_TYPE structure.

Another major problem is using the assignment operator between two AM_MEDIA_TYPE structures. The problem with this is that it will only copy the value of the pbFormat pointer, and not the data it actually points to. Should the right hand argument be deleted correctly (although assuming someone is trying to use the "=" operator between AM_MEDIA_TYPE, they might not be deleting it correctly anyways), then the pointer in the copy will be null, which can result in some rather strange behavior when DirectShow is asked to handle a bogus pbFormat block.

A better solution to all of this is to avoid the use of AM_MEDIA_TYPE whenever possible, and instead use CMediaType. CMediaType is a class derived from an AM_MEDIA_TYPE structure that includes methods in the destructor that make it much more difficult to leak memory. It also overloads the assignment operators, so in most instances, it will copy the pbFormat block automatically. Basically, it provides a lot of protection against leaking memory and having bogus pointers.

Mixing AM_MEDIA_TYPE and CMediaType can still be nasty, however:

STDMETHODIMP SomeClass::GetAMMediaType( AM_MEDIA_TYPE * pAMMEDIATYPE )
{
*pAMMEDIATYPE = *m_pCMediaType;
return S_OK;
}

The above will work, but only so long as m_pCMediaType is a valid CMediaType structure. If m_pCMediaType is destroyed, the pointer in pAMMEDIATYPE is no longer valid, because it merely copied the value of pbFormat and not the contents. It would be much better to do this:

STDMETHODIMP SomeClass::GetAMMediaType( CMediaType * pCMediaType )
{
*pCMediaType = *m_pCMediaType;
return S_OK;
}

...now the format block will be copied automatically, and because the type here is CMediaType, deallocation is managed automatically.

Not listening for and responding to DirectShow events
Severity: Potentially Harmful, Deeply Suspect

DirectShow provides a mechanism for listening and responding to events from the underlying filter graph. Via the IMediaEvent and IMediaEventEx interfaces, an application can listen for events, many of which are highly useful and/or informative.

Not all of the events sent by the underlying filter graph and the filter graph manager are useful, but in many instances, failures in the baseclasses or the underlying graph are communicated exclusively by the DirectShow eventing. The underlying implication here is pretty straightforward. If an application isn't listening for events, those failures:

  1. Fall on deaf ears--the application doesn't know something has gone wrong, and:
  2. Are almost always impossible to debug after the fact, because the specific HRESULT failure returned in the event interface, as well as the specific event, are long gone. Had the events been received, the application can potentially recover from the error, and also log the problem for future troubleshooting.

To list one example, I troubleshot a DirectShow application with several custom filters, including a custom source filter derived from CSource and CSourceStream. The source filter was logging errors with a custom logging interface. In one run of the application, the Deliver() call in the overridden DoBufferProcessingLoop returned 0x8004020B (VFW_E_RUNTIME_ERROR: A run-time error occurred. How informative!). Going through the baseclasses, I quickly noticed that this error was commonly associated with render and transform filters, and whenever it occured, it would send EC_ERRORABORT along with an HRESULT detailing the failure. Some filter downstream from the source filter had encountered a runtime error, and was communicating it both via a failure to the Deliver call, and a much more informative event to the controlling application.

Because the application was not listening for events, it had no idea that a real issue had even occured. The source filter continued chugging away, for hours, doing almost nothing (well, nothing besides logging ridiculous amounts of VFW_E_RUNTIME_ERROR failures) and oblivious to the fact that a serious error had occured. To add insult to injury, because the event was not received (and thus no HRESULT was available for further troubleshooting), it's almost impossible to know "why" or "where" the problem even occured. It could have been in any of the several downstream filters, and for any number of reasons.

Mitigating this is very, very easy:

  • Make sure that your application listens for events.
  • Make sure it responds to events.
  • Make sure you log extranious events for post-run analysis. Not doing so will result in erratic behavior and strange failures that are impossible to isolate after they've occured.



Not using DirectShow eventing in custom filters, or using a secondary eventing scheme
Severity: Potentially Harmful, Deeply Suspect

I've seen several custom filters now that implement their own home-baked eventing scheme. This is bad for several reasons:

  • DirectShow provides an eventing mechanism, free of charge, that filters can leverage. Not using it means someone is developing something that already exists. Re-inventing the wheel = bad
  • The DirectShow eventing is asynchronous. Not using asynchronous eventing (which I've seen several instances of now) means the possibility of very elaborate deadlocks becomes much greater.
  • An application can consolidate two eventing interfaces into one. Really, why make things difficult? Multithreading is complicated as it is; having redundant eventing handlers (e.g.: the standard DShow eventing, as well as some other custom eventing) is literally begging for a few extra late nights at the office.

To elaborate on all of the above:

DirectShow filters can leverage the IMediaEventSink interface, which allows filters to notify the filter graph manager of events via the Notify() method. Applications should not use this interface; it is specifically for filters. Filters may also define custom events with the range of EC_USER and higher, with a few relatively minor caveats:

  • The Filter Graph Manager cannot free the event parameters using the normal IMediaEvent::FreeEventParams method. The application must free any memory or reference counts associated with the event parameters.
  • The filter should only send the event from within an application that is prepared to handle the event. (Possibly the application can set a custom property on the filter to indicate that it is safe to send the event.)

That being said, there is a very comprehensive listing of events, and most filters would probably not need to define custom events.

The other really helpful thing about using IMediaEventSink::Notify is not having to worry about sending off events synchronously. Sending a synchronous event from the CSourceStream's worker thread that causes the graph to get torn down is instantaneous deadlock. DirectShow is heavily threaded; between implementing my own asynchronous eventing service and simply leveraging the standard one provided to me, the choice becomes rather obvious.

Lastly, the calling application can now centralize graph handling in one routine. There are not multiple interfaces to reconcile, and assuming the second eventing interface is asynchronous (like it should be), no synchronization issues between the two handlers.

Make life easy: always use the standard DirectShow eventing in custom filters unless there is absolutely a need to do otherwise. Quite frankly, I can't think of an instance where the need would be absolute, but always and never are two words to always remember to never use.


Registering Filters with the Operating System
Severity: Potentially Harmful, Deeply Suspect

First off, I hesitate to list this defect. After all, I am essentially telling you not to do what the documentation tells you to do. But after supporting a production application for the last several years, I am of the opinion that anywhere you are using Intelligent Connect, your application will (sooner or later) fail in the wild. By using this feature of DirectShow, you are exposed to any arbitrary filter (or filters) the user installs (coughffdshowcough). Other downsides to filter registration include people being able to re-use your filters, making your application easier to reverse-engineer, and exposing settings in the registry that can be modified by third-party applications. You cannot count on filter merit to save you.

Simply put, you're asking for trouble. In the case of my application, it took a year or two, but one day a support case came in where the application was crashing. The source? I had forgotten to change a small piece of the application to use ConnectDirect() rather than Connect(), which resulted in my filter graph picking up some rogue filter. I was crashing inside someone else's code. I had previously removed all of the Connect() calls, since those had resulted in crashing in the wild via the same mechanism, but I missed one spot when I added a new feature to the application. And surprisingly, it caught up with me.

One other substantial pain-point with respect to registering filters is registration typically happens at install-time, so it's one more thing that can potentially cause the installer to fail. DirectShow filters often end up with numerous dependancies (many of my filters have dependancies on several other runtimes), and if any of them are unfulfilled, the filter will not register, resulting in a failed installer or error messages the user doesn't understand. Sometimes all that needs to happen is the filters need to be re-registered after the installer has run, but if you avoid the registry, you avoid this problem completely.

The solution? Do not register your filters. Simply instantiate them with new.

(addendum: this advice doesn't apply if you are in fact attempting to integrate with existing applications. If your filter is specific to your application, then consider not registering and using ConnectDirect() to manually build your graph)


Not using IFileSourceFilter in Custom Filters
Severity: Poor Style, Deeply Suspect

IFileSourceFilter is an interface "...to set the file name and media type of the media file that they are to render." This may sound like a minor interface, but the advantages of using IFileSourceFilter are huge. First of all, it is the standard interface for configuring source filters, so suddenly any code that uses IFileSourceFilter will be capable of loading up your filter. Second, testing your filters in applications like GraphEdit becomes trivial. Lastly, should you want to register a custom protocol or file type and associate it with your filter, you must implement IFileSourceFilter.

Do you have to implement IFileSourceFilter? No, of course not. But, without this interface, your filter is largely incompatible with most existing DirectShow applications, and people using your filter will have to do a lot of extra work to make it all happen. Your filter is also much more difficult to test because you can't fire it up in GraphEdit. And it also precludes you from using your filter in Windows Media Player, Windows Media Center, and a whole host of other applications that depend on this standard interface. In general, source filters should always implement IFileSourceFilter.

Wednesday, March 12, 2008

IsBad(Enum.IsDefined) == true

There isn't much to say about Enum.IsDefined that hasn't been said before. Much of the confusion about why this method is bad stems from the fact that many people don't understand how Enum types have been implemented in .NET. So let's start by reviewing .NET's Enum implementation.
Consider the following enum:

enum Animal
{
cat = 0,
dog,
bird,
cow,
pig,
}

...we've set cat to be equivalent to integer value 0, so dog will be 1, bird is 2, etc. .NET allows us to specify enum values explicitly (as has been done with cat), or have them be implied (as has been done with all of the other values in the enum). This is sort of nice; we can now write code that looks like:

Animal hopefullyIAmACat = (Animal)0;
Console.WriteLine(hopefullyIAmACat);

This prints:

cat

Super. We've successfully taken an integer value and converted it to our enum type.

Now for something completely different:

Animal notSoSureAboutThisAnimal = (Animal)(-1);
Animal orThisOne = (Animal)5;
Animal definitelyNotSureAboutThisAnimal = (Animal)0.0M;
Animal ohNoes = (Animal)1.1;

Console.WriteLine(notSoSureAboutThisAnimal);
Console.WriteLine(orThisOne);
Console.WriteLine(definitelyNotSureAboutThisAnimal);
Console.WriteLine(ohNoes);

To the surprise of many, this not only compiles, but runs exception free:

-1
5
cat
dog

You might be thinking this is a bug, but this is by design. To get at why such a design decision was made, we need to fully understand C#'s enum implementation. From the C# language specification:
11.1.9 Enumeration types
An enumeration type is a distinct type with named constants. Every enumeration type has an underlying type, which shall be byte, sbyte, short, ushort, int, uint, long or ulong. Enumeration types are defined through enumeration declarations (§21.1). The direct base type of every enumeration type is the class System.Enum. The direct base class of System.Enum is System.ValueType.


There are also specific sections which explain implicit and explicit conversions with respect to enumerations:
13.1.3 Implicit enumeration conversions
An implicit enumeration conversion permits the decimal-integer-literal 0 to be converted to any enum-type.

13.2.2 Explicit enumeration conversions
The explicit enumeration conversions are:
•From sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, or decimal to any enum-type.
•From any enum-type to sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, or decimal.
•From any enum-type to any other enum-type.
An explicit enumeration conversion between two types is processed by treating any participating enum-type as the underlying type of that enum-type, and then performing an implicit or explicit numeric conversion between the resulting types. [Example: Given an enum-type E with and underlying type of int, a conversion from E to byte is processed as an explicit numeric conversion (§13.2.1) from int to byte, and a
conversion from byte to E is processed as an implicit numeric conversion (§13.1.2) from byte to int.]


By definition, enums can be explicitly cast to and from almost all of .NET's fundamental value types, so assigning my Animal enum to be "0.0M" invokes a cast from Decimal to int. The decimal gets hacked off, resulting in a cat.

This malleability brings up a couple of huge questions. Brad Abrams brings up this point:
It’s a know issue that adding values to enums is bad (from a breaking change perspective), WHEN someone is exhaustively switching over that enum.

Case in point: assume someone is iterating over my Animal enum, and writes the following code:

switch (ohNoes)
{
case Animal.pig:
break;
case Animal.dog:
break;
case Animal.cow:
break;
case Animal.cat:
break;
default:
// This one MUST be a bird!
break;
}

...of course, it won't be a bird when someone changes the enum to include an elephant entry; suddenly default maps to two values. Also, more importantly, in the above code default is a catch all! Everything that doesn't fall into the range--for example, negative one--is going to hit the default switch.

This brings us back around to Enum.IsDefined, which returns true of the supplied value is defined by the Enum. Writing some code like so is very tempting:
       if (!Enum.IsDefined (typeof(Animal), 5)
throw new InvalidEnumArgumentException();

...but this is, again, fraught with peril. Our Animal type is defined at runtime. What if someone later adds elephant to our enum? The code following this check still needs to be capable of dealing with elephant or any future types that may be defined in the enum.

Furthermore, Enum.IsDefined is pricy. And by pricy, I mean all sorts of reflection and boxing and junk under the covers. I found this call being used in SharpMap, and removing it resulted in some very respectable performance gains in a tight loop used to parse some binary data:
I tested your (suggestion to remove Enum.IsDefined) with the method below. First with the polygons countries.shp that is in the DemoWebSite App_Data. This was 16% faster. Then with points in cities.shp. This was 85% faster, nice.

Pretty clear win: the project looses some potential versioning dilemmas, and gains some sizable performance in one of the most heavily used routines.

Moral of the story: enums are not objects that handle versioning well. I personally believe they should only be used where the enumeration is clear and not likely to be expounded upon in the future. Using Enum.IsDefined should be avoided.