Tuesday, September 29, 2009

Programming Fail: Directory.GetFiles()

I went to demo some shiny new code for a friend, and we both had a laugh when my program pretty much puked all over itself.

Admittedly I had not run this code on this particular machine before, and it was running Vista and .NET 3.5 (neither of which I'd tested against) But upon finding the bug, I find it difficult to understand what sort of design decision would involve such an arbitrary behavior.

Directory.GetFiles() seems like a pretty straight-forward function. You hand it a directory to search, and a search pattern (e.g. "*.exe"), and it returns a list of all the files in that directory which match the search. OK, I can handle that. Or so I thought.

The fine print of the documentation contains the gotcha:

The following list shows the behavior of different lengths for the searchPattern parameter:

* "*.abc" returns files having an extension of .abc, .abcd, .abcde, .abcdef, and so on.
* "*.abcd" returns only files having an extension of .abcd.
* "*.abcde" returns only files having an extension of .abcde.
* "*.abcdef" returns only files having an extension of .abcdef.


Ummm. OK. So, MSFT, basically your search pattern violates decades of common convention with regards to regular expressions, and the developer gets to figure this out when the app bombs? OK, sure. Brilliant. Ship it, yo.

Let's say you're looking for TIFF files, which can have either "tif" or "tiff" as the file extension. What happens is: calling GetFiles() with both of those extensions will result in the "tiff" files being added twice. Bzzt, fail MSFT, fail.

So, the fix is: either go through the aggregate list and remove duplicates, or remove the "*.tiff" search pattern (by the way, I got my supported extensions through the image handling classes in .NET). But it'd be a lot easier to simply have this method behave in a manner that is normal and predictable and sane: if I ask for "*.tif," I only want "*.tif". But I guess that's asking for too much.

Monday, September 21, 2009

More fun with AM_MEDIA_TYPE

While messing around with CMediaType (a wrapper class for AM_MEDIA_TYPE), I came across an inconsistency/bug. If you execute the following code:

 
pmt->cbFormat = sizeof(WAVEFORMATEX);
WAVEFORMATEX *wfex = (WAVEFORMATEX*)pmt->AllocFormatBuffer(sizeof(WAVEFORMATEX));


...you will find that wfex (which is a pointer to pbFormat) is NULL. My first thought was "out of memory?!" and the next thought was that simply wasn't possible. This call was about 30k lines of code deep in my application, so clearly memory allocation would have nailed me long before this point in time. So I poked around in mtype.cpp, which is the file that implements CMediaSource, and found the bug:



// allocate length bytes for the format and return a read/write pointer
// If we cannot allocate the new block of memory we return NULL leaving
// the original block of memory untouched (as does ReallocFormatBuffer)
BYTE*
CMediaType::AllocFormatBuffer(ULONG length)
{
ASSERT(length);

// do the types have the same buffer size

if (cbFormat == length) {
return pbFormat;
}

// allocate the new format buffer

BYTE *pNewFormat = (PBYTE)CoTaskMemAlloc(length);
if (pNewFormat == NULL) {
if (length <= cbFormat) return pbFormat; //reuse the old block anyway.
return NULL;
}

// delete the old format

if (cbFormat != 0) {
ASSERT(pbFormat);
CoTaskMemFree((PVOID)pbFormat);
}

cbFormat = length;
pbFormat = pNewFormat;
return pbFormat;
}


It becomes pretty clear what the issue is: if the requested length is the same as cbFormat, then it simply returns pbFormat, but since pbFormat was never allocated, it simply returns 0x00000000. Bzzt.

I guess the obvious moral of this story is: when using wrapper classes, it's probably not a good idea to manually set the underlying structure parameters. But still, this is a clear bug: this method will CoTaskMemFree pbFormat if it already exists. So this class should probably be like so:


if (pbFormat != NULL && cbFormat == length) {
return pbFormat;
}


If someone is allocating the same amount of memory, and pbFormat isn't null, it's acceptable to return pbFormat.

Again, I realize it doesn't make sense to set cbFormat if AllocFormatBuffer is going to do it for you, but that argument assumes one is aware that AllocFormatBuffer is going to do that for you. Nowhere in the documentation does it mention that I am forbidden from interacting with the underlying structure, so the only way you figure this out is by kicking yourself in the teeth, which I'd rather not do because it hurts.

Lastly, the same issue is present if the allocation itself fails. So a better implementation of the method is:


// allocate length bytes for the format and return a read/write pointer
// If we cannot allocate the new block of memory we return NULL leaving
// the original block of memory untouched (as does ReallocFormatBuffer)
BYTE*
CMediaType::AllocFormatBuffer(ULONG length)
{
ASSERT(length);

// do the types have the same buffer size, _and_
// do we have a valid pbFormat pointer???
if (pbFormat != NULL && cbFormat == length) {
return pbFormat;
}

// allocate the new format buffer
BYTE *pNewFormat = (PBYTE)CoTaskMemAlloc(length);
if (pNewFormat == NULL) {
// If the current pbFormat works, reuse it. Otherwise, fail.
return (pbFormat != NULL && length <= cbFormat) ?
pbFormat : NULL;
}

// delete the old format
if (pbFormat != NULL && cbFormat != 0) {
CoTaskMemFree((PVOID)pbFormat);
}

cbFormat = length;
pbFormat = pNewFormat;
return pbFormat;
}


Slightly better, but I still hate AM_MEDIA_TYPE.