Tuesday, September 18, 2007

Pseudo Language


I'm currently converting a substantial application to be Internationalized and Localized. This basically means we're completely abstracting the language from the application, so we can easily have French, German, Italian, and other languages displayed in our application.

In addition to internationalization, we also do localization, which means the way we format dates, times, currency and other "things" are specific to a given locale. For example, there are French speaking people in both Canada and France, but the way they represent numbers, dates, times, and currency can drastically vary, so it's important to have formatting specific to both locales.

To help us with this, we have a bunch of cool, nifty tools. One of the tools we have is some code that generates a pseudo language. The pseudo language is basically a fake language that reads like English, but looks obviously different. The pseudo strings end up looking sort of like the original strings, but with heavily modified characters, and additional padding on the beginning and end of the strings. For example, the string:

The quick brown fox jumps over the lazy dog.

...becomes:

[!!! Ťħĕ qǔĭčĸ þřŏŵʼn ƒŏχ ĵǔmpš ŏvĕř ťħĕ ľäžЎ đŏğ. !!!]

The idea behind the additional length (which, by the way, is variable depending upon the length of the original string) is to ensure there's enough room on the forms in case the translation ends up being substantially longer than the original. The idea behind the modified language is to have some obvious way of determining that our application isn't loading English strings (which, for reasons that shall remain unexplained, can be sort of a pain in the butt on .NET). It's also nice because it gives a visible indicator of places that aren't being translated.

One issue we quickly found, however, was that our pseudo translator was rather stupid. The following string, which contains .NET String.Format arguments:

This is a string with String.Format parameters: {0:D}

...would end up looking something like:

[!!! Ťħĭš ĭš ä šťřĭʼnğ ŵĭťħ Šťřĭʼnğ.Fŏřmäť päřämĕťĕřš: {0:Đ} !!!]

The issue with this, of course, is that {0:Đ} doesn't mean anything to String.Format--{0:D} means something to String.Format. So, we had to modify the code to be aware of {} blocks and not attempt to translate the contents. In theory, this sounded straight forward, but it ended up being a bit involved. One thing is that "{{" and "}}" is how String.Format escapes the "{" and "}" characters, so a perfectly valid String.Format statement is:

This {{{0}}} is a {{ totally valid {{ String.Format statement.

...or maybe:

This is yet another perfectly valid String.Format argument: {{{0:D}{{Hello World!}}

...in the first, String.Format produces "This {0} is a { totally valid { String.Format statement." (the "0" can end up being any number of things, but the point is it's going to be wrapped in {} in the output) The second ends up looking like "This is yet another perfectly valid String.Format argument:{0:D{Hello World!}" So for inner {}, String.Format interprets those as places for format arguments. Any outside {{ or }} get interpreted as braces in the resulting string.

The solution I used kept track of the previous character. If I encountered {{ or }}, I reset the previous character. Any single instance of { encountered resulted in no translation until a } was encountered.

Wednesday, September 12, 2007

Leaking Memory with Marshal.AllocHGlobal

The other day I was going through the source code in our application and tidying up various internationalization tasks, and I came across the following code:

IntPtr BinaryValue = Marshal.AllocHGlobal(8 * Marshal.SizeOf(new byte()));

I quickly noticed that this call didn't have the requisite Marshal.FreeHGlobal call, which basically meant it would leak 8 bytes every time this code was executed. Going through the application, I then found ~15 other places, each of which would leak anywhere from a few bytes up to a couple hundred.

Leaking memory in .NET? Say it isn't so! According to the documentation, the Marshal class:
Provides a collection of methods for allocating unmanaged memory, copying unmanaged memory blocks, and converting managed to unmanaged types, as well as other miscellaneous methods used when interacting with unmanaged code.

So one important thing to keep in mind with the Marshal class is that it was specifically crafted to allow .NET to interact with unmanaged memory, among other things. This means particular care must be exercised, because many of the methods have requisite cleanup steps, or deal directly with memory. Admittedly this can be difficult to cope with, especially after programming in a language where memory management is typically handled entirely under the covers.

Anyways, after finding and establishing that the above code was leaking memory, the question quickly moved on to: what is the best way to make sure memory is always released? Anybody who matriculated to the .NET world from C++ is acutely aware of the perils of non-deterministic finalization, so wrapping it in a class and relying on a destructor is out of the question. One quick/dirty solution a co-worker of mine proposed was a try/catch/finally block:

IntPtr BinaryValue = IntPtr.Zero;
try
{
BinaryValue= Marshal.AllocHGlobal(8 * Marshal.SizeOf(new byte()));
}
catch(Exception)
{
}
finally
{
if (BinaryValue != IntPtr.Zero)
{
Marshal.FreeHGlobal(BinaryValue);
}
}

...the benefit of this is, the finally block will execute regardless of how this code exits. Also, the catch block can be omitted and the finally block will still execute, so this code is also nice for places where it'd be better for the caller to handle the exception.

Another option is the using statement (not to be confused with the using directive). The using statement is actually one way to make .NET behave in a more deterministic manner; once the using block goes out of scope, objects declared in the using block will be zapped from memory. The only requirement is that the object declared in the using statement has to implement IDisposable, so this means we have to wrap Marshal.AllocHGlobal in a wrapper class. I wrote a quick and dirty wrapper class:

class UnmanagedMemory : IDisposable
{
private IntPtr _ptrToUnmanagedMemory = IntPtr.Zero;

public UnmanagedMemory(int amountToAllocate )
{
_ptrToUnmanagedMemory = Marshal.AllocHGlobal(amountToAllocate);
}

public IntPtr PtrToUnmanagedMemory
{
get { return _ptrToUnmanagedMemory; }
}

public void Dispose()
{
if (_ptrToUnmanagedMemory != IntPtr.Zero)
{
Marshal.FreeHGlobal(_ptrToUnmanagedMemory);
_ptrToUnmanagedMemory = IntPtr.Zero;
}
}
}

...pretty straight forward. Using this class then looks like:

using (UnmanagedMemory um = new UnmanagedMemory(8 * Marshal.SizeOf(new byte())))
{
// Do something here with um.PtrToUnmanagedMemory
}

...if you set a break point right after the using block exits, and one in the Dispose method of UnmanagedMemory, you can see the "deterministic" finalization in action. This method is a little more involved, but probably preferable; the UnmanagedMemory class could be expounded upon to provide any number of useful features, and it's cleaner and more object oriented than the try/catch/finally block.

(worth noting is that the using statement is essentially a try/catch/finally block, under the covers, but it is still fundamentally different to wrap unmanaged memory in a class and interface with it like that. I believe this method is preferable)

In our application, we ended up using the try/catch/finally block, mostly because of circumstances inside the company (release coming up, can't do anything too wild) and because I needed tight control over the exception flow so I didn't have to fundamentally alter any of our error handling code. Eventually, though, I'll head over to using something like UnmanagedMemory.