Thursday, November 29, 2007

How not to benchmark code.

The other day, I was searching for C# benchmarks. I came across this blog post. The "ultimate java verses C# benchmark" claims that:

The C# results? Well they're just too embarrassing to post. Suffice to say, it's several orders of magnitude slower! Furthermore, the memory usage exploded to who knows where! I can't believe that people may be thinking of deploying mission critical applications on this virtual machine!

Just like Cameron's benchmark, everything is self contained, you can cut and paste it, compile it and run it yourself. Better yet, send it to your resident C# expert, have him tweak it as much as he can. Bring a digital camera, capturing the anguish in his face when he realizes the truth, would be priceless.


...unfortunately my digital camera's batteries are hosed, so there will be nobody to capture the smile on my face as I tear this "benchmark" apart. The program put forth by the author "matches a DNA sequence against a million DNA patterns expressed as regular expressions." Both Java and C# variants of the program use java.util.regex.* and System.Text.RegularExpressions, respectively. Based on the authors's implementation, he concludes that "Java is at least 7,733x faster than C#."

Now I'm a pretty reasonable person. Seriously. I know that the .NET regex implementation has been shown to be slower than other implementations, but seven thousand times slower? Something is seriously wrong with the test.

The problem lies in how the C# benchmark was implemented:
Regex regexpr = new Regex(matchthis, RegexOptions.Compiled);

...what is this RegexOptions.Compiled flag? A blog post from the BCL team at Microsoft sheds some light:
In [the case of RegexOptions.Compiled], we first do the work to parse into opcodes. Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance. There are even more costs for compilation that should mentioned, however. Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that's not memory that you'll ever get back. In addition. in v1.0 and v1.1, we couldn't ever free the IL we generated, meaning you leaked memory by using this mode. We've fixed that problem in Whidbey. But the bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly.

Furthermore, the MSDN page on RegexOptions.Compiled has some very interesting commentary:
If a Regex object is constructed with the RegexOptions.Compiled option, it compiles the regular expression to explicit MSIL code instead of high-level regular expression internal instructions. This allows the.NET Framework's just-in-time (JIT) compiler to convert the expression to native machine code for higher performance.

However, generated MSIL cannot be unloaded. The only way to unload code is to unload an entire application domain (that is, to unload all of your application's code.). Effectively, once a regular expression is compiled with the RegexOptions.Compiled option, the .NET Framework never releases the resources used by the compiled expression, even if the Regex object itself is released to garbage collection.

You must be careful to limit the number of different regular expressions you compile with the RegexOptions.Compiled option to avoid consuming too many resources.
If an application must use a large or unbounded number of regular expressions, each expression should be interpreted, not compiled. However, if a small number of regular expressions are used repeatedly, they should be compiled with RegexOptions.Compiled for higher performance. An alternative is to use precompiled regular expressions. You can compile all of your expressions into a reusable DLL. This avoids the need to compile at runtime while still benefiting from the speed of compiled regular expressions.

What we can take away from this is:
  • RegexOptions.Compiled is intended for a small number of regular expressions which are used repeatedly, and in that instance, results in higher performance.
  • If an application must use a large or unbounded number of regular expressions, RegexOptions.Compiled should not be used.
What is our "benchmark" doing? It is creating a million Regex objects using RegexOptions.Compiled! If we correct this error, what is our performance like? I remeasured using the following C# program:
using System.Text.RegularExpressions;
using System.Text;
using System;
using System.Diagnostics;

namespace TestRegex
{

public class RegexBench2
{
private static String _doc = "CGAATCTAAAAATAGATTCGGACGTGATGTAGTCGTACAAATGAAAAAGTAAGCC";
private static int ITERATIONS = 1000000;

public static void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();

int length = 1;

for (int i = ITERATIONS; i <= ITERATIONS * 2; i++)
{
length = (int)(Math.Log((double)i) / Math.Log(4));
String matchthis = generateWord(i, length + 1);
Regex regexpr = new Regex(matchthis);

if (regexpr.IsMatch(_doc))
{
Console.WriteLine("found {0} at {1} it took {2} miliseconds", matchthis, i, sw.ElapsedMilliseconds);
}
}

Console.WriteLine(".NET regex took {0} miliseconds", sw.ElapsedMilliseconds);

}

public static String generateWord(int value, int length)
{
StringBuilder buf = new StringBuilder();
int current = value;
for (int i = 0; i < length; i++)
{
buf.Append(convert(current % 4));
current = current / 4;
}

return buf.ToString();
}

private static String convert(int value)
{
switch (value)
{
case 0: return "A";case 1: return "G";case 2: return "T";case 3: return "C";default: return "0";
}
}
}
}


My performance for the C# program was substantially better (as in, it actually completes) than the previous incarnation:



For the java program, I compiled using Eclipse 3.3.1.1 and the previous code the author used. The results were:
found AAAGTAAGCC at 1000000 it took 0 miliseconds
found AAATGAAAAAG at 1048960 it took 172 miliseconds
found GAAAAAGTAAG at 1085441 it took 265 miliseconds
found TCTAAAAATAG at 1179694 it took 547 miliseconds
found ACGTGATGTAG at 1204636 it took 625 miliseconds
found AATAGATTCGG at 1548576 it took 1578 miliseconds
found TCGTACAAATG at 1576094 it took 1656 miliseconds
found CGGACGTGATG at 1599255 it took 1734 miliseconds
found ATTCGGACGTG at 1689064 it took 1984 miliseconds
found AGATTCGGACG at 1859204 it took 2453 miliseconds
found TGATGTAGTCG at 1984902 it took 2797 miliseconds
found AAATAGATTCG at 2000000 it took 2843 miliseconds
Java regex took 2843 miliseconds

The java program is now roughly twice the speed of the C# program, which seems far more reasonable. The discrepancy resulted from misuse of the API; not from the C# regular expression implementation being seven thousand times slower than the Java implementation. The worst possible way to benchmark two language implementations is to incorrectly use one, and then compare it to a correct usage of the second. Obviously, the results are going to be skewed towards the language being used correctly.

The author's confusion probably stems over how the Java regex implementation uses a .Compile() method, but one has to wonder how he could seriously think the results were valid. The honest response to this would have been "Gee, 7,000x slower? Something must be wrong in my C# program!". The dishonest response is: "C# is a dog! 7000x slower? Java forever!"

On a more fundamental level, I completely disagree with the overall tone of the article. As the Computer Language Benchmarks Game points out,
Benchmarking programming languages?

How can we benchmark a programming language?
We can't - we benchmark programming language implementations.

How can we benchmark language implementations?
We can't - we measure particular programs.

You cannot benchmark a language, nor can you benchmark a language implementation. There is only performance for a specific program. So it's entirely possible that this one example posted isn't necessarily a strong indicator of overall performance. It's also possible that the results will change as both the Java and .NET implementations change over time.