Thursday, November 29, 2007

How not to benchmark code.

The other day, I was searching for C# benchmarks. I came across this blog post. The "ultimate java verses C# benchmark" claims that:

The C# results? Well they're just too embarrassing to post. Suffice to say, it's several orders of magnitude slower! Furthermore, the memory usage exploded to who knows where! I can't believe that people may be thinking of deploying mission critical applications on this virtual machine!

Just like Cameron's benchmark, everything is self contained, you can cut and paste it, compile it and run it yourself. Better yet, send it to your resident C# expert, have him tweak it as much as he can. Bring a digital camera, capturing the anguish in his face when he realizes the truth, would be priceless.

...unfortunately my digital camera's batteries are hosed, so there will be nobody to capture the smile on my face as I tear this "benchmark" apart. The program put forth by the author "matches a DNA sequence against a million DNA patterns expressed as regular expressions." Both Java and C# variants of the program use java.util.regex.* and System.Text.RegularExpressions, respectively. Based on the authors's implementation, he concludes that "Java is at least 7,733x faster than C#."

Now I'm a pretty reasonable person. Seriously. I know that the .NET regex implementation has been shown to be slower than other implementations, but seven thousand times slower? Something is seriously wrong with the test.

The problem lies in how the C# benchmark was implemented:
Regex regexpr = new Regex(matchthis, RegexOptions.Compiled);

...what is this RegexOptions.Compiled flag? A blog post from the BCL team at Microsoft sheds some light:
In [the case of RegexOptions.Compiled], we first do the work to parse into opcodes. Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance. There are even more costs for compilation that should mentioned, however. Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that's not memory that you'll ever get back. In addition. in v1.0 and v1.1, we couldn't ever free the IL we generated, meaning you leaked memory by using this mode. We've fixed that problem in Whidbey. But the bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly.

Furthermore, the MSDN page on RegexOptions.Compiled has some very interesting commentary:
If a Regex object is constructed with the RegexOptions.Compiled option, it compiles the regular expression to explicit MSIL code instead of high-level regular expression internal instructions. This allows the.NET Framework's just-in-time (JIT) compiler to convert the expression to native machine code for higher performance.

However, generated MSIL cannot be unloaded. The only way to unload code is to unload an entire application domain (that is, to unload all of your application's code.). Effectively, once a regular expression is compiled with the RegexOptions.Compiled option, the .NET Framework never releases the resources used by the compiled expression, even if the Regex object itself is released to garbage collection.

You must be careful to limit the number of different regular expressions you compile with the RegexOptions.Compiled option to avoid consuming too many resources.
If an application must use a large or unbounded number of regular expressions, each expression should be interpreted, not compiled. However, if a small number of regular expressions are used repeatedly, they should be compiled with RegexOptions.Compiled for higher performance. An alternative is to use precompiled regular expressions. You can compile all of your expressions into a reusable DLL. This avoids the need to compile at runtime while still benefiting from the speed of compiled regular expressions.

What we can take away from this is:
  • RegexOptions.Compiled is intended for a small number of regular expressions which are used repeatedly, and in that instance, results in higher performance.
  • If an application must use a large or unbounded number of regular expressions, RegexOptions.Compiled should not be used.
What is our "benchmark" doing? It is creating a million Regex objects using RegexOptions.Compiled! If we correct this error, what is our performance like? I remeasured using the following C# program:
using System.Text.RegularExpressions;
using System.Text;
using System;
using System.Diagnostics;

namespace TestRegex

public class RegexBench2
private static int ITERATIONS = 1000000;

public static void Main()
Stopwatch sw = new Stopwatch();

int length = 1;

for (int i = ITERATIONS; i <= ITERATIONS * 2; i++)
length = (int)(Math.Log((double)i) / Math.Log(4));
String matchthis = generateWord(i, length + 1);
Regex regexpr = new Regex(matchthis);

if (regexpr.IsMatch(_doc))
Console.WriteLine("found {0} at {1} it took {2} miliseconds", matchthis, i, sw.ElapsedMilliseconds);

Console.WriteLine(".NET regex took {0} miliseconds", sw.ElapsedMilliseconds);


public static String generateWord(int value, int length)
StringBuilder buf = new StringBuilder();
int current = value;
for (int i = 0; i < length; i++)
buf.Append(convert(current % 4));
current = current / 4;

return buf.ToString();

private static String convert(int value)
switch (value)
case 0: return "A";case 1: return "G";case 2: return "T";case 3: return "C";default: return "0";

My performance for the C# program was substantially better (as in, it actually completes) than the previous incarnation:

For the java program, I compiled using Eclipse and the previous code the author used. The results were:
found AAAGTAAGCC at 1000000 it took 0 miliseconds
found AAATGAAAAAG at 1048960 it took 172 miliseconds
found GAAAAAGTAAG at 1085441 it took 265 miliseconds
found TCTAAAAATAG at 1179694 it took 547 miliseconds
found ACGTGATGTAG at 1204636 it took 625 miliseconds
found AATAGATTCGG at 1548576 it took 1578 miliseconds
found TCGTACAAATG at 1576094 it took 1656 miliseconds
found CGGACGTGATG at 1599255 it took 1734 miliseconds
found ATTCGGACGTG at 1689064 it took 1984 miliseconds
found AGATTCGGACG at 1859204 it took 2453 miliseconds
found TGATGTAGTCG at 1984902 it took 2797 miliseconds
found AAATAGATTCG at 2000000 it took 2843 miliseconds
Java regex took 2843 miliseconds

The java program is now roughly twice the speed of the C# program, which seems far more reasonable. The discrepancy resulted from misuse of the API; not from the C# regular expression implementation being seven thousand times slower than the Java implementation. The worst possible way to benchmark two language implementations is to incorrectly use one, and then compare it to a correct usage of the second. Obviously, the results are going to be skewed towards the language being used correctly.

The author's confusion probably stems over how the Java regex implementation uses a .Compile() method, but one has to wonder how he could seriously think the results were valid. The honest response to this would have been "Gee, 7,000x slower? Something must be wrong in my C# program!". The dishonest response is: "C# is a dog! 7000x slower? Java forever!"

On a more fundamental level, I completely disagree with the overall tone of the article. As the Computer Language Benchmarks Game points out,
Benchmarking programming languages?

How can we benchmark a programming language?
We can't - we benchmark programming language implementations.

How can we benchmark language implementations?
We can't - we measure particular programs.

You cannot benchmark a language, nor can you benchmark a language implementation. There is only performance for a specific program. So it's entirely possible that this one example posted isn't necessarily a strong indicator of overall performance. It's also possible that the results will change as both the Java and .NET implementations change over time.

Friday, October 26, 2007

Hardware Assisted Brute Force Attacks

Jeff Atwood recently blogged about a hardware assisted brute force password cracking product developed by Elcomsoft.

Using Elcomsoft's figures from a pdf they released, Atwood concludes that we can attempt this many passwords a second:

Dual Core CPU 10,000,000
GPU 200,000,000

...and using Elcomsoft's relatively weak 8-character alpha-based password (a-A, z-Z), there are this many passwords:

52^8 = 53,459,728,531,456

Some more basic math gives us:

53,459,728,531,456 / 10,000,000 pps / 60 / 60 / 24 = 61.9 days
53,459,728,531,456 / 200,000,000 pps / 60 / 60 / 24 = 3.1 days

To his credit, Atwood correctly assaults Elcomsoft's most questionable assumptions: allowing people only 8 characters, and using an extremely limited character set. Adding additional characters, it becomes clear that the exponent plays a rather sizable role. Here's the amount of time for twelve characters:

52^12 / 200,000,000 pps / 60 / 60 / 24 = 22,620,197 days

We can all agree that ~22 million days (about ~61,000 years) is an awful long time. But this is where Atwood makes an error:

For those of you keeping score at home, with a 12 character password this hardware assisted brute-force attack would take 61,973 years. Even if we increased the brute force attack rate by a factor of a thousand, it would still take 62 years.

The error here is pretty basic: over the next 62 years, what sort of new computing techniques will we develop? How much faster will GPUs run five years from now? Ten years from now? What did computing resemble 62 years ago?

I'll give you a hint:

This is ENIAC. It was the "first large-scale, electronic, digital computer capable of being reprogrammed to solve a full range of computing problems." In one second, it could perform:
  • 5,000 add or subtract operations
  • 357 multiply operations with two ten digit operands
  • 35 division or square root operations
...all of this, at a svelte 30 tons, while consuming 150 kW. This is state of the art in computing, roughly ~61 years ago.

Considering the operations being done to hash a password are substantially more complicated than simple arithmetic, how many passwords could ENIAC hash in a second? To be honest, I think it's more likely that this unit would be expressed in number of seconds per password rather than the number of passwords per second, but I'll be the first to admit that I'm not familiar enough with the NTLM hashing algorithm to say with certainty.

For the sake of argument (and not having to discuss a computing architecture that dates me by a good 35 years) let's assume that ENIAC can do one hash a second. This means our modern CPU is ten million times faster than ENIAC, or somewhere between 2^23 and 2^24 times the speed of ENIAC. The GPU is 2^27 to 2^28 times the speed of ENIAC. These are both commodity products, and ENIAC was state of the art, so a more fair comparison would be with something like Blue Gene, but I digress.

Using the above assumptions, this means that the number of hashes we can do should double every 2.25 years. This also means that it'd take 22.5 years for our current computing power to increase 2^10 (or 1024) times. Let's reread Jeff's statement:

Even if we increased the brute force attack rate by a factor of a thousand, it would still take 62 years. this means that 22.5 years from now, a computer would still need another 62 years to crack the 12-character password, for a grand total of 84.5 years. But, if we increase our computing power by a million (2^20, or 45 years from now--it's no big deal, we just push the date out), it will only take .o62 years (roughly 22 days), or 45.062 years from today. If we increase our computing power by 2^27, it ends up being roughly 4.25 hours to crack a single 12 character password.

Obviously all of this is total speculation, with wild figures and estimates on my behalf that are obviously open to interpretation, huge questions about where computing will be 40 years from now, etc. But the point I'm trying to make is: it's a fallacy to assume today's computing power is what you'll have access to in a year. It's a gigantic fallacy to assume today's computing power is what you'll have access to in 20 years (what were you doing in 1987?). Discussing what we'll have at our disposal in 62 years is mind-boggling.

When we're computing the amount of time it takes to do something, it makes no sense to hold computing power constant, especially when we start talking about any amount of time in excess of a year. If we think it's going to take ten years to perform some computation, one viable option is to wait five years (at which point in time your computing power has quadrupled), kick off that batch job, and take a 2.5 year coffee break. It still took you 7.5 years, but that was 2.5 years saved.

Sunday, October 07, 2007

The Sorry State of Online Translation Tools

As of late, I've been working on a set of automated tools that query free online translation services to perform translations for some of our target languages. For example, all of our string data is in en-US, so we might request a fr translation from babelfish.

Yes, if you were paying attention, I am doing the unspeakable: I've developed an application to automate translation using AltaVista's translation services. Is it theft or abuse of their free services? Yeah, probably--but seriously: cry me a river. (If you're reading this, BabelFish, take note: your programmers are stupid. Yes, I said it. Stupid. You have the potential to offer a for-pay service, but your API is too deficient for it to be of any real use without jumping through ridiculous hoops.)

Anyways, this task has yielded some extremely interesting observations. AltaVista (BabelFish) doesn't expose any web services for doing translation, although I believe they may have in the past. So, what we were left with was good old HTTP requests. We submit a request through their web form, and parse the translated resource out of the response. Needless to say, this is problematic. We've glued together some extremely flexible, object-oriented C# classes to manage this.

The page for translation looks something like:

...okay, great. So it managed to do this basic translation, pretty easily. But maybe I don't want "fox" translated. AltaVista instructs users to wrap "fox" in "x" characters, which results in something like:

...seems obvious enough. But wait--notice that the no-translate specifiers (x) are left in the output. I have to manually remove the x's from the output. Slop, baby, slop. How about this:

Call me crazy, but I don't remember "l'examplesx de xextra" being in the French lexicon. And something tells me they aren't going to recall it either. And when it comes to language, the French may be the last people in the world you want to piss off. But I digess.

So one thing is pretty obvious: the characters that BabelFish has chosen to designate sections of text as "do not translate" are insufficient; this should be totally obvious to even the most stupid programmer, since attempting to parse out x's from English text is going to be problematic, given that character is used (somewhat infrequently, I must admit, but why use a letter in the language itself?). And why not go with the (more) obvious (but still incredibly stupid) choice of "z", which is encountered half as often as "x" in everyday language? Perhaps their justification was that there aren't words in the English language that start and end with x, but there are phrases like "xextra examplesx", which seems to make BabelFish's parser throw a temper tantrum.

This character selection snafu probably also explains why they're left in the output; good luck removing something as unspecific as "x" from the output. Because of this, our application has a post-processing step where we search for our wrapped phrases, and then replace them with the originals before the x's were inserted.

All of this begs the question: why not use something that is obviously not an element of the English language, or any other linguistic language, for that matter? Personally, I would propose something like:

<NoTranslate lexcat="">section to not translate</NoTranslate>

...which would have made the aforementioned sentence:

I need <NoTranslate>extra examples</NoTranslate> of this leaping fox.

...or maybe:

I need extra examples of this leaping <NoTranslate lexcat=noun>fox</NoTranslate>.

..."lexcat" is an idea I had (no idea how feasible or practical it is) to give the translator some idea of what is contained in the parenthesis to perform a more concise translation. The context would be for the source language, which would (hopefully) aid in translation to the target language. I suppose other attributes could be added based on the source language (maybe gender would be useful for French, for example). Anyways, with this way of marking text, it is clearly obvious what part you don't want translated. The odds of someone encountering <NoTranslate></NoTranslate> in everyday parlance are pretty damn low.

Unfortunately, this is hardly the beginning of the crap translation service offered. Consider multi-line translations, such as:

Here's the first line to be translated.

This is a leaping fox.

...translates to:

Voici la première ligne à traduire. C'est un renard de saut.

...the problem here is pretty obvious: the translator didn't preserve line breaks. The output should have had a break between the two sentences. This is almost an even bigger issue for automating translation--one option we've kicked around is inserting some xLINEBREAKx word where there's a \r\n; without modifying the input, the location of line breaks is lost in the output. Yet another pre/post processing step to add.

The really sad thing is we've looked for better translation services, and this is actually one of the better ones. Google's service is (surprisingly) even more deficient; rather than using x's as no mark, you use a single "." at the front of a word or phrase you don't want translated. Gee, who uses periods these days? I think I'll go and remove all of the periods from my writing from now on.

Tuesday, September 18, 2007

Pseudo Language

I'm currently converting a substantial application to be Internationalized and Localized. This basically means we're completely abstracting the language from the application, so we can easily have French, German, Italian, and other languages displayed in our application.

In addition to internationalization, we also do localization, which means the way we format dates, times, currency and other "things" are specific to a given locale. For example, there are French speaking people in both Canada and France, but the way they represent numbers, dates, times, and currency can drastically vary, so it's important to have formatting specific to both locales.

To help us with this, we have a bunch of cool, nifty tools. One of the tools we have is some code that generates a pseudo language. The pseudo language is basically a fake language that reads like English, but looks obviously different. The pseudo strings end up looking sort of like the original strings, but with heavily modified characters, and additional padding on the beginning and end of the strings. For example, the string:

The quick brown fox jumps over the lazy dog.


[!!! Ťħĕ qǔĭčĸ þřŏŵʼn ƒŏχ ĵǔmpš ŏvĕř ťħĕ ľäžЎ đŏğ. !!!]

The idea behind the additional length (which, by the way, is variable depending upon the length of the original string) is to ensure there's enough room on the forms in case the translation ends up being substantially longer than the original. The idea behind the modified language is to have some obvious way of determining that our application isn't loading English strings (which, for reasons that shall remain unexplained, can be sort of a pain in the butt on .NET). It's also nice because it gives a visible indicator of places that aren't being translated.

One issue we quickly found, however, was that our pseudo translator was rather stupid. The following string, which contains .NET String.Format arguments:

This is a string with String.Format parameters: {0:D}

...would end up looking something like:

[!!! Ťħĭš ĭš ä šťřĭʼnğ ŵĭťħ Šťřĭʼnğ.Fŏřmäť päřämĕťĕřš: {0:Đ} !!!]

The issue with this, of course, is that {0:Đ} doesn't mean anything to String.Format--{0:D} means something to String.Format. So, we had to modify the code to be aware of {} blocks and not attempt to translate the contents. In theory, this sounded straight forward, but it ended up being a bit involved. One thing is that "{{" and "}}" is how String.Format escapes the "{" and "}" characters, so a perfectly valid String.Format statement is:

This {{{0}}} is a {{ totally valid {{ String.Format statement.

...or maybe:

This is yet another perfectly valid String.Format argument: {{{0:D}{{Hello World!}} the first, String.Format produces "This {0} is a { totally valid { String.Format statement." (the "0" can end up being any number of things, but the point is it's going to be wrapped in {} in the output) The second ends up looking like "This is yet another perfectly valid String.Format argument:{0:D{Hello World!}" So for inner {}, String.Format interprets those as places for format arguments. Any outside {{ or }} get interpreted as braces in the resulting string.

The solution I used kept track of the previous character. If I encountered {{ or }}, I reset the previous character. Any single instance of { encountered resulted in no translation until a } was encountered.

Wednesday, September 12, 2007

Leaking Memory with Marshal.AllocHGlobal

The other day I was going through the source code in our application and tidying up various internationalization tasks, and I came across the following code:

IntPtr BinaryValue = Marshal.AllocHGlobal(8 * Marshal.SizeOf(new byte()));

I quickly noticed that this call didn't have the requisite Marshal.FreeHGlobal call, which basically meant it would leak 8 bytes every time this code was executed. Going through the application, I then found ~15 other places, each of which would leak anywhere from a few bytes up to a couple hundred.

Leaking memory in .NET? Say it isn't so! According to the documentation, the Marshal class:
Provides a collection of methods for allocating unmanaged memory, copying unmanaged memory blocks, and converting managed to unmanaged types, as well as other miscellaneous methods used when interacting with unmanaged code.

So one important thing to keep in mind with the Marshal class is that it was specifically crafted to allow .NET to interact with unmanaged memory, among other things. This means particular care must be exercised, because many of the methods have requisite cleanup steps, or deal directly with memory. Admittedly this can be difficult to cope with, especially after programming in a language where memory management is typically handled entirely under the covers.

Anyways, after finding and establishing that the above code was leaking memory, the question quickly moved on to: what is the best way to make sure memory is always released? Anybody who matriculated to the .NET world from C++ is acutely aware of the perils of non-deterministic finalization, so wrapping it in a class and relying on a destructor is out of the question. One quick/dirty solution a co-worker of mine proposed was a try/catch/finally block:

IntPtr BinaryValue = IntPtr.Zero;
BinaryValue= Marshal.AllocHGlobal(8 * Marshal.SizeOf(new byte()));
if (BinaryValue != IntPtr.Zero)

...the benefit of this is, the finally block will execute regardless of how this code exits. Also, the catch block can be omitted and the finally block will still execute, so this code is also nice for places where it'd be better for the caller to handle the exception.

Another option is the using statement (not to be confused with the using directive). The using statement is actually one way to make .NET behave in a more deterministic manner; once the using block goes out of scope, objects declared in the using block will be zapped from memory. The only requirement is that the object declared in the using statement has to implement IDisposable, so this means we have to wrap Marshal.AllocHGlobal in a wrapper class. I wrote a quick and dirty wrapper class:

class UnmanagedMemory : IDisposable
private IntPtr _ptrToUnmanagedMemory = IntPtr.Zero;

public UnmanagedMemory(int amountToAllocate )
_ptrToUnmanagedMemory = Marshal.AllocHGlobal(amountToAllocate);

public IntPtr PtrToUnmanagedMemory
get { return _ptrToUnmanagedMemory; }

public void Dispose()
if (_ptrToUnmanagedMemory != IntPtr.Zero)
_ptrToUnmanagedMemory = IntPtr.Zero;

...pretty straight forward. Using this class then looks like:

using (UnmanagedMemory um = new UnmanagedMemory(8 * Marshal.SizeOf(new byte())))
// Do something here with um.PtrToUnmanagedMemory

...if you set a break point right after the using block exits, and one in the Dispose method of UnmanagedMemory, you can see the "deterministic" finalization in action. This method is a little more involved, but probably preferable; the UnmanagedMemory class could be expounded upon to provide any number of useful features, and it's cleaner and more object oriented than the try/catch/finally block.

(worth noting is that the using statement is essentially a try/catch/finally block, under the covers, but it is still fundamentally different to wrap unmanaged memory in a class and interface with it like that. I believe this method is preferable)

In our application, we ended up using the try/catch/finally block, mostly because of circumstances inside the company (release coming up, can't do anything too wild) and because I needed tight control over the exception flow so I didn't have to fundamentally alter any of our error handling code. Eventually, though, I'll head over to using something like UnmanagedMemory.

Monday, June 04, 2007

Population of Rome, Continued

Another question often pondered when people discuss the population of ancient cities is: what was the average life expectancy? The formula for computing the average expectation of life is pretty simple:

AverageExpectationOfLife = SumOfTotalYearsLived / TotalIndividualsRecorded

The above formula is basic enough, but Tim G. Parkin, author of Demography and Roman Society, points out the rub:

The simple equation...may often produce what appears to be a realistic figure for average life expectancy, but when ages at death are sorted into age groups (say, 0-1 years, 1-9, 10-19, etc.), serious problems become evident. That infant mortality (i.e., deaths in the first year of life) is grossly underrepresented in the tombstone records has long been recognized; clearly infant deaths, even when burial took place, went largely unrecorded, or at least any such records have not survived in large numbers. other words, infant mortality skews the number so heavily that one thinks of the average lifespan as being low, but people surviving past childhood have a good shot at living much more reasonable (by modern standards, anyway) lifespans. This book also contained some eye opening information on Roman demographics, and it seems the same uncertainty surrounding the estimation of water abounds.

Another interesting point discussed in the book is the ratio between sexes; Beloch and previous authors (Carcopino's Daily Life in Ancient Rome: The People and the City at the Height of the Empire., for example) assumed an equal sex ratio. The evidence put forth by Parkin suggests otherwise, although he makes it quite clear that even his evidence may be off:

Another very striking feature when considering the thousands of epitaphs collected is the preponderance of males over females. If we assume for the moment that in reality the sex ratio over all age groups was approximately equal, this means that in our sample three males were commemorated for every two females...It is true that there is some other evidence...that suggests that males really did outnumber females in certain periods of antiquity.

Another question: how many children did Roman families have? And were the Romans generally monogamous, or did males sometime take multiple wives? The latter question was answered with relative ease: the Romans were monogamous, and generally frowned on polygamy. The former is perilous. Given an average life expectancy of 25 years, a gross reproductive rate of over 2.5 is required to require a stationary population. In other words, each woman needs to bear 2.5 daughters, or about five children. More than that would be population growth; less would be decline. That said, Parkin brings up many issues with estimating Roman family size:
  • It was known how may children were born (or often only sons), but not how many survived to adulthood.
  • Large families may be mentioned because they were abnormal, skewing modern perceptions.
  • Daughters often remained anonymous or unmentioned.
  • Ancient authors rarely show any demographic interest in average family size.

Through Roman Egyption census records, the average size of households was calculated to around six, including parents and slaves.

Based on the information Parkin brings up, it seems like it should be possible to refine Beloch's

Saturday, June 02, 2007

The Population Of Rome

Determining the population of Rome is a rather tricky issue. It's been much debated, and the estimates vary substantially depending upon who's numbers you look at. The research also tends to focus on a relatively narrow window of Roman history (46 BC to AD 15), and tends to focus on a subset of the overall population: recipients of the corn dole. Beloch's 1886 paper, Die Bevölkerung der griechisch-römischen Welt, is the starting point of discussion surrounding the corn dole.

The corn dole (sometimes referred to as the grain dole) was a handout by the Roman government to citizens of Rome. The number of people who received the dole varied, but research shows it varying between 150,000 to 300,000+. The dole started in 123 BC by Gaius Gracchus, and in 58 BC Publius Clodius Pulcher made it completely free. Beloch used the number of qualified male citizens who received the dole as a starting point for estimating the population of Rome. Based on this, he then estimated the number of dependent women, children, and slaves. He also estimated the number of foreigners in the city as well. This resulted in a figure of ~800,000 people inhabiting Rome.

As a verification of the above estimate, Beloch estimated the grain consumption of his computed population and compared that with estimates of the total amount of grain flowing into Rome each year. Here is a table from Gerda De Kleijn's book, The Water Supply Of Ancient Rome - City Area, Water, and Population, which shows the years of the dole and number of recipients:

TODO scan table

There are several issues with this, and especially so when attempting to reconcile the figures with Rome's water supply. For starters, the narrow window of data points is particularly troubling: Beloch's data spans 60 years, but aqueduct information spans in excess of five hundred. Superimposed on our previous graphic of Rome's water supply, it's easy to get a sense of just how small the window really is:


Another question often pondered when people discuss the population of ancient cities is what the average expectation of life was. The formula for computing the average expectation of life is pretty simple:

AverageExpectationOfLife = SumOfTotalYearsLived / TotalIndividualsRecorded

Friday, June 01, 2007

Here is a graph showing how much the water capacity in Rome increased over time. The y-axis is in quinariae, and the x-axis is time:

...the above numbers were computed using Blackman and Hodge's figures. Notice that this graph is similar to the one I previous posted that tabulated totals in Excel. The big difference with this graph is the x-axis is appropriately scaled with respect to the aqueduct's construction date.

In the perfect world, I'd have enough data to correctly taper each of the horizontal lines, clip out sections where the aqueduct was incapacitated or otherwise non-functional, etc. But it isn't a perfect world, so I have to settle for the above. (which for all intents and purposes isn't that bad)

I recently reviewed another paper by H. Chanson, Hydraulics of Roman Aqueducts:
Steep Chutes, Cascades, and Dropshafts
, which had a totally different approach to estimating water capacity. Chanson actually evaluated archaeological evidence to determine the hydraulic flow in ancient Roman aqueducts. His findings are quite interesting: "maximum" flow for the nine major aqueducts was 1,013,960 cubic meters per day. Adding in the Traiana and Alexandrina (I guess there's enough evidence to find their hydraulic potential), Rome was supplied with 1,148,960 cubic meters, per day. The graph for the data tabulated in Chanson's paper looks like so (in this graph, the y-axis is cubic meters per day):

Chanson's data also makes me think that Grimal's data is probably based on hydraulic potential rather than Frontinus' records. And this is probably fine. One interesting thing about these graphs is regardless of the magnitude of the units, they look pretty similar. As in, the overall progression of water flow into Rome is pretty obvious; the overall magnitude seems to be of contention, but not the fact that it A) increased, and B) increased by some proportional amount.

One important thing to note with Chanson's data is that this is the maximum capacity, so it gives us a well-defined upper bound for the aqueduct's delivery potential. However, I doubt the aqueducts were ever running at capacity, and it's probably safe to say that it was common for them to run at half-capacity (or less) during dry months (being spring fed means the water source wasn't constant). So the estimates around ~600,000 cubic meters per day still seem more reasonable to me.

I've also recently been reviewing sources on population; I'll be posting more about that very soon.

Thursday, May 17, 2007

I decided to look a bit more into the water quantity figures Hodge listed in his 2002 work, Roman Aqueducts and Water Supply. It turns out that his figures are derived from the work of a previous author, Grimal. I know little about Grimal and some google searching didn't yield much information beyond basic facts: he's French, and most of his publications seems to have occurred in the 60s and 70s.

Purportedly, his figures are based on the records of Frontinus, but I don't possibly see how that could be: Grimal's cumulative total of the nine aqueducts for which Frontinus supplied measurements is fully double the total of other authors, including Hodge's figures in his collaborative work with Blackman. Almost all estimates are based on the figured supplied by Frontinus, and tend to be in the range of 12,000 to 15,000 quinariae. Grimal is hovering around 25,000.

The generally accepted conversion from a quinaria to some unit of volume over time appears to be about 40 m³/day. As always, estimates vary; some seem to have a more conservative figure of 20 m³/day, and I saw a few estimates above 40 as well. To get a sense of what 40 cubic meters of water actually looks like, think of it as being a cube of water ~3.41 meters on all sides, or about ~10,000 gallons of water. A swimming pool 24 feet round and four feet deep has about 12,000 gallons in it.

Knowing this, Grimal's total for the water supply of Rome is 24,805 quinaria per day, which Hodge seems to happily accept and convert to cubic meters using the 40 m³ estimate: 992,200 m³ per day. To get a sense of how much water this is, this would be a cube of water 99.74 meters on all three sides. This is 262 million gallons of water. Per day. Assuming Rome contained a million inhabitants (which seems to be the generally accepted estimate), this means that the water supplied 262 gallons of water for each person in Rome, per day.

Personally, I find Grimal's figures wildly suspect. For starters, Grimal arrives at totals double of other estimates. An "alternative estimate," according to Hodge, was H. Fahlbusch's proposed range of 520,000 to 635,000 m³. Other estimates seem to fall in this range as well.

Second, looking at what typical US cities used, it seems really unlikely to me. In a paper by Morgan, he gives the following table for U.S. cities in 1900:

...assuming the above figures are correct (they seem reasonable, but I haven't verified them myself), Hodge/Grimal are stating that Rome supplied more water per person than most major cities in the U.S. did by the start of the 20th century. I find that a very hard pill to swallow. I'm not alone on this: in Morgan's paper, his estimates are based on Frontinus' figures, and he arrives at a more typical ~14,000 quinaria, which using the 40 m³ conversion would end up being 560,000 m³ of water per day. However, Morgan settles on a more conservative figure of 6000 gallons/day per quinaria (22.7 m³/day), which makes his estimate 318,000 m³ per day. This is fully one-third of Hodge/Grimal's estimates.

Worth note is that when the Aqua Traiana and Aqua Alexandrina are added in (which Grimal estimates at 113,920 m³ and 21,160 m³ per day, respectively, although one what basis these estimates were made is unknown), it pushes the Rome total to 1.127 million m³ per day. Considering most other authors don't even bother supplying figures for the Traiana and Alexandrina, I wonder A) where Grimal got this information, and B) on the basis of what estimations. It certainly wasn't Frontinus' estimates.

What do modern aqueducts supply? The Catskill Aqueduct, which begins at the Ashokan Reservoir in Olivebridge, New York, in Ulster County and runs 120 miles to New York City, has an operational capacity of about 580 million gallons (219,240 m³) per day. It typically carries less than this, with flows averaging around 350 - 400 million gallons (132,200 - 151, 200 m³) per day.

Another example--the Colorado River Aqueduct--runs 242 miles from the Arizona-California border. Its capacity is 1.3 million acre-feet (1.6 km³) per year, or 1.6 billion cubic meters per year, or ~4.4 million cubic meters per day. Average flow is probably less than this, but it's a staggering number to think about. The construction project lasted eight years, and employed a total of 30,000 people.

I won't even list the figures for the California Aqueduct. It'd make your head explode. :)

Perhaps the most appropriate comparison, however, is with another Roman Aqueduct: Pont Du Gard. The water conduit, which is 1.8 meters (6 feet) high and 1.2 meters (4 feet) wide and has an average gradient of 0.4 percent, supplied the Roman city of Nemausus (now Nîmes) with an estimated 20,000 cubic meters of water per day (note: this is an uncited figure from wikipedia; I'll have to look into how that number was arrived at). This is also 500 quinaria, using the typical 40 cubic meter conversion rate.

Using the cube analogy, Pont Du Gard supplied Nemausus with a daily 27.1 meter cube, or 5.3 million gallons a day. Estimates for the population of Nemausus were around 50,000 inhabitants, so that would be roughly 106 gallons per person, per day, which is a pretty far cry from Hodge/Grimal's estimate of 262 gallons person/day in Rome.

Thursday, May 10, 2007


I'm currently taking a class on Roman aqueducts. We discuss and research the aqueducts that supplied Rome (and other portions of the Roman empire) from a number of different perspectives, including historical, hydrological, cultural, economic, logistical, etc. Basically, it's a class covering a broad range of topics pertaining to Rome's water infrastructure, with the aqueducts at the core of discussion.

As part of the class, we have to do research on a topic of our choice. Originally, I wanted to do research on how the aqueducts reduced the instance of water-borne disease, but my preliminary research revealed a rudely obvious fact: getting data on that subject was going to be close to impossible. I'd also made some crude assumptions, such as assuming that water-borne diseases like Cholera that produced epidemics during the 19th century were present (they were not--Cholera was endemic to the Indian subcontinent until the 18th and 19th centuries). Basically, it was doomed from the get-go.

With that fresh defeat behind me, I got to thinking--how did the water supply effect the population of Rome? How did the population of Rome effect the water supply? The aqueducts were built over a span of 500 years, so one thing I plan on examining is how the water supply grew in relationship to the population growth of Rome during 312 B.C. and A.D. 200. I'd like to attempt to answer whether or not new capacity was added in response to demand, and if the added water in turn sparked increased population growth.

Admittedly, this task is also fraught with peril: gigantic discrepancies in the capacity of the aqueducts exist in current research. Archaeological evidence and historical accounts of their capacity are obviously suspect, because stuff generally isn't built to last 2000+ years (even the Romans, notorious for their over-engineering, aren't up to snuff), and historical records are sparse and sometimes cryptic.

Another huge stumbling block is the actual population of Rome--again, estimates vary greatly, from a peak population of 200,000 inhabitants (absurdly low) to 2,000,000+ (absurdly high). The general consensus among reading seems to be around a million, but it's an exceptionally wide range. Also unknown is the rate of growth, epidemics that may have caused lapses in growth, immigration, the effects of being sacked seven or eight times, famines, etc. The demographic information leaves much to be desired.

Still, the estimates are good enough to attempt to draw some conclusions. And the idea sounds entertaining to me, so...without further ado:

...the above is a chronological listing of Rome's aqueducts. I have the dates as negative numbers because I wanted to do some basic graphing in Excel. Anyways, this chart shows their rough completion date, overall length, altitude of water source, altitude of the water level in Rome, and the total portion of the aqueduct that was above ground. Much of this information is from the meticulous records of a model public servant by the name of Sextus Julius Frontinus, from which most of the "hard" data today is derived. Because Frontinus died before the completion of the Aqua Traiana and the Aqua Alexandrina, little is known about those aqueducts other than some scant archaeological evidence.

Here is another chart, showing the "output" (in quotes for a damn good reason--more on this in a bit) of each aqueduct in quinariae, according to several sources:

Overall B&H is "Blackman and Hodge," who had estimates based around modified figures derived from Frontinus' records. Hodge later tabulated greatly increased values in a subsequent publication, hence the separate column. The last column are figures from Gerda de Kleijn, who also made estimates based on modified figures from Frontinus.

The first thing to notice about this table is the vast discrepancy between the last four columns. All four columns represent estimates by published sources. I'm going to do more research into this discrepancy, but for now, I'll let it be. The second thing to notice is that these units are all in quinariae.

Much of the controversy over the total water "output" supplied by the aqueducts stems around the rather nebulous nature of a quinaria. A quinaria was a Roman unit of area, roughly equal to 4.2 square centimeters. Its primary use was to measure the cross-sectional area of pipes in Roman water distribution systems. Even in Roman times, there was controversy over the value of a quinaria--according to Frontinus:

Those who refer (the quinaria) to Vitruvius and the plumbers, declare that it was so named from the fact that a flat sheet of lead 5 digits wide, made up into a round pipe, forms this ajutage. But this is indefinite, because the plate, when made up into a round shape, will be extended on the exterior surface and contracted on the interior surface. The most probable explanation is that the quinaria received its name from having a diameter of 5/4 of a digit... other words, Frontinus disagreed with Vitruvius about the actual value of a quinaria (primarily on the basis of Vitruvius' estimate being "imprecise" with respect to inner and outer circumference). Regardless of the Romans haggling over the value of their own units, one thing is certain: using quinariae to measure flow would be somewhat like using feet to measure acres, or kilograms to express newtons. The Romans did have units of volume, but the issue with the water flow is that it's a measurement of volume over time, which the Romans had no accurate or practical means of measuring. Culleus per elapsed sundial just doesn't roll of the tongue quite like cubic meters per second. Considering they had no accurate temporal measuring devices, it's debatable if the Romans even grasped the concept of units expressed over time.

To get a better sense of how the flow of water changed over time, I did some graphing in Excel:

The x-axis is time, and the y-axis is cumulative water flow in quinariae. The different colors correspond with a specific aqueduct. This is my first stab at graphing some of these data, so I plan on doing much more here; I want the temporal axis to be to scale, rather than the regular intervals in the above graph, but Excel seems suited only for the most basic charts. I suspect I'm going to have to draw much of these by hand.

Here's another interesting graph that shows how dependent Rome was on each aqueduct over time:

...considering the scale of aqueduct construction, new projects were not to be taken lightly. So there must have been a practical reason to add new capacity--could it be population? Who knows. :)

Sunday, January 21, 2007

The cost of owning a 'puter (or maybe 250 of them)

I recently read several papers evaluating the Total Cost of Ownership of Windows and Linux solutions. Thus far, the experience has been enlightening, although I'm sad to report that I don't have any obvious answers to the million dollar question: which is cheaper?

This is due to several things:
  1. Often times, the authors have a clear industry bias. Two of the papers I reviewed had clear conflicts of interest. This doesn't automatically invalidate findings, but it is difficult to believe a company would be dumb enough to trash their own products and services.
  2. Completely different methodologies for arriving at total-cost figures. As always, there are numerous ways to Fudge The Numbers™
  3. Completely different use cases. Depending upon the required task, Windows and Linux may have different TCO numbers. No two companies are identical, and different companies have totally different needs. Companies that develop software, for example, are going to have totally different requirements than a company that sells insurance.
  4. Dubious sources, claims, or arguments. Many papers had sources that were improperly cited, or that I could no longer locate.
  5. For certain comparisons, objective figures are not easily expressed. For example, evaluating "usability" is a difficult proposition. Evaluating "stability" is a comparable dilemma. Some studies attempt to factor these details in, but do so in a very superficial manner.

Here are some of the articles I read:

...that said, I'd like to demonstrate the above uncertainty by brutally harassing one of the above articles. I didn't find this article much better or worse than the others (in general, the quality is poor); this is for illustrative purposes only.

The CyberSource paper concluded that open source software saved businesses 19 to 36%, depending upon various factors and conditions. The most important factoid I drew out of this study was the fact that CyberSource is "the second longest running open source solutions company in the world." Immediately, it's hard to be objective about their findings, because there's a clear conflict of interest. For the same reason I wouldn't trust MSFT to release honest, accurate information about TCO figures, it is hard to take this evaluation seriously.

Still, there are some compelling arguments in the paper that should be discussed. CyberSource does clearly state they are vendors (and thus proponents) of Open Source Software solutions, so they let their biases be known early on. Furthermore, they make the claim that the scales have been tipped in MSFT's favor by:
  1. Not applying survey information indicating it takes less resources to support Linux
  2. Not factoring in viruses, malware, spyware, etc.
  3. Not counting system downtime from "reboots" and "crashes"
  4. Tripling the budget for Linux support to counter MSFT's claim that external experts would be required to support OSS solutions.

At first these would seem to be gigantic "favors," but let's take a moment to look each gift horse in the mouth.

The first scale-tipper isn't particularly compelling, given that other TCO reports had "survey information" stating the exact opposite: Windows requires less support resources than Linux. I was also unable to verify the source of this claim because it was a dead hyperlink; since the veracity cannot be established, it shall be struck.

Second, viruses....what can be said of viruses on Windows? As a "source," the article links a listing of viruses that have been seen in the last month. The source does not state how many times a virus was found, nor what platform the virus was targeting, nor if the underlying security vulnerability has been patched. It is an interesting list; it is not, however, a useful source to illustrate the security of Windows XP or Vista. In the context of TCO, it's almost worthless as a resource.

(also worth noting is that the listing of viruses is misleading because the list enummerates viruses rather than the specific exploit being targeted, so it is possible the list gives a false sense of insecurity)

Furthermore, other studies have little qualm with factoring in security costs for Linux--so the claim that these sorts of costs are not attributable to Linux is immediately suspect. I believe the reason for this is security in a mid to large business has more to do with protecting internal resources from employees, rather than from outside threats. At my last job, which was a fortune-50 company, our biggest threat vectors were clearly from within.

Regarding system downtime and "crashes," the article bases its claim that Linux is more stable and less prone to downtime with a single source. CyberSource claims that "our research indicates that open source systems rarely if ever suffers such problems," and further claims that "none of the most robust systems tracked by research firm Netcraft UK are Windows." Looks like they haven't viewed their own source in a while: the top three positions are occupied by Windows Server/IIS configurations. Also, this listing is almost exclusively based on web servers, and the dominant web server is Apache (60% market share, verses 30% market share for IIS, as of December 07); assuming equal weighting in terms of stability, Apache should dominate the list.

Personally, I don't think the netcraft numbers are actually relevant to stability/downtime. However, they are the only figures CyberSource provides as evidence of Windows being more prone to downtime and crashes. I personally find it to be unconvincing.

The final advantage is much more tangible. CyberSource triples the budget for Linux support in response to the claim that Linux may require more external consultants than Windows. This is a honest scale-tipper, in my opinion; they factor the costs of outside consultants in the Windows system at $45k, and the numbers rise to $135k for OSS solutions.

Cybersource then dives into the actual numbers, and there's definitely some funny math going on. Here is a compilation of costs for the Windows system:

...the first item is in direct conflict with CyberSource's claim that they weren't going to factor in the cost of viruses, spyware, malware, etc. For a 250-computer company, that comes to $9,475. To be honest, this number should be deducted. Either that, or they need to estimate the whole cost (including support). There are also free anti-virus solutions available; CyberSource glosses over the fact that free software isn't just for Linux. (more on this later)

The second item--Window Server 2003, is assigned a dollar figure of $3,999. This is not entirely clear, because Cybersource is specifically referring to the cost of Windows Server 2003 Enterprise Edition. Other versions of Windows Server 2003 have much lower costs. The only sizable difference between Enterprise and Standard is the number of physical processors and the amount of RAM supported; Standard costs $999. It is not necessarily true that a business of 250 employees would need Enterprise.

The third item is Microsoft Commerce Server. Again, omitted is the fact that the $20k figure is for the Enterprise Edition. Standard Edition is $7k.

Regarding Microsoft ISA 2004, for some inexplicable reason CyberSource goes with the cheaper version. The pricey Enterprise Edition is $6k per processor, or a cool $75k for 25 processors.

SQL Server has more editions with more complicated pricing schemes than I care to ennumerate in this post; prices range anywhere from $4k up to $25k; at any rate, the Cybersource figures are likely inflated.

The quoted price for Visual Studio .NET is for some sort of Enterprise Edition; MSFT has moved on from 2003 and 2005 is the latest in sliced bread. Prices for VS.NET 2005 range anywhere from $300 for the lowly Standard edition, up to $10k+ for Team Suite with MSDN Premium Subscription. Considering the CyberSource TCO figures only a single copy of VS.NET into the figures, there is literally no point in buying anything beyond the standard edition. Hell, the absolutely free Express Editions might suffice.

Next, we have the largest costs in software: operating systems and office software. First off, the CyberSource article specs out full retail pricing for Windows XP Pro; this is stupid. If a business is buying 250 copies, that's well into the world of bulk licensing. Ditto for office. The office software, however, is even more troublesome. One thing CyberSource did was create an analogous software package for their Linux-based solution. Included as the office suite is the venerable Open Office application (which I am actually a big fan of). What CyberSource doesn't make clear is that you can run Open Office on Windows, and it works great. So charging the Windows based configuration $300 times 250 computers is weird, and possibly unfair. Either MSFT Office has something that Open Office lacks (which would be the only good reason to cough up that much money), or these TCO figures reek of dead fish.

Finally, the icing on the cake: the Microsoft Software Insurance Program, which CyberSource computes to be a recurring cost of 25% a year for server OSs, and 29% a year for desktop OSs. I know of no company that purchases this program. Even a complete idiot can see this is a bad investment: CyberSource estimates it to be $211k per year, which is more than the cost of operating systems or office software. Given that MSFT releases a new operating system or office platform every two to five years, it doesn't take a rocket scientist to see this is a bad deal.

The total cost for the MSFT Software package ended up being $504,712, verses $90 for the OSS solution. $211,000 was the Software Insurance Program. $173,000 was the office software. Removing those two items gets us to $120k in software costs, which completely obliterates the lead for the OSS solution. This is also not taking into account the fact that volume licensing can further reduce that figure, and that Apache, PHP, MySql, and other industrial grade OSS products are available on a Windows platform.

This is a consistent trend in all the studies I reviewed: why no hybrid environment? Why can't we run Windows Operating systems, Firefox browsers, Open Office, VS.NET 2005, and Unix servers running Apache, MySql, and PHP? Why must consumers continually be presented with this "us verses them" reality? To see that sort of attitude from MSFT is not totally surprising; to see it from the open source community, on the other hand, is disturbing.

Even administrative jobs get broken into this "us" verses "them" concept; is it outside the realm of reason that a single individual could be good at administering both Windows and Linux computers? Most nerds I know are proficient with several operating systems; on my desk, at this very moment, are Windows XP, Vista, and Ubuntu OS installs. At school, I typically use machines running Red Hat, unless I'm interested in using VS.NET, in which case I use XP Pro. I like all of them, more or less. Were I an employer, I would want to hire IT administrators who knew all of it--not just half of the picture.

I should reiterate that this study was pretty bad, but so were the rest of the studies I read. Some of them weren't even studies; they were surveys pretending to be legitimate TCO comparisons, which is equally as debatable as Cybersource's number fudging.