Saturday, December 06, 2008

It's big, it's heavy, it's wood!


Everybody loves log:


...this posting concerns a different type of log: the type we programmers use to keep a history of what our application did.

I love Jeff Atwood. I really do. I read his blog on a regular basis. I enjoy his style of writing, and both of us are unabashed nerds. We both post on Silent PC Review. His blog, Coding Horror, has possibly been my favorite blog over the last few years. Most of the time, he's spot on, but occasionally he stumbles, as is the case in The Problem With Logging. Atwood makes it clear how little respect he has for logging:
So is logging a giant waste of time? I'm sure some people will read about this far and draw that conclusion, no matter what else I write. I am not anti-logging. I am anti-abusive-logging. Like any other tool in your toolkit, when used properly and appropriately, it can help you create better programs. The problem with logging isn't the logging, per se -- it's the seductive OCD "just one more bit of data in the log" trap that programmers fall into when implementing logging. Logging gets a bad name because it's so often abused. It's a shame to end up with all this extra code generating volumes and volumes of logs that aren't helping anyone.

We've since removed all logging from Stack Overflow, relying exclusively on exception logging. Honestly, I don't miss it at all. I can't even think of a single time since then that I'd wished I'd had a giant verbose logfile to help me diagnose a problem.

When it comes to logging, the right answer is not "yes, always, and as much as possible." Resist the tendency to log everything. Start small and simple, logging only the most obvious and critical of errors. Add (or ideally, inject) more logging only as demonstrated by specific, verifiable needs.
This sounds perfectly reasonable, until you consider a few facts:
  1. Jeff's application, Stack Overflow, is a web application, so logging may or may not be a valuable tool. For many applications, logging isn't that useful.
  2. Nobody wants a giant, verbose logfile. At least nobody sane, that is. The goal of logging has never been to mindlessly "log" every conceivable message, and anyone who thinks that is a good thing has A) never spent time parsing a log file or B) is prone to wasting their own time.
I, for the record, have spent an inordinate amount of time sifting through our customer's log files. Our application is real-time so debugging can be very difficult and logging provides a crucial method by which we solve problems. We go to great pains to only log things that are useful, and even more pain to remove messages that are not useful.

Atwood claims he's not anti-logging, but that's hard to believe given that Jeff's application no longer contains any logging beyond exceptions. The impetus for this rash decision?

Oh, sure, logging seems harmless enough, but let me tell you, it can deal some serious hurt. We ran into a particularly nasty recursive logging bug:

  • On thread #1, our code was doing Log (lock) / DB stuff (lock)
  • On thread #2, our code was doing DB stuff (lock) / log stuff (lock)

If these things happened close together enough under heavy load, this resulted in -- you guessed it -- a classic out-of-order deadlock scenario. I'm not sure you'd ever see it on a lightly loaded app, but on our website it happened about once a day on average.

I don't blame log4net for this, I blame our crappy code. We spent days troubleshooting these deadlocks by .. wait for it .. adding more logging! Which naturally made the problem worse and even harder to figure out. We eventually were forced to take memory dumps and use dump analysis tools. With the generous assistance of Greg Varveris, we were finally able to identify the culprit: our logging strategy. How ironic. And I mean real irony, not the fake Alanis Morrissette kind.

Correction: with the generous assistance of Greg Varveris, you were finally able to identify the culprit: your locking strategy.

Furthermore, why would you troubleshoot a deadlock with log messages? Think about this for a moment: logging only happens if the threads are active. Deadlocked threads: not active. The only sane conclusion one can arrive at in the event of a deadlock is that logging isn't going to help you much. Taking a crash dump and doing post-mortem debugging is what you do in the event of a deadlock (Varveris++). Reviewing your locking strategy is what you do in the event of a deadlock. But logging can't help you with a thread that is stuck waiting for a lock to free up because it is incapable of logging. It may be possible to add logging to figure out how the threads got in the state they're in, but the mere act of doing this implies your application has a threading/locking model that you don't understand.

This post was highly ineffective because the basic problems are A) failing to acknowledge a flawed locking strategy and B) misusing a tool (in this case, logging). And then when said tool is (unfairly) implicated as being the source of the problem, removing it from the application entirely. It's more like mindless thrashing than calculated debugging.

I do agree with Jeff that overzealous logging is a bad thing, and I would strongly advise against it. But I can't even tell you how many times logging has helped me solve a problem in our application. Here are some general rules of thumb I follow for logging:
  1. Keep a low signal-to-noise ratio; logging unnecessary messages only makes it more difficult for you to find interesting ones. Be brutal; remove anything you can deduce without the message.
  2. Keep a low signal-to-noise ratio; you do not have an infinite amount of disk space, so logs must be purged at some point in time. This means excessive logging impacts how much prior history you have. Sometimes it means log messages that were useful get purged and replaced by stuff that is not.
  3. Do not print out periodic messages to the log. If you must, print them as infrequently as possible.
  4. Keep messages as short as possible without making them cryptic.
  5. Always include variable values in messages (e.g. if you have the option between "blah failed!" and "blah failed with error code: E_INVALIDARG", the second is vastly preferable)
  6. If you are a real-time programmer (e.g. we do live video streaming), logging is always preferable to debugging. The act of debugging influences the system enough that logging is a far better option.
  7. There must be a way to access your logs. For a web application, this is obvious. You get them from the web server. For an application you distribute to customers, it can be more difficult. Implement a mechanism to fetch your logs, or give customers an easy way to generate and send them.
  8. We use two log levels: info and error. Some people prefer to add a dozen different levels, but in practice nobody knows the difference between INFO and IMPORTANT and ERROR and FATAL and SYSTEMEXPLODED and OMFG and JUSTKILLMENOW, etc.
  9. Always log date/time with each log message.
  10. Make errors visually distinguishable from info messages.
  11. On application start, log crucial information like application version, OS type, system info, etc. This can be really useful, especially if you have a versioned desktop application. (for a web app, not so much).
  12. If you're using C#, System.Reflection.MethodBase.GetCurrentMethod().Name is your friend. Avoid hard-coding class/method names in your log messages; when you go to refactor, it is a royal pain.
  13. Do implement a log rotation scheme. My favorite is to close logs at xx MB (we use 5 MB), and then zip them. We allocate yy MB of space for archived, zipped logs. Delete logs that exceed the allocated space (so maximum space occupied by logs is xx + yy; for us). There's also a ping-pong method, but it is less efficient overall.
  14. Provide a method to disable all logging.
Logging is valuable. Unfortunately, there is no general rule of thumb for determining if a given message is useful or just noise. It takes practice and maintenance to make a logfile a useful addition to an application. But knowing how to log messages in a way that is useful is a crucial skill for any programmer.