Monday, December 10, 2012

Error Handling: Or, How to Start an Argument Among Programmers

Lately I see a lot of discussions come up about error handling. For example: Dr. Dobbs "The Scourge of Error Handling" covering mainly C-descended languages; this blog post stemming from OTN discussions of Oracle exception handling in PL/SQL. I've been on both sides of the fence (cowboy coder and database programmer) so I'll offer my take. I'll write mostly from the perspective of PL/SQL stored procedures, because that's where I've coded the biggest variety of error handlers. The same opinions apply to my programming in Python, Javascript, Java, or C++, that's just a lower volume of code.

I have a lot of sympathy for the Dr. Dobbs article's position that we've neglected error codes too much. I also don't mind PL/SQL's WHEN OTHERS clause as much as some people do.

It's true that exception handling is often used as a kludge that winds up simulating GOTOs and producing spaghetti code. In fact, I don't think I've ever seen a substantial body of code that used "admirable" exception-handling practices. People know some but not all of the things that could throw exceptions, so their try/catch blocks are insufficient. If you really want to protect and debug every operation that could throw an error, you need way more exception handling than anyone does in practice. (In theory, you could get some sort of out of memory, memory corrupted, or disk full error at any random time.) If you wind up maintaining someone else's code, basically any new bug is going to highlight the need for more granular exception handling, or dealing with some never-before-seen condition, or better recovery code in the caller.

For example, I was the author of the Oracle PL/SQL User's Guide for a time, and wrote more than 100K lines of PL/SQL code. I had many first-time encounters with hard-to-debug exceptions in purely PL/SQL operations, for things like nonexistent collection items, index out of range, bad datatype conversions, and so on. Those were often in query-intensive pieces of code where there was no need for a rollback, just an error message for a user or a slightly less-than-optimal query result for a search engine result page. The structure of stored procedures dealing with collection types often required putting separate try/catch blocks around every reference to an index variable, or constructing some debugging metadata using extra variables designating how far the routine had gotten. That didn't feel right; once the language leads you down a path that seems clunky, coders are less likely to stick to best practices in all other areas.

I also dealt with built-in PL/SQL stored procedures where I didn't know in advance what exceptions could be raised, and so learned over time what were the most likely ones to come back based on my imperfect understanding of the internal workings of some package. But how could I ever predict in advance all the things that could go wrong when settings were changed, the database upgraded, or some hardware limit was reached that I couldn't anticipate in a test environment? The people writing such routines, who have greater expertise with the internal operations, can just RAISE any exceptions and put the burden of predicting and handling them on you, the ill-informed caller. That feels like the wrong way round.

For those reasons, I wouldn't demand that WHEN OTHERS always propagate an exception back to the caller. It could be protecting a purely optional section of code, for example one that logs error conditions. (Which is already likely to be called when things are messed up in unexpected ways.) Or that produces some optional UI adornment that can be skipped in the name of keeping things running in the face of moderate problems. Personally, I'm happy when PL/SQL stored procedures tackle some new problem domain where not every code path is mission critical or represents a DML operation inside a transaction.

As a developer, you also don't necessarily get to control how unhandled exceptions are displayed or logged all the way back at the web layer. For example, the PL/SQL web gateway by default shows a stack trace, which makes unhandled exceptions easy to debug in a development/test environment. But that might be turned into a generic 404-style error page on the production web server, meaning it's just as well to handle the exception entirely yourself (WHEN OTHERS with no RAISE) and put up your own error page or make subtle changes to your page of search engine results or whatever.
The best argument I've seen against WHEN OTHERS THEN <something besides raise> is that the system could be so hosed that another exception occurs doing something you thought was completely innocuous inside the handler, leading to an infinite loop of exceptions. I've never encountered that myself, so I can't say how that works in practice.

Getting away from PL/SQL for a second...

In Javascript, you're almost always dealing with non-critical code paths. Which makes it more practical to do less granular error handling. (If errors X, Y, or Z occur during this function, don't display the ad widget we were trying to retrieve.)

Python's nice robust standard library and introspection makes it very pleasant to write exception handlers for routines you didn't write yourself. There are plenty of things you can do with scripting languages where, if you run out of memory or out of disk space, you can live with a sudden halt and an error report rather than extensive recovery actions. So in Python (and Perl, et. al.) I don't mind having a top-level exception handler and not going overboard with try/catch blocks for every operation that could fail.