Sunday, September 28, 2008

Thoughts on Debugging PL/SQL Web Applications

At OOW, I ran into Stephen Feuerstein after seeing him demonstrate Quest Software's "Quest Code Tester" product. Considering how I might use a product like that for testing web-based applications, I suggested a couple of enhancements.

The biggest, most important procedures that I test in PL/SQL are those that generate entire web pages. For that kind of testing, you can't look at whether data has been changed in a table, you have to look at the HTML output of the procedure. In a testing scenario, that output would be in the internal buffer used by the HTP package and the others in the PL/SQL web toolkit.

An important procedure could generate a big web page. For that reason, I'd like to be able to compare at a finer granularity than whether the generated web page matches exactly some version that was stored for testing purposes. I think the ideal technique would be to run a regular expression test over each line of output, and be able to check "does any part of the page match this pattern?". It's that kind of flexibility that's missing in a lot of test environments, e.g. causing anxiety over the prospect of changing some text in an error message it might break some text case that does an exact match on all output, not just looking for the error number.

The contents of a web page could be unpredictable. For example, a page of search results might not be exactly the same after the search index has been refreshed. And Web 2.0-style pages could have random elements like "Tip of the Day" or a list of online friends, Twitter messages, or some set of recently viewed links. Even just personalized text like "Hello John".

In testing, I would like to ignore all those things and just focus on the parts that vary according to the parameters. For example, in a search of Oracle documentation, if the search term is "oracle", I expect that somewhere on the page will be a "Next>>" link. If I pass in the right parameters to retrieve page 1 of results, I expect that nowhere on the page will be a "<<Previous" link. If I pass in a nonsensical search term, I expect that the page will contain a particular message saying there aren't any results. For intentional misspellings, I might want to confirm that the right "Did you mean?" message comes up.

In addition, I might want to test certain invisible properties of the page, like the attributes of meta tags, links to stylesheets, or instead of showing exceptions to the user I'll catch them but embed little coded messages inside HTML comment tags.

Now, of course, someone could write a set of test cases in Perl or Python, even shell scripts, to retrieve URLs using some combination of parameters, then do this scanning of the content. But a PL/SQL-aware tool could be more convenient by hooking into the data dictionary to see the available parameters and the procedure source, which is why I'm intrigued by the Code Tester product.

Of course, any big procedure is composed of little parts. It's those little parts that do things like returning a string in a certain format, computing what number to display on a page, opening a cursor for a query. Those procedures and functions are the easy ones to unit test in an automated way, which is what the demo focused on. If you pass this number in, do you get this number out? If you pass zero, a negative value, a too-high number, do you get back the appropriate result or the expected exception? And so on for strings, collections, and other datatypes.

The wrinkle I throw into that testing scenario is internal procedures and functions. If something is directly callable, it'll be a top-level procedure or function, or inside a package. If it's reusable, chances are high it'll be in a package. But if something is abstracted purely so that my procedure body can be written like pseucode:

prepare_for_action();
run_query();
while results_left_to_process() loop
process_results();
end loop;

then I'll nest those procedures and functions inside the current one. No danger of them being called by accident (or by a malicious attacker, in a web scenario with mod_plsql). No name conflict if I want to use the same step name in a different procedure. They can access all the variables from the current procedure, so I don't need to load down the calls with lots of parameters.

Automated testing for those internal procedures and functions could be tricky. They can't be called directly from unit tests outside the original procedure. Stephen suggested using conditional compilation. The first idea that jumped to my mind is to generate instrumentation code and use some SQL*Plus hackery to embed it in the main procedure:

procedure test_me is
procedure step1 is...
procedure step2 is...
function func1...
-- This source file could be generated by a testing tool.
-- But using @ halfway through a procedure is a SQL*Plus-ism
-- that's doubtless unsupported.
@unit_tests_for_test_me;
begin
-- Conditionally compile this block under non-testing circumstances...
step1();
step2();
-- Conditionally compile this block under testing circumstances...
run_unit_tests();
end;

Saturday, September 27, 2008

HP Oracle Database Machine

Here's the HP Oracle Database Machine talked about in Larry's keynote. On the way out, some audience members said they were drooling over it. I don't know if that's a good idea; didn't see anything about moisture resistance in the tech specs.

I noticed that the box was about the same height as Larry. In the same way we talk about pizza boxes, 5U vs. 10U servers, etc., will we one day measure the form factor of big servers in Larrys?

Thursday, September 11, 2008

The Roads Must Roll

I have this theory. It's a variation on the Sapir-Whorf hypothesis, which says that the language someone speaks influences their thought patterns. That is, if a language has lots of words for XYZ, the people who speak it might be preoccupied with XYZ. Science fiction fans would naturally glom onto the idea that Klingon has lots of words for war and fighting; but that would be the Sapir-Worf hypothesis.

I was reminded of this theory the other day, reading this post comparing the usability of Apple's campus in Cupertino with Microsoft's campus in Bellevue. It seems logical that the founders who planned out a company headquarters would leave some mark of their own personality. And that environment would reinforce the same company culture on future generations of workers, forming a closed loop.

My theory, and I hesitate to name it after myself because maybe I'll have some better theory later, is that software (particularly networking) companies are defined by the roads around their headquarters.

Now I'm from the sparsely populated east coast of Canada. In my home town, there's not much excitement on the roads except once or twice a year, when a regatta or air show backs things up so much that it takes an hour to get out of the parking lots. There's only one major east-west highway across the province, and many communities are only reachable by boat. So it's not surprising that network-oriented companies from Newfoundland concentrate mostly on marine navigation, sonar, radar, etc.

I spent some time in Toronto. Very grid-like structure for the surface streets. Most highways are also essentially east-west or north-south. In fact, take one wrong turn or miss one exit and you'll never make it to your destination. There's also a deeply nested system of lanes on the major '401' highway, with "collector" lanes on the outside and "express" lanes in the middle. But that's a trap! The collector lanes are the ones that flow freely, the express lanes are jammed up with big trucks. Don't get too deeply nested or it'll take forever to get somewhere.

What software technology do we associate with Toronto and area? SGML and XML, via Tim Bray, Yuri Rubinsky (RIP) and SoftQuad. Lots of nesting; error detection but not a lot of error correction.

I've only visited Boston a few times, but I remember driving through the Big Dig tunnel when an ambulance sped by, siren blaring. And a dozen cars were tailgating the ambulance, passing all the drivers who kindly got out of the way. Who's in Boston? Right, Akamai, with big pipes transferring traffic as fast as they can.

Now the Bay Area has some geographical quirks that are reflected in the roadways. You can get on the wrong highway entirely, and still get to your destination. Try to figure out the interconnections between 101 and 280 through San Francisco. Cross the Bay Bridge and you're on 80, 880, and 580 simultaneously. Take 580 east from Oakland -- it's the one that goes south -- and even if you meant to take 880, in 20 miles you'll see a sign saying "880 this way", and everything is OK again. Go south on the east side of the Bay or the west side, and you'll end up in San Jose either way. It's impossible to go through the 92/880N interchange without fantasizing some better way to organize it.

Which explains why Google is so hot for directed acyclic graphs. And why Google maps will doubtless keep getting more and more features revolving around traffic routing.

Wednesday, September 3, 2008

First Thoughts about Google's Chrome Browser

I haven't tried Google's new Chrome browser yet. (Mac version if you please!)

But that doesn't stop me from having opinions. :-) For me, the most interesting aspect is the architecture behind the tabs. This whole issue has been driving me crazy for some time now. All the tabbed browsers have one shortcoming or another. Let's see if Chrome can fix some things.

For example, have you noticed that the best way to quit Firefox is to run 'kill -9 ' from the command line? That's fast. Using the real Quit menu item takes forever. When I want to reboot my Mac, OS X asks Firefox nicely to do its Quit action; but it takes so long that the reboot process times out, every time. Killing the operating system process is instantaneous. It's all the same in the end, all the memory gets released. Yet I trust 'kill' to be more thorough about releasing memory, given all the leaks and fragmentation while Firefox is running. Also, Firefox is more reliable about offering to restore all the closed tabs upon restart, if it was killed rather than Quit. Usually I get 2 offers to restore the tabs, from Firefox and from SessionSaver. It's a complicated dance to know which one to accept. Skip the first one, maybe the second one won't happen this time because a Firefox upgrade has made certain extensions incompatible, or because you don't get the same offer after a Quit. So for me it's kill kill kill.

I'm typically restoring ~100 tabs whenever I restart Firefox (or Safari for that matter). What does that mean? No performance or responsiveness until they're pretty much all loaded. Is it too much to ask to devote more attention to the frontmost tab in each window, and particularly the active window that I'm reading? Page down, right click, hello...? Safari tries to do this a little, for example when it defers loading things like Flash until you actually switch to that tab. The worst is when a tab has some auto-play movie or ad with audio; you hear it but you can't find the right tab to stop it. Sometimes half a dozen tabs try to play audio at once, producing complete babble.

So, it makes sense to favor some tabs over others. But which ones? Like I say, usually the frontmost tab deserves the most CPU. But just because a tab is deselected doesn't mean it shouldn't run. I want this background tab to keep running because it's playing a teaser ad before a video clip, and I'm reading something else until the ad is finished. I want that tab to get little to no CPU, because it's just playing ads with animated GIFs and sending AJAX requests for more ads in an endless loop. I'm hoping that Chrome comes with the equivalent of "Task Manager for tabs", where I can freeze certain ones, maybe blacklist sites so their tabs always freeze when backgrounded, see which tabs have gone nuts with memory and CPU, maybe clamp an upper limit on memory use for some tabs.

The question is, will Chrome simplify or complicate my browsing equation, which currently goes like this:

- Playing a Flash-based game on Facebook? Use Safari because Flash in Firefox tends to stall for a few seconds every now and then. No good for 3-minute rounds of Scramble on Facebook.

- Reading Slashdot? Use Firefox, because Safari has a Javascript bug that causes crashes sometimes when closing or even just leaving a page on that site. It's unpredictable which page, but happens consistently for the same page. Because I occasionally slip up, I usually have half a dozen old Slashdot tabs that I have to leave open until I'm done with every other single tab and can let the browser crash without consequence. Since Chrome uses the same Webkit renderer as Safari, that's something for their team to take note.

- Using Windows? Stay away from Firefox once it starts to get slow; switch to Safari or Opera instead. When doing Ctrl-click to open links in new tabs, sometimes there's a wait of 30 seconds or more between clicking the link and when the click is recognized. In the meantime, you have to sit there like a fool with your finger on the Ctrl key; otherwise, when the click is finally processed, the browser will forget it was Ctrl-click and will replace the current window. I don't know if this sluggishness in registering meta keys is something inherent to Windows event processing, a problem in Firefox coding, or a combination (since I don't see the same effect on OS X).

- Doing research opening several related sites in adjacent tabs? Use Firefox because it has better tools for bookmarking a group of tabs at once. I hear there's a plugin for Safari that offers the same feature, but I'm keeping Safari plugin-free to avoid performance issues. Also, that particular plugin appears to be several years old.

- Using some business application like for net meetings, that only runs on IE? Well, guess I have to run IE. Using Remote Desktop connection to a Windows machine. Not exactly the most efficient use of network bandwidth to have animation and audio streamed to a remote computer and then streamed again to a local one, but hey.

- Firefox grinds to a halt because its virtual memory size exceeds the amount of RAM on my computer? Start up Safari. One browser can apparently be in its death throes, while the machine still has plenty of CPU and real memory to run a different one at full speed.

- Debugging some web code? Run Firefox for the Firebug debugger. Curse various sites that load up the error log with tons of Javascript warnings and errors from background tabs. Often, I can't tell which background tabs are causing the continuous stream of errors. Even if I can recognize which site is misbehaving, I can't necessarily find the relevant tab to shut it down. (That would be a nice addition to Firebug.)

- Reading ad-heavy sites? Use Firefox for its combination of ad-blocking and Flash-blocking plugins.

- Reading certain other sites, like the Dilbert comic strip? Use Safari. Sometimes all the ad- and Flash-blocking prevents the desired content from showing up. Even with whitelisting and the ability to selectively load Flash, I can't read the Flash-based Dilbert strips anymore in Firefox.

- Switching a lot between computers? Use Safari. Its bare-bones bookmarking features mean I don't even bother with bookmarks for it, preferring Del.icio.us instead. I will use Firefox to bookmark groups of tabs, but it's too much bother to try and synchronize them between machines. Also, I find that the Firefox auto-update system and compatibility checking for plugins means that at any one time, several machines that from my perspective should all be on the same level of Firefox, have different combinations of disabled plugins that I can never predict. One machine will check for plugin updates and re-enable something, while a different machine will report no compatible plugins found. (Might be a Mac vs. PC thing, but why would that make a difference for Firefox plugins?)