Saturday, June 21, 2008

For Want of a %s...

Today's post involves a three-way grudge match among usability, security, and inertia.

One of the basic tenets of usability is to answer user questions clearly and concisely. One concrete technique is to use placeholder strings liberally in messages (typically coded in C-like languages as "%s" for strings or "%d" for numbers).

This notion pops into my head whenever I'm doing anything significant in SQL*Plus. Here's a condensed excerpt from a typical log file of mine. The questions that run through my mind aren't that deep -- what table is being updated? What index, package, procedure, etc. is being created? It seems redundant for me to have to put that information into PROMPT commands.

34094 rows updated.
Commit complete.
Index created.
Table truncated.
Package created.
Package body created.
No errors.
Package created.
No errors.
Function created.
No errors.
Procedure created.
No errors.

I never pushed too hard though to get a change like this into SQL*Plus, because just imagine how many test cases and automated builds expect to see some exact sequence of output that would suddenly be different. I expect it would require some sort of SET option within SQL*Plus, which in turn would involve new documentation, a long adoption cycle, scripts that used the new SET option would throw errors for older versions of SQL*Plus... all of a sudden it's not so clean.

Another possibility for this usability technique is the set of database error messages. How many messages say that some limit has been exceeded, or there's a problem with some object, where the erroneous thing that was encountered, or the correct thing that was expected, is never specified?

Again, test cases are difficult to change. Extensive error messages are also tricky for translation, because maybe the placeholder needs to be moved around because of different sentence structures in different languages. Maybe the value of the placeholder causes a ripple effect elsewhere in the message, as with the "le / la" distinction in French depending on whether a noun is considered masculine or feminine.

The lesson here is that these little touches are important to bake in from the beginning.

Security is another consideration that wasn't so obvious when the usability bibles were being written. If someone can induce an application to print a < character where one isn't expected, and the output is written to a web page, presto an attacker can construct a string to be echoed in an error message with a <script>, <iframe>, or other tag that can change the contents of the page as part of a cross-site scripting attack. So there's an extra level of scrutiny for every message string, to ensure that it doesn't have disallowed characters, that the substitution text isn't too long, and so on.

Saturday, June 7, 2008

The Goals of a Computer Science Education

I'm far enough along in my UC Berkeley Comp Sci refresher podcasts -- 61A (Structure and Interpretation of Computer Programs), 61B (Data Structures), and most of 162 (Operating Systems) -- that it feels like time to make some blog posts about computer science in general.

After all, one of the perennial topics for argum^H^H^H^H^H discussion is whether it's better to hire a Comp Sci major, a naturally talented coder, or someone with a lot of vocational training and experience. What anyway are the goals of a Comp Sci degree?

IBM has a multidimensional measure of software quality and/or customer satisfaction called CUPRIMD [that link is a PDF file]. Standing for Capability, Usability, Performance, Reliability, Installability, Maintainability, and Documentation. Never one to leave well enough alone, IBM later extended this acronym to CUPRIMDSO by adding measurements for Support and Overall. I contend that the goal of computer science is really all about the CPR in that acronym: capability, performance, and reliability.

For example, CS 61A focuses on capability (how do you write a program to accomplish X?) and reliability (what practices are needed to ensure the answer is right and cover all the possible inputs?). CS 61B focuses on capability (how do you sort, find the highest/lowest value, or transform a data structure?) and performance (is it faster to sort this way or that, what are the best and worst cases to do such-and-such a transformation on this type of data structure?). And CS 162 focuses on reliability (how do we keep a process from writing over another's memory, how do we avoid deadlock?) and performance (how do we keep address translation, resource tracking, et al from grinding the system to a halt?).

I know that all the while, students are exhorted about documentation ("write good comments", "learn Javadoc notation"), and later on, depending on the institution, there might be optional branches off into specialties for technical writing and usability. But those are a bit squishy for the core Comp Sci goals. The best technical writers put their deep knowledge aside to write things that are mostly correct and comprehensive, in words that an eight-grader can understand. The questions "is Ruby on Rails usable?" and "is this ATM touchscreen usable?" lead off in very different directions.

The goal of teaching someone who can make the most reliable, fastest, and fullest-featured program or system runs into some conundrums in the real world.

Today, if you want to enable your company to do something it couldn't do before, you might do a "View Source" on a web page to see how to do a certain effect in CSS, or adopt a new language or code library that solves your toughest problem in one line of code. So capability is not really such a distinguishing factor between people with higher education, natural skill, or on-the-job training.

That leaves us with performance and reliability. That's where the real paradoxes come in.

You might call in someone with a CS degree, who turns an O(N**3) solution into O(N  log N) and so saves untold hours of waiting. Or they might reduce memory requirements so that the underpowered machine devoted to an application suddenly runs like a champ. Or they might network together several servers to turn formerly idle time into productive work.

On the other hand, if you have something that works most of the time but occasionally gives a wrong answer or hangs, the Comp Sci solution may add quite a lot of complexity and overhead to make things 100% reliable. UCB's operating systems course really puts that into perspective, taking me down memory lane from the Commodore 64 days of gleefully smashing the stack, to Motorola 68000 and the Amiga 1000 where there was no paging or swapping and we dreamed of the 68020 with MMU, to the world of today with robust multitasking and memory management all accelerated by hardware. Not a day goes by that I don't say to myself, things were a lot faster in the '80s. :-)

I see echoes of these CS goals in the "tough tech interview"-style word problems that people post from time to time. (C'mon, admit it, you've looked at them.) Hey, this one is really recursive unwinding, that one's a merge sort, I'll bet the clock algorithm would work here, and so on. For the most part, all drawn from those early fundamental CS courses. As an employer, you want your recent grads to really have internalized those lessons, rather than deciding that this theory stuff isn't important in the real world. And you want your industry hires to not have completely forgotten the basics after years of dealing with COTS ("commercial off-the-shelf") software and office politics.

Tuesday, June 3, 2008

How PL/SQL Could Be More Like PHP

This is really a trick subject line -- PL/SQL already has the capability to be very much like PHP. PL/SQL Server Pages lets you write template HTML files including PL/SQL parameters, expressions, and blocks, and turn them into stored procedures that produce HTML output using HTP.P() procedure calls. The angle-bracket syntax around the expressions and blocks is very similar to that of PHP. And a PL/SQL coder has a headstart when it comes to optimizing DB performance and avoiding security problems.

PL/SQL Server Pages was the first web-related feature that I documented at Oracle. When I started using it myself, eating my own dogfood so to speak, that's when the potential functionality enhancements sprang to mind.

PHP is so popular because it's all based around text files, you can modify your source with any kind of HTML tool, and you don't have to do anything special to get the code to run, just put the files in the right places.

PL/SQL Server pages are also based around text files, you can also run them through HTML Tidy, and edit them in DreamWeaver or what have you. However, to use them requires the loadpsp command. With this command, you specify the user ID and password on the command line. That's not the ideal mode of loading code if you normally sit at the SQL*Plus command prompt, already logged in, and load new/changed stored procedures via '@script_name'.

With PHP, the way to load code into different parts of the application structure is to put the files in different subdirectories. The equivalent for PL/SQL Server Pages would be to use a template file to generate a packaged subprogram, an internal procedure, or a method for an object type. But, to my knowledge, loadpsp can only create a top-level procedure, which best practice says you should keep to a minimum.

Now, all of this could be worked around if loadpsp could generate the source code for the procedure and stop there, leaving the file behind instead of loading it into the database. You could set up a makefile to transform each .psp file into a .sql file. You could load the .sql files into the database from inside SQL*Plus without doing a separate login each time. You could set up a script using a SQL*Plus HOST command to go through both steps. Perhaps that would incite the SQL*Plus dev team to create a new directive similar to '@' that would run loadpsp and then '@' the resulting output file. If the output file could start with 'PROCEDURE ... IS', leaving off the initial 'CREATE OR REPLACE', you could use my favorite trick of:

package body foo is

to load PSP-generated pages as packaged procedures or object methods.

I never quite bought into the idea that with PHP, coders and designers would happily edit the same files, passing them back and forth and limiting their edits to either the HTML tags or the code blocks depending on their role. In my experience, every expert tends to have ingrained assumptions that don't work well with top-down planning. The designer tasked with making a more attractive-looking page will go off and write HTML that's structured very differently from the existing page, that PL/SQL blocks and widgets can't be neatly dropped into. The PL/SQL coder tasked with adding some dynamic elements to the page will chafe at working inside markup produced by some HTML authoring tool they disdain; sometimes, it's too painful even to read past the DOCTYPE tag. :-)