Thursday, May 29, 2008

How PL/SQL Could Be More Like Perl

An idea has been rattling around in my head for a while now. I know, I know, there's a lot of empty space so it makes a lot of noise. :-) Could PL/SQL become more popular and easier to use by taking a couple of points from the Perl playbook?

I read a blog post about how scripting languages are or aren't gaining on Java. I think supporting curly-brace syntax would be a nifty way to encourage PL/SQL adoption among both scripters and Java coders. But maybe that's just me.

My first impressions of Ada and then PL/SQL were somewhat negative: hey, this verbose BEGIN/END, IF...THEN...END IF syntax reminds me of Pascal. I first programmed in Pascal on the Commodore 64, and it was deadly slow to compile and run, owing to the whole UCSD P-Code business. Then at university, it was the first language for coursework, and even though it ran really fast on UNIX boxes, I always felt that behind the scenes, more deadly slowness was occurring.

These days, 95% of the languages I code in use curly braces for blocks:

if ... { ... }
function ... { ... }
for ... { ... }

Imagine if PL/SQL did the same. Suddenly, brace matching in vi/vim would work. (Hit % while cursor is on a brace character to find the matching brace at the other end of the block.) It would be unlikely to introduce logic errors; in my experience, mis-nesting of blocks is not a frequent source of errors in Perl or Java, and in deeply nested PL/SQL, even today a careless coder can just add END IF statements until the syntax error goes away. I sometimes write little comments after the END IF, END LOOP, etc. to remind myself of which block each one closes.

On the documentation end, the PL/SQL User's Guide and Reference, aka PL/SQL Language Reference in 11g, has never devoted reference topics specifically to "BEGIN Statement", "END Statement", "END IF Statement", etc. Instead there are generic topics like "Block Declaration", or a section on "LOOP Statements" that doesn't split out the "END..." part as a separate item. That's always made it hard for me to write auto-linking pretty printers for PL/SQL (i.e. no "BEGIN" topic to link from the "BEGIN" keyword), but it would make syntax enhancements slot right into the docs.

From a parsing standpoint, I think it's pretty straightforward. If you leave out a THEN or put an END IF where an END LOOP is expected, PL/SQL gives you a straightforward error message. It knows what block is open and what the expected closing symbol is. So I imagine that it's practical to parse syntax like:

procedure foo is
x number;
{
if 1 > 0
{
for bar in (select * from t1)
{
x := 3.1;
declare
x varchar2(64);
{
x := 'hello';
}
}
}
}


{ takes the place of THEN, LOOP, and BEGIN. If a { is seen where one of those tokens is expected, treat it like the equivalent block beginning keyword. When a } is seen, treat it like whatever block closing keyword is valid at that spot. When a { is seen with nothing special preceding it, treat it like a nested BEGIN.

When I think about it more deeply, I see potential problems around ELSE and WHEN clauses, where those keywords act as both the end of one block and the beginning of another, but anyway this whole idea is probably just a fantasy on my part.

One other Perlism that really struck home lately. Perl lets you get away with a comma after the last item within an array-style declaration. This seems like a minor point, but when writing programs that generate other programs (or anyway JSON-style data structures), it is sometimes difficult to give special treatment to the first or last data item. For example, with SQL or PL/SQL it's simple to write out 1000 INSERT statements, each ending with a semicolon. But let's say you have an XSLT transform with a FOR-EACH loop that produces an initialization statement for a data structure. The output goes

'item1',
'item2',
'item3',
...
'itemN',

and then you're left with that final trailing comma that you need to get rid of somehow. I'll often put the comma at the beginning of each line for just that reason, and to allow easier reordering of parameters and such. But then the first item is a special case. Think of how many times you have to introduce a Boolean variable to make the output slightly different for the first or last pass through a loop.

I used to think Perl's optional no-op trailing comma was just a flippant design choice, but now I think it was a very shrewd way to make dynamic programming easier. One that some other languages could emulate.

Tuesday, May 27, 2008

Doubleplusgood vi Shortcut for PL/SQL Coders

One thing that bugs me (no pun intended) about PL/SQL syntax is the lack of the ++ and similar operators. If you use long variable names, you're penalized by having to type them twice in a simple increment operation:

rows_processed := rows_processed + 1;

I use vim, a cross-platform vi clone, for editing PL/SQL code. vim has a lot of extensions that the old-school vi user might never realize. One such is Ctrl-P when in insert mode. It looks backward ("previous") for the last matching word, and fills it in. For example, you could write the line above by entering:

rows_processed := <Ctrl-P>

vim fills in the variable name again, then you continue entering "+ 1;".

If the variable name you want is not the very last word, you can enter one or more letters first and vim will fill in the last word that starts with that prefix. For example:

toggle_switch := initial_value - t<Ctrl-P>;

produces:

toggle_switch := initial_value - toggle_switch;

The matching isn't limited to the current line, so you could fill in variable, procedure, etc. names from much earlier in the file.

Vim has a whole set of such extensions. Some of them like 'ab' are familiar from the original vi days, but you can boost your productivity by brushing up on the newer techniques.

Wednesday, May 14, 2008

Query vs. CREATE TABLE AS SELECT

Half the time, I want to know some answer that can be determined with a query. The other half of the time, I want to start from that answer and use it as an intermediate step towards some larger solution. (That's not even considering the third half... :-)

So, I have a bunch of .sql files that do SELECT statements. Sometimes, I end up editing those .sql files into new ones that do CREATE TABLE AS SELECT... using the same query. I don't like the proliferation of .sql files that are slight variations of each other. That's the aspect of SQL coding that I would most like to see be object-oriented, the many SQL scripts that differ from each other only in terms of connection string, table name, or other minor detail.

Now, I know the WITH clause is there for pulling out a query to reuse its results as part of a bigger query. But sometimes the natural-feeling technique is to do CREATE TABLE AS SELECT colname... and then several other queries with clauses WHERE colname IN SELECT * FROM newly_created_table.

I had a sense there must be some shortcut to turning the original query into a CTAS construct. And there is! Remember how SQL*Plus will happily substitute the contents of a script anywhere you start a line with an '@' sign? Well, turns out you can use this technique to substitute a script as the last half of a SQL statement:

- Create a .sql file that has a single SELECT statement.
- Issue a CTAS statement like so:

create table t1 as
@script_with_select_statement

The trick is to make sure the SQL script has no blank lines, as they will cause SQL*Plus to stop processing the command prematurely. '--' comments are OK, so you can comment and format the script using -- lines where you would normally use blank lines.

CTAS is a bit of a blunt instrument, especially if the same situation comes up again and again. I may adapt this technique to use with CREATE GLOBAL TEMPORARY TABLE t1... and INSERT INTO t1 SELECT... constructs.