Thursday, May 29, 2008

How PL/SQL Could Be More Like Perl

An idea has been rattling around in my head for a while now. I know, I know, there's a lot of empty space so it makes a lot of noise. :-) Could PL/SQL become more popular and easier to use by taking a couple of points from the Perl playbook?

I read a blog post about how scripting languages are or aren't gaining on Java. I think supporting curly-brace syntax would be a nifty way to encourage PL/SQL adoption among both scripters and Java coders. But maybe that's just me.

My first impressions of Ada and then PL/SQL were somewhat negative: hey, this verbose BEGIN/END, IF...THEN...END IF syntax reminds me of Pascal. I first programmed in Pascal on the Commodore 64, and it was deadly slow to compile and run, owing to the whole UCSD P-Code business. Then at university, it was the first language for coursework, and even though it ran really fast on UNIX boxes, I always felt that behind the scenes, more deadly slowness was occurring.

These days, 95% of the languages I code in use curly braces for blocks:

if ... { ... }
function ... { ... }
for ... { ... }

Imagine if PL/SQL did the same. Suddenly, brace matching in vi/vim would work. (Hit % while cursor is on a brace character to find the matching brace at the other end of the block.) It would be unlikely to introduce logic errors; in my experience, mis-nesting of blocks is not a frequent source of errors in Perl or Java, and in deeply nested PL/SQL, even today a careless coder can just add END IF statements until the syntax error goes away. I sometimes write little comments after the END IF, END LOOP, etc. to remind myself of which block each one closes.

On the documentation end, the PL/SQL User's Guide and Reference, aka PL/SQL Language Reference in 11g, has never devoted reference topics specifically to "BEGIN Statement", "END Statement", "END IF Statement", etc. Instead there are generic topics like "Block Declaration", or a section on "LOOP Statements" that doesn't split out the "END..." part as a separate item. That's always made it hard for me to write auto-linking pretty printers for PL/SQL (i.e. no "BEGIN" topic to link from the "BEGIN" keyword), but it would make syntax enhancements slot right into the docs.

From a parsing standpoint, I think it's pretty straightforward. If you leave out a THEN or put an END IF where an END LOOP is expected, PL/SQL gives you a straightforward error message. It knows what block is open and what the expected closing symbol is. So I imagine that it's practical to parse syntax like:

procedure foo is
x number;
{
if 1 > 0
{
for bar in (select * from t1)
{
x := 3.1;
declare
x varchar2(64);
{
x := 'hello';
}
}
}
}


{ takes the place of THEN, LOOP, and BEGIN. If a { is seen where one of those tokens is expected, treat it like the equivalent block beginning keyword. When a } is seen, treat it like whatever block closing keyword is valid at that spot. When a { is seen with nothing special preceding it, treat it like a nested BEGIN.

When I think about it more deeply, I see potential problems around ELSE and WHEN clauses, where those keywords act as both the end of one block and the beginning of another, but anyway this whole idea is probably just a fantasy on my part.

One other Perlism that really struck home lately. Perl lets you get away with a comma after the last item within an array-style declaration. This seems like a minor point, but when writing programs that generate other programs (or anyway JSON-style data structures), it is sometimes difficult to give special treatment to the first or last data item. For example, with SQL or PL/SQL it's simple to write out 1000 INSERT statements, each ending with a semicolon. But let's say you have an XSLT transform with a FOR-EACH loop that produces an initialization statement for a data structure. The output goes

'item1',
'item2',
'item3',
...
'itemN',

and then you're left with that final trailing comma that you need to get rid of somehow. I'll often put the comma at the beginning of each line for just that reason, and to allow easier reordering of parameters and such. But then the first item is a special case. Think of how many times you have to introduce a Boolean variable to make the output slightly different for the first or last pass through a loop.

I used to think Perl's optional no-op trailing comma was just a flippant design choice, but now I think it was a very shrewd way to make dynamic programming easier. One that some other languages could emulate.

2 comments:

thomas.strub said...

To remove the initial extra comma i usally use syntax like:

v_sql_stmt := 'select --' ;
for cur in (select name, kriterium from adressguard where aktiv = 'ja' order by id)
loop
v_sql_stmt := v_sql_stmt || ', ' ||chr(10)
||' s_' || cur.name || chr(10)
|| ', ' || cur.kriterium || ' ' || cur.name ;
end loop;

Problem is that it doesn't help in the case where you produce output which has no comment starting in the line.

knut said...

Note that this is also allowed in many c-style languages including Java.