Sunday, December 30, 2007

What Is Scripting, Anyway?

[Some thoughts prompted by the article "Programming is Hard, Let's Go Scripting" by Larry Wall, father of Perl.]

How to define scripting and scripting languages? For me, as a long-time delver into compiler details, there's one key distinction between scripting languages and other ones. Scripting languages bypass all the low-level conventions for interlanguage calls, and take the shortcut of encoding everything as text. Usually, such conventions take up one or more chapters in a compiler manual -- how is a float represented, are integers big-endian, what's the memory layout of an array, structure, etc. The scripting language expects that all of this will be passed around in ASCII format via pipes, text files, and command-line flags.

Unix shell scripting is the natural building block for all of this. Consider a pipeline like this:

du -k | sort -n | grep -i Downloads

The first command gives the size of each subdirectory starting with the current directory. The second command sorts this output while treating the first field as a number; for example, "99 Directory 1" comes before "100 Directory 2", even though a strict alpha sort would put "1" before "9". And the third command plucks out lines containing a certain string; the directory names didn't get lost while we were doing that numeric sorting.

Perl descends from this lineage with its backticks, commands like eval and grep, and HERE documents. All very familiar. But Perl's object orientation really strays from its roots as a scripting language. All of a sudden, you have to think intently about how hashes are laid out. You can write script-like programs using Perl's object-oriented notation with packages written by someone else, but the OO code behind your own packages doesn't feel very scriptish. (In the same way that C isn't a scripting language, even though grep et al are written in C.)

The nicest Perl touch I see in this area is in conversion between strings and numbers. For example, you can say '$foo = "00000";', then do various operations like $foo++, and you'll find the value gets printed as "00001", "00002", etc. Very nice for formatting sequence numbers for IDs.

Early vs. late binding. Now we're getting into familiar territory for Oracle developers. By default, most PL/SQL uses early binding -- it's easy to know ahead of time which tables, columns, types, etc. a PL/SQL subprogram will use, and whether it can use them successfully. (Just look at whether the stored procedure or function is valid.) Venture into dynamic SQL, such as the EXECUTE IMMEDIATE command, and now it's not so easy to guarantee correctness or reliability. One mistake in string concatenation logic, and you'll try to reference a non-existent table or column, send badly-formed literals, and so on -- whole new classes of errors you never had to deal with before. Normally, I come down on the side that favors greater flexibility, but I haven't found PL/SQL's restrictions to be too onerous considering how important it is to ensure data integrity. Other aspects of PL/SQL that incorporate late binding -- hence are more comfortable for long-time scripters -- are the anonymous block (and it's always fun to generate the code for an anonymous block from some other program written in Perl or what have you), and the %TYPE and %ROWTYPE qualifiers, which insulate programs to some degree against changes wrought by ALTER TABLE.

Wordiness vs. use of symbols. Here is where I could imagine some improvements in PL/SQL. Why not allow { } instead of BEGIN/END, LOOP/END LOOP, and THEN/END IF for blocks? As Larry notes, scripting languages occupy the whole spectrum when it comes to verbosity.

PL/SQL also has the historical capitalization convention of all-caps for built-ins, and lowercase for user-defined names. Here, I firmly agree with Larry:

Actually, there are languages that do it even worse than COBOL. I remember one Pascal variant that required your keywords to be capitalized so that they would stand out. No, no, no, no, no! You don't want your functors to stand out. It's shouting the wrong words: IF! foo THEN! bar ELSE! baz END! END! END! END!

It's really the subprogram calls and variable references you want to see clearly when skimming code. As someone involved with doc, I know how nice it is to write "the ALTER TABLE statement" and know the emphasis will be preserved in all printed incarnations, even down to man pages, in a way that isn't guaranteed for bold, italics, or monospace. Yet a concern for code clarity means I'm perfectly happy writing INSERT, BEGIN, etc. in uppercase in running text, and lowercase in my own code. PL/SQL also has the conundrum of so many Oracle-supplied packages. Since DBMS_OUTPUT.PUT_LINE comes from Oracle, should it be written all-caps? Or since it obeys the same lexical rules as some package and subprogram you write, should it be lowercase? I prefer to stick with lowercase for everything in source code, even if that means terms can't be cut and pasted between code and running text.

No comments: