Saturday, April 21, 2007

Perl vs. PL/SQL

When I started learning PL/SQL, I was gratified to find that my whole Perl way of working translated easily into doing stored procedures.

Most small Perl scripts that do transformations are built around a loop like this:

@lines = ; # read entire contents of file
for $line (@lines)
{
# do some search/replace on each line
# print the result, or write to another file
}

PL/SQL has similar compact notation for doing a query and looping through the results:

for item in (select * from some_table)
loop
# the name "item" is arbitrary; that variable doesn't need to be declared
# refer to columns using the notation .
end loop;

As the stored procedures get more elaborate, you can get into different syntax like the FORALL statement for issuing lots of INSERT/UPDATE/DELETE statements, or doing BULK COLLECT to pull the full results of a query into a PL/SQL collection variable to inspect the whole thing. But the construct above is fine for most typical query / reporting scenarios.

The similarity between the 2 languages is the reason that I've never used the Perl DBI modules for database work. It's too easy to write out real SQL statements or PL/SQL blocks using a HERE document, invoke SQL*Plus using backticks, do one-liners in Perl to just call PL/SQL stored procedures... just generally hand off the database logic. I even see parallels in the way object-oriented constructs were added to both languages in ways that don't especially appeal to me. (If I want to do object-oriented programming, Ruby is the most appealing choice for me.)

First thoughts about Python vs. PL/SQL

I successfully ignored Python for a long time -- it has the curse of starting with the letter "P", and I already know too many languages like that! But Google uses it a lot, there must be something to it, they even hired its creator Guido van Rossum. So when I saw this post from a Googler with a spelling checker in 21 lines of Python code, I thought that would be a good starting point. Long ago, I wrote a subset of this functionality (spelling correction by checking transpositions) in 64 lines of PL/SQL, but didn't find the general technique efficient enough to be worth putting into my production code. (Although if I devoted a whole server to nothing but that code, who knows.)

I'm only on my second pass through the Python tutorial, but already I'm gaining some perspectives on the language.

One good way to assess what a language is for is, what can you do with a one-liner. Python has some pretty good and easy to understand one-liners for composing and decomposing data structures. For example, in a search engine you might see a search term like:

"create table" oracle

and want to turn that into a list structure like:

create table
oracle

In Python, the decomposition part is a one-liner:

term = '"create table" oracle'
decompose = [x.strip() for x in term.split('"') if len(x) > 0]

(OK, you would do more processing than this to account for cases with no quotes in the term, but let's ignore that for the moment.) The FOR and IF parts of the expression above are really working like clauses in a SQL query. We're stepping through items in a data structure, deciding which ones to take, and running a transformation function on each one as it's pulled out.

That's a typical thing to do in a SQL query, where the original data structure is pretty solidly defined. You can also do it on an in-memory data structure using the PL/SQL construct TABLE(CAST ...)). But the PL/SQL technique is kind of obscure. You have to define local types, which makes the code verbose. I didn't know the details of this technique until after I stopped writing the PL/SQL Guide; I would see the idiom used but there was nowhere in the book that documented this syntax. (For examples of this PL/SQL usage, see these code listings.)

When I look at certain other languages, they clearly complement SQL and PL/SQL. You would handle in the other language the things that are difficult to do in the database, and let SQL and PL/SQL do the queries, joins, hashing, etc. for large-scale data manipulation.

Python strikes me as different, because a lot of its operations mirror typical database operations. I've never been comfortable with the attitude of doing all the logic on the client side and just using the database for a dumb storage engine; seen too much bad Java code to trust that the logic will produce squeaky clean data! But with Python I could imagine getting the logic right, serializing the data back into the DB, and manipulating it equally well in memory or on disk.

The one-line decomposition from my example could be done with PL/SQL. Just with more verbosity, various SUBSTR() functions and || concatenation operators, and yet more local type declarations. And those type declarations have to be moved around if the program expands; say you want to pass around procedure parameters of complex types, now you have to declare those types in a package. It's all just a big Turing machine after all, but there is something about the way a language puts barriers around certain operations and makes you reinvent the wheel with low-level code, that locks people into expectations of what is and isn't practical.

Another example. Python has a simple way to transform hash-like tuples into a true hash table:

tuples = [('Canada','Ottawa'), ('Greece','Athens')]
hash = dict(tuples)
print hash

gives:

{'Canada': 'Ottawa', 'Greece': 'Athens'}


Hash tables (aka associative arrays) are relatively new to PL/SQL. One feature that I really would like is the ability to query data directly into the key/value parts of an associative array, something like:

select country, capital bulk collect into ;

Right now, you can query all the countries into one array and all the capitals into another, but then you have to loop through them all and construct a new hash table from the 2 arrays. Seems like a logical shortcut to allow queries to produce hash tables of the form { : : }.

Database Language Smackdown

I've tried just about every language I could get my hands on, at one time or another. For the last decade, the big focus was usually to see how well they worked in conjunction with a database.

My big epiphany was taking ownership of the Oracle Database PL/SQL User's Guide and Reference, which I owned for the 9i release and have since relinquished. At the same time, I was writing what now comprises about 100KLOC of PL/SQL code for "Project Tahiti", the database that provides search, hypertext navigation, remote lookup, and related capabilities for the Oracle documentation. (Of course, as is often the case, my biggest insights about the strengths, weaknesses, and best practices for PL/SQL came after I stopped owning the book.)

One thing I always wanted to do was some compare-and-contrast essays for different languages, with my opinions of how they complemented Oracle's SQL dialect and PL/SQL procedural language, and the easiest way for a programmer in language X to approach PL/SQL.

So, that's what I'll do in a series of blog posts. Stay tuned.