Saturday, April 21, 2007

First thoughts about Python vs. PL/SQL

I successfully ignored Python for a long time -- it has the curse of starting with the letter "P", and I already know too many languages like that! But Google uses it a lot, there must be something to it, they even hired its creator Guido van Rossum. So when I saw this post from a Googler with a spelling checker in 21 lines of Python code, I thought that would be a good starting point. Long ago, I wrote a subset of this functionality (spelling correction by checking transpositions) in 64 lines of PL/SQL, but didn't find the general technique efficient enough to be worth putting into my production code. (Although if I devoted a whole server to nothing but that code, who knows.)

I'm only on my second pass through the Python tutorial, but already I'm gaining some perspectives on the language.

One good way to assess what a language is for is, what can you do with a one-liner. Python has some pretty good and easy to understand one-liners for composing and decomposing data structures. For example, in a search engine you might see a search term like:

"create table" oracle

and want to turn that into a list structure like:

create table
oracle

In Python, the decomposition part is a one-liner:

term = '"create table" oracle'
decompose = [x.strip() for x in term.split('"') if len(x) > 0]

(OK, you would do more processing than this to account for cases with no quotes in the term, but let's ignore that for the moment.) The FOR and IF parts of the expression above are really working like clauses in a SQL query. We're stepping through items in a data structure, deciding which ones to take, and running a transformation function on each one as it's pulled out.

That's a typical thing to do in a SQL query, where the original data structure is pretty solidly defined. You can also do it on an in-memory data structure using the PL/SQL construct TABLE(CAST ...)). But the PL/SQL technique is kind of obscure. You have to define local types, which makes the code verbose. I didn't know the details of this technique until after I stopped writing the PL/SQL Guide; I would see the idiom used but there was nowhere in the book that documented this syntax. (For examples of this PL/SQL usage, see these code listings.)

When I look at certain other languages, they clearly complement SQL and PL/SQL. You would handle in the other language the things that are difficult to do in the database, and let SQL and PL/SQL do the queries, joins, hashing, etc. for large-scale data manipulation.

Python strikes me as different, because a lot of its operations mirror typical database operations. I've never been comfortable with the attitude of doing all the logic on the client side and just using the database for a dumb storage engine; seen too much bad Java code to trust that the logic will produce squeaky clean data! But with Python I could imagine getting the logic right, serializing the data back into the DB, and manipulating it equally well in memory or on disk.

The one-line decomposition from my example could be done with PL/SQL. Just with more verbosity, various SUBSTR() functions and || concatenation operators, and yet more local type declarations. And those type declarations have to be moved around if the program expands; say you want to pass around procedure parameters of complex types, now you have to declare those types in a package. It's all just a big Turing machine after all, but there is something about the way a language puts barriers around certain operations and makes you reinvent the wheel with low-level code, that locks people into expectations of what is and isn't practical.

Another example. Python has a simple way to transform hash-like tuples into a true hash table:

tuples = [('Canada','Ottawa'), ('Greece','Athens')]
hash = dict(tuples)
print hash

gives:

{'Canada': 'Ottawa', 'Greece': 'Athens'}


Hash tables (aka associative arrays) are relatively new to PL/SQL. One feature that I really would like is the ability to query data directly into the key/value parts of an associative array, something like:

select country, capital bulk collect into ;

Right now, you can query all the countries into one array and all the capitals into another, but then you have to loop through them all and construct a new hash table from the 2 arrays. Seems like a logical shortcut to allow queries to produce hash tables of the form { : : }.

No comments: