Saturday, August 16, 2008

How PL/SQL Could Be More Like Python

In previous posts, I suggested a couple of syntactic or thematic items from Perl and PHP that I thought would make PL/SQL friendlier and more accessible. Hey, maybe Python has some things too!

One thing to like about Python is the way its composite data structures build on top of each other. A list or a tuple can hold a bunch of numbers or strings in an array-like structure, but those things can also hold heterogeneous data items like an assortment of numbers, strings, other lists, other tuples, etc.

Another powerful idea is that a composite data structure can be used as the lookup key for a hash table / associative array / Python dictionary.

Finally, complex data structures with different numbers of elements can be passed as parameters or function return values, without the need to declare explicit types.

Underlying all this functionality, at least from a user's perspective, is that these complex data structures can be represented nicely as strings. Here's a little interactive Python output showing some list and tuple values. The >>> lines are what I entered, the other lines are how Python mangles the escape and quote characters for output.

>>> ['i do not care']
['i do not care']
>>> ['i don\'t care']
["i don't care"]
>>> ['i don\'t care about "that"']
['i don\'t care about "that"']
>>> ('i','don\'t','care')
('i', "don't", 'care')
>>> [ 1, 2, ("a", "b", ["x", 3] ) ]
[1, 2, ('a', 'b', ['x', 3])]

So we can see that there is a fairly well-defined way of making these data types into strings.

PL/SQL gained a lot of power when it got the ability to use string values as keys for INDEX-BY tables, which soon became known as associative arrays to be more familiar to people who used such things in other languages. It would be even more powerful if you could use VARRAYs, nested tables, etc. as lookup keys. You could solve problems that require a higher order of thinking than just "have I already encountered this integer" or "I will assign the symbolic name 'XYZ' to such-and-such a DATE value". You could say, "I know my web site is under attack when the counter corresponding to this combination of IP address, URL, and time interval is greater than 100".

PL/SQL can express something like this already, using nested collections. But it requires a lot of type definitions, which aren't readily reusable across different procedures without a lot of advance planning. There is a point where foresight crosses the threshold into rigidity. When dealing with in-memory data structures, I like the flexible approach of Python.

I don't think this kind of processing really requires much, if any, change in PL/SQL itself. All you would need are functions that could turn a VARRAY, nested table, etc. into a single string as in the examples above, so the string could be used as a key into an associative array. And the contents of the associative array element could be another string that represented a tuple or other data type that isn't directly supported by PL/SQL.

With a set of functions to go back and forth between strings and built-in types, you could emulate some features that are limited now by PL/SQL's type strictness. For example, it's difficult to write a function that accepts a VARRAY with 5 elements, and returns a modified VARRAY with 4, or 6, or 3 elements. So certain classes of recursive operations are not practical.

If the data structures were all passed around as strings, and unpacked into associative arrays behind the scenes, that would be powerful but slow. Perhaps the best compromise is to use a small number of uniform associative array types at the top level:

type python_dictionary_t is table of varchar2(32767) index by varchar2(32767);
type python_list_t is table of varchar2(32767) index by pls_integer;

Each string element could encode a single value, a list, a tuple, a dictionary, or whatever. But the basic datum being passed around could be one of these user-defined types.

Notice that a list in this scheme has numeric indices, yet isn't fixed size like a VARRAY. You could pass to a function a "list" of 4 elements with indices 0..3, and get back another "list" of 5, 1, 10, or whatever elements. There's nothing intrinsically forcing the index values to be sequential, that would have to be enforced behind the scenes.

Imagining how all this could be coded up inside the existing PL/SQL language, it occurs to me that the part that would be a real performance roadblock would be reassigning all the numeric index values when doing list operations like pop, push, etc. Element 0 becomes element 1, element 1 becomes element 2, element N becomes element N+1, or vice versa. So perhaps the biggest change needed to PL/SQL itself is a fast way of changing the key for an associative array element, without actually copying the element itself under a new key. Something like:

element.key(i) := j;

There could be a performance or memory hit when stringifying entire complex datatypes to use as keys for hash tables, if every complex value was passed through TO_CHAR() while evaluating it as a key. Perhaps the right approach is another built-in function like TO_KEY(), which could take something like a varray of gigantic VARCHAR2s, or an object type with member functions, and create a composite key with a bunch of MD5 or similar checksums, instead of the entire source text of the strings or member functions.

No comments: