Sunday, August 24, 2008

I'm Not Dead Yet...

This piece on the always-dying-but-never-quite-dead mainframes led me down memory lane:

Interop Systems - The Open Enterprise - The mainframe isn’t dead after all

My earliest full-time job was at IBM, developing and publishing in a mainframe environment on VM/CMS. (TOROLAB4 to be precise.) Coming from a UNIX background at university, I knew there was something better. Then IBM tried moving off mainframes to OS/2. Then Windows ate OS/2's lunch. The last time I had a pleasant thought about Windows, it was Windows 95. Now I think the mainframe wasn't so bad after all.

Let's look at some of the things that were OK in the mainframe, things that have gone downhill since then and the industry is still trying to recapture.

The storage was amazingly fast. The job queue system could take a snapshot of your entire user account in a couple of seconds, after which you could continue working on your hundreds of interlinked source files without worrying that ongoing edits would mess up the build. You might make some fixes, submit another 1/2-hour job, rinse and repeat several times while the first job was being processed. When the output files started arriving in the inbox, you would just throw out all but the ones from the last build. Today, you might have a dozen terminal windows open at once, but that's little consolation if all of them are tied up packing or transferring files, and you have to hold off on changes until that operation is finished. Those little progress indicators in sftp/scp are hypnotic. (Don't forget that scp is faster than sftp.)

Storage was also somewhat abstracted from us as users. Need more? Sure, you just have to request it. We were never sure how much it really was, as it was all in terms of cylinders. A newbie would get 10 cylinders; for a big project, maybe you'd need 40. These days, I spend far to much time running "du -k" and "df -k" -- and that's just on my home machines! I remember in the early days of DB2 on workstations, users didn't like that the performance wizard asked for the RPMs of their hard drive, that was considered arcane information; these days, any tech-savvy 12-year-old knows how many RPMs in their desktop, laptop, and MP3 player.

In the early days of AIX, they carried over some of that abstraction. You could have tons of free space on a drive, yet all filesystems showed as full. You could add space from that free pool to any filesystem, it was just a deliberate decision that couldn't be undone, so the temptation was to only allocate as much as needed. Contrast that with the need to repartition on typical PC-scale OSes.

Having a small set of storage areas (equivalent to C:, D:, etc. drives) without any subdirectories was a pain, but enforced discipline when it came to naming conventions. Today, there's plenty of wasted time trying to track down duplicate DLLs in an execution path, or arrange Java class files in a 5-level-deep directory tree. In my experience, whenever people are given the chance to make something into a hierarchy, they'll always use 1 or 2 levels too many.

Editing on VM/CMS was pretty cool too. I was never an ISPF guy, that was for the MVS crowd; for me, XEDIT was the bomb. I was heavily into automating the hide/show settings for ranges of lines, which was much more productive than a standard tree view in today's editors. The integrated REXX language allowed infinite customization and automation for editing chores.

Strangely, when REXX came to the Amiga in the form of ARexx, it didn't interest me the same way. And Applescript, don't get me started. The major mainframe software packages that could be automated through REXX just had a more interesting or extensive set of actions.

One thing you could always do was customize the function keys. That's why they were called programmable function keys. "set pf3 = save" etc. Very convenient to have 24 custom actions available in any app. (I think you could set up both regular and shifted function keys.) The biggest problem was, too much freedom! With all those custom function keys, no standardization, not enough of an assembly line feel. Thus the rise of GUI standards in the form of Common User Access (CUA). So things everybody knows now for usability and accessibility, like "F1 = help", "F10 = activate menu bar", "tab between all controls", got their start on the mainframe.

That must have been the apex of function key usage. X Windows never seemed to do much with 'em. These days, outside of of course Photoshop, the only function key I use much is F8 (on Windows) to recall commands in a DOS window, or on OS X to activate features like Expose and Spaces. Since F8 is also the default to bring up the Spaces display, I can't even rely on that when remotely connecting to Windows from OS X!

The publishing software on VM/CMS, BookMaster, deserves its own whole blog post so I won't go into that here. Let's just say that SGML clearly suffers from the "second system" effect when compared with BookMaster, the same way XML could be said to suffer the second system effect when compared with SGML. (I.e. does a lot more, but is needlessly complicated and overstuffed with features.)

The one aspect of BookMaster that I've never seen replicated anywhere else, is its tag-based language for doing Gantt charts. Say what you like about the waterfall model versus other forms of software development. Any good projects run out of IBM in the '90s were because the project schedules were all managed via text files printed out as Gantt charts. I had my own REXX-based extension to that tag language too, with shortcuts for moving any combination of planned, actual, start, and end dates, or confirming when activities started on time, late, or (very occasionally) early. When IBM moved us all off the mainframe, and people started investigating other options like MS Project, that was the first time I thought someone had taken an activity that you used to be able to assign to a summer student, and turned it into something for which you'd need a PhD in a related field.

If I could recapture the best points of the mainframe, I'd use the Hercules emulator, the THE editor or one of its relatives, and write stuff in BookMaster and then downconvert to Docbook, DITA, or what have you. Unfortunately, although IBM released the B2H HTML converter, I've never encountered any free BookMaster workalike that (a) would do the nice hardcopy formatting, since this was pre-PDF days, or (b) included Gantt charting capability.

IBM has opened up a lot of former internal packages from the mainframe days. However, my 2 big contributions, LOOKIT and SCHEDIT, are still locked up because I don't have a current IBM employee who could be designated as the owner. So if you are at IBM and still have access to the old VMTOOLS and TXTTOOLS repositories, and wouldn't mind being listed as the contact for those packages, please let me know.

Saturday, August 16, 2008

How PL/SQL Could Be More Like Python

In previous posts, I suggested a couple of syntactic or thematic items from Perl and PHP that I thought would make PL/SQL friendlier and more accessible. Hey, maybe Python has some things too!

One thing to like about Python is the way its composite data structures build on top of each other. A list or a tuple can hold a bunch of numbers or strings in an array-like structure, but those things can also hold heterogeneous data items like an assortment of numbers, strings, other lists, other tuples, etc.

Another powerful idea is that a composite data structure can be used as the lookup key for a hash table / associative array / Python dictionary.

Finally, complex data structures with different numbers of elements can be passed as parameters or function return values, without the need to declare explicit types.

Underlying all this functionality, at least from a user's perspective, is that these complex data structures can be represented nicely as strings. Here's a little interactive Python output showing some list and tuple values. The >>> lines are what I entered, the other lines are how Python mangles the escape and quote characters for output.

>>> ['i do not care']
['i do not care']
>>> ['i don\'t care']
["i don't care"]
>>> ['i don\'t care about "that"']
['i don\'t care about "that"']
>>> ('i','don\'t','care')
('i', "don't", 'care')
>>> [ 1, 2, ("a", "b", ["x", 3] ) ]
[1, 2, ('a', 'b', ['x', 3])]

So we can see that there is a fairly well-defined way of making these data types into strings.

PL/SQL gained a lot of power when it got the ability to use string values as keys for INDEX-BY tables, which soon became known as associative arrays to be more familiar to people who used such things in other languages. It would be even more powerful if you could use VARRAYs, nested tables, etc. as lookup keys. You could solve problems that require a higher order of thinking than just "have I already encountered this integer" or "I will assign the symbolic name 'XYZ' to such-and-such a DATE value". You could say, "I know my web site is under attack when the counter corresponding to this combination of IP address, URL, and time interval is greater than 100".

PL/SQL can express something like this already, using nested collections. But it requires a lot of type definitions, which aren't readily reusable across different procedures without a lot of advance planning. There is a point where foresight crosses the threshold into rigidity. When dealing with in-memory data structures, I like the flexible approach of Python.

I don't think this kind of processing really requires much, if any, change in PL/SQL itself. All you would need are functions that could turn a VARRAY, nested table, etc. into a single string as in the examples above, so the string could be used as a key into an associative array. And the contents of the associative array element could be another string that represented a tuple or other data type that isn't directly supported by PL/SQL.

With a set of functions to go back and forth between strings and built-in types, you could emulate some features that are limited now by PL/SQL's type strictness. For example, it's difficult to write a function that accepts a VARRAY with 5 elements, and returns a modified VARRAY with 4, or 6, or 3 elements. So certain classes of recursive operations are not practical.

If the data structures were all passed around as strings, and unpacked into associative arrays behind the scenes, that would be powerful but slow. Perhaps the best compromise is to use a small number of uniform associative array types at the top level:

type python_dictionary_t is table of varchar2(32767) index by varchar2(32767);
type python_list_t is table of varchar2(32767) index by pls_integer;

Each string element could encode a single value, a list, a tuple, a dictionary, or whatever. But the basic datum being passed around could be one of these user-defined types.

Notice that a list in this scheme has numeric indices, yet isn't fixed size like a VARRAY. You could pass to a function a "list" of 4 elements with indices 0..3, and get back another "list" of 5, 1, 10, or whatever elements. There's nothing intrinsically forcing the index values to be sequential, that would have to be enforced behind the scenes.

Imagining how all this could be coded up inside the existing PL/SQL language, it occurs to me that the part that would be a real performance roadblock would be reassigning all the numeric index values when doing list operations like pop, push, etc. Element 0 becomes element 1, element 1 becomes element 2, element N becomes element N+1, or vice versa. So perhaps the biggest change needed to PL/SQL itself is a fast way of changing the key for an associative array element, without actually copying the element itself under a new key. Something like:

element.key(i) := j;

There could be a performance or memory hit when stringifying entire complex datatypes to use as keys for hash tables, if every complex value was passed through TO_CHAR() while evaluating it as a key. Perhaps the right approach is another built-in function like TO_KEY(), which could take something like a varray of gigantic VARCHAR2s, or an object type with member functions, and create a composite key with a bunch of MD5 or similar checksums, instead of the entire source text of the strings or member functions.

Friday, August 1, 2008

Ruminations on Metadata

Metadata is a great subject for web development and publishing, but when it comes to music, that's more a subject for my more personal blog. Thoughts here.