Sunday, December 30, 2007

What Is Scripting, Anyway?

[Some thoughts prompted by the article "Programming is Hard, Let's Go Scripting" by Larry Wall, father of Perl.]

How to define scripting and scripting languages? For me, as a long-time delver into compiler details, there's one key distinction between scripting languages and other ones. Scripting languages bypass all the low-level conventions for interlanguage calls, and take the shortcut of encoding everything as text. Usually, such conventions take up one or more chapters in a compiler manual -- how is a float represented, are integers big-endian, what's the memory layout of an array, structure, etc. The scripting language expects that all of this will be passed around in ASCII format via pipes, text files, and command-line flags.

Unix shell scripting is the natural building block for all of this. Consider a pipeline like this:

du -k | sort -n | grep -i Downloads

The first command gives the size of each subdirectory starting with the current directory. The second command sorts this output while treating the first field as a number; for example, "99 Directory 1" comes before "100 Directory 2", even though a strict alpha sort would put "1" before "9". And the third command plucks out lines containing a certain string; the directory names didn't get lost while we were doing that numeric sorting.

Perl descends from this lineage with its backticks, commands like eval and grep, and HERE documents. All very familiar. But Perl's object orientation really strays from its roots as a scripting language. All of a sudden, you have to think intently about how hashes are laid out. You can write script-like programs using Perl's object-oriented notation with packages written by someone else, but the OO code behind your own packages doesn't feel very scriptish. (In the same way that C isn't a scripting language, even though grep et al are written in C.)

The nicest Perl touch I see in this area is in conversion between strings and numbers. For example, you can say '$foo = "00000";', then do various operations like $foo++, and you'll find the value gets printed as "00001", "00002", etc. Very nice for formatting sequence numbers for IDs.

Early vs. late binding. Now we're getting into familiar territory for Oracle developers. By default, most PL/SQL uses early binding -- it's easy to know ahead of time which tables, columns, types, etc. a PL/SQL subprogram will use, and whether it can use them successfully. (Just look at whether the stored procedure or function is valid.) Venture into dynamic SQL, such as the EXECUTE IMMEDIATE command, and now it's not so easy to guarantee correctness or reliability. One mistake in string concatenation logic, and you'll try to reference a non-existent table or column, send badly-formed literals, and so on -- whole new classes of errors you never had to deal with before. Normally, I come down on the side that favors greater flexibility, but I haven't found PL/SQL's restrictions to be too onerous considering how important it is to ensure data integrity. Other aspects of PL/SQL that incorporate late binding -- hence are more comfortable for long-time scripters -- are the anonymous block (and it's always fun to generate the code for an anonymous block from some other program written in Perl or what have you), and the %TYPE and %ROWTYPE qualifiers, which insulate programs to some degree against changes wrought by ALTER TABLE.

Wordiness vs. use of symbols. Here is where I could imagine some improvements in PL/SQL. Why not allow { } instead of BEGIN/END, LOOP/END LOOP, and THEN/END IF for blocks? As Larry notes, scripting languages occupy the whole spectrum when it comes to verbosity.

PL/SQL also has the historical capitalization convention of all-caps for built-ins, and lowercase for user-defined names. Here, I firmly agree with Larry:

Actually, there are languages that do it even worse than COBOL. I remember one Pascal variant that required your keywords to be capitalized so that they would stand out. No, no, no, no, no! You don't want your functors to stand out. It's shouting the wrong words: IF! foo THEN! bar ELSE! baz END! END! END! END!

It's really the subprogram calls and variable references you want to see clearly when skimming code. As someone involved with doc, I know how nice it is to write "the ALTER TABLE statement" and know the emphasis will be preserved in all printed incarnations, even down to man pages, in a way that isn't guaranteed for bold, italics, or monospace. Yet a concern for code clarity means I'm perfectly happy writing INSERT, BEGIN, etc. in uppercase in running text, and lowercase in my own code. PL/SQL also has the conundrum of so many Oracle-supplied packages. Since DBMS_OUTPUT.PUT_LINE comes from Oracle, should it be written all-caps? Or since it obeys the same lexical rules as some package and subprogram you write, should it be lowercase? I prefer to stick with lowercase for everything in source code, even if that means terms can't be cut and pasted between code and running text.

Friday, December 28, 2007

Demystifying AJAX for the PL/SQL Developer

Raise your hand if you haven't heard that AJAX is the web development buzzword du jour. Despite the word "Javascript" in the acronym -- Asynchronous Javascript and XML -- PL/SQL can play a role in the front and particularly the back end. PL/SQL developers have the kind of background that's needed to understand what's happening at the lowest levels of network requests, perceived performance, and dealing with result sets. Don't be intimidated by the conventional wisdom that you should leave all the details to someone else's Javascript library. The code you need to write isn't all that big or complicated compared with 2-phase commits and bulk SQL.

Why AJAX at all? Think of it purely as an aspect of performance. You know how you run the UNIX 'time' command and see that a process took 10 minutes of user time, but only a few seconds of system time? Or how a PL/SQL anonymous block that runs for a long time produces no output from DBMS_OUTPUT.PUT_LINE until the very last moment, then dumps all the text at super speed? What you're dealing with is actual versus perceived performance, and web pages suffer from the same conundrum.

For example, when you load a page with a lot of graphics, the way the page is structured determines whether you'll see the text appear while the graphics are still loading, or if the page will be blank until every graphic is ready to display. Ditto with Javascript. If you have a big block of Javascript code on the page, the page might not render -- or might render but not respond to keyboard or mouse input -- until the Javascript is finished loading or (in worst case) until it's finished running. You can do some things to partially work around this, like hooking into the onload event handler, but still the page can seem slow, even though you know the back-end processing that generates it takes no time at all. The slowdown is in transmitting the code and data for the Javascript part of the page, and the browser parsing it and acting on it.

AJAX deals with that problem by making just-in-time requests for additional data, not when the page is loaded but when some action is taken, like clicking a button or link. The Javascript code receives the data as a block of text. Ideally, that text is supposed to be XML (hence the X in AJAX). But often, the most pragmatic approach is to send plain text, or HTML tagged text that is only a partial document. These approaches have even spawned their own acronyms; I like one known as AHAH, where the request returns a block of HTML that is inserted directly into the page at a designated spot.

If you're still waiting for me to mention something related to PL/SQL, I have good news and bad news. Bad news first: to do any of the "modern" web techniques, you will need to use a combination of at least 2 languages, in this case PL/SQL, Javascript, and optionally something else like PHP. The good news is, all of the components of an AJAX solution can be generated as PL/SQL stored procedures exposed through mod_plsql. That is, one stored procedure can generate the original HTML page that does the fancy AJAX stuff; another stored procedure can generate the Javascript loaded by the <script src="..."> tag; and yet another can do a database query or grab information through UTL_HTTP, UTL_FILE, etc. and return the text that gets passed to Javascript.

That last part is the most powerful, but also the biggest mental hurdle to get past. You are used to writing a procedure that generates an entire HTML page. Now you are writing a procedure that acts like a function, using HTP.P() to build a result set that the other end will parse as a single text string. (That's why a developer used to enterprise-class programming would probably choose to use a real XML document as the medium of exchange, to guard against the carelessness that comes with free-format text.)

On the Javascript end, you need a function to create the object that does the network request. You need a function that responds to some event on the page and initiates the request. And you also need a callback function that wakes up when the data is ready, receives the output, and then does the needful to change the page content and/or form elements. PL/SQL is really only required to do the database query and return the text of the results; the actual page and the Javascript code could come from static files, PHP, or some other option.

Within the HTML page that the user interacts with, you might also find that you need certain structures to be in place, or certain elements to have IDs. For example, one popular technique uses an empty <div> tag like so:

<div id="content_goes_here"></div>

The Javascript can locate the div element by looking up its ID, then set its innerHTML property to make an elaborate HTML structure like a table appear at that location, instead of a blank space.

You'll quickly find when you start implementing AJAX solutions, that you need a ton of these little Javascript functions, 2 for every kind of AJAXy change you're making on the page. The need to either auto-generate such functions, or to parameterize them to handle all kinds of cases, is why people recommend using the popular Javascript libraries. But there's nothing magic happening.

This post is going to be a bit light on source code, because the PL/SQL aspect of the front end is both so simple:

htp.p(q'{
/* enormous block of Javascript */
}');

and so complex and varied depending on your circumstances:

htp.p(q'{
try ... /* throws exception only in browser XYZ */
catch ...
try ... /* throws exception if HTML not structured as expected */
catch ...
}');

The techniques I use in my own code are probably lower level than you would use in yours, because I'm not using Javascript libraries like Prototype. So I'll refer you to mainstream AJAX tutorials to learn that part.

It's more straightforward on the back end. Here is PL/SQL code for a minimal AJAX procedure:

create or replace procedure ajax_procedure
(
param1 integer := null
, param2 varchar2 := null
)
as
begin
owa_util.mime_header('text/html');
htp.p('Some text from the procedure...');
end;
/

The XMLHTTPRequest object's open() method specifies the mod_plsql URL to the PL/SQL procedure, including any parameters as name/value pairs in the query string. Once the request has been fully satisfied (readyState == 4, status == 200), the object's responseText() method will return all the text that the procedure printed with htp.p(), which typically would be built up from some database logic. If the output was a real XML document, you would retrieve it through the object's responseXML method.

Alternative Firefox shortcut for Find In Page

For a long time, I've used the Firefox option "Search for text when I start typing" to avoid having to type Ctrl-F / Command-F every time to find something on a page. But with more interactive Web 2.0-style pages perhaps using keyboard controls, now I've turned that option off.

However, you don't have to do Ctrl-F / Command-F to activate Firefox's find-in-page dialog. The / key also activates it; that's an easy mnemonic for me, because it is the same key that kicks off search mode in the vi editor. It's the best of all possible worlds -- using alpha keys as shortcuts in web apps, easy access to the Find dialog, and no carpal tunnel syndrome from multi-key "Find" shortcuts.

Monday, November 26, 2007

Posting Reports from SQL*Plus

It used to be that all you needed to do with a SQL*Plus report was save it into a file, then you had everything you needed. (Or maybe just transfer the file to a web server afterwards.) These days, with blogs, wikis, etc., you might have to be a bit more creative.

For example, I was posting the results of a SQL*Plus report onto a wiki page. At the end of the SQL*Plus script, I had PROMPT commands to state the path of the report file and the URL of the wiki page. Then I'd copy and paste those items so I could open the report in an editor, open and edit the wiki page in a browser, and copy the contents of the file into the wiki edit box.

But if you're just doing the same thing every time, why not automate it? I hadn't made extensive use of the HOST command in SQL*Plus, so I held off a little longer than necessary. Here's the procedure I settled on:

Create a vim macro file that copies the contents of the edited file to the clipboard. (vim is a multiplatform vi clone with extra features like an extensive macro language, record/playback of keystrokes, and integration with features like the clipboard.) The procedure was something like:

vim -w clipboard.vi dummy_file.txt
[Then inside vim...]
"*yG
:q

That produces a macro file clipboard.vi with the keystrokes I did while editing the file. In this case, yanking the entire contents into the special buffer named *, which puts the results on the clipboard, then quitting.

Instead of just echoing the path of the report file, use the HOST command in SQL*Plus to run vim and load the report into the clipboard.

Instead of just echoing the URL of the wiki page, use the HOST command to open the default browser to that page, optionally using the URL that goes straight to the edit form.

-- Comment out the old PROMPT commands.
-- prompt Report is in: long_name_of_report.txt
-- prompt Wiki page is: http://...
-- Replace with fully automated HOST commands.
host vim -s clipboard.vi long_name_of_report.txt
host start http://...

Couple of wrinkles that I ran into along the way...

Although I use vim interchangeably between Windows and OS X, I'm finding that on OS X, the * buffer does not seem to replace the real contents of the clipboard; it gets retained somewhere between invocations of vim, but Cmd-V pastes the previous clipboard contents. Anyway, I use this technique primarily on Windows, where the * buffer really is the system clipboard.

On OS X, you would use 'host open ' to launch the browser, as opposed to 'host start ' for Windows.

I had the Cygwin command START.EXE in my Windows %PATH% where it was being run instead of the default Windows START command. For some reason it didn't like being run via HOST, even though other Cygwin commands like LS worked fine. I renamed it to CYGSTART.EXE to keep it out of the way.

While diagnosing my problem with HOST START, I ran across a blog post by Tanel Poder, who describes the technique of spooling the report in HTML format and then using HOST START to open it in a browser.

When you're using the Wikimedia markup format (i.e. the same as used on the Wikipedia site), you might find that SQL*Plus reports map naturally to the Wikimedia table markup.

When I'm faced with automating a braindead simple editing job, or even a more complicated situation involving searching, moving, and replacing variable amounts of text, my first instinct is to look for a solution involving a vim macro, rather than writing a Perl program. I have vim on all the same platforms as I do Perl, and vim lets me try out the solution interactively and "debug" the sequence of edits more effectively than Perl does.

Sometimes, this approach is looked down on as not "real" programming. I'm curious, what's the most complex or important business process you've ever run as an editor macro (vim, emacs, or other editor of your choice)?

Friday, November 16, 2007

OpenWorld at 20,000 Feet

I hadn't been to OpenWorld in a while until this year. Definitely a different tone than in years past. In the demo grounds, less "here are features we have in individual products", more "here are products built on top of those nifty features". Lots of demo booths from industry-specific acquisitions. Reminded me of IBM's SHARE conferences from the '90s, very customer focused.

Plus, Tom Kyte has a beard now. :-)

Wednesday, October 24, 2007

Why Keynote is Better than Powerpoint

I had the opportunity to try Apple's Keynote for a presentation where I might normally have used Powerpoint or even OpenOffice. Rather than do a lengthy review, why don't I just summarize why the results are better than I might get from Powerpoint? After all, that's the ultimate goal with presentation software -- to do something Powerpointish, but faster and/or higher quality.

Let's leave aside interface polish in the UI (the Inspector windows) and the output (slick animated transitions). I don't think those are fundamental to the results.

IMHO, the strong points for Keynote are:

The outline view with slides down the left side in filmstrip style lets you nest slides into an expanding/collapsing tree. You can see the outline showing only the major sections of a long presentation, or see all the slides, or just expand certain sections. I immediately noticed when I exported in PPT format and viewed in Powerpoint, that the equivalent view there was incredibly cluttered, with all the slides plus all the bullet points from the slides visible in a single long list. The expanding/collapsing tree let me organize the slides more easily than with Powerpoint. Nesting related slides under a "key frame" (a slide with just a giant-sized title) helped propel the actual presentation, i.e. establishing that some slides were just going to come up for a second or two, with no exposition.

The font controls lead you very gently into the habit of writing concise text, and elaborating only as needed. Let me explain what I mean. With Powerpoint, it's a race to see how many bullet points you can cram onto each slide, by resizing the text box to allow more text. The generous spacing on Keynote slides sends the subliminal message that you're only going to get a few bullets on each slide, so make the text punchier. Then, if you're desperate to fit 1-2 more lines, you can adjust the paragraph and line spacing in very small increments until the extra lines fit. The page still doesn't look cluttered, and you don't have to endlessly shrink the font into unreadability.

The export options let you choose output formats like PPT, Quicktime movie, PDF file, individual images, even Flash. Although I didn't use these, I can see that they would be tremendously useful. I exported to PPT so my audience could view the slides with Powerpoint or OpenOffice. Depending on the context of the presentation, I might have chosen Quicktime movie or Flash, to post the presentation on the web for viewing directly in the browser.

Sunday, October 14, 2007

Learn PHP In 5 Minutes

Maybe you've noticed a theme in some of my blog posts, the idea of switching back and forth between languages, or quickly transitioning from one language to another.

Recently, I found myself writing some PHP code after a long stretch of doing nothing but Perl, Python, and Javascript. I decided to make it more challenging by writing the syntax from hazy, long-ago memory, and only looking anything up after the fact, to fix the syntax errors. That experience reinforced my feeling ever since Perl came out, that every recent language can be thought of as a series of small deltas to existing languages. (C'mon, don't tell me you didn't look at the little-used report format of Perl, and say to yourself "hey, this is straight out of COBOL!" :-)

So, here's my 5-minute quickstart for PHP. (No APIs, just basic language features.)

Semicolons are mandatory as in Perl, not optional as in Javascript.

Inclusion is via the 'require' keyword, or 'require_once' if everything is requiring everything else. What goes in the included files is flexible as in C and Perl.

Variables are done like Perl, introduced by $. Unlike Perl, arrays and hashes are also notated with $, no @ or % to be found. All subscripts are square brackets.

If, braces, for, while, arithmetic, etc. are like C, C++, Java, and Javascript. No "elsif", it's "else if". Comparison operators are == et. al.

Just to confuse you, HERE documents are introduced by a <<<, 3 instead of 2 like normal. There's a semicolon after the final HERE / EOF / END / etc., but none on the initial line.

Output is largely echo as in shell scripts. Variable interpolation the same as shell scripts and Perl, i.e., the meaning of double quotes and single quotes. String concatenation is . like in Perl, not + like in Javascript. (Use + with strings and you'll get a numeric 0 result.)

Ternary conditions require more quoting than in Javascript, if you're used to stringing together several object-detection tests in a single assignment statement.

Comments are a mishmash of several languages. /* */ like everything except Perl, // like Javascript, # like Perl.

Functions are introduced with the 'function' keyword, and parameters are named (including the $) but not typed. E.g. 'function myfunc($myparm) {...}'.

And that's it. Regular expressions, the other interesting aspect of most language syntax, are done with function-style notation, so on a technicality I don't consider them part of the 5-minute tour.

Tuesday, September 18, 2007

Separation of...

You hear a lot about the merits of "separation of X and Y", for many values of X and Y. Content and presentation. Church and State. Running people and scissors. I thought I would just touch on this subject briefly before delving into more detail on the PL/SQL APIs for web development, like the HTP and HTF packages.

In web development generally, it's conventional wisdom that you want to separate content (HTML) and presentation (CSS). It's useful to understand how that works if your HTML is generated from PL/SQL. Essentially you're abstracting out certain parts of the markup so that maintenance is easier down the road.

Instead of:

<p style="border: 1px solid black;">

you're writing:

<p class="thing_that_deserves_solid_black_border">

filling in, of course, a semantically meaningful name for the class. It's self-documenting, smaller file size (assuming you don't use ridiculously long names), and the definition of how the classes look goes into a CSS file. The CSS in turn can be a static file, generated by mod_plsql the same as the HTML page, or even produced via some other language like Perl, so there's more flexibility in making global changes.

The same notion applies in pages generated from PL/SQL, but the simplicity of HTP.P() makes it easy to overlook. The Java world is hot on the "model/view/controller" architecture. How does that play out in PL/SQL?

Basically, it just means that you have separate procedures for manipulating data, and rendering the data as a web page. Something as simple as:

create or replace procedure generate_some page as
begin
build_giant_data_structure();
format_giant_data_structure();
end;

The conundrum is that you know in your head that HTP.P() is just stuffing the strings in a big data structure anyway, so why not code something like:

create or replace procedure generate_some page as
begin
generate_start_of_page();
for item in (select * from some_table)
loop
generate_piece_of_page(item);
end loop;
generate_end_of_page();
end;

But, what if you want to write stored procedures that can be called from Java, Perl, VB, or some other environment? In that case, they'll have all their own printing logic, and it will be convenient to have stored procedures that purely work on the data and don't do any HTP.P() calls of their own. You'll want to centralize things like cursor declarations, type declarations, and constants in packages, where otherwise you might feel fine with putting those inside a particular procedure, if only that procedure uses them.

Friday, July 20, 2007

PL/SQL Web Programming 1.0 - The URL is the API

Back in the transition between 8i and 9i, I almost had the chance to write a book about developing database-driven web apps for Oracle. But with so much web stuff moving to Java and the middle tier, it was too big and amorphous of a subject. I wrote about specific features, like PL/SQL Server Pages, but not a whole soup-to-nuts treatment.

Ironically, now that things are more sorted out, I know the subject a million times better from writing code, but I'm no longer writing doc. So my (unofficial) thoughts on the subject go here on the blog.

The notion that will really get you started with PL/SQL web programming is, realize that with typical PL/SQL web apps, your API is really the URL. You write stored procedures or packages as normal, and a component in the web server translates between URL format and the PL/SQL parameter types. If you have a full-blown Application Server installation, that component is the mod_plsql module. If you take the simpler way to web-enable your database, e.g. for a departmental application, that component is called the Embedded PL/SQL Gateway and comes with the database.

What do I mean by "the URL is the API"? Well, by looking at the URL, you can see how the application code is structured. Consider this URL that I own on Oracle.com:

http://www.oracle.com/pls/db102/homepage

/pls is a common convention for the virtual path under which all the URLs represent PL/SQL stored procedures, similar to the way you see /cgi-bin for Perl or other types of scripts.

db102 means the stored procedure is owned by the DB102 schema.

homepage means that's the name of the stored procedure. It could also be a synonym pointing to the stored procedure. If the stored procedure was part of a package, the last part of the URL would be pkg_name.proc_name instead of just proc_name.

I find it a little annoying the way dots are significant in the last part of the URL, because it keeps me from using URLs that end with "filename.ext", as is typical of other web development environments. There's always the Apache module mod_rewrite to mangle the paths of URLs into some friendlier form, but that introduces complexity, and involves coordination with other groups if you don't own the web server.

But the path part of the URL is just the appetizer. The real API is the query string, the part of the URL after the "?" character. For example, in a URL like this:

http://www.oracle.com/pls/db102/search?remark=quick_search&word=john+russell&tab_id=&format=ranked

the part after the ? represents the parameters to the procedure. (The top-level subprograms that you call to generate a page in a PL/SQL web app are always procedures, not functions.) We can see that the procedure takes a REMARK parameter, a WORD, parameter, a TAB_ID parameter, and a FORMAT parameter.

Perhaps there are other parameters, but if so, they must have default values, which makes them optional for the URL. We can't infer anything about the order of the parameters; they can be specified in the URL in a different order than they're declared in the procedure, and mod_plsql takes care of it.

Each parameter is part of a name/value pair separated by the = sign. The & character is the separator between each name/value pair. Since there's nothing between the = sign and the & character following TAB_ID=, that parameter is getting passed a null.

The + in the parameter value "john+russell" above represents a space, in the standard web URL encoding scheme. Because we need a way to pass special characters like &, =, space, /, characters outside the ASCII range, etc. we can represent them as %xx, where xx is 2 hex digits representing the character code. Space is represented by either + or %20 (ASCII 32). And it follows that real % and + characters are represented by their own %xx sequences.

Your PL/SQL procedure can receive VARCHAR2 values with all of these characters with no trouble, the mod_plsql module translates all the values for you when it invokes your procedure. But if you generate a page of HTML from a PL/SQL procedure, and that page includes links to other PL/SQL procedures including parameters, then you do have to URL-encode the value part of each name/value pair.

Encoding most special characters is a relatively straightforward matter, but you have to take care with the characters that are special within the URL. For example, you wouldn't URL-encode the part of the URL that normally includes / characters, or up to the ? character that begins the parameters. (That last portion of the URL is known as the "query string".) Within the query string, you wouldn't URL-encode the = characters in the name/value pairs, and you especially wouldn't URL-encode the & characters that separate the name/value pairs. In fact, to be strictly correct, you must turn each & separator character into the sequence &, so that the browser or whatever user agent parses your HTML file doesn't find a lone unescaped & character in the middle of a string literal. That makes for invalid XML and will trip you up when you start using XHTML content-type declarations, strict file validators or link checkers, and so on.

The & business is the trickiest part of basic PL/SQL web development. Normally the & character is used by SQL*Plus for substitution variables. So even to use it in a program compiled by SQL*Plus, you either have to issue SET DEFINE OFF to turn off substitution variables entirely, or do something like

define amp = &
define urlamp = '&'

and put &. in your strings everywhere you want a normal &, and &urlamp. everywhere you want &. So your search-and-replace code that replaces & with & winds up looking like:

expanded_separators := replace(uneexpanded_separators,'&.','&urlamp.');

When a link coded like <a href="proc?parm1=val1&parm2=val2"> is displayed on a web page, any preview you see in the status bar will just show you regular & characters. There is a bit of a Catch-22 in that some older crawler software will fail to parse the & characters correctly, so your Apache log might fill up with invalid URL requests even though you coded the links according to the standards.

You must order any substitutions so that once you have introduced & or % characters into your encoded string, you don't substitute those characters a second time.

So, let's deconstruct an imaginary URL:

http://www.example.com/pls/my_schema/my_proc?parm3=val3&parm1=weird+string%28value%29%26%3D%2F%3C%3E

That means somewhere in your code, you have CREATE OR REPLACE PROCEDURE MY_PROC...

We put param3 first in the URL because the order of parameters in the URL doesn't need to match the order of parameters in the procedure; you just have to fill in values for all parameters that don't have default values.

The fact that we omitted parm2=val2 means the procedure will just use the default value for that parameter.

The value for parm1 includes some unusual characters that get quoted as + (space) or %xx (punctuation and non-ASCII characters). Your procedure doesn't have to decode the escape sequences, but it does have to encode them if you write out such a link.

The above URL with the & characters is what you would literally open up in your browser. But if you were writing out a link tag in your own page, you would write it as:

http://www.example.com/pls/my_schema/my_proc?parm3=val3&parm1=weird+string%28value%29%26%3D%2F%3C%3E

with & instead of & as the separator.

What about datatypes, you ask? mod_plsql will helpfully convert values to PL/SQL datatypes where possible. Most parameters can be declared VARCHAR2 if you want. NUMBER and INTEGER are also OK.

I stay away from anything with too exotic a representation, e.g. DATE. Better to pass it as a VARCHAR2, then call TO_DATE(param_value, 'format_string') inside the procedure. If there is a problem with the conversion inside the procedure, you can trap the exception and handle the error yourself. But if the error happens while mod_plsql is converting the value to the type of your parameter, you can't trap the exception and you will see an unhelpful (to the user) PL/SQL stack dump instead of the regular page.

Boolean is one common type that isn't supported for parameters in a mod_plsql URL. Use INTEGER and pass a 0 or 1 instead. The non-support for Boolean is handy in that you can put a required Boolean parameter on a procedure, which ensures that it can't be called unexpectedly by someone constructing a mod_plsql URL, trying to hack into your pages. Although I wouldn't rely on that as the primary security for a procedure, since mod_plsql could always add support for Boolean parameters.

You also commonly see URLs with the same parameter repeated multiple times, such as:

http://www.example.com/pls/my_schema/my_proc?name=Joe&name=Fred&name=Sue

It feels intuitive that the procedure is getting an array-like data structure with 3 elements. But what type should it be exactly? That's not quite so intuitive. If you use a TYPE declaration to make a new kind of VARRAY in your own procedure, mod_plsql can't see it early enough for you to declare parameters of that type. So what you really need is a package that declares a type that's a VARRAY or nested table of VARCHAR2s, then declare the procedure parameter as YOUR_PACKAGE.YOUR_TYPE. Luckily, there is a predefined type like this, OWA_UTIL.IDENT_ARR. But the elements of that type aren't especially long, so if you need to pass sizeable values, you'll need to make your own type.

A further complication comes if you want to make a multiple-occurrence parameter optional, that is, to make both of these URLs work:

http://www.example.com/pls/my_schema/my_proc?name=Joe&name=Fred&name=Sue
http://www.example.com/pls/my_schema/my_proc?nickname=Wheezy

You need to declare a default value in your procedure for the NAME parameter, but how do you do that for a type that you just declared? You can't just make the default be NULL. You need to make a real but empty instance of that type.

The answer is to declare inside the same package as the type, an empty instance of that type (for index-by tables a.k.a. associative arrays), or a function that returns an empty instance of that type (for nested tables or varrays). Something like:

-- In the package spec.
type type1_t is table of varchar2(100);

type type2_t is table of varchar2(100) index by pls_integer;
empty_type2 type2_t;

-- In the package body.
function empty_type1 return type1_t is
begin
return type1_t();
end;

Then you can declare your procedure like so:

create or replace procedure my_proc
(
name my_package.type1_t := my_package.empty_type1(), -- Construct nested table with no elements
nickname varchar2(100) := null
)
...

or

create or replace procedure my_proc
(
name my_package.type2_t := my_package.empty_type2, -- Construct associative array with no elements
nickname varchar2(100) := null
)
...

With parameters set up like this, you can call the procedure with one or more NAME values, or leave out the NAME parameter and pass NICKNAME instead.

A useful pattern for procedures that generate a web page is to have null or empty defaults for all parameters, and make the procedure display a minimal page (typically with HTML form elements) when invoked with no parameters, and display a more elaborate page when invoked with parameters. For example, a search page invoked with no parameters can display just the search box. When invoked with parameters, it can display results too. Instead of pointing the form tag to a different URL, point it to the same procedure. For example, if the procedure name is SEARCH, generate a form tag like so:

<form action="search" ...>

This example also illustrates how procedures within the same schema can link to each other without elaborate paths or qualifiers in the URLs. Just put the procedure name in the action= attribute of a form tag, or the procedure name + the query string in the href= attribute of an anchor ("a") tag.

Planning and preparation like this is all well and good, but how do you actually generate the HTML for the page? Ah, that's another post coming up!

Friday, July 6, 2007

Search Engine Optimization for Programming Languages

Why didn't language X take over the world? Why does language Y enjoy broad adoption despite clear shortcomings? While there are lots of factors, for example the degree of strictness vs. whimsy and how that matches up with the mindset of programmers, I am convinced that it largely comes down to how easy it is to search for sample code in a given language.

For example, once Java became popular, if you ran across a mention of some class or method name, you could always find lots of example by searching for those names. Or if you wanted to see how to use a certain GUI method in combination with a certain container type, you could search for pages with both names. If any name was ambiguous, you could add some other keywords -- class, extends, etc. -- to make sure you really found a full code sample and not some general discussion.

So, Java's adoption was helped by a confluence of factors: a standard set of relatively verbose names, and a bunch of students posting their homework solutions online.

Perl, for me, didn't fulfill its promise and I lament sometimes that it didn't take over the world like it could have. When your source code is full of idioms like $_ and <>, it's hard to search for code samples. (By search, I mean using Google et al, rather than using vi, emacs, or grep on your own source.) When I look at my Perl code, the words inside are mostly generic like for, if, print, and die. Searching for them, you would come up with lots of non-programming discussion, or source code from other languages. For example, what does "unset local $/" do? Google will find instances of the phrase "unset local", but not the full line of code.

So, Perl's adoption was hindered by factors related to searchability. The best Perl source on the web is posted as CGIs, where a search engine crawler can only spider the output, not the underlying source. It's hard to find sample code generally. And the way Perl modules are documented, when you search for popular module or method names, the top hits tend to be dozens of copies of the same Perldoc files. (And if the Perldoc was sufficient, you wouldn't be doing the search in the first place.)

I see SQL and PL/SQL as somewhere in between Java and Perl in terms of searchability. There are lots of unique names from PL/SQL packages and methods. Some of the words that might otherwise be too common are grouped into phrases like "connect by" and "order by". PL/SQL has some words like begin and end that aren't especially searchable by themselves, but if you combine them with others like declare, exception, etc. you can devise searches that home in on PL/SQL source code.

Where SQL and PL/SQL get into trouble with searchability is in assigning specialized behavior to single, common words. PL/SQL has a function called TABLE(); good luck in finding that by searching given that TABLE is used in so many other contexts like CREATE TABLE or "Doing xyz to a Table". There also used to be a SQL function called THE(), now happily gone, but I still get requests from time to time "how come I can't search for the THE function?".

When I get some free time, I'm going to look into the idea of auto-classifying source code via Bayesian analysis. Oracle Text has the CTX_CLS package, where you feed it training data consisting of already-categorized documents, and it deduces rules so that it can analyze unknown documents and match them up with those same categories. The Oracle doc library has tons of files with source code in a single known language, multiple languages in the same file, or source code from one language in a book that is primarily for a different language. I'd like to compare and contrast the CTX_CLS approach with a "homegrown" one similar to Peter Norvig's spelling corrector.

Thursday, June 28, 2007

Upgrading SQL*Plus Client

Sometimes the old ways are best, sometimes not. I have some rarely-used machines where I had never bothered to upgrade my client installation, and so were still running the 9i SQL*Plus. The older SQL*Plus could still compile all my PL/SQL source that used the latest and greatest 10g features, so what's the big deal?

As it turns out, there are a number of cases where having a downlevel SQL*Plus produces cryptic errors. I went looking in the docs, but I don't think there is any comprehensive listing of these cases. (It would be difficult to make an exhaustive list; it might not help much with troubleshooting because the errors are so cryptic; and I'm sure the workaround -- just upgrade! -- is stated many times in many places. Still, I would have liked to have found a list that showed all the errors that could occur, so I could find it by searching.)

For me, the feature that prompted me to finally upgrade was the alternative quoting syntax. It worked fine in a PL/SQL block:

begin
  insert into some_table (title) values (q'{Administrator's Guide}');

But not if I used the quoting feature directly in SQL, i.e. in a query from the SQL*Plus command line or in a SQL script with INSERT etc. not wrapped inside a PL/SQL block:

SQL> select title from some_table where title like q'{Administrator's}';
ERROR:
ORA-01756: quoted string not properly terminated

Tuesday, June 26, 2007

What's the Capital of Abyssinia?

Few subjects provoke as much passion as the question of how to capitalize programming code.

You've got C-like languages where case is significant, but the discussion continues whether to string_words_along or JamWordsTogether. My own convention for PL/SQL is a bit idiosyncratic. I decide whether a particular subprogram is more procedural or more object-oriented, and use underscores or Camel case as appropriate for each one.

Then you've got lots of other languages, like PL/SQL, where case is not significant. But you'd still like to format source code for readability and comprehensibility. I commonly see the convention of reserved words in all caps, and lowercase for names of tables, variables, etc. That allows for some degree of consistency in documentation -- you can talk about "the IF statement" and the reserved word is easy to pick out even if the text is stripped of all formatting in a man page or an e-mail. But if Oracle ships packages like DBMS_OUTPUT, does it make sense to capitalize those names as if they're reserved, or treat them more like user-defined objects? Rather than drive myself to distraction, I use lowercase for everything in source code (after all, nobody ever accused C or Perl code of being hard to read... er... :-). When referring to reserved words in text, I use all-caps.

Now, I am well aware that there is an uppercase stigma attached to some languages. FORTRAN 77 was case-sensitive and got endless grief for that. (Why, I left my small-town roots rather than accept a job working in FORTRAN.) Fortran 90 enabled the use of lowercase, which was such a big deal that it was reflected in the language name. At IBM, where I eventually worked my way up to working on XL Fortran :-), it was common to switch within the same sentence: "The IBM Toronto Lab took over the FORTRAN mission, and released several Fortran compilers, while the mainframe FORTRAN products continued to be developed by the Santa Teresa lab". This after years of working on Ada, which also had an uppercase stigma because everyone assumes it should be ADA. (It's Ada, because it's an eponym not an acronym.)

There is some uppercase lurking inside the database. Issue create table foo and it will dutifully create a table named FOO, as you can see by 'select table_name from user_tables'. You can make the table be named FooBar by enclosing the name in double quotes: create table "FooBar". But then you have to put the quotes everywhere you refer to the table, so it's not worth the trouble except in unusual circumstances where you want to obscure the name, or have names identical except for case.

The tilt towards uppercase means you more often see code like this for doing case-insensitive matches:

select last_name from my_table where upper(last_name) like 'A%';

You could just as well write:

...lower(last_name) like 'a%';

and people might regard that line with more respect. These days, out of principle, I always use a regular expression instead:

select last_name from my_table where regexp_like(last_name,'^a','i');

In real code, anyway. On the command line, I still slip back into LIKE out of habit. It puzzles me a bit that SQL can't handle the BOOL... bool... Boolean datatype of PL/SQL, yet regexp_like is testing a Boolean condition without any comparison operator.

Sunday, June 24, 2007

Why Is It So Hard to Print?

A knock that seems to apply to any "enterprise"-class language is, why is it so hard to actually print anything? For purposes of this post, we'll consider PL/SQL and Java.

In Java, a hello world program must call System.out.println(). That 3-level call is kind of verbose, plus in the CamelCased world of Java you might forget whether the second part is "Out" or the third part is "PrintLn".

In PL/SQL, a hello world program must call dbms_output.put_line. Don't forget to issue "set serveroutput on size 1000000" in SQL*Plus first, or the output is held in a buffer instead of printed immediately. When you see this call in the docs, it will probably look like DBMS_OUTPUT.PUT_LINE, which will make you think it's combining the keywords of Pascal and the capitalization of FORTRAN 77. In the Oracle9i PL/SQL Guide, I had deep metaphysical anguish over whether to show such a hello world example in Chapter 1, or if it had to wait until much later after discussions of stored procedures and packages. Of course, I chose to break with linearity and show it early.

In contrast, scripting languages like Perl and Python let you just issue: print "hello world". Why make it hard to do simple I/O? How is that a selling point for a language?

You can think of it as an implicit nudge towards a particular style of programming. Both Java and PL/SQL would prefer that you write your own function named print(), or msg(), or debug(), or whatever that is essentially a 1-liner that calls System.out.println or dbms_output.put_line. That's because the program that today just prints something to the screen, tomorrow might store the output in a database, or send it in an e-mail, or transmit it in some other way, or maybe send it to multiple output sources. The idea being that it's easier to change your one print() function than to go back and change every system-defined print call to your own function name.

With Java, you might have several kinds of objects, and in each one print() might do something different.

With PL/SQL, you might define print() as a nested procedure inside a bunch of other procedures, or each package might have its own print() function. PL/SQL makes the "store in a database" option so easy, that you might find it's the most convenient way to go for any function like debug(), log(), etc.

Unix hackers might find all of this unnatural, because in the environment where you would normally be doing Perl or Python scripts, you could easily redefine where the output goes by way of pipes, redirects like > or 2>&1, or the tee command.

However, the zen of writing a bunch of seemingly trivial one-line functions is part and parcel of both object-oriented and functional programming, so just cast your mind back into LISP hacker mode if necessary and it won't seem so painful.

Thursday, May 31, 2007

Less is More, More or Less

Do you fret every time you declare a PL/SQL variable, that you've carelessly allocated a few unnecessary bytes, and that this wastage will be replicated thousands of times as your code is called from a bazillion sessions, and that that will be the straw that breaks the camel's back and makes your server thrash to a halt?

Well, calm down. It's not that bad.

If you don't know exactly how much space is needed for a PL/SQL variable, the most space-efficient thing you can do is counterintuitive: declare a gigantic variable. For example:
declare
email_address varchar2(32); -- Could give error from being too small
email_address2 varchar2(512); -- Probably no error, but wastes space
email_address3 varchar2(32000); -- No error, no wasted space

Once your variable is declared as 2000 characters or more, the variable is no longer statically allocated to accomodate its maximum size; instead, it's dynamically allocated every time it's assigned, based on the exact size that's really needed.
email_address2 := 'someone@example.com'; -- Rest of 512 characters are wasted empty space
email_address3 := 'someone_else@example.com'; -- Although declared with length 32000, only 24 characters are allocated

VARCHAR2s declared with huge lengths make great stocking stuffers, such as collection elements. You can make complex data structures that potentially can represent big documents (each line up to 32K), yet don't take up any more space than is really needed.

Now let's look at a complementary technique, where again we keep the allocated size under control.

When you want a variable to hold the contents of a table column, there is a simple way to match the right length:
declare
var1 my_table.my_column%type;

If my_table.my_column is declared VARCHAR2(600), then that's what var1 will be too. No worries about selecting a value from the table and getting a length error. If you run ALTER TABLE to change the length of the column, next time your code is run, it will pick up that change automatically and adjust to the new length.

Maybe your selects are not quite so simple. Say you want to select col1 || col2 and store the results in a variable. How...?
declare
cursor my_cursor is select col1 || col2 concatenated from my_table;
var2 my_cursor%rowtype;

We let Oracle figure out the maximum length of the concatenated columns, then piggyback on that result to make a variable of the same type. (Actually, it's a record variable that has a field of the same type; to refer to the field, we'd specify var2.concatenated.) Again, we're safe from ALTER TABLE.




Addendum



Eddie Awad linked to this post from his blog, and I see that post attracted some comments that I should address here.

Looks like, in 10g, the limit has been raised from 2000 characters to 4000 characters. I like it when PL/SQL limits align with SQL limits, and the number 4000 is significant because that's the biggest VARCHAR2 you can stuff in a table. So you don't have to worry about declaring a variable of type TABLE_NAME.VARCHAR2_COLUMN%TYPE and having this kind of dynamic allocation happen or not depending on the size of the column.

Anyway, I'll just reiterate my advice from the PL/SQL Guide: don't try to finesse the VARCHAR2 length depending on knowledge of the exact limit; if you prefer dynamic allocation for a variable, use a huge length. I always use 32000. A PL/SQL guru I know uses 32500. (I haven't seen anyone use 3276(7|8), proof that nobody likes to tempt the gods of boundary conditions.)

Performance guru Jonathan Lewis discusses some detailed performance / scalability implications of using big or different-sized VARCHAR2 variables. The kind of code that would use huge VARCHAR2s, in my experience anyway, is not inserting into or querying those values from SQL tables; rather it's building data structures like the buffers used in HTP.P() or DBMS_OUTPUT.PUT_LINE(). So I'll keep that performance tip in mind for DML-intensive code, but not worry so much for code that mainly does string manipulation and displays the results as a web page via mod_plsql.

Friday, May 18, 2007

Quotable Quotes

Spare a kind thought for the hard-working yet underappreciated quotation mark. Single quotes or double quotes, you ask? Exactly!

If you're familiar with other languages such as Perl or Javascript, you probably intermix single and double quotes as needed. Double quotes around strings containing single quotes. Single quotes around Perl strings containing $ or @ characters, so they aren't interpolated as embedded variables.

With PL/SQL, you'll almost always use single quotes. The main exception is around identifier names, which you can enclose in double quotes if they contain special characters or if you need to force them into mixed case or lowercase:

execute immediate 'create table "My_Table" ...';

Without those double quotes, the new table would be known internally (in the USER_TABLES view, etc.) as MY_TABLE. The double quotes force the object name to be exactly as entered, meaning you would get an error from "select * from MY_TABLE" or "select * from my_table". (In this last case, the name my_table is translated silently to MY_TABLE.)

So, that's it then. Use single quotes for string literals, nothing more to learn.

var1 := 'Hello world!';

But what happens when we start dealing with real-world cases?

var2 := 'Oracle Database Administrator''s Guide';

Wait a second! What's with the 2 consecutive single quotes? That's the notation to escape a single quote inside a string literal. What a bother. If you want to generate a text file with a bunch of insert statements, you have to run a search/replace on each string to create 2 of each single quote. And you have to do it before wrapping single quotes around the data. When done in PL/SQL, this leads to one of my favourite PL/SQL statements of all time:

var3 := replace(var2,'''','''''');

We are only replacing 1 single quote with 2 single quotes, but between the surrounding quotes for the literals and the doubled quotes to escape them inside the literals, it looks like an entry in the Obfuscated C Contest.

Luckily, in Oracle 10g there is a way to avoid all the escaping. It still requires you to predict what characters your string might contain, but within that constraint it is much more readable:

var4 := q'{Oracle Database Administrator's Guide}';
var5 := q'[I'm the king of the world!]';
var6 := q'#Mommy's all right, Daddy's all right...#';

With the q literal notation, the first and last characters inside the quotes are not part of the string value, and single quotes are OK since you are making sure that the closing sequence (in the examples above, }', ]', and #') do not appear anywhere in the string. If the opening character after the q' is a "bracket"-style character, the matching end character is the corresponding close bracket. If the opening character is some other punctuation mark or what have you, the same character is used immediately before a single quote to end the literal.

Where this really comes into play is in writing out code of one kind or another from PL/SQL. In a web application, you might be writing out HTML full of double quotes, Javascript full of single quotes, or a combination:

obj.innerHTML = '<div style="float: right;">';

It's very comforting to know you can wrap the whole thing in a PL/SQL string literal without doing any replacements inside the string:

var7 := q'|obj.innerHTML = '<div style="float: right;">';|';

Again, you'll have to know based on your own coding conventions or a priori that the closing delimiter character doesn't appear immediately before a single quote.

I like to combine this technique with a SQL*Plus trick I mentioned in an earlier post:

var8 :=
q'|
@javascript_source.js;
|';

Now that Javascript is available to PL/SQL and can be plunked into a web page, concatenated with other source, searched/replaced, etc. It doesn't have to contain any escaped '' sequences and so can be reused with other languages, edited by a non-PL/SQL programmer, run through a Javascript validator, or anything else that strikes your fancy.

Sunday, May 6, 2007

The Humble IF Statement

If I have one piece of wisdom to impart, it's this: pay attention to your IF statements!

Seems like the hardest part of knowing a dozen languages is keeping track of the different syntaxes for simple IF statements. (I took a tour of an Internet startup pre-dotcom crash, and saw my PL/SQL User's Guide open on a desk -- not to BULK COLLECT or the exception model, but to the page for the IF statement.)

PL/SQL is relatively verbose but straightforward: IF this THEN that END IF;

Python is a little more concise, leaving aside the line breaks and indentation:

if this:
that
elif the_other:
something_else

I could wish for a little more Perl-style forgiveness; I keep leaving "then" in by mistake.

My simultaneous favourite and least-favourite IF syntax is from Korn shell:

if [ this ]; then
that;
fi;

"then" is its own insignificant statement, and so requires the semicolon after the condition to format the lines the intuitive way. The conditions are a weird mix of UNIX-style flags (-f "$filename" to check for file existence) and FORTRAN-style comparison operators ("$string1" eq "$string2").

I can never remember whether the semicolon is needed, or the details of the comparison operators, or sometimes even whether the condition is enclosed in [ ] or [[ ]]. Don't forget that spaces are needed after the [ and before the ], it's an error otherwise. So for me, ksh is the toughest language to switch back into on a moment's notice.

Now this post isn't just a language comparison. There are some shortcuts you can do with IF logic in PL/SQL.

If you find yourself writing statements like:

if x is not null then
y := x;
else
y := 'some default value';
end if;

...you can use the NVL() function instead:

y := nvl(x,'some default value');

Because this is a SQL function too, you can use it in queries the same way:

select name, course, nvl(grade,'Incomplete') grade from coursework;

For any row where GRADE is null, the value that comes back will be 'Incomplete'. Since we're plugging a function in instead of a column value, the trailing "grade" is helpful to give a name to the expression value, in case we wanted to wrap the query into a nested query or implicit FOR loop in PL/SQL.

The next level of IF is the CASE expression. This lets you replace a whole sequence of IF/THEN/ELSE clauses with a single expression:

letter := case
when grade between 85 and 100 then 'A'
when grade between 75 and 84 then 'B'
else 'Kiss the honor roll goodbye'
end;

Old-time PL/SQL programmers might be in the habit of doing:

select nvl(something,something_else) into variable from dual;

as a way to do this kind of IF-NULL-THEN assignment for a single value. You can get rid of the query by using NVL in a regular PL/SQL assignment, or (my preference) turning the NVL into a CASE.

It's not the compactness that makes CASE a win for me, it's the fact that (as an expression) you can use it in the middle of some other statement where normally it would break up the flow of thought or require you to create dummy variables to hold intermediate values:

declare
something varchar2(32000) :=
case
when release = 1 then 'value for release 1'
when release = 2 then 'value for release 2'
else 'some other value'
end;

procedure my_proc
(
param1 varchar2 := 'xyz',
param2 varchar2 := 'abc',
param3 number := case when to_char(sysdate,'Day') = 'Tuesday' then 7 else 2 end
) is ...

PL/SQL's CASE comes in 2 forms, the expression that you can plug in anywhere as shown above, and the statement that must stand alone and ends with END CASE instead of just END:

case
when percent_full < 75 then
perform_routine_maintenance();
when percent_full between 75 and 99 then
stop_nonessential_jobs();
send_email_alert();
when percent_full = 100 or cpu_too_high = true then
panic_in_the_streets();
else
dbms_output.put_line('Some anomalous condition!');
end case;

The statement format lets you do multiple things in response to any condition, and you can test different variables in different WHEN clauses. You aren't just returning a value, but running full statements in each WHEN block. I don't use the statement format very often though. I find it is a little confusing to distinguish between the two forms (usually by checking whether the last part is END or END CASE) when reading existing source. And if you don't properly account for all possibilities in a CASE statement and execution falls off the end, you get a CASE_NOT_FOUND exception, so it requires extra care in constructing the conditions to always match one of the WHEN conditions. (You can trap the exception, but it can be a shock the first time you get one when you hadn't intended the CASE statement to cover every possibility.)

Saturday, May 5, 2007

PL/SQL vs. LISP, Scheme, and friends

Getting my Comp. Sci. degree from about as far away from Silicon Valley as you can get in North America, I was curious what the prestigious US schools teach that's so great. So just for kicks, I started going through podcasts for the CS courses at UC Berkeley. The teaching language they start out with is Scheme, a LISP-like language. Walking through the coursework in my head, I couldn't help but draw comparisons with PL/SQL.

There is a section in the PL/SQL User's Guide and Reference on recursion. It was there before I owned the book, and it's still there after I gave it up. Listening to UCB focus on functional programming from the get-go clarified for me exactly what questions that blurb about PL/SQL recursion is trying to answer.

Yes, PL/SQL can do recursion like Fibonacci sequences, where you're passing numbers around. But it isn't really set up for recursion in the LISP or Scheme sense, where you're recursing through a list by chopping elements off at the front or back, and possibly returning a value that is some new set of elements. When you have a PL/SQL collection, the only practical alternative is to loop through all the elements and do arithmetic on the subscripts. (After all, the P in PL/SQL is for "procedural".)

You could fake this kind of recursion to some degree, but PL/SQL's type strictness makes it awkward. Plus with the types of problems you're solving in PL/SQL, you like to visualize exactly how much memory is being allocated and keep things simple so your programs scale well and play nicely with other work happening on the same database server. Functional programming asks you to take a leap of faith that you won't run out of memory, and a logic bug that causes an infinite recursive loop is something to be expected during debugging. Database programming occasionally asks for the same type of faith -- say when querying several big tables joined together -- but the DB programmer expects to be able to look at the distribution of data and the optimizer plan for the query.

Another interesting aspect in Scheme is that functions are first-class items. You can pass a function (rather than a function result) around as a parameter, assign it to a variable, and use it as a member of a composite data structure. It's not quite like the Wild West days of function pointers in K&R C, but it's close.

PL/SQL stored procedures are intimately entangled with the database in ways that don't mesh with that way of thinking. Creating a PL/SQL subprogram is a relatively expensive operation: CREATE OR REPLACE PROCEDURE/FUNCTION is a DDL statement that might block if someone is calling the previous version of that subprogram; DDL commits, so your transaction is over; the source of the subprogram gets squirreled away in a form that can be queried; DDL is syntactically more complicated to issue from a PL/SQL subprogram than DML statements like INSERT or UPDATE. If you have a bunch of subprograms defined and just want to pass symbolic references, you might wind up passing around the procedure or function names as strings and calling them via:

execute immediate variable_name || parameter_list;

But with all the bad guys trying to find ways to hack into a database, that approach raises the possibility of SQL injection attacks, by stuffing something more elaborate into the string used with EXECUTE IMMEDIATE.

Another approach is to define SQL types that inherit different variations of the same subprogram. Instead of passing around a variable X and having your code call X(), you pass around an object X and your code calls X.MEMBER_FUNCTION(). This can do the job if you know in advance all the variations of MEMBER_FUNCTION and can declare them in your code. But the LISP/SCHEME notion of functions returning functions likes to be able to make such things up on the fly, which would involve EXECUTE IMMEDIATE statements running CREATE OR REPLACE statements with big blocks of code. Again, I would stay away from this approach for security and performance reasons.

Perl vs. PL/SQL: Regular Expressions

One recent addition to Oracle's SQL dialect, and also PL/SQL, is Perl-like regular expressions.

It's very convenient to declare a CHECK constraint as a regular expression, using the REGEXP_LIKE operator. The data has to match a pattern, or it doesn't get into the database. The pattern could be all caps or all lowercase, some set of allowed characters, alphanumeric values with certain patterns, and so on.

It's also very convenient to set up a trigger that does a replace using regular expressions, using REGEXP_REPLACE instead of a combination of SUBSTR, TRIM, and REPLACE stretching out over dozens of lines. Back-references are particularly useful, say for recognizing a flexible pattern within a block of HTML and enclosing that pattern in tags for highlighting.

A couple of things are not quite ideal yet in the PL/SQL implementation. I have found that running REGEXP_REPLACE on certain moderately large strings can fail (and I've opened a bug for that). I wish there were more metacharacters supported; in the name of NLS, some \ sequences familiar from Perl are left out. And I am having an issue that I haven't fully diagnosed yet, doing backreferences in a pattern containing multiply nested parentheses.

Don't have code at hand to post samples; I'll try to come back to that later.

Saturday, April 21, 2007

Perl vs. PL/SQL

When I started learning PL/SQL, I was gratified to find that my whole Perl way of working translated easily into doing stored procedures.

Most small Perl scripts that do transformations are built around a loop like this:

@lines = ; # read entire contents of file
for $line (@lines)
{
# do some search/replace on each line
# print the result, or write to another file
}

PL/SQL has similar compact notation for doing a query and looping through the results:

for item in (select * from some_table)
loop
# the name "item" is arbitrary; that variable doesn't need to be declared
# refer to columns using the notation .
end loop;

As the stored procedures get more elaborate, you can get into different syntax like the FORALL statement for issuing lots of INSERT/UPDATE/DELETE statements, or doing BULK COLLECT to pull the full results of a query into a PL/SQL collection variable to inspect the whole thing. But the construct above is fine for most typical query / reporting scenarios.

The similarity between the 2 languages is the reason that I've never used the Perl DBI modules for database work. It's too easy to write out real SQL statements or PL/SQL blocks using a HERE document, invoke SQL*Plus using backticks, do one-liners in Perl to just call PL/SQL stored procedures... just generally hand off the database logic. I even see parallels in the way object-oriented constructs were added to both languages in ways that don't especially appeal to me. (If I want to do object-oriented programming, Ruby is the most appealing choice for me.)

First thoughts about Python vs. PL/SQL

I successfully ignored Python for a long time -- it has the curse of starting with the letter "P", and I already know too many languages like that! But Google uses it a lot, there must be something to it, they even hired its creator Guido van Rossum. So when I saw this post from a Googler with a spelling checker in 21 lines of Python code, I thought that would be a good starting point. Long ago, I wrote a subset of this functionality (spelling correction by checking transpositions) in 64 lines of PL/SQL, but didn't find the general technique efficient enough to be worth putting into my production code. (Although if I devoted a whole server to nothing but that code, who knows.)

I'm only on my second pass through the Python tutorial, but already I'm gaining some perspectives on the language.

One good way to assess what a language is for is, what can you do with a one-liner. Python has some pretty good and easy to understand one-liners for composing and decomposing data structures. For example, in a search engine you might see a search term like:

"create table" oracle

and want to turn that into a list structure like:

create table
oracle

In Python, the decomposition part is a one-liner:

term = '"create table" oracle'
decompose = [x.strip() for x in term.split('"') if len(x) > 0]

(OK, you would do more processing than this to account for cases with no quotes in the term, but let's ignore that for the moment.) The FOR and IF parts of the expression above are really working like clauses in a SQL query. We're stepping through items in a data structure, deciding which ones to take, and running a transformation function on each one as it's pulled out.

That's a typical thing to do in a SQL query, where the original data structure is pretty solidly defined. You can also do it on an in-memory data structure using the PL/SQL construct TABLE(CAST ...)). But the PL/SQL technique is kind of obscure. You have to define local types, which makes the code verbose. I didn't know the details of this technique until after I stopped writing the PL/SQL Guide; I would see the idiom used but there was nowhere in the book that documented this syntax. (For examples of this PL/SQL usage, see these code listings.)

When I look at certain other languages, they clearly complement SQL and PL/SQL. You would handle in the other language the things that are difficult to do in the database, and let SQL and PL/SQL do the queries, joins, hashing, etc. for large-scale data manipulation.

Python strikes me as different, because a lot of its operations mirror typical database operations. I've never been comfortable with the attitude of doing all the logic on the client side and just using the database for a dumb storage engine; seen too much bad Java code to trust that the logic will produce squeaky clean data! But with Python I could imagine getting the logic right, serializing the data back into the DB, and manipulating it equally well in memory or on disk.

The one-line decomposition from my example could be done with PL/SQL. Just with more verbosity, various SUBSTR() functions and || concatenation operators, and yet more local type declarations. And those type declarations have to be moved around if the program expands; say you want to pass around procedure parameters of complex types, now you have to declare those types in a package. It's all just a big Turing machine after all, but there is something about the way a language puts barriers around certain operations and makes you reinvent the wheel with low-level code, that locks people into expectations of what is and isn't practical.

Another example. Python has a simple way to transform hash-like tuples into a true hash table:

tuples = [('Canada','Ottawa'), ('Greece','Athens')]
hash = dict(tuples)
print hash

gives:

{'Canada': 'Ottawa', 'Greece': 'Athens'}


Hash tables (aka associative arrays) are relatively new to PL/SQL. One feature that I really would like is the ability to query data directly into the key/value parts of an associative array, something like:

select country, capital bulk collect into ;

Right now, you can query all the countries into one array and all the capitals into another, but then you have to loop through them all and construct a new hash table from the 2 arrays. Seems like a logical shortcut to allow queries to produce hash tables of the form { : : }.

Database Language Smackdown

I've tried just about every language I could get my hands on, at one time or another. For the last decade, the big focus was usually to see how well they worked in conjunction with a database.

My big epiphany was taking ownership of the Oracle Database PL/SQL User's Guide and Reference, which I owned for the 9i release and have since relinquished. At the same time, I was writing what now comprises about 100KLOC of PL/SQL code for "Project Tahiti", the database that provides search, hypertext navigation, remote lookup, and related capabilities for the Oracle documentation. (Of course, as is often the case, my biggest insights about the strengths, weaknesses, and best practices for PL/SQL came after I stopped owning the book.)

One thing I always wanted to do was some compare-and-contrast essays for different languages, with my opinions of how they complemented Oracle's SQL dialect and PL/SQL procedural language, and the easiest way for a programmer in language X to approach PL/SQL.

So, that's what I'll do in a series of blog posts. Stay tuned.

Saturday, February 10, 2007

Separating PL/SQL Declarations and Assignments

For variables with default or derived values, I'm normally a big fan of declaring and assigning it in one go. In PL/SQL, that looks like:

create or replace procedure foo as
v1 varchar2(16) := 'hello';
n1 integer := 7;
cond1 varchar2(32) := case
when n1 > 3 then 'greater than 3'
when n1 < 3 then 'less than 3'
else 'exactly 3'
end;
begin
...

(That last technique with CASE is quite a powerful one by the way, you can encode a whole function's worth of logic into a single statement.)

However, I've found through the school of hard knocks that in many cases, it's better to declare the variable but not assign it until the code starts after BEGIN.

If the assignment inside a declaration causes an exception, your EXCEPTION block inside the same procedure or function won't catch it. That is, you'll get different behaviour from:

declare
v1 varchar2(8) := function_that_returns_a_long_string();
begin
...
exception
...

and:

declare
v1 varchar2(8);
begin
v1 := function_that_returns_a_long_string();
exception
...

The second example can trap the exception itself. In the first example, the exception could only be trapped by whatever called the procedure.

Another case is when you make an assignment that ends up inside a PL/SQL package. One of the good points of packages is that you can recompile the body, and none of the procedures or functions that calls the package is invalidated. However, if you put a declaration like:

v1 varchar2(32) := 'some string literal';

in the package spec, you can't change the value without recompiling the spec. So if you're writing code in a procedure that might one day evolve into a package, you might as well split up the declaration and the assignment now.

Thursday, January 18, 2007

SQL*Plus Tip: Turn a File into a String Literal

Consider for a moment this code, using SQL*Plus's '@' command to import a file:

select
'
@foo.htm;
'
from dual;

We've just turned the entire contents of foo.htm into a string literal. Of course, it won't work if foo.htm contains any single quote characters. No problem in 10g, using the alternative quoting mechanism:

select
q'{
@bar;
}'
from dual;

Now we're OK as long as the file doesn't contain the sequence }'. We could pick other characters if needed to avoid collisions with the file contents.

Think about this for a minute. Any file less than 32K could be turned into a PL/SQL VARCHAR2 variable. We could run REGEXP_REPLACE functions to substitute variable content inside the string.

This could be very useful. There is always some tension between coders and designers on web projects. Everyone wants to be able to freely edit their pieces without interference from the other. This technique allows someone to mark up a div, table, style, or other HTML block in its own file, and a PL/SQL coder to pick it up and use it without any extra overhead.

I've never been entirely comfortable with solutions that start with an HTML file and embed little code blocks inside, like with PL/SQL Server Pages. That seems to me to give too much opportunity for fatal error to the person editing the HTML file. When was the last time you asked somebody to make a tiny edit to an HTML file, and they only changed the part you wanted them to. My point exactly; chances are, they went wild changing all your list tags, adding style attributes, and who knows what else. I'd rather have my program logic pumping out divs, table cells, form fields, etc. with only the small block of tagging for each container subject to outside editing.

SQL*Plus Tip: @ vs. @@

I built huge amounts of PL/SQL code with SQL*Plus, using the '@' command to import source files, without ever running across the '@@' command. But one day I found a limitation in '@', and in one fell swoop changed all my SQL and PL/SQL scripts to use '@@' and have never looked back.

You need '@@' if you ever want to compile source code by running SQL*Plus from a different directory. For example, I had source files that would call each other like so:

@variables;
@file1;
@file2;

and it all worked fine, as long as I went into the directory where all the files were and ran SQL*Plus from there.

But to automate my build process, I started running various scripts from a different directory, such as:

sqlplus $credentials source/file1.sql

For file1.sql to be able to import file2.sql and so on, I needed to use '@@'. With '@@', all the import commands are processed relative to the directory where the original file sits, not the directory where you run SQL*Plus.

SQL*Plus Tip: Splitting Up Package Code

With PL/SQL packages, I run into the perennial problem. Logically, a package works best as a single huge thing, grouping perhaps hundreds of related procedures together. But logistically, source code sometimes works better as small individual files.

First up, always put the package spec and package body in separate files. That way, you can recompile the body without invalidating other procedures that call into the package.

You can take advantage of SQL*Plus's lack of a true parser by splitting big PL/SQL package files (typically the bodies) into smaller ones. The '@' and '@@' commands work inside a package, where you might not expect them to. For example:

create or replace package foo
as
@foo_declarations;
@foo_procedures;
@foo_functions;
end;

Tuesday, January 16, 2007

Tahiti Views, My Oracle Blog

I figured I should keep my work-related blog posts separate from my personal ones, so here's my new Oracle-related blog, "Tahiti Views". Project Tahiti is the code name for my work at Oracle, hardly a secret anymore since tahiti.oracle.com debuted in 2000.

Here I'll blog about all things Oracle, documentation, usability, web application development, and combinations thereof.

As you might suspect, all material here is my own personal opinion, not anything official from Oracle.