Friday, July 20, 2007

PL/SQL Web Programming 1.0 - The URL is the API

Back in the transition between 8i and 9i, I almost had the chance to write a book about developing database-driven web apps for Oracle. But with so much web stuff moving to Java and the middle tier, it was too big and amorphous of a subject. I wrote about specific features, like PL/SQL Server Pages, but not a whole soup-to-nuts treatment.

Ironically, now that things are more sorted out, I know the subject a million times better from writing code, but I'm no longer writing doc. So my (unofficial) thoughts on the subject go here on the blog.

The notion that will really get you started with PL/SQL web programming is, realize that with typical PL/SQL web apps, your API is really the URL. You write stored procedures or packages as normal, and a component in the web server translates between URL format and the PL/SQL parameter types. If you have a full-blown Application Server installation, that component is the mod_plsql module. If you take the simpler way to web-enable your database, e.g. for a departmental application, that component is called the Embedded PL/SQL Gateway and comes with the database.

What do I mean by "the URL is the API"? Well, by looking at the URL, you can see how the application code is structured. Consider this URL that I own on Oracle.com:

http://www.oracle.com/pls/db102/homepage

/pls is a common convention for the virtual path under which all the URLs represent PL/SQL stored procedures, similar to the way you see /cgi-bin for Perl or other types of scripts.

db102 means the stored procedure is owned by the DB102 schema.

homepage means that's the name of the stored procedure. It could also be a synonym pointing to the stored procedure. If the stored procedure was part of a package, the last part of the URL would be pkg_name.proc_name instead of just proc_name.

I find it a little annoying the way dots are significant in the last part of the URL, because it keeps me from using URLs that end with "filename.ext", as is typical of other web development environments. There's always the Apache module mod_rewrite to mangle the paths of URLs into some friendlier form, but that introduces complexity, and involves coordination with other groups if you don't own the web server.

But the path part of the URL is just the appetizer. The real API is the query string, the part of the URL after the "?" character. For example, in a URL like this:

http://www.oracle.com/pls/db102/search?remark=quick_search&word=john+russell&tab_id=&format=ranked

the part after the ? represents the parameters to the procedure. (The top-level subprograms that you call to generate a page in a PL/SQL web app are always procedures, not functions.) We can see that the procedure takes a REMARK parameter, a WORD, parameter, a TAB_ID parameter, and a FORMAT parameter.

Perhaps there are other parameters, but if so, they must have default values, which makes them optional for the URL. We can't infer anything about the order of the parameters; they can be specified in the URL in a different order than they're declared in the procedure, and mod_plsql takes care of it.

Each parameter is part of a name/value pair separated by the = sign. The & character is the separator between each name/value pair. Since there's nothing between the = sign and the & character following TAB_ID=, that parameter is getting passed a null.

The + in the parameter value "john+russell" above represents a space, in the standard web URL encoding scheme. Because we need a way to pass special characters like &, =, space, /, characters outside the ASCII range, etc. we can represent them as %xx, where xx is 2 hex digits representing the character code. Space is represented by either + or %20 (ASCII 32). And it follows that real % and + characters are represented by their own %xx sequences.

Your PL/SQL procedure can receive VARCHAR2 values with all of these characters with no trouble, the mod_plsql module translates all the values for you when it invokes your procedure. But if you generate a page of HTML from a PL/SQL procedure, and that page includes links to other PL/SQL procedures including parameters, then you do have to URL-encode the value part of each name/value pair.

Encoding most special characters is a relatively straightforward matter, but you have to take care with the characters that are special within the URL. For example, you wouldn't URL-encode the part of the URL that normally includes / characters, or up to the ? character that begins the parameters. (That last portion of the URL is known as the "query string".) Within the query string, you wouldn't URL-encode the = characters in the name/value pairs, and you especially wouldn't URL-encode the & characters that separate the name/value pairs. In fact, to be strictly correct, you must turn each & separator character into the sequence &, so that the browser or whatever user agent parses your HTML file doesn't find a lone unescaped & character in the middle of a string literal. That makes for invalid XML and will trip you up when you start using XHTML content-type declarations, strict file validators or link checkers, and so on.

The & business is the trickiest part of basic PL/SQL web development. Normally the & character is used by SQL*Plus for substitution variables. So even to use it in a program compiled by SQL*Plus, you either have to issue SET DEFINE OFF to turn off substitution variables entirely, or do something like

define amp = &
define urlamp = '&'

and put &. in your strings everywhere you want a normal &, and &urlamp. everywhere you want &. So your search-and-replace code that replaces & with & winds up looking like:

expanded_separators := replace(uneexpanded_separators,'&.','&urlamp.');

When a link coded like <a href="proc?parm1=val1&parm2=val2"> is displayed on a web page, any preview you see in the status bar will just show you regular & characters. There is a bit of a Catch-22 in that some older crawler software will fail to parse the & characters correctly, so your Apache log might fill up with invalid URL requests even though you coded the links according to the standards.

You must order any substitutions so that once you have introduced & or % characters into your encoded string, you don't substitute those characters a second time.

So, let's deconstruct an imaginary URL:

http://www.example.com/pls/my_schema/my_proc?parm3=val3&parm1=weird+string%28value%29%26%3D%2F%3C%3E

That means somewhere in your code, you have CREATE OR REPLACE PROCEDURE MY_PROC...

We put param3 first in the URL because the order of parameters in the URL doesn't need to match the order of parameters in the procedure; you just have to fill in values for all parameters that don't have default values.

The fact that we omitted parm2=val2 means the procedure will just use the default value for that parameter.

The value for parm1 includes some unusual characters that get quoted as + (space) or %xx (punctuation and non-ASCII characters). Your procedure doesn't have to decode the escape sequences, but it does have to encode them if you write out such a link.

The above URL with the & characters is what you would literally open up in your browser. But if you were writing out a link tag in your own page, you would write it as:

http://www.example.com/pls/my_schema/my_proc?parm3=val3&parm1=weird+string%28value%29%26%3D%2F%3C%3E

with & instead of & as the separator.

What about datatypes, you ask? mod_plsql will helpfully convert values to PL/SQL datatypes where possible. Most parameters can be declared VARCHAR2 if you want. NUMBER and INTEGER are also OK.

I stay away from anything with too exotic a representation, e.g. DATE. Better to pass it as a VARCHAR2, then call TO_DATE(param_value, 'format_string') inside the procedure. If there is a problem with the conversion inside the procedure, you can trap the exception and handle the error yourself. But if the error happens while mod_plsql is converting the value to the type of your parameter, you can't trap the exception and you will see an unhelpful (to the user) PL/SQL stack dump instead of the regular page.

Boolean is one common type that isn't supported for parameters in a mod_plsql URL. Use INTEGER and pass a 0 or 1 instead. The non-support for Boolean is handy in that you can put a required Boolean parameter on a procedure, which ensures that it can't be called unexpectedly by someone constructing a mod_plsql URL, trying to hack into your pages. Although I wouldn't rely on that as the primary security for a procedure, since mod_plsql could always add support for Boolean parameters.

You also commonly see URLs with the same parameter repeated multiple times, such as:

http://www.example.com/pls/my_schema/my_proc?name=Joe&name=Fred&name=Sue

It feels intuitive that the procedure is getting an array-like data structure with 3 elements. But what type should it be exactly? That's not quite so intuitive. If you use a TYPE declaration to make a new kind of VARRAY in your own procedure, mod_plsql can't see it early enough for you to declare parameters of that type. So what you really need is a package that declares a type that's a VARRAY or nested table of VARCHAR2s, then declare the procedure parameter as YOUR_PACKAGE.YOUR_TYPE. Luckily, there is a predefined type like this, OWA_UTIL.IDENT_ARR. But the elements of that type aren't especially long, so if you need to pass sizeable values, you'll need to make your own type.

A further complication comes if you want to make a multiple-occurrence parameter optional, that is, to make both of these URLs work:

http://www.example.com/pls/my_schema/my_proc?name=Joe&name=Fred&name=Sue
http://www.example.com/pls/my_schema/my_proc?nickname=Wheezy

You need to declare a default value in your procedure for the NAME parameter, but how do you do that for a type that you just declared? You can't just make the default be NULL. You need to make a real but empty instance of that type.

The answer is to declare inside the same package as the type, an empty instance of that type (for index-by tables a.k.a. associative arrays), or a function that returns an empty instance of that type (for nested tables or varrays). Something like:

-- In the package spec.
type type1_t is table of varchar2(100);

type type2_t is table of varchar2(100) index by pls_integer;
empty_type2 type2_t;

-- In the package body.
function empty_type1 return type1_t is
begin
return type1_t();
end;

Then you can declare your procedure like so:

create or replace procedure my_proc
(
name my_package.type1_t := my_package.empty_type1(), -- Construct nested table with no elements
nickname varchar2(100) := null
)
...

or

create or replace procedure my_proc
(
name my_package.type2_t := my_package.empty_type2, -- Construct associative array with no elements
nickname varchar2(100) := null
)
...

With parameters set up like this, you can call the procedure with one or more NAME values, or leave out the NAME parameter and pass NICKNAME instead.

A useful pattern for procedures that generate a web page is to have null or empty defaults for all parameters, and make the procedure display a minimal page (typically with HTML form elements) when invoked with no parameters, and display a more elaborate page when invoked with parameters. For example, a search page invoked with no parameters can display just the search box. When invoked with parameters, it can display results too. Instead of pointing the form tag to a different URL, point it to the same procedure. For example, if the procedure name is SEARCH, generate a form tag like so:

<form action="search" ...>

This example also illustrates how procedures within the same schema can link to each other without elaborate paths or qualifiers in the URLs. Just put the procedure name in the action= attribute of a form tag, or the procedure name + the query string in the href= attribute of an anchor ("a") tag.

Planning and preparation like this is all well and good, but how do you actually generate the HTML for the page? Ah, that's another post coming up!

4 comments:

Dimitri Gielis said...

Hi John,

Did you already have a look at Oracle Application Express (APEX)?

You can see that as a whole API to build PLSQL web applications ;-)

Ofcourse it goes a lot further...

Dimitri

Charles D said...

One way to avoid exposing all this is to use AJAX techniques. I've done some stuff where the main page is static html and all the guts are hidden in javascript. I also find that the more I do of this stuff, the more I prefer writing good ol' HTML and wrapping it in htp.p(). One friend of mine has applications that store the HTML in the database and just uses htp.p to assemble the page.

John Russell said...

Dimitri, yes, I have used Application Express and like it. I'll get to that a little in a followup posting, about the different ways to actually generate the pages. This first posting is to lay some groundwork, which will be useful whether you roll your own pages, use APEX, or integrate your own procedures with APEX pages.

John Russell said...

D.L., yes, AJAX is a subject for a followup post. Notice I put "1.0" in the title of this one, when you're talking AJAX you're talking "2.0". :-)