Thursday, May 31, 2007

Less is More, More or Less

Do you fret every time you declare a PL/SQL variable, that you've carelessly allocated a few unnecessary bytes, and that this wastage will be replicated thousands of times as your code is called from a bazillion sessions, and that that will be the straw that breaks the camel's back and makes your server thrash to a halt?

Well, calm down. It's not that bad.

If you don't know exactly how much space is needed for a PL/SQL variable, the most space-efficient thing you can do is counterintuitive: declare a gigantic variable. For example:
email_address varchar2(32); -- Could give error from being too small
email_address2 varchar2(512); -- Probably no error, but wastes space
email_address3 varchar2(32000); -- No error, no wasted space

Once your variable is declared as 2000 characters or more, the variable is no longer statically allocated to accomodate its maximum size; instead, it's dynamically allocated every time it's assigned, based on the exact size that's really needed.
email_address2 := ''; -- Rest of 512 characters are wasted empty space
email_address3 := ''; -- Although declared with length 32000, only 24 characters are allocated

VARCHAR2s declared with huge lengths make great stocking stuffers, such as collection elements. You can make complex data structures that potentially can represent big documents (each line up to 32K), yet don't take up any more space than is really needed.

Now let's look at a complementary technique, where again we keep the allocated size under control.

When you want a variable to hold the contents of a table column, there is a simple way to match the right length:
var1 my_table.my_column%type;

If my_table.my_column is declared VARCHAR2(600), then that's what var1 will be too. No worries about selecting a value from the table and getting a length error. If you run ALTER TABLE to change the length of the column, next time your code is run, it will pick up that change automatically and adjust to the new length.

Maybe your selects are not quite so simple. Say you want to select col1 || col2 and store the results in a variable. How...?
cursor my_cursor is select col1 || col2 concatenated from my_table;
var2 my_cursor%rowtype;

We let Oracle figure out the maximum length of the concatenated columns, then piggyback on that result to make a variable of the same type. (Actually, it's a record variable that has a field of the same type; to refer to the field, we'd specify var2.concatenated.) Again, we're safe from ALTER TABLE.


Eddie Awad linked to this post from his blog, and I see that post attracted some comments that I should address here.

Looks like, in 10g, the limit has been raised from 2000 characters to 4000 characters. I like it when PL/SQL limits align with SQL limits, and the number 4000 is significant because that's the biggest VARCHAR2 you can stuff in a table. So you don't have to worry about declaring a variable of type TABLE_NAME.VARCHAR2_COLUMN%TYPE and having this kind of dynamic allocation happen or not depending on the size of the column.

Anyway, I'll just reiterate my advice from the PL/SQL Guide: don't try to finesse the VARCHAR2 length depending on knowledge of the exact limit; if you prefer dynamic allocation for a variable, use a huge length. I always use 32000. A PL/SQL guru I know uses 32500. (I haven't seen anyone use 3276(7|8), proof that nobody likes to tempt the gods of boundary conditions.)

Performance guru Jonathan Lewis discusses some detailed performance / scalability implications of using big or different-sized VARCHAR2 variables. The kind of code that would use huge VARCHAR2s, in my experience anyway, is not inserting into or querying those values from SQL tables; rather it's building data structures like the buffers used in HTP.P() or DBMS_OUTPUT.PUT_LINE(). So I'll keep that performance tip in mind for DML-intensive code, but not worry so much for code that mainly does string manipulation and displays the results as a web page via mod_plsql.

Friday, May 18, 2007

Quotable Quotes

Spare a kind thought for the hard-working yet underappreciated quotation mark. Single quotes or double quotes, you ask? Exactly!

If you're familiar with other languages such as Perl or Javascript, you probably intermix single and double quotes as needed. Double quotes around strings containing single quotes. Single quotes around Perl strings containing $ or @ characters, so they aren't interpolated as embedded variables.

With PL/SQL, you'll almost always use single quotes. The main exception is around identifier names, which you can enclose in double quotes if they contain special characters or if you need to force them into mixed case or lowercase:

execute immediate 'create table "My_Table" ...';

Without those double quotes, the new table would be known internally (in the USER_TABLES view, etc.) as MY_TABLE. The double quotes force the object name to be exactly as entered, meaning you would get an error from "select * from MY_TABLE" or "select * from my_table". (In this last case, the name my_table is translated silently to MY_TABLE.)

So, that's it then. Use single quotes for string literals, nothing more to learn.

var1 := 'Hello world!';

But what happens when we start dealing with real-world cases?

var2 := 'Oracle Database Administrator''s Guide';

Wait a second! What's with the 2 consecutive single quotes? That's the notation to escape a single quote inside a string literal. What a bother. If you want to generate a text file with a bunch of insert statements, you have to run a search/replace on each string to create 2 of each single quote. And you have to do it before wrapping single quotes around the data. When done in PL/SQL, this leads to one of my favourite PL/SQL statements of all time:

var3 := replace(var2,'''','''''');

We are only replacing 1 single quote with 2 single quotes, but between the surrounding quotes for the literals and the doubled quotes to escape them inside the literals, it looks like an entry in the Obfuscated C Contest.

Luckily, in Oracle 10g there is a way to avoid all the escaping. It still requires you to predict what characters your string might contain, but within that constraint it is much more readable:

var4 := q'{Oracle Database Administrator's Guide}';
var5 := q'[I'm the king of the world!]';
var6 := q'#Mommy's all right, Daddy's all right...#';

With the q literal notation, the first and last characters inside the quotes are not part of the string value, and single quotes are OK since you are making sure that the closing sequence (in the examples above, }', ]', and #') do not appear anywhere in the string. If the opening character after the q' is a "bracket"-style character, the matching end character is the corresponding close bracket. If the opening character is some other punctuation mark or what have you, the same character is used immediately before a single quote to end the literal.

Where this really comes into play is in writing out code of one kind or another from PL/SQL. In a web application, you might be writing out HTML full of double quotes, Javascript full of single quotes, or a combination:

obj.innerHTML = '<div style="float: right;">';

It's very comforting to know you can wrap the whole thing in a PL/SQL string literal without doing any replacements inside the string:

var7 := q'|obj.innerHTML = '<div style="float: right;">';|';

Again, you'll have to know based on your own coding conventions or a priori that the closing delimiter character doesn't appear immediately before a single quote.

I like to combine this technique with a SQL*Plus trick I mentioned in an earlier post:

var8 :=

Now that Javascript is available to PL/SQL and can be plunked into a web page, concatenated with other source, searched/replaced, etc. It doesn't have to contain any escaped '' sequences and so can be reused with other languages, edited by a non-PL/SQL programmer, run through a Javascript validator, or anything else that strikes your fancy.

Sunday, May 6, 2007

The Humble IF Statement

If I have one piece of wisdom to impart, it's this: pay attention to your IF statements!

Seems like the hardest part of knowing a dozen languages is keeping track of the different syntaxes for simple IF statements. (I took a tour of an Internet startup pre-dotcom crash, and saw my PL/SQL User's Guide open on a desk -- not to BULK COLLECT or the exception model, but to the page for the IF statement.)

PL/SQL is relatively verbose but straightforward: IF this THEN that END IF;

Python is a little more concise, leaving aside the line breaks and indentation:

if this:
elif the_other:

I could wish for a little more Perl-style forgiveness; I keep leaving "then" in by mistake.

My simultaneous favourite and least-favourite IF syntax is from Korn shell:

if [ this ]; then

"then" is its own insignificant statement, and so requires the semicolon after the condition to format the lines the intuitive way. The conditions are a weird mix of UNIX-style flags (-f "$filename" to check for file existence) and FORTRAN-style comparison operators ("$string1" eq "$string2").

I can never remember whether the semicolon is needed, or the details of the comparison operators, or sometimes even whether the condition is enclosed in [ ] or [[ ]]. Don't forget that spaces are needed after the [ and before the ], it's an error otherwise. So for me, ksh is the toughest language to switch back into on a moment's notice.

Now this post isn't just a language comparison. There are some shortcuts you can do with IF logic in PL/SQL.

If you find yourself writing statements like:

if x is not null then
y := x;
y := 'some default value';
end if; can use the NVL() function instead:

y := nvl(x,'some default value');

Because this is a SQL function too, you can use it in queries the same way:

select name, course, nvl(grade,'Incomplete') grade from coursework;

For any row where GRADE is null, the value that comes back will be 'Incomplete'. Since we're plugging a function in instead of a column value, the trailing "grade" is helpful to give a name to the expression value, in case we wanted to wrap the query into a nested query or implicit FOR loop in PL/SQL.

The next level of IF is the CASE expression. This lets you replace a whole sequence of IF/THEN/ELSE clauses with a single expression:

letter := case
when grade between 85 and 100 then 'A'
when grade between 75 and 84 then 'B'
else 'Kiss the honor roll goodbye'

Old-time PL/SQL programmers might be in the habit of doing:

select nvl(something,something_else) into variable from dual;

as a way to do this kind of IF-NULL-THEN assignment for a single value. You can get rid of the query by using NVL in a regular PL/SQL assignment, or (my preference) turning the NVL into a CASE.

It's not the compactness that makes CASE a win for me, it's the fact that (as an expression) you can use it in the middle of some other statement where normally it would break up the flow of thought or require you to create dummy variables to hold intermediate values:

something varchar2(32000) :=
when release = 1 then 'value for release 1'
when release = 2 then 'value for release 2'
else 'some other value'

procedure my_proc
param1 varchar2 := 'xyz',
param2 varchar2 := 'abc',
param3 number := case when to_char(sysdate,'Day') = 'Tuesday' then 7 else 2 end
) is ...

PL/SQL's CASE comes in 2 forms, the expression that you can plug in anywhere as shown above, and the statement that must stand alone and ends with END CASE instead of just END:

when percent_full < 75 then
when percent_full between 75 and 99 then
when percent_full = 100 or cpu_too_high = true then
dbms_output.put_line('Some anomalous condition!');
end case;

The statement format lets you do multiple things in response to any condition, and you can test different variables in different WHEN clauses. You aren't just returning a value, but running full statements in each WHEN block. I don't use the statement format very often though. I find it is a little confusing to distinguish between the two forms (usually by checking whether the last part is END or END CASE) when reading existing source. And if you don't properly account for all possibilities in a CASE statement and execution falls off the end, you get a CASE_NOT_FOUND exception, so it requires extra care in constructing the conditions to always match one of the WHEN conditions. (You can trap the exception, but it can be a shock the first time you get one when you hadn't intended the CASE statement to cover every possibility.)

Saturday, May 5, 2007

PL/SQL vs. LISP, Scheme, and friends

Getting my Comp. Sci. degree from about as far away from Silicon Valley as you can get in North America, I was curious what the prestigious US schools teach that's so great. So just for kicks, I started going through podcasts for the CS courses at UC Berkeley. The teaching language they start out with is Scheme, a LISP-like language. Walking through the coursework in my head, I couldn't help but draw comparisons with PL/SQL.

There is a section in the PL/SQL User's Guide and Reference on recursion. It was there before I owned the book, and it's still there after I gave it up. Listening to UCB focus on functional programming from the get-go clarified for me exactly what questions that blurb about PL/SQL recursion is trying to answer.

Yes, PL/SQL can do recursion like Fibonacci sequences, where you're passing numbers around. But it isn't really set up for recursion in the LISP or Scheme sense, where you're recursing through a list by chopping elements off at the front or back, and possibly returning a value that is some new set of elements. When you have a PL/SQL collection, the only practical alternative is to loop through all the elements and do arithmetic on the subscripts. (After all, the P in PL/SQL is for "procedural".)

You could fake this kind of recursion to some degree, but PL/SQL's type strictness makes it awkward. Plus with the types of problems you're solving in PL/SQL, you like to visualize exactly how much memory is being allocated and keep things simple so your programs scale well and play nicely with other work happening on the same database server. Functional programming asks you to take a leap of faith that you won't run out of memory, and a logic bug that causes an infinite recursive loop is something to be expected during debugging. Database programming occasionally asks for the same type of faith -- say when querying several big tables joined together -- but the DB programmer expects to be able to look at the distribution of data and the optimizer plan for the query.

Another interesting aspect in Scheme is that functions are first-class items. You can pass a function (rather than a function result) around as a parameter, assign it to a variable, and use it as a member of a composite data structure. It's not quite like the Wild West days of function pointers in K&R C, but it's close.

PL/SQL stored procedures are intimately entangled with the database in ways that don't mesh with that way of thinking. Creating a PL/SQL subprogram is a relatively expensive operation: CREATE OR REPLACE PROCEDURE/FUNCTION is a DDL statement that might block if someone is calling the previous version of that subprogram; DDL commits, so your transaction is over; the source of the subprogram gets squirreled away in a form that can be queried; DDL is syntactically more complicated to issue from a PL/SQL subprogram than DML statements like INSERT or UPDATE. If you have a bunch of subprograms defined and just want to pass symbolic references, you might wind up passing around the procedure or function names as strings and calling them via:

execute immediate variable_name || parameter_list;

But with all the bad guys trying to find ways to hack into a database, that approach raises the possibility of SQL injection attacks, by stuffing something more elaborate into the string used with EXECUTE IMMEDIATE.

Another approach is to define SQL types that inherit different variations of the same subprogram. Instead of passing around a variable X and having your code call X(), you pass around an object X and your code calls X.MEMBER_FUNCTION(). This can do the job if you know in advance all the variations of MEMBER_FUNCTION and can declare them in your code. But the LISP/SCHEME notion of functions returning functions likes to be able to make such things up on the fly, which would involve EXECUTE IMMEDIATE statements running CREATE OR REPLACE statements with big blocks of code. Again, I would stay away from this approach for security and performance reasons.

Perl vs. PL/SQL: Regular Expressions

One recent addition to Oracle's SQL dialect, and also PL/SQL, is Perl-like regular expressions.

It's very convenient to declare a CHECK constraint as a regular expression, using the REGEXP_LIKE operator. The data has to match a pattern, or it doesn't get into the database. The pattern could be all caps or all lowercase, some set of allowed characters, alphanumeric values with certain patterns, and so on.

It's also very convenient to set up a trigger that does a replace using regular expressions, using REGEXP_REPLACE instead of a combination of SUBSTR, TRIM, and REPLACE stretching out over dozens of lines. Back-references are particularly useful, say for recognizing a flexible pattern within a block of HTML and enclosing that pattern in tags for highlighting.

A couple of things are not quite ideal yet in the PL/SQL implementation. I have found that running REGEXP_REPLACE on certain moderately large strings can fail (and I've opened a bug for that). I wish there were more metacharacters supported; in the name of NLS, some \ sequences familiar from Perl are left out. And I am having an issue that I haven't fully diagnosed yet, doing backreferences in a pattern containing multiply nested parentheses.

Don't have code at hand to post samples; I'll try to come back to that later.