Sunday, November 2, 2008

Don't Fence Me In: All About Constraints


Constraints are simultaneously one of my most favorite and least favorite Oracle Database features. They're great for keeping bad data out of the database. They're a terrible imposition on object-oriented, agile, or <insert your favorite buzzword here> coding style. They save a ton of repetitive coding, writing the same logic in different languages. Hey, we already wrote all that redundant code in 10 languages and we'd hate for all those weeks of debugging to be wasted. Etc. etc.

If you're coming from a Java background, you might think of constraints like a straitjacket that forces your exception handling code to be organized a certain way. But that's the wrong way to think of it. Robust database code must TRY/CATCH for unexpected conditions anyway, in case the power goes out or the hard drive fills up just as you ask to do a completely innocuous INSERT or UPDATE. What if some maverick team member fires up SQL*Plus to make bulk updates to one table, bypassing all the Java application code and its error checking, without understanding the conventions that everyone agreed on for the data values, or the corresponding changes that must be made in other tables? The exception from a constraint is just another error that you expect never to happen, but you don't rely on it not happening.

If you're coming from a SQL, PL/SQL, or maybe more of a scripting background, or if your team programs in different languages or consists of fallible human beings, you can think of constraints like your very own bridge-keeper from "Monty Python and the Holy Grail". Whoever would insert or update table T must answer me these questions three, else an exception raised shall be.

For example, on a command like INSERT INTO GRAIL_KNIGHTS SELECT * FROM ROUND_TABLE_KNIGHTS...

  • If Sir Lancelot and Sir Robin both give the same answers, a UNIQUE constraint will catapult Sir Robin into the Gorge of Eternal Peril.
  • If Sir Galahad can't name a favorite color, a NOT NULL constraint will send him to the same fate.
  • If King Arthur gives the wrong airspeed velocity of an unladen swallow, a CHECK constraint will spell his end.

As a bonus...

  • If you don't count correctly when intoning the ritual, the Holy Hand Grenade of Antioch won't work because of its PRIMARY KEY constraint. (It's probably best to count to 3 via a sequence, and not write a psalm about that special value. In the database world, the ritual is different every time.)
  • If you are a French knight and receive a gift of a Trojan Rabbit, a FOREIGN KEY constraint will keep it out of your fort.

When I first joined Oracle, almost the first thing I had to do was document constraints in 8i, in the Application Developer's Guide - Fundamentals. That was a frustrating experience for me as an Oracle newbie. There was information about "benefits of constraints" -- but who exactly were we trying to convince, or was that just marketing fluff? You could set up constraints with CREATE TABLE or ALTER TABLE -- why one or the other? Who exactly would be issuing all these DDL statements to work with constraints, a developer or a DBA? Here was technology leaps and bounds ahead of the competition, but as a result, there was maybe less depth of understanding about how that feature would be used in the real world.

Years later, now I (think I) know some of the answers, but it's too late to rearrange and rewrite the documentation to my whims. (I make suggestions with the reader comment forms in the 11g library, same as anyone else.)

The "benefits" information is aimed at the crowd who like to put all their error-checking logic in the middle tier. If anything, I'd like to have more simple examples to extol the benefits, particularly for scenarios involving Java client code, to get a feel for best practices for exception handling.

The CREATE TABLE syntax is used by the experienced enterprise types who plan everything out in advance. The ALTER TABLE syntax is used by those who came around to the idea of constraints after their coding was already underway, or who want to experiment before deciding. (I call this the "try it and see" style of development.) I'd like to be able to bridge from one style to the other: here is the ALTER TABLE syntax, and if you decide this is a worthy change, here's the equivalent CREATE TABLE syntax to make in your installation script.

There's at least one counterintuitive case, where careful advance planning requires you to use ALTER TABLE instead of CREATE TABLE. If you want to periodically turn off constraint checking for a primary key, you must create the table without the primary key clause, create an index on the primary key column, and do an ALTER TABLE to add the primary key. The primary key will then use your index, which must not be a UNIQUE index so that later you can "defer" the primary key checking.

The extra keywords in the CREATE TABLE / ALTER TABLE syntax for constraints indicate how dogmatic you are about keeping bad data out of your tables. DEFERRABLE means you may some time in the future allow bad data in the table for a brief period, but you're confident you can fix any problems before committing. INITIALLY DEFERRED means you expect to do this cleanup the very first time you insert data, or that the table already contains bad data and you'll clean it up immediately after creating the constraint. NOVALIDATE means you don't want to test the data that's already in the table -- either you're really confident that it's good, or you don't care if there are problems -- but you do want to check future data that's inserted or updated.

Once upon a time, you would do ALTER SESSION SET CONSTRAINTS = DEFERRED. These days, you could instead use SQL, as in SET CONSTRAINTS ALL DEFERRED or SET CONSTRAINT MY_TABLE_FK DEFERRED.

When/why would you use all of those keywords? I guess it boils down to a combination of convenience and performance. Consider a development scenario, where you insert personal data that has minor problems with the data, perhaps bad capitalization in some of the names. You could run a fixer function as the data goes into the table, either by calling the function in the INSERT statement, coding it into a SQL*Plus datafile, or by setting up a trigger. But (a) each of those is less convenient than just loading the data and then running 1-2 UPDATE statements afterward, and (b) maybe you want to start crunching numbers that don't rely on the textual columns ASAP, so you don't want the cleanup processing to slow down the initial data load.

My favorite analogy here is to XML processing. We've all been given XML documents that were invalid for some ticky-tacky reason... an unquoted attribute like width=100, or an uppercase tag name like <P> in an XHTML document that requires lowercase tags. It's cumbersome to do anything with those documents as XML, the "constraint checking" is so tightly integrated into the parsers. Oracle SQL lets you choose how "constraining" such easily detected/fixed data inconsistencies should be -- fatal error, try again, or don't check until I've cleaned up the data.

The ability to rename constraints is another offshoot of the "try it and see" development methodology. If you create a constraint without giving it a name, it might be years before bit rot sets in and somebody actually tries to insert bad data. At that point, the error message will cite some unintelligible name like SYS_C000123. After some detective work querying the USER_/DBA_/ALL_CONSTRAINTS views, you'll track down that this name refers to some primary key, check constraint, or what have you, and you'll use the ALTER TABLE ... RENAME CONSTRAINT syntax to give the constraint a descriptive name, so future error messages will be easier to understand. Again, I haven't found a clear path in the documentation that leads from troubleshooting examples of ALTER TABLE syntax, back to the equivalent CREATE TABLE examples or syntax so you can plan ahead next time.

As to who actually needs the information about constraints, that's still fuzzy for me. The world is full of application developers who don't take enough advantage of stored procedures, triggers, and constraints. It's also full of DBAs who aren't allowed to poke around with business logic in application code. So anywhere the constraint best practices are documented, it never feels to me like they are reaching quite the right receptive and empowered audience. There's no "Oracle Database Jack-of-All-Trades User's Guide" for the person who fills both developer and DBA roles.

Further reading:

1 comment:

Brian Tkatch said...

Wow, good read.

Data types are nothing but formalized constraints. Imagine:

CREATE TABLE Tahiti_Views(
What CHECK (What IS DATA_TYPE VARCHAR2(100), When CHECK(When IS DATA_TYPE DATE); Ranges, FKs, and all the other were added later, and are considered a separate group. Too bad.

As for the difference between CREATE and ALTER, i like your definition; i think it is pretty much it. I would add that ALTER is used so script generation does not have to deal with dependencies. Just make all FKs in ALTER statements after all TABLEs have been CREATEd.

I have spoken to coders as to why they want to put rules in thier code as opposed to CONSTRAINTs in the DB. It boiled down to control. The coder wants to control everything. And wants other coders who can touch the DB to also control everything. That means the constraints are in the code. The DBA wants the DB to be reliable. That requires CONSTRAINTs in the DB itself. On the MBTI, this is basically a P/J difference. Ps want it in the code, Js want it in the DB. I tell coders that we should have a deal. I'll respect your program designs even if i disagree, but you should respect my decisions in the DB even if you disagree. Then i add CONSTRAINTs wherever i can. If only to force definitions.

DEFFERABLE CONSTRAINTs are at times a must. Imagine an INSERT ALL statement that INSERTs INTO two TABLEs that are FKed on a generated id. This can generate FK errors if Oracle INSERTs the data out of order. INSERTing them separately is impossible because the generated id must be consistent but there is no way to know which record got which id in the first TABLE to use the same id in the second TABLE. Unless a LOOP is used to generate one id and INSERT INTO both TABLEs one record at a time, but that is inefficient.

NOVALIDATE is also good when there is historical data before the CONSTRAINT was kept. The easiest way to not change the data but ADD the CONSTRAINT would be to use NOVALIDATE for the old data. Perhaps that cases does not come up often, but it is useful.

Commenting on why a CONSTRAINT is needed is a good idea. TRIGGERs, however, can include comments. Then again, TRIGGERs should be avoided in most cases.