Tuesday, April 8, 2008

Unlikely Combos: Deleting Data

One of the challenges I see in understanding how the Oracle Database works is when features for similar tasks don't fall into neatly organized categories. It's obvious that this feature should be documented over here, and that one over there, but where can you discuss them side-by-side? Maybe the features are from different functional areas, and it's tough to find a single reviewer who can verify. Maybe the feature is being used in ways not originally intended, which prevents a technique from getting officially endorsed.

Well, that's where a blog comes in. Personal opinions all the way!

I'll write some posts with features that I think of as addressing similar tasks, falling along a spectrum of convenience, capacity, and complexity. First up: deleting data.

The basic method for deleting data is the SQL DELETE statement. Well duh! You can delete one or a zillion rows in a table, or a subset of rows based on simple or complicated conditions. Everyone knows that.

delete from the_table;
delete from the_table where col_value > 10;
delete from the_table where col_value in (select some_other_col from some_other_table);

But as the volume of data grows or the conditions get more tangled, other techniques start to look attractive.

Next up is the PL/SQL FORALL statement. You can use this loop-like technique in situations where normally you would issue a sequence of regular DELETE statements, because it's too complicated to construct an IN list or what have you. You'd put the construct inside an anonymous block, stored procedure, or trigger.

forall i in 1..the_count
delete from the_table
where col_value = collection_element(i);

In real life, your WHERE condition would probably be more complicated, with different clauses referencing the i'th value of different collections.

forall i in 1..the_count
delete from the_table
where col_value between lower_bound(i) and upper_bound(i)
and col_string not like '%' || string_pattern(i) || '%';

When I would normally write a little Perl script to spit out thousands of DELETE statements, instead I make the script generate an anonymous block that sets up one or more big PL/SQL collections, then runs a little FORALL-DELETE loop at the end. FORALL batches all the requests, so there's a lot less network traffic and other overhead. You can use the same technique to speed up large groups of related INSERT or UPDATE statements too.

Both DELETE and FORALL do all the nice Oracle behind-the-scenes bookkeeping that we all know and love. But sometimes, all you want is for a lot of data to be gone, ASAP. For example, in a data warehousing scenario, you might want to keep emptying a table and immediately re-filling it. That's where the TRUNCATE statement comes in. It skips some of the bookkeeping, so for example index statistics can go stale and the database doesn't reclaim space like it could. But it's fast.

truncate table the_table;

The opposite scenario is when you only have a little data to delete, but you want to delete it frequently. Maybe a stored procedure needs a little work area that isn't needed after the COMMIT, or after a web session finishes generating an HTML page and disconnects. That's when the temporary table comes in. It empties itself with no fuss, no muss, either when the session commits or disconnects, your choice.

Oracle temporary tables are faster to populate and clear than normal tables, although they don't totally avoid behind-the-scenes overhead. The key thing to remember, if you're a SQL Server expert, is that creating and dropping them is just as expensive as for a regular table, so you want to create a single temporary table as part of application setup, and let it live forever. Oracle takes care of deleting the data when you're through with it, and also keeps different sessions from seeing each other's data. So several sessions can be messing with the same temporary table at once, and each only sees the data it put there.

-- To clear the table as soon as the transaction ends...
create global temporary table t1 (scratch_value number) on commit delete rows;
-- To clear the table as soon as the session ends...
create global temporary table t2 (key_value varchar2(16)) on commit preserve rows;

OK, back to TRUNCATE. Sometimes, you want a lot of data to disappear quickly, but then the lack of bookkeeping bites you later. I've had tables where I truncated a couple of hundred thousand rows before loading new data, and then anything involving the new data was slow. I could go through the process of generating new stats, but then the whole operation takes as long as using DELETE for the whole table. Sometimes -- again usually in data warehousing situations where nobody else is using a table -- the practical thing to do is DROP TABLE followed by CREATE TABLE to recreate it.

Another kind of table that you can think of like a temporary table is the external table. The data comes from an outside text file, so you can imagine that the data ceases to exist (within the Oracle sphere of control, anyway) as soon as you finish querying it. It takes up space on the filesystem, but not in the normal tablespace and datafile regime. I haven't personally used external tables, but I've considered it for data that I thought of as "read once and forget".

1 comment:

Gary Myers said...

Nice. If you're up for a followup, how about the term "space".
Had an issue raised by the DBAs once which, after being filtered up and down two layers of management, informed us developers that the database was "running low on space".
Was this a problem with OS files (log files, trace files or whatever) ? Was it space in a table or in a tablespace ? Or did we need to shrink one tablespace to let another grow ?
And this was before 10g and its recyclebin.