
I've always been intrigued by the FIRST_ROWS hint, so I paid special attention when we reached it in the 11g SQL Tuning class. But I'm still puzzled.
The course notes said that although you shouldn't be using hints generally, when you do, FIRST_ROWS is the most useful of the hints (in the form FIRST_ROWS(n) where you specify how many rows to optimize for). Also that it's not effective when the query block contains any "sorting or grouping" operations.
Now, I always assumed that this hint would be used like so:
select /*+ FIRST_ROWS(10) */ * from
(
select count(*) howmany, x from t group by x order by count(*) desc
)
where rownum < 11;
which would get you the top 10 items from a particular table. Notice that the ORDER BY is in an inner block, so presumably not covered by the "no sorts in the block with the hint" restriction.
However, when you put a WHERE ROWNUM < (N+1) clause in the query, the plan will show a line with COUNT STOPKEY that presumably means the optimizer knows only N rows are needed. So shouldn't the hint be a no-op in that case? I always figured there must be some other case where it's needed. I've been trying to find some authoritative information.
If the WHERE clause uses a variable, e.g. ROWNUM < N, maybe the hint is just a way of suggesting what the likely value of N is, for purposes of COUNT STOPKEY. But that's just speculation on my part.
I got one suggestion that the FIRST_ROWS hint makes the optimizer spend less time looking for the ideal query plan, since it's not going to traverse the whole table anyway, speeding up the hard parse phase. Could that be the meaning of "optimized for throughput"? One of the SQL internals guys didn't think so.
I got another suggestion that it could really change the execution plan, but you would use it not in a top N situation, rather where you wanted arbitrary values. E.g. I wonder if certain values in my table are uppercase or lowercase, so I'll just look at a few, don't care which ones (and in real life might be some devilishly complicated join):
select /*+ FIRST_ROWS */ my_col from my_table where rownum < 6;
In this scenario, I could imagine a change in the execution plan producing different results, depending on how exactly the query ran through the index. But I don't have confirmation that could ever happen.
If you are getting the top N values from an inner query block, maybe the hint helps figure out how to efficiently process the inner results, which after all aren't a real table with indexes and such. That's just my speculation again.
The
SQL Reference states:
The optimizer ignores this hint ... in SELECT statement blocks that include any blocking operations, such as sorts or groupings. However, I find explicit advice and examples featuring ORDER BY
here in the Oracle Text docs. The
PSOUG page on hints is a little more specific in its language -- it mentions a bunch of things that render FIRST_ROWS inoperative, including GROUP BY and DISTINCT but
not ORDER BY; and it also uses ORDER BY in its examples.
So, after much thought... I'm back where I started. FIRST_ROWS is useful for doing top N queries... or for filtering top N queries done inside an inner block... or only for getting arbitrary values, not top N at all. It does this by speeding up parsing... or choosing a different explain plan (which might or might not change the order since no sorts are allowed)... or just confirming to the COUNT STOPKEY operation how many results are likely to come back from WHERE ROWNUM < N.
Any performance mavens who can say from hands-on experience which, if any, of these likely stories are true?
Addendum
I will update the original post, as new answers (or new questions) come in, and with responses to suggestions in the comments.
One other top-N-almost-but-not-quite scenario I sometimes encounter is this: display the top N results, but if there is a tie for Nth place, keep going until you've exhausted all the tied values. It's common for traffic analysis on new or lightly visited sites, where many pages may be tied with 1 visitor, and the value 1 makes the top N rankings. The intuitive, strictly correct thing to do is to include
AND howmany >= (subquery to find the value of Nth item; then either bump up the value of N in
WHERE ROWNUM < (N+1), or leave out ROWNUM entirely and just stop fetching after getting the top N values plus all successive values tied for Nth. I've never delved deep into what is the best performing way to do this, since like I say it usually happens where relatively small amounts of data are being analyzed. But I could surmise that FIRST_ROWS might help in cases where you omit
WHERE ROWNUM < (N+1), yet you stop fetching before exhausting the whole result set.
One of the reasons I stopped including FIRST_ROWS hints in my new queries was that mod_plsql always processes the whole procedure before returning any HTML output. So if there is a query that takes 5 minutes, it doesn't matter if FIRST_ROWS starts sending back output after a few seconds; the user won't see anything until the full 5 minutes are elapsed. (Unlike the client-side GUI situation mentioned by one commenter, where presumably the output could start appearing before all the result set rows were available.)
The
Jonathan Lewis blog post referenced by a commenter provides some useful information, but still leaves me wanting more information. I feel like I need a FIRST_ROWS FAQ, with short direct answers -- even if some of them are "it depends". Here are my FIRST_ROWS FAQs and the best answers I have:
Q: Any version dependencies?
A: The optimizer is smarter in 9i, meaning less need for the hint. It's smarter again in 10g (especially as regards
WHERE ROWNUM < (N+1) so even less need for the hint.
Q: Any effect on parse time, does FIRST_ROWS result in faster hard parses?
A: Have heard both "think so" and "don't think so".
Q: Does ORDER BY totally negate the performance benefits?
A: It depends. Sometimes the query results come back from an index already in the ORDER BY order, in which case ORDER BY doesn't hurt the potential FIRST_ROWS optimization. If the FIRST_ROWS hint goes on the outer block and ORDER BY is on a subquery... don't know if FIRST_ROWS might still help, or if the sorting in the subquery has already lost the chance to make the outer query any faster.
Q: Are those longstanding Oracle Text examples with ORDER BY wrong or obsolete?
A: Don't know for sure, see above. Might depend on results already being sorted and ORDER BY being a no-op. Might be obsolete based on optimizer improvements in 9i and 10g. Might be immaterial when using Oracle Text for a web-based application through mod_plsql, due to the lack of incremental output.
Q: Any difference when a query includes
WHERE ROWNUM < (N+1) using a variable N instead of a constant?
A: Don't know if bind variable peeking plays any role here. I know that I am tempted to use e.g. FIRST_ROWS_10 in cases where the number of items is a variable, but the default or most common case is top 10. However, given all the other factors above (particularly the interaction with ORDER BY), this might just be superstition on my part.
The comments in the Jonathan Lewis blog post discuss different approaches to displaying "top N results" combined with "page X of Y" like in a search engine. I personally prefer the technique "figure out the top N * pages results, then skip all pages prior to the current one". What I do in the Tahiti search engine is just grab the ROWIDs of those top N * pages results, find the desired range of values, then query based on ROWID; so I'm only retrieving the actual data for the 10 or so on the page that's really displayed.