Random notes on database features, code, practice, optimization and techniques.
Recently, there was this interesting case where DELETING DUPLICATE ROWS from a large history table (10m+ rows) in batch of 1000 was super slow.
Here's how the trace output looked like (when it was slow) --
DELETE FROM E1
WHERE E1.ROWID != (SELECT
MIN(E3.ROWID) FROM E1 E3
WHERE E1.SOURCE_ID = E3.SOURCE_ID AND E1.SOURCE_ENTITY_ID = E3.SOURCE_ENTITY_ID)
AND ROWNUM < 1001
8.06 seconds
Statistics
----------------------------------------------------------
0 recursive calls
7082 db block gets
1724270 consistent gets
4947 physical reads
666564 redo size
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 9.35 10.22 4929 1724270 7082 1000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 9.35 10.23 4929
1724270 7082 1000
Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE
1000 COUNT STOPKEY
1000 FILTER
855489 TABLE ACCESS FULL E1
855489 SORT AGGREGATE
1219705 INDEX RANGE SCAN E1_N1 (object id 56159)
After changing MIN to MAX the query was doing much less work (and thus much faster). Statistics below -
DELETE FROM E1
WHERE E1.ROWID != (SELECT MAX(E3.ROWID) FROM E1 E3
WHERE E1.SOURCE_ID = E3.SOURCE_ID AND E1.SOURCE_ENTITY_ID = E3.SOURCE_ENTITY_ID)
AND ROWNUM <>
- 02.44 seconds
Statistics
----------------------------------------------------------
0 recursive calls
7101 db block gets
496066 consistent gets
1417 physical reads
671584 redo size
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 2.45 2.75 1389 496066 7085 1000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 2.45 2.75 1389 496066 7085 1000
Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE
1000 COUNT STOPKEY
1000 FILTER
246650 TABLE ACCESS FULL E1
246650 SORT AGGREGATE
248221 INDEX RANGE SCAN E1_N1 (object id 56159)
So, why just changing MIN to MAX changed the number of blocks read from 1.7 million to 0.5 million?
My explanation - the difference in work done (and thus, time) is fully obtained in the first step of the execution i.e., " INDEX RANGE SCAN E1_N1 (object id 56159)" returns 248K rows to the next level when we use MAX whereas the same " INDEX RANGE SCAN E1_N1 (object id 56159)" returns 1.2 million rows when we use MIN.
SORT AGGREGATE can be defined as (from Oracle Docs.) "Retrieval of a single row that is the result of applying a group function to a group of selected rows".
Here the "group function" is MAX or MIN.
E1_N1 is a non-unique index on (SOURCE_ENTITY_ID, SOURCE_ID).
Stats on this index -
BLevel = 2
Leaf Blocks = 69012
Distinct Keys = 6.4 million
Avg. Leaf Blocks per key = 1
Avg. Data Blocks per key = 3 (means, a key in an index block may have been spread over 3 data blocks in table)
Clustering Factor = 23M
Num Rows = 23 Million
Size = 545MB
A little theory -
The structure of a numeric B-tree index somewhat looks like -
+---------------+
| 50 and less--|--------+
+------|--more than 50 | |
| +---------------+ |
| |
| |
| |
+---------------+ +---------------+
+-----|-->= 100 | | >40 to 50 |
| | >90 to 100---|---+ | >30 to 40 |
| | >80 to 90 | | +---|-->20 to 30 |
| | >70 to 80 | | | | >10 to 20----|---+
| | >60 to 70 | | | | >0 to 10 | |
| | >50 to 60 | | | | less than 0 | |
| +---------------+ | | +---------------+ |
| | | |
| | | |
| | | |
| | | |
+---------------+ +---------------+ +---------------+ +---------------+
| 101, rowid | | 91, rowid | | 21, rowid | | 11, rowid |
| 102, rowid | | 92, rowid | | 22, rowid | | 12, rowid |
| 103, rowid |<->| 93, rowid |<->| 23, rowid |<->| 13, rowid |
| 104, rowid | | 94, rowid | | 24, rowid | | 14, rowid |
| .... | | .... | | .... | | .... |
+---------------+ +---------------+ +---------------+ +---------------+
The lowest level blocks are "leaf" blocks. The second level blocks are "branch" blocks.
As we know, an index (B-tree index or "normal" index) "leaf" block entry (where the actual keys are stored) contains a row with two columns. One, the actual value of the index (in this case concatenated value for SOURCE_ENTITY_ID, SOURCE_ID) and, two, the physical address of the data row (ROWID) that contains the key. In more technical terms, leaf blocks are "doubly linked list".
Each "branch" (the higher level blocks than "leaf") index block exntry contains a maximum key value and a data block address (not a ROWID, but just BLOCKID).
This particular combination of two columns, as we see from INDEX STATS, has a lot of repeating values. Only about 6 million distinct values out of a 23 million total rows. So, when we're using MAX - there's a high chance that Oracle can navigate more number of "branch" blocks and not as many leaf blocks since it can get the MAX value and the range of the values in the leaf blocks under. Thus if a leaf block has 10 repeating values, MAX will get the MAX value of the range directly from the branch block, without even going to the leaf.
Due to the particular data skewness, this will result a fairly lower number of IOs. Very roughly, about 3.6 times less (23m total rows / 6.3 distinct rows). In reality, from TKPROF output we see - 3.47 (1724K "min" IOs/ 396K "max" IOs) times less Logical IO!!
However, this was a very random "test" data generated to try out performance issues. On the production instance of the database I was analyzing this on, both MIN and MAX queries followed not only the same plan - but, smartly enough, similar amount of IO.