Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Wednesday, April 19, 2006

 

Deleting Duplicate Records

I was recently asked to review some "possible interview questions". One of it was -

Suppose you have a table that looks like:

Name Null? Type
------------------------------- -------- ----
OWNER NOT NULL VARCHAR2(30)
TABLE_NAME NOT NULL VARCHAR2(30)
SEQUENCE_NAME NOT NULL VARCHAR2(30)
DERIVED_COLUMN_NAME NOT NULL VARCHAR2(30)

Because of a logic bug, the table has been filled with numerous duplicate rows (for example, there might be 120 distinct values of the four columns, but the table has 1200 rows).


Write an anonymous PL/SQL block that would remove all the duplicate rows.

ANSWER:


declare
cursor ccc is
select owner, table_name, sequence_name, derived_column_name,
count(*) cnt
from iview
group by owner, table_name, sequence_name, derived_column_name;


begin
for c in ccc loop
delete from trm_tab_sequences
where table_name = c.table_name
and sequence_name = c.sequence_name
and derived_column_name = c.derived_column_name
and rownum < c.cnt;
end loop;
end;
/


Now, I do not really agree with the answer. In fact, I do not agree with the question too.

No one should write a "PL/SQL block" to do things like this. This should ALWAYS be done in SQL.

There are at least two quick ways to delete the duplicate records -

Method ONE -
--------------


DELETE FROM iview t1
WHERE ROWID <> ( SELECT MAX(ROWID)
FROM iview b
WHERE b.owner = t1.owner
AND b.table_name = t1.table_name
AND b.sequence_name = t1.sequence_name
AND b.derived_column_name = t1.derived_column_name);



Method TWO -
--------------


DELETE FROM iview WHERE ROWID IN
(SELECT rid
FROM
(SELECT ROWID rid,
row_number() over (PARTITION BY owner, table_name, sequence_name, derived_column_name ORDER BY ROWID) rn
FROM iview )
WHERE rn <> 1 );



The second solution uses "analytic function" and is more elegant. However, based on some tests - depending on what columns are duplicated and are indexed - the first solution could indeed be faster in some cases.


Comments: Post a Comment



<< Home

Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?