Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Wednesday, April 19, 2006

Deleting Duplicate Records

I was recently asked to review some "possible interview questions". One of it was -

Suppose you have a table that looks like:

Name Null? Type
------------------------------- -------- ----
OWNER NOT NULL VARCHAR2(30)
TABLE_NAME NOT NULL VARCHAR2(30)
SEQUENCE_NAME NOT NULL VARCHAR2(30)
DERIVED_COLUMN_NAME NOT NULL VARCHAR2(30)

Because of a logic bug, the table has been filled with numerous duplicate rows (for example, there might be 120 distinct values of the four columns, but the table has 1200 rows).

Write an anonymous PL/SQL block that would remove all the duplicate rows.

ANSWER:



declare
   cursor ccc is
   select owner, table_name, sequence_name, derived_column_name,
          count(*) cnt
     from iview
 group by owner, table_name, sequence_name, derived_column_name;


begin
  for c in ccc loop
    delete from trm_tab_sequences
     where table_name = c.table_name
       and sequence_name = c.sequence_name
       and derived_column_name = c.derived_column_name
       and rownum < c.cnt;
    end loop;
end;
/

Now, I do not really agree with the answer. In fact, I do not agree with the question too.

No one should write a "PL/SQL block" to do things like this. This should ALWAYS be done in SQL.

There are at least two quick ways to delete the duplicate records -

Method ONE -
--------------


DELETE FROM iview t1
   WHERE ROWID <> ( SELECT MAX(ROWID)
                       FROM iview b
                      WHERE b.owner = t1.owner
                        AND b.table_name = t1.table_name
                        AND b.sequence_name = t1.sequence_name
                        AND b.derived_column_name = t1.derived_column_name);

Method TWO -
--------------


DELETE FROM iview WHERE ROWID IN
  (SELECT rid
     FROM
    (SELECT ROWID rid,
          row_number() over (PARTITION BY owner, table_name, sequence_name, derived_column_name ORDER BY ROWID) rn
         FROM iview )
   WHERE rn <> 1 );

The second solution uses "analytic function" and is more elegant. However, based on some tests - depending on what columns are duplicated and are indexed - the first solution could indeed be faster in some cases.

posted by Nilendu # 2:28 PM

Comments: Post a Comment

<< Home

Oracle Performance and Engineering

Wednesday, April 19, 2006

Deleting Duplicate Records

Archives