Random notes on database features, code, practice, optimization and techniques.
I was recently asked to review some "possible interview questions". One of it was -
Suppose you have a table that looks like:
Name Null? Type
------------------------------- -------- ----
OWNER NOT NULL VARCHAR2(30)
TABLE_NAME NOT NULL VARCHAR2(30)
SEQUENCE_NAME NOT NULL VARCHAR2(30)
DERIVED_COLUMN_NAME NOT NULL VARCHAR2(30)
Because of a logic bug, the table has been filled with numerous duplicate rows (for example, there might be 120 distinct values of the four columns, but the table has 1200 rows).
Write an anonymous PL/SQL block that would remove all the duplicate rows.
ANSWER:
declare
cursor ccc is
select owner, table_name, sequence_name, derived_column_name,
count(*) cnt
from iview
group by owner, table_name, sequence_name, derived_column_name;
begin
for c in ccc loop
delete from trm_tab_sequences
where table_name = c.table_name
and sequence_name = c.sequence_name
and derived_column_name = c.derived_column_name
and rownum < c.cnt;
end loop;
end;
/
Now, I do not really agree with the answer. In fact, I do not agree with the question too.
No one should write a "PL/SQL block" to do things like this. This should ALWAYS be done in SQL.
There are at least two quick ways to delete the duplicate records -
Method ONE -
--------------
DELETE FROM iview t1
WHERE ROWID <> ( SELECT MAX(ROWID)
FROM iview b
WHERE b.owner = t1.owner
AND b.table_name = t1.table_name
AND b.sequence_name = t1.sequence_name
AND b.derived_column_name = t1.derived_column_name);
Method TWO -
--------------
DELETE FROM iview WHERE ROWID IN
(SELECT rid
FROM
(SELECT ROWID rid,
row_number() over (PARTITION BY owner, table_name, sequence_name, derived_column_name ORDER BY ROWID) rn
FROM iview )
WHERE rn <> 1 );
The second solution uses "analytic function" and is more elegant. However, based on some tests - depending on what columns are duplicated and are indexed - the first solution could indeed be faster in some cases.