天天看點

[引]在Oracle中如何利用Rowid查找和删除表中的重複記錄

平時工作中可能會遇到當試圖對庫表中的某一列或幾列建立唯一索引時,系統提示 ORA-01452 :不能建立唯一索引,發現重複記錄。

下面總結一下幾種查找和删除重複記錄的方法(以表CZ為例):

表CZ的結構如下:

SQL> desc cz

Name Null? Type

----------------------------------------- -------- ------------------

C1 NUMBER(10)

C10 NUMBER(5)

C20 VARCHAR2(3)

删除重複記錄的方法原理:

(1).在Oracle中,每一條記錄都有一個rowid,rowid在整個資料庫中是唯一的,rowid确定了每條記錄是在Oracle中的哪一個資料檔案、塊、行上。

(2).在重複的記錄中,可能所有列的内容都相同,但rowid不會相同,是以隻要确定出重複記錄中那些具有最大rowid的就可以了,其餘全部删除。

重複記錄判斷的标準是:

C1,C10和C20這三列的值都相同才算是重複記錄。

經檢視表CZ總共有16條記錄:

SQL>set pagesize 100

SQL>select * from cz;

C1 C10 C20

---------- ---------- ---

1 2 dsf

1 2 dsf

1 2 dsf

1 2 dsf

2 3 che

1 2 dsf

1 2 dsf

1 2 dsf

1 2 dsf

2 3 che

2 3 che

2 3 che

2 3 che

3 4 dff

3 4 dff

3 4 dff

4 5 err

5 3 dar

6 1 wee

7 2 zxc

20 rows selected.

1.查找重複記錄的幾種方法:

(1).SQL>select * from cz group by c1,c10,c20 having count(*) >1;

C1 C10 C20

---------- ---------- ---

1 2 dsf

2 3 che

3 4 dff

(2).SQL>select distinct * from cz;

C1 C10 C20

---------- ---------- ---

1 2 dsf

2 3 che

3 4 dff

(3).SQL>select * from cz a where rowid=(select max(rowid) from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);

C1 C10 C20

---------- ---------- ---

1 2 dsf

2 3 che

3 4 dff

2.删除重複記錄的幾種方法:

(1).适用于有大量重複記錄的情況(在C1,C10和C20列上建有索引的時候,用以下語句效率會很高):

SQL>delete cz where (c1,c10,c20) in (select c1,c10,c20 from cz group by c1,c10,c20 having count(*)>1) and rowid not in

(select min(rowid) from cz group by c1,c10,c20 having count(*)>1);

SQL>delete cz where rowid not in(select min(rowid) from cz group by c1,c10,c20);

(2).适用于有少量重複記錄的情況(注意,對于有大量重複記錄的情況,用以下語句效率會很低):

SQL>delete from cz a where a.rowid!=(select max(rowid) from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);

SQL>delete from cz a where a.rowid<(select max(rowid) from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);

SQL>delete from cz a where rowid <(select max(rowid) from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);

(3).适用于有少量重複記錄的情況(臨時表法):

SQL>create table test as select distinct * from cz; (建一個臨時表test用來存放重複的記錄)

SQL>truncate table cz; (清空cz表的資料,但保留cz表的結構)

SQL>insert into cz select * from test; (再将臨時表test裡的内容反插回來)

(4).适用于有大量重複記錄的情況(Exception into 子句法):

采用alter table 指令中的 Exception into 子句也可以确定出庫表中重複的記錄。這種方法稍微麻煩一些,為了使用“excepeion into ”子句,必須首先建立 EXCEPTIONS 表。建立該表的 SQL 腳本檔案為 utlexcpt.sql 。對于win2000系統和 UNIX 系統, Oracle 存放該檔案的位置稍有不同,在win2000系統下,該腳本檔案存放在$ORACLE_HOMEOra90rdbmsadmin 目錄下;而對于 UNIX 系統,該腳本檔案存放在$ORACLE_HOME/rdbms/admin 目錄下。

具體步驟如下:

SQL>@?/rdbms/admin/utlexcpt.sql

Table created.

SQL>desc exceptions

Name Null? Type

----------------------------------------- -------- --------------

ROW_ID ROWID

OWNER VARCHAR2(30)

TABLE_NAME VARCHAR2(30)

CONSTRAINT VARCHAR2(30)

SQL>alter table cz add constraint cz_unique unique(c1,c10,c20) exceptions into exceptions;

*

ERROR at line 1:

ORA-02299: cannot validate (TEST.CZ_UNIQUE) - duplicate keys found

SQL>create table dups as select * from cz where rowid in (select row_id from exceptions);

Table created.

SQL>select * from dups;

C1 C10 C20

---------- ---------- ---

1 2 dsf

1 2 dsf

1 2 dsf

1 2 dsf

2 3 che

1 2 dsf

1 2 dsf

1 2 dsf

1 2 dsf

2 3 che

2 3 che

2 3 che

2 3 che

3 4 dff

3 4 dff

3 4 dff

16 rows selected.

SQL>select row_id from exceptions;

ROW_ID

------------------

AAAHD/AAIAAAADSAAA

AAAHD/AAIAAAADSAAB

AAAHD/AAIAAAADSAAC

AAAHD/AAIAAAADSAAF

AAAHD/AAIAAAADSAAH

AAAHD/AAIAAAADSAAI

AAAHD/AAIAAAADSAAG

AAAHD/AAIAAAADSAAD

AAAHD/AAIAAAADSAAE

AAAHD/AAIAAAADSAAJ

AAAHD/AAIAAAADSAAK

AAAHD/AAIAAAADSAAL

AAAHD/AAIAAAADSAAM

AAAHD/AAIAAAADSAAN

AAAHD/AAIAAAADSAAO

AAAHD/AAIAAAADSAAP

16 rows selected.

SQL>delete from cz where rowid in ( select row_id from exceptions);

16 rows deleted.

SQL>insert into cz select distinct * from dups;

3 rows created.

SQL>select *from cz;

C1 C10 C20

---------- ---------- ---

1 2 dsf

2 3 che

3 4 dff

4 5 err

5 3 dar

6 1 wee

7 2 zxc

7 rows selected.

從結果裡可以看到重複記錄已經删除。

本文來自CSDN部落格,轉載請标明出處:http://blog.csdn.net/iltaek/archive/2009/06/12/4263503.aspx