hive實作oracle merge into matched and not matched

2023-06-22 19:23:12

create database cc_test;
use cc_test;
table1 可以了解為記錄學生最好成績的表。 table2可以了解為每次學生的考試成績。
我們要始終更新table1的資料
create table table1 (
                        id string ,
                        maxScore string
);

create table table2 (
                        id string ,
                        score string
);

insert into table1 values
(1,100),
(2,100),
(3,100),
(4,100);

insert into table2 values
(2,100),
(3,90),
(4,120),
(5,100);

-----注意這裡2重複 3score減少 4score增加 . 5屬于新增資料

insert overwrite table1
select
    t1.id ,
    greatest(t1.maxScore,nvl(t2.score,0))
from table1 t1
         left join table2 t2
                   on t1.id =t2.id
union all
select
t2.id ,
t2.score
from table2 t2
where not exists (
    select 1  from table1 t1 where  t1.id = t2.id
)

----------------------------------或者下面這種寫法

select
    t2.id ,
    greatest(nvl(t1.maxScore,0),t2.score)
from table2 t2
         left join table1 t1
                   on t1.id =t2.id
union all
select
    t1.id ,
    t1.maxScore
from table1 t1
where not exists (
    select 1  from table2 t2 where  t1.id = t2.id
)

兩個的最後查詢結果是ok的。

hive實作oracle merge into matched and not matched

-------------------------------------------------------

最後說下思路。 table1 和table2 兩個表

hive實作oracle merge into matched and not matched

t2 和t3 相當于id重疊的部分。

因為hive沒有update ，是以一般update = delete+insert 。但是hive也沒有delete。。。

是以oracle的matched not match 的删掉t2 插入t3 然後插入t4。

我們可以看做插入t1 和插入 t3+t4

也可以看做插入 t4 和插入 t1+t2

這兩種就對應我們上面的兩種sql

你以為這就完了嗎？怎麼可能就這麼lowb的結束了。我們要追尋更深層次的知識海洋。

兩個有什麼差別？我們該選用那種好呢？

一般來說 table1 是遠大于table2的。例如學校每年的學生數量都差不多=table2.但是學校曆史學生資料量是很大的=table1.

也不排除該學校剛剛創立第一年學生100 人第二年學生1000人。。

但是一般來說傾向于 table1>>>>table2. 那麼那種效率更高呢？

一般來說外表大内表小用in 。外表小内表大用exists。

exists

insert overwrite table1 select t1.id , greatest(t1.maxScore,nvl(t2.score,0)) from table1 t1 left join table2 t2 on t1.id =t2.id union all select t2.id , t2.score from table2 t2 where not exists ( select 1 from table1 t1 where t1.id = t2.id )

in

insert overwrite table1 select t1.id , greatest(t1.maxScore,nvl(t2.score,0)) from table1 t1 left join table2 t2 on t1.id =t2.id union all select t2.id , t2.score from table2 t2 where t2.id not in ( select id from table1 )

join

insert overwrite table1 select t1.id , greatest(t1.maxScore,nvl(t2.score,0)) from table1 t1 left join table2 t2 on t1.id =t2.id union all select t2.id , t2.score from table2 t2 left join table1 t1 on t1.id =t2.id where t1.maxScore is null

個人來說是推薦用exists 和join這兩種的

hive實作oracle merge into matched and not matched

繼續閱讀

oralce與mysql中，如何删除重複記錄

查詢表中重複記錄

Hive最全常見錯誤及解決方案hive --service metastore &

ORACLE 表壓縮

oracle中的start with connect by用法

使用JDeveloper Remote debug PLSQL程式

LoadRunner 的多種Vuser類型

性能測試（并發負載壓力）測試分析－簡要篇

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

賦予使用者dba權限--解釋以上：

HiveQl語句應用執行個體：WordCount具體步驟如下：

Oracle的基本操作

為什麼要選擇UniDAC

windows不能在本地計算機上運作oracleDbConsoleorcl

SQL語言基礎：常用的資料查詢語句

Oracle 批量查詢傳入List 傳回List