離線數倉-7-資料倉庫開發DIM層設計要點-拉連結清單同步&裝載腳本
- 離線數倉-7-資料倉庫開發DIM層設計要點-拉連結清單同步&裝載腳本
-
- 一、DIM層 次元模型 設計要點
-
- 6.使用者次元表 -拉連結清單
-
- 1.使用者次元表 前期梳理
- 2.使用者次元表 DDL表設計分析
- 3.使用者次元表 加載資料分析
-
- 1.拉連結清單首日裝載資料SQL
- 2.拉連結清單每日裝載資料SQL
-
- 1.拉連結清單每日裝載資料 -第一種思路
- 2.拉連結清單每日裝載資料 -第二種思路
- 7.DIM層 次元模型-資料裝載腳本
-
- 1.首日裝載腳本
- 2.每日裝載腳本
離線數倉-7-資料倉庫開發DIM層設計要點-拉連結清單同步&裝載腳本
一、DIM層 次元模型 設計要點
6.使用者次元表 -拉連結清單
- 拉連結清單回顧:
- 拉連結清單是什麼:記錄每條資訊的生命周期,一旦一條記錄的生命周期結束,就重新開始一條新的記錄,并把目前日期放入生效開始日期。
- 拉連結清單意義:在于能夠更加高效的儲存次元資訊的曆史狀态。
- 拉連結清單适合的場景:緩慢變化維,業務表格資料變化不頻繁。
- 拉連結清單如何使用:分析的日期 大于等于 開始日期 并且 分析的日期 小于等于 結束日期。
- 之前文檔彙總:
https://blog.csdn.net/weixin_38136584/article/details/129137583?spm=1001.2014.3001.5501
1.使用者次元表 前期梳理
- 從業務系統尋找使用者次元表的主維表和相關維表
-
1.擷取使用者相關表格如下:
ods_user_info_inc
- 2.分析與之關聯的每個表格中的具體字段,抽離出來“使用者次元表”所需字段
- 3.确定這些表格中,誰是主維表。粒度小的是主維表,使用者表user,是以此表作為主維表。
- 4.主維表确定以後,
- 之前規則:每日全量表 主維表對應的一行代表什麼資訊,次元表一行就代表對應資訊;
- 但是此處, 使用者表做為主維表,同時使用者表也是一張拉連結清單,拉連結清單規定,每行是一個狀态記錄,是以這裡每行代表一個使用者狀态。
-
2.使用者次元表 DDL表設計分析
- 拉連結清單的開始時間和結束時間 需要注意
- 拉連結清單按照什麼做分區
- hive中對表進行分區的意義:分區後,一個表中,不同資料存放在不同路徑,後面查詢時,使用分區作為過濾條件,極大減少了查詢的資料量級,查詢效率會更高,便于将來查詢。
- 對于拉連結清單來說,将來查詢場景需要分析,拉連結清單也是一張次元表,
- 次元表查詢,一方面是擷取全量最新,另一方面是擷取曆史上一份資料
- 對于查詢頻繁方向來看,拉連結清單對于查詢最新資料是比較頻繁的。
- 是以,就上面思考,設計拉連結清單的時候,需要思考:怎樣加快查詢最新一天拉連結清單的查詢速度。
- 分區規劃的設計如下圖:
- dt最大的時間分區裡面,存放 全量最新的使用者資料。
- dt之前的分區,存放當日過期的使用者資料。
- 具體sql如下:
DROP TABLE IF EXISTS dim_user_zip;
CREATE EXTERNAL TABLE dim_user_zip
(
`id` STRING COMMENT '使用者id',
`login_name` STRING COMMENT '使用者名稱',
`nick_name` STRING COMMENT '使用者昵稱',
`name` STRING COMMENT '使用者姓名',
`phone_num` STRING COMMENT '手機号碼',
`email` STRING COMMENT '郵箱',
`user_level` STRING COMMENT '使用者等級',
`birthday` STRING COMMENT '生日',
`gender` STRING COMMENT '性别',
`create_time` STRING COMMENT '建立時間',
`operate_time` STRING COMMENT '操作時間',
`start_date` STRING COMMENT '開始日期',
`end_date` STRING COMMENT '結束日期'
) COMMENT '使用者表'
PARTITIONED BY (`dt` STRING)
STORED AS ORC
LOCATION '/warehouse/gmall/dim/dim_user_zip/'
TBLPROPERTIES ('orc.compress' = 'snappy');
3.使用者次元表 加載資料分析
- 對于拉連結清單來說,隻做增量同步即可。
- 拉連結清單具體裝載過程如下:
- 1.拉連結清單最開始加載的處理:全量同步資料到拉連結清單,并且添加開始時間和結束時間
- 開始日期:以什麼日期為準? 擷取不到開始時間就以拉連結清單最初同步資料的時間為準。
- 結束時間:添加最大時間即可
- 2.拉連結清單第二天加載資料的處理:
- 1.基于binlog來擷取當天使用者全量表相關更新記錄檔,進行分析
- 2.擷取 當天的使用者變化表,并添加上開始時間和結束時間
- 3.當天使用者變化表和前一天全量的使用者拉連結清單進行合并,得到最新的全量使用者拉連結清單。
- 3.拉連結清單後面的加載資料跟第二天一緻。
- 1.拉連結清單最開始加載的處理:全量同步資料到拉連結清單,并且添加開始時間和結束時間
- 資料流向:基于拉連結清單具體裝載過程的分析,需要書寫兩個加載過程,分别是第一天和後面的。
1.拉連結清單首日裝載資料SQL
- 隐私資訊的脫敏,手機号,使用者名等等
insert overwrite table dim_user_zip partition (dt='9999-12-31')
select
data.id,
data.login_name,
data.nick_name,
md5(data.name),
md5(data.phone_num),
md5(data.email),
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
'2020-06-14' start_date,
'9999-12-31' end_date
from ods_user_info_inc
where dt='2020-06-14'
and type='bootstrap-insert';
2.拉連結清單每日裝載資料SQL
1.拉連結清單每日裝載資料 -第一種思路
- 第一種SQL思路具體過程
- 1.擷取子查詢 前一日全量資料
- 2.擷取子查詢 當日新增及變化資料
- 3.使用full join 将新表和舊表兩部分資料關聯
- 4.使用CET将兩個子查詢連接配接起來,為後面共享使用提供便利
- 5.然後使用if判斷,資料更新到全量最新分區還是前一天的分區
- 1.判斷是否為當日全量最新資料,fulljoin之後,如果新表id不為空,取新表所有字段,否則取老表所有字段,并更新為最新的分區,整合完畢資料,即為需要當日全量最新資料,更新到時間最大分區
- 2.判斷是否為過期資料,fulljoin之後,新舊表格兩邊id都不為空,那麼對應資料就是過期資料,需要更新到前一天分區裡面,同時 結束時間也需要更新為前一天的時間。
- 5.将兩部分查詢到的資料進行uninon all操作;
- 6.使用hive動态分區的概念,使用一個insert指令,将不同資料發往不同分區。
- 前一日全量資料SQL
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from dim_user_zip
where dt='9999-12-31'
- 當日新增及變化資料的SQL,需要将每天變化的最終資料擷取到,因為是binlog資料,會記錄一天中某條記錄變化的各種狀态,可能一天内會出現多條資料,隻擷取最後的狀态資料即可,是以這裡需要進行判斷操作。分組TopN操作:開窗函數+過濾。
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-15' start_date,
'9999-12-31' end_date
from
(
select
data.id,
data.login_name,
data.nick_name,
data.name,
data.phone_num,
data.email,
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
row_number() over (partition by data.id order by ts desc) rn
from ods_user_info_inc
where dt='2020-06-15'
)t1
where rn=1
- 判斷是否為當日全量最新資料
with
tmp as
(
select
old.id old_id,
old.login_name old_login_name,
old.nick_name old_nick_name,
old.name old_name,
old.phone_num old_phone_num,
old.email old_email,
old.user_level old_user_level,
old.birthday old_birthday,
old.gender old_gender,
old.create_time old_create_time,
old.operate_time old_operate_time,
old.start_date old_start_date,
old.end_date old_end_date,
new.id new_id,
new.login_name new_login_name,
new.nick_name new_nick_name,
new.name new_name,
new.phone_num new_phone_num,
new.email new_email,
new.user_level new_user_level,
new.birthday new_birthday,
new.gender new_gender,
new.create_time new_create_time,
new.operate_time new_operate_time,
new.start_date new_start_date,
new.end_date new_end_date
from
(
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from dim_user_zip
where dt='9999-12-31'
)old
full outer join
(
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-15' start_date,
'9999-12-31' end_date
from
(
select
data.id,
data.login_name,
data.nick_name,
data.name,
data.phone_num,
data.email,
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
row_number() over (partition by data.id order by ts desc) rn
from ods_user_info_inc
where dt='2020-06-15'
)t1
where rn=1
)new
on old.id=new.id
)
select
if(new_id is not null,new_id,old_id),
if(new_id is not null,new_login_name,old_login_name),
if(new_id is not null,new_nick_name,old_nick_name),
if(new_id is not null,new_name,old_name),
if(new_id is not null,new_phone_num,old_phone_num),
if(new_id is not null,new_email,old_email),
if(new_id is not null,new_user_level,old_user_level),
if(new_id is not null,new_birthday,old_birthday),
if(new_id is not null,new_gender,old_gender),
if(new_id is not null,new_create_time,old_create_time),
if(new_id is not null,new_operate_time,old_operate_time),
if(new_id is not null,new_start_date,old_start_date),
if(new_id is not null,new_end_date,old_end_date),
if(new_id is not null,new_end_date,old_end_date) dt
from tmp
- 判斷是否為過期資料
with
tmp as
(
select
old.id old_id,
old.login_name old_login_name,
old.nick_name old_nick_name,
old.name old_name,
old.phone_num old_phone_num,
old.email old_email,
old.user_level old_user_level,
old.birthday old_birthday,
old.gender old_gender,
old.create_time old_create_time,
old.operate_time old_operate_time,
old.start_date old_start_date,
old.end_date old_end_date,
new.id new_id,
new.login_name new_login_name,
new.nick_name new_nick_name,
new.name new_name,
new.phone_num new_phone_num,
new.email new_email,
new.user_level new_user_level,
new.birthday new_birthday,
new.gender new_gender,
new.create_time new_create_time,
new.operate_time new_operate_time,
new.start_date new_start_date,
new.end_date new_end_date
from
(
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from dim_user_zip
where dt='9999-12-31'
)old
full outer join
(
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-15' start_date,
'9999-12-31' end_date
from
(
select
data.id,
data.login_name,
data.nick_name,
data.name,
data.phone_num,
data.email,
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
row_number() over (partition by data.id order by ts desc) rn
from ods_user_info_inc
where dt='2020-06-15'
)t1
where rn=1
)new
on old.id=new.id
)
select
old_id,
old_login_name,
old_nick_name,
old_name,
old_phone_num,
old_email,
old_user_level,
old_birthday,
old_gender,
old_create_time,
old_operate_time,
old_start_date,
cast(date_add('2020-06-15',-1) as string) old_end_date,
cast(date_add('2020-06-15',-1) as string) dt
from tmp
- 總結,第一種方式的最終SQL
with
tmp as
(
select
old.id old_id,
old.login_name old_login_name,
old.nick_name old_nick_name,
old.name old_name,
old.phone_num old_phone_num,
old.email old_email,
old.user_level old_user_level,
old.birthday old_birthday,
old.gender old_gender,
old.create_time old_create_time,
old.operate_time old_operate_time,
old.start_date old_start_date,
old.end_date old_end_date,
new.id new_id,
new.login_name new_login_name,
new.nick_name new_nick_name,
new.name new_name,
new.phone_num new_phone_num,
new.email new_email,
new.user_level new_user_level,
new.birthday new_birthday,
new.gender new_gender,
new.create_time new_create_time,
new.operate_time new_operate_time,
new.start_date new_start_date,
new.end_date new_end_date
from
(
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from dim_user_zip
where dt='9999-12-31'
)old
full outer join
(
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-15' start_date,
'9999-12-31' end_date
from
(
select
data.id,
data.login_name,
data.nick_name,
data.name,
data.phone_num,
data.email,
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
row_number() over (partition by data.id order by ts desc) rn
from ods_user_info_inc
where dt='2020-06-15'
)t1
where rn=1
)new
on old.id=new.id
)
insert overwrite table dim_user_zip partition(dt)
select
if(new_id is not null,new_id,old_id),
if(new_id is not null,new_login_name,old_login_name),
if(new_id is not null,new_nick_name,old_nick_name),
if(new_id is not null,new_name,old_name),
if(new_id is not null,new_phone_num,old_phone_num),
if(new_id is not null,new_email,old_email),
if(new_id is not null,new_user_level,old_user_level),
if(new_id is not null,new_birthday,old_birthday),
if(new_id is not null,new_gender,old_gender),
if(new_id is not null,new_create_time,old_create_time),
if(new_id is not null,new_operate_time,old_operate_time),
if(new_id is not null,new_start_date,old_start_date),
if(new_id is not null,new_end_date,old_end_date),
if(new_id is not null,new_end_date,old_end_date) dt
from tmp
union all
select
old_id,
old_login_name,
old_nick_name,
old_name,
old_phone_num,
old_email,
old_user_level,
old_birthday,
old_gender,
old_create_time,
old_operate_time,
old_start_date,
cast(date_add('2020-06-15',-1) as string) old_end_date,
cast(date_add('2020-06-15',-1) as string) dt
from tmp
where old_id is not null
and new_id is not null;
2.拉連結清單每日裝載資料 -第二種思路
- 1.将兩個子查詢(前一日全量資料、當日新增及變化資料)使用union all連接配接起來
- 2.使用開窗函數,将整個資料進行開窗,
- rk結果=1的就寫到全量表中,結束日期使用最大日期;
- rk結果=2的就寫到前一天分區裡面,結束日期處理為前一天時間
- hive中使用動态分區來實作資料寫入到不同分區。
- 多個select 查詢uinon all 後,再進行select的話,預設擷取的是第一個select的所有字段。
- 第二種方式的SQL
insert overwrite table dim_user_zip partition(dt)
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
if(rk =2 ,data_sub('2020-06-15',1),end_date) end_date,
if(rk =1 ,'9999-12-31',data_sub('2020-06-15',1)) dt
from (
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date,
rank() over (partition by id order by start_date desc) rk
from
( select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from dim_user_zip
where dt='9999-12-31'
union all
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'2020-06-15' start_date,
'9999-12-31' end_date
from
(
select
data.id,
data.login_name,
data.nick_name,
data.name,
data.phone_num,
data.email,
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
row_number() over (partition by data.id order by ts desc) rn
from ods_user_info_inc
where dt='2020-06-15'
)t1
where rn=1
) t2
) t3
7.DIM層 次元模型-資料裝載腳本
1.首日裝載腳本
- 數倉上線第一天,執行一次即可
#!/bin/bash
APP=gmall
if [ -n "$2" ] ;then
do_date=$2
else
echo "請傳入日期參數"
exit
fi
dim_user_zip="
insert overwrite table ${APP}.dim_user_zip partition (dt='9999-12-31')
select
data.id,
data.login_name,
data.nick_name,
md5(data.name),
md5(data.phone_num),
md5(data.email),
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
'$do_date' start_date,
'9999-12-31' end_date
from ${APP}.ods_user_info_inc
where dt='$do_date'
and type='bootstrap-insert';
"
dim_sku_full="
with
sku as
(
select
id,
price,
sku_name,
sku_desc,
weight,
is_sale,
spu_id,
category3_id,
tm_id,
create_time
from ${APP}.ods_sku_info_full
where dt='$do_date'
),
spu as
(
select
id,
spu_name
from ${APP}.ods_spu_info_full
where dt='$do_date'
),
c3 as
(
select
id,
name,
category2_id
from ${APP}.ods_base_category3_full
where dt='$do_date'
),
c2 as
(
select
id,
name,
category1_id
from ${APP}.ods_base_category2_full
where dt='$do_date'
),
c1 as
(
select
id,
name
from ${APP}.ods_base_category1_full
where dt='$do_date'
),
tm as
(
select
id,
tm_name
from ${APP}.ods_base_trademark_full
where dt='$do_date'
),
attr as
(
select
sku_id,
collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
from ${APP}.ods_sku_attr_value_full
where dt='$do_date'
group by sku_id
),
sale_attr as
(
select
sku_id,
collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
from ${APP}.ods_sku_sale_attr_value_full
where dt='$do_date'
group by sku_id
)
insert overwrite table ${APP}.dim_sku_full partition(dt='$do_date')
select
sku.id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.is_sale,
sku.spu_id,
spu.spu_name,
sku.category3_id,
c3.name,
c3.category2_id,
c2.name,
c2.category1_id,
c1.name,
sku.tm_id,
tm.tm_name,
attr.attrs,
sale_attr.sale_attrs,
sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"
dim_province_full="
insert overwrite table ${APP}.dim_province_full partition(dt='$do_date')
select
province.id,
province.name,
province.area_code,
province.iso_code,
province.iso_3166_2,
region_id,
region_name
from
(
select
id,
name,
region_id,
area_code,
iso_code,
iso_3166_2
from ${APP}.ods_base_province_full
where dt='$do_date'
)province
left join
(
select
id,
region_name
from ${APP}.ods_base_region_full
where dt='$do_date'
)region
on province.region_id=region.id;
"
dim_coupon_full="
insert overwrite table ${APP}.dim_coupon_full partition(dt='$do_date')
select
id,
coupon_name,
coupon_type,
coupon_dic.dic_name,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
case coupon_type
when '3201' then concat('滿',condition_amount,'元減',benefit_amount,'元')
when '3202' then concat('滿',condition_num,'件打',10*(1-benefit_discount),'折')
when '3203' then concat('減',benefit_amount,'元')
end benefit_rule,
create_time,
range_type,
range_dic.dic_name,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from
(
select
id,
coupon_name,
coupon_type,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
create_time,
range_type,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from ${APP}.ods_coupon_info_full
where dt='$do_date'
)ci
left join
(
select
dic_code,
dic_name
from ${APP}.ods_base_dic_full
where dt='$do_date'
and parent_code='32'
)coupon_dic
on ci.coupon_type=coupon_dic.dic_code
left join
(
select
dic_code,
dic_name
from ${APP}.ods_base_dic_full
where dt='$do_date'
and parent_code='33'
)range_dic
on ci.range_type=range_dic.dic_code;
"
dim_activity_full="
insert overwrite table ${APP}.dim_activity_full partition(dt='$do_date')
select
rule.id,
info.id,
activity_name,
rule.activity_type,
dic.dic_name,
activity_desc,
start_time,
end_time,
create_time,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
case rule.activity_type
when '3101' then concat('滿',condition_amount,'元減',benefit_amount,'元')
when '3102' then concat('滿',condition_num,'件打',10*(1-benefit_discount),'折')
when '3103' then concat('打',10*(1-benefit_discount),'折')
end benefit_rule,
benefit_level
from
(
select
id,
activity_id,
activity_type,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
benefit_level
from ${APP}.ods_activity_rule_full
where dt='$do_date'
)rule
left join
(
select
id,
activity_name,
activity_type,
activity_desc,
start_time,
end_time,
create_time
from ${APP}.ods_activity_info_full
where dt='$do_date'
)info
on rule.activity_id=info.id
left join
(
select
dic_code,
dic_name
from ${APP}.ods_base_dic_full
where dt='$do_date'
and parent_code='31'
)dic
on rule.activity_type=dic.dic_code;
"
case $1 in
"dim_user_zip")
hive -e "$dim_user_zip"
;;
"dim_sku_full")
hive -e "$dim_sku_full"
;;
"dim_province_full")
hive -e "$dim_province_full"
;;
"dim_coupon_full")
hive -e "$dim_coupon_full"
;;
"dim_activity_full")
hive -e "$dim_activity_full"
;;
"all")
hive -e "$dim_user_zip$dim_sku_full$dim_province_full$dim_coupon_full$dim_activity_full"
;;
esac
2.每日裝載腳本
- 數倉上線之後,以後每天都需要執行一遍
#!/bin/bash
APP=gmall
# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取目前時間的前一天
if [ -n "$2" ] ;then
do_date=$2
else
do_date=`date -d "-1 day" +%F`
fi
dim_user_zip="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp as
(
select
old.id old_id,
old.login_name old_login_name,
old.nick_name old_nick_name,
old.name old_name,
old.phone_num old_phone_num,
old.email old_email,
old.user_level old_user_level,
old.birthday old_birthday,
old.gender old_gender,
old.create_time old_create_time,
old.operate_time old_operate_time,
old.start_date old_start_date,
old.end_date old_end_date,
new.id new_id,
new.login_name new_login_name,
new.nick_name new_nick_name,
new.name new_name,
new.phone_num new_phone_num,
new.email new_email,
new.user_level new_user_level,
new.birthday new_birthday,
new.gender new_gender,
new.create_time new_create_time,
new.operate_time new_operate_time,
new.start_date new_start_date,
new.end_date new_end_date
from
(
select
id,
login_name,
nick_name,
name,
phone_num,
email,
user_level,
birthday,
gender,
create_time,
operate_time,
start_date,
end_date
from ${APP}.dim_user_zip
where dt='9999-12-31'
)old
full outer join
(
select
id,
login_name,
nick_name,
md5(name) name,
md5(phone_num) phone_num,
md5(email) email,
user_level,
birthday,
gender,
create_time,
operate_time,
'$do_date' start_date,
'9999-12-31' end_date
from
(
select
data.id,
data.login_name,
data.nick_name,
data.name,
data.phone_num,
data.email,
data.user_level,
data.birthday,
data.gender,
data.create_time,
data.operate_time,
row_number() over (partition by data.id order by ts desc) rn
from ${APP}.ods_user_info_inc
where dt='$do_date'
)t1
where rn=1
)new
on old.id=new.id
)
insert overwrite table ${APP}.dim_user_zip partition(dt)
select
if(new_id is not null,new_id,old_id),
if(new_id is not null,new_login_name,old_login_name),
if(new_id is not null,new_nick_name,old_nick_name),
if(new_id is not null,new_name,old_name),
if(new_id is not null,new_phone_num,old_phone_num),
if(new_id is not null,new_email,old_email),
if(new_id is not null,new_user_level,old_user_level),
if(new_id is not null,new_birthday,old_birthday),
if(new_id is not null,new_gender,old_gender),
if(new_id is not null,new_create_time,old_create_time),
if(new_id is not null,new_operate_time,old_operate_time),
if(new_id is not null,new_start_date,old_start_date),
if(new_id is not null,new_end_date,old_end_date),
if(new_id is not null,new_end_date,old_end_date) dt
from tmp
union all
select
old_id,
old_login_name,
old_nick_name,
old_name,
old_phone_num,
old_email,
old_user_level,
old_birthday,
old_gender,
old_create_time,
old_operate_time,
old_start_date,
cast(date_add('$do_date',-1) as string) old_end_date,
cast(date_add('$do_date',-1) as string) dt
from tmp
where old_id is not null
and new_id is not null;
"
dim_sku_full="
with
sku as
(
select
id,
price,
sku_name,
sku_desc,
weight,
is_sale,
spu_id,
category3_id,
tm_id,
create_time
from ${APP}.ods_sku_info_full
where dt='$do_date'
),
spu as
(
select
id,
spu_name
from ${APP}.ods_spu_info_full
where dt='$do_date'
),
c3 as
(
select
id,
name,
category2_id
from ${APP}.ods_base_category3_full
where dt='$do_date'
),
c2 as
(
select
id,
name,
category1_id
from ${APP}.ods_base_category2_full
where dt='$do_date'
),
c1 as
(
select
id,
name
from ${APP}.ods_base_category1_full
where dt='$do_date'
),
tm as
(
select
id,
tm_name
from ${APP}.ods_base_trademark_full
where dt='$do_date'
),
attr as
(
select
sku_id,
collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
from ${APP}.ods_sku_attr_value_full
where dt='$do_date'
group by sku_id
),
sale_attr as
(
select
sku_id,
collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
from ${APP}.ods_sku_sale_attr_value_full
where dt='$do_date'
group by sku_id
)
insert overwrite table ${APP}.dim_sku_full partition(dt='$do_date')
select
sku.id,
sku.price,
sku.sku_name,
sku.sku_desc,
sku.weight,
sku.is_sale,
sku.spu_id,
spu.spu_name,
sku.category3_id,
c3.name,
c3.category2_id,
c2.name,
c2.category1_id,
c1.name,
sku.tm_id,
tm.tm_name,
attr.attrs,
sale_attr.sale_attrs,
sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"
dim_province_full="
insert overwrite table ${APP}.dim_province_full partition(dt='$do_date')
select
province.id,
province.name,
province.area_code,
province.iso_code,
province.iso_3166_2,
region_id,
region_name
from
(
select
id,
name,
region_id,
area_code,
iso_code,
iso_3166_2
from ${APP}.ods_base_province_full
where dt='$do_date'
)province
left join
(
select
id,
region_name
from ${APP}.ods_base_region_full
where dt='$do_date'
)region
on province.region_id=region.id;
"
dim_coupon_full="
insert overwrite table ${APP}.dim_coupon_full partition(dt='$do_date')
select
id,
coupon_name,
coupon_type,
coupon_dic.dic_name,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
case coupon_type
when '3201' then concat('滿',condition_amount,'元減',benefit_amount,'元')
when '3202' then concat('滿',condition_num,'件打',10*(1-benefit_discount),'折')
when '3203' then concat('減',benefit_amount,'元')
end benefit_rule,
create_time,
range_type,
range_dic.dic_name,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from
(
select
id,
coupon_name,
coupon_type,
condition_amount,
condition_num,
activity_id,
benefit_amount,
benefit_discount,
create_time,
range_type,
limit_num,
taken_count,
start_time,
end_time,
operate_time,
expire_time
from ${APP}.ods_coupon_info_full
where dt='$do_date'
)ci
left join
(
select
dic_code,
dic_name
from ${APP}.ods_base_dic_full
where dt='$do_date'
and parent_code='32'
)coupon_dic
on ci.coupon_type=coupon_dic.dic_code
left join
(
select
dic_code,
dic_name
from ${APP}.ods_base_dic_full
where dt='$do_date'
and parent_code='33'
)range_dic
on ci.range_type=range_dic.dic_code;
"
dim_activity_full="
insert overwrite table ${APP}.dim_activity_full partition(dt='$do_date')
select
rule.id,
info.id,
activity_name,
rule.activity_type,
dic.dic_name,
activity_desc,
start_time,
end_time,
create_time,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
case rule.activity_type
when '3101' then concat('滿',condition_amount,'元減',benefit_amount,'元')
when '3102' then concat('滿',condition_num,'件打',10*(1-benefit_discount),'折')
when '3103' then concat('打',10*(1-benefit_discount),'折')
end benefit_rule,
benefit_level
from
(
select
id,
activity_id,
activity_type,
condition_amount,
condition_num,
benefit_amount,
benefit_discount,
benefit_level
from ${APP}.ods_activity_rule_full
where dt='$do_date'
)rule
left join
(
select
id,
activity_name,
activity_type,
activity_desc,
start_time,
end_time,
create_time
from ${APP}.ods_activity_info_full
where dt='$do_date'
)info
on rule.activity_id=info.id
left join
(
select
dic_code,
dic_name
from ${APP}.ods_base_dic_full
where dt='$do_date'
and parent_code='31'
)dic
on rule.activity_type=dic.dic_code;
"
case $1 in
"dim_user_zip")
hive -e "$dim_user_zip"
;;
"dim_sku_full")
hive -e "$dim_sku_full"
;;
"dim_province_full")
hive -e "$dim_province_full"
;;
"dim_coupon_full")
hive -e "$dim_coupon_full"
;;
"dim_activity_full")
hive -e "$dim_activity_full"
;;
"all")
hive -e "$dim_user_zip$dim_sku_full$dim_province_full$dim_coupon_full$dim_activity_full"
;;
esac