天天看點

企業發票異常分析---導入,清洗

今天做了企業發票異常分析的作業成功地将資料導入到hive資料倉當中,并對資料進行了初步的清洗

流程如下:

一.将三個樣表檔案中的資料導入HIVE資料倉庫中

先建三個表:

create table xxfpb(

hydm string,

xf_id string,

djzclx_dm string,

kydjrq string,

xgrq string,

label string,

fp_nid string,

je double,

se double,

jshj double,

kpyf string,

kprq string,

zfbz string

)row format delimited fields terminated by ',';

其餘兩個表的建表語句類似

然後進行資料清洗:

insert overwrite table nsrxx select substring(hydm,2,length(hydm)-1) as hydm, nsr_id as nsr_id,djzclx_dm as djzclx_dm,kydjrq as kydjrq,xgrq as xgrq,substring(label,1,length(label)-1) as label from nsrxx;

insert overwrite table zzsfp_hwmx select substring(fp_nid,2,length(fp_nid)-1) as fp_nid, date_kry as date_kry,hwmc as hwmc,ggxh as ggxh,dw as dw,sl as sl,dj as dj,je as je,se as se,substring(spbm,1,length(spbm)-1) as spbm from zzsfp_hwmx;

清洗的目的是将三個表的前括号和後括号去掉

清洗完成的結果截圖:

企業發票異常分析---導入,清洗
企業發票異常分析---導入,清洗

繼續閱讀