天天看點

5 大資料實戰-hive實戰分析

15年項目使用spark+hadoop,最近公司分享給小夥伴們,就整理出文檔供大家學習交流。整理hdfs+hive+spark stream+spark sql實戰系列,有些資料是網上搜尋下載下傳的,同時整理時也參考其他博文以及《spark實戰》...

1 内部表

Show databses;
Use hive_data;           
  • 1.1 建立内部表
CREATE TABLE SOGOUQ2(DT STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY \'\t\' LINES TERMINATED BY \'\n\' ;           
  • 1.2 加載資料

    Load local data:

LOAD DATA LOCAL INPATH \'/data/software/sougou/SogouQ2.txt\' INTO TABLE SOGOUQ2;
Load hdfs data:
LOAD DATA INPATH \'hdfs://shulaibao2:9010/home/hadoop/upload/test/sougou/SogouQ1.txt
\' INTO TABLE SOGOUQ2;           
  • 1.3 檢視hdfs資料
hadoop fs -ls /user/hive/warehouse/hive_data.db           
  • 1.4 操作資料庫
select count(*) from SOGOUQ2;
select count(*) from SOGOUQ2 where WEBSITE like \'%baidu%\';           

2 外部表

  • 2.1 建立hdfs資料存儲目錄
hadoop fs -mkdir -p  /home/hadoop/upload/hive/sogouq1
hdfs dfs -ls  /home/hadoop/upload/hive/sogouq1           
  • 2.2 建立表
CREATE EXTERNAL TABLE SOGOUQ1(DT STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY \'\t\' LINES TERMINATED BY \'\n\' STORED AS TEXTFILE LOCATION \'/home/hadoop/upload/hive/sogouq1\';
Show tables;           
  • 2.3 加載資料
Hadoop fs -copy /home/hadoop/upload/test/sougou/SogouQ1.txt
 /home/hadoop/upload/hive/sogouq1
hdfs關聯:copy、mv
本地系統資料關聯:copyFromLocal           
  • 2.4 操作資料庫
select count(*) from SOGOUQ1;           

總結:【注】在删除表的時候,内部表将删除表的中繼資料和資料檔案;而删除外部表的時候,僅僅删除外部表的中繼資料,不删除資料檔案

3 交易資料統計實戰

  • 3.1 資料準備

*資料下載下傳: 

https://pan.baidu.com/s/1o7HpDEy#list/path=%2F58.%E5%8D%9A%E5%AE%A2%E8%B5%84%E6%96%99%2F20150901Spark%E5%85%A5%E9%97%A8%E5%AE%9E%E6%88%98%E7%B3%BB%E5%88%97%2Fdata%2Fclass5%2Fsaledata&parentPath=%2F58.%E5%8D%9A%E5%AE%A2%E8%B5%84%E6%96%99%2F20150901Spark%E5%85%A5%E9%97%A8%E5%AE%9E%E6%88%98%E7%B3%BB%E5%88%97*

tbDate:日期、年月、年、月、日、周幾、第幾周、季度、旬、半月;
tbStock:訂單号、交易位置、交易日期;
tbStockDetail:訂單号、行号、貨品、數量、金額:           
  • 3.2 建立表
CREATE TABLE tbDate(dateID string,theyearmonth string,theyear string,themonth string,thedate string,theweek string,theweeks string,thequot string,thetenday string,thehalfmonth string) ROW FORMAT DELIMITED FIELDS TERMINATED BY \',\' LINES TERMINATED BY \'\n\' ;

CREATE TABLE tbStock(ordernumber STRING,locationid string,dateID string) ROW FORMAT DELIMITED FIELDS TERMINATED BY \',\' LINES TERMINATED BY \'\n\' ;

CREATE TABLE tbStockDetail(ordernumber STRING,rownum int,itemid string,qty int,price int ,amount int) ROW FORMAT DELIMITED FIELDS TERMINATED BY \',\' LINES TERMINATED BY \'\n\' ;           
  • 3.3 加載資料
LOAD DATA LOCAL INPATH \'/data/software/testdata/tbDate.txt\' INTO TABLE tbDate;

LOAD DATA LOCAL INPATH \'/data/software/testdata/tbStock.txt\' INTO TABLE tbStock;

LOAD DATA LOCAL INPATH \'/data/software/testdata/tbStockDetail.txt\' INTO TABLE tbStockDetail;           
  • 3.4 資料統計分析
  • 3.4.1按年統計銷售額,年份升序
select c.theyear, sum(b.amount) from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber
left join tbDate c on a.dateid=c.dateid group by c.theyear order by c.theyear;           
  • 3.4.2按交易日期-訂單号分組統計銷售額
select a.dateid,a.ordernumber,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber group by a.dateid,a.ordernumber;           
  • 3.4.3統計年度銷售額最大的交易日期-訂單号
select c.theyear,max(d.sumofamount) from tbDate c inner join (select a.dateid,a.ordernumber,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber group by a.dateid,a.ordernumber)
 d  on c.dateid=d.dateid group by c.theyear sort by c.theyear;           
  • 3.4.4統計季度銷售額前10位
select c.theyear,c.thequot,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber
left join tbDate c on a.dateid=c.dateid group by c.theyear,c.thequot order by sumofamount desc limit 10;           
  • 3.4.5銷售金額在100000以上的單據
select a.ordernumber,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber group by a.ordernumber having sumofamount>100000;           
  • 3.4.6按交易日統銷售額
select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber
left join tbDate c on a.dateid=c.dateid group by c.theyear,b.itemid;           
  • 3.4.7統計每個年度銷售額最大的交易日
select d.theyear,max(d.sumofamount) as maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber
left join tbDate c on a.dateid=c.dateid group by c.theyear,b.itemid;) d group by d.theyear ;           
  • 3.4.8統計年度最暢銷的商品
select distinct  e.theyear,e.itemid,f.maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a,tbStockDetail b,tbDate c where a.ordernumber=b.ordernumber and a.dateid=c.dateid group by c.theyear,b.itemid) e ,
 (select d.theyear,max(d.sumofamount) as maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a,tbStockDetail b,tbDate c where a.ordernumber=b.ordernumber and a.dateid=c.dateid group by c.theyear,
b.itemid) d group by d.theyear) f where e.theyear=f.theyear and e.sumofamount=f.maxofamount order by e.theyear;