天天看點

大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

使用Zeppelin分析電子商務消費行為分析

  • 任務描述
    • 需求概述
  • 問題分析
    • Customer表
    • Transaction表
    • Store表
    • Review表
    • 資料結構
  • 連接配接至Zeppelin
    • 1.資料擷取
      • Download Data
    • 2.檢查行數和header行
      • Understand the Data
    • 3.上傳資料到HDFS
      • Upload the file to HDFS
    • 4.建表查表
      • 4.1 Clear all tables if exists
      • 4.2 Verify all Tables are Created
    • 5.資料清洗
      • 5.1 Clean and Mask customer_details
      • 5.2 Clean transaction_details into partition table
      • 5.3 Clean store_review table
    • 6.Customer分析
      • 6.1找出顧客最常用的信用卡
      • 6.2找出客戶資料中排名前五的職位名稱
      • 6.3在美國女性最常用的信用卡
      • 6.4按性别和國家進行客戶統計
    • 7.Transaction分析
      • 7.1計算每月總收入
      • 7.2計算每個季度的總收入
      • 7.3按年計算總收入
      • 7.4按工作日計算總收入
      • 7.5按時間段計算總收入(需要清理資料)
      • 7.6按時間段計算平均消費
      • 7.7按工作日計算平均消費
      • 7.8計算年、月、日的交易總數
      • 7.9找出交易量最大的10個客戶
      • 7.10找出消費最多的前10位顧客
      • 7.11統計該期間交易數量最少的使用者
      • 7.12計算每個季度的獨立客戶總數
      • 7.13計算每周的獨立客戶總數
      • 7.14計算整個活動客戶平均花費的最大值
      • 7.15統計每月花費最多的客戶
      • 7.16統計每月通路次數最多的客戶
      • 7.17按總價找出最受歡迎的5種産品
      • 7.18根據購買頻率找出最暢銷的5種産品
      • 7.19根據客戶數量找出最受歡迎的5種産品
      • 7.20驗證前5個details
    • 8.Store分析
      • 8.1按客流量找出最受歡迎的商店
      • 8.2根據顧客消費價格找出最受歡迎的商店
      • 8.3根據顧客交易情況找出最受歡迎的商店
      • 8.4根據商店和唯一的顧客id擷取最受歡迎的産品
      • 8.5擷取每個商店的員工與顧客比
      • 8.6按年和月計算每家店的收入
      • 8.7按店鋪制作總收益餅圖
      • 8.8找出每個商店最繁忙的時間段
      • 8.9找出每家店的忠實顧客
      • 8.10根據每位員工的最高收入找出明星商店
    • 9.Review分析
      • 9.1在ext_store_review中找出存在沖突的交易映射關系
      • 9.2了解客戶評價的覆寫率
      • 9.3根據評分了解客戶的分布情況
      • 9.4根據交易了解客戶的分布情況
      • 9.5客戶給出的最佳評價是否總是同一家門店

電子商務消費行為分析檔案

提取碼:39r1

任務描述

需求概述

對某零售企業最近一年門店收集的資料進行資料分析

  • 潛在客戶畫像
  • 使用者消費統計
  • 門店的資源使用率
  • 消費的特征人群定位
  • 資料的可視化展現

問題分析

Customer表

customer_details details
customer_id Int,1-500
first_name string
last_name string
email string,such as [email protected]
gender string,Male or Female
address string
country string
language string
job string,job title/position
credit_type string,credit card type。such as visa
credit_no strin,credit card number

問題:language字段資料存在錯誤

Transaction表

transaction_details details
transaction_id Int,1-1000
customer_id Int,1-500
store_id Int,1-5
price decimal,such as 5.08
product string,things bought
date string,when to purchase
time string,what time to purchase

問題:表中transa_id有重複,但資料有效,需要修複資料

Store表

store_details details
store_id Int,1-5
store_name string
employee_number Int,在store有多少employee

Review表

store_review details
stransaction_id Int,1-8000
store_id Int,1-5
review_store Int,1-5

問題:表中有效的score資料

表中有獎transaction_id映射到錯誤的store_id

資料結構

Customer表

  • customer_details

Transaction表

  • transaction_details

Store表

  • store_details

Review表

  • store_review
    大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

連接配接至Zeppelin

Zeppelin安裝和配置

打開Zeppelin網站

大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

導入電子商務消費行為分析資料及模闆

1.資料擷取

Download Data

%sh
--建立資料庫
mkdir /mnt/hivetest
cd /mnt/hivetest
wget -p https://raw.githubusercontent.com/datafibers/big_data_training/master/data/customer_details.csv
wget -p https://raw.githubusercontent.com/datafibers/big_data_training/master/data/transaction_details.csv
wget -p https://raw.githubusercontent.com/datafibers/big_data_training/master/data/store_details.csv
wget -p https://raw.githubusercontent.com/datafibers/big_data_training/master/data/store_review.csv
           

2.檢查行數和header行

Understand the Data

%sh
## /tmp/data/
cd /mnt/hivetest
head -2 customer_details.csv
#head -2 store_details.csv
#head -2 store_review.csv
#head -2 transaction_details.csv
           

3.上傳資料到HDFS

Upload the file to HDFS

%sh
hdfs dfs -mkdir -p /apps/shopping/customer
hdfs dfs -put /mnt/hivetest/customer_details.csv /apps/shopping/customer
hdfs dfs -mkdir -p /apps/shopping/transaction
hdfs dfs -put /mnt/hivetest/transaction_details.csv /apps/shopping/transaction
hdfs dfs -mkdir -p /apps/shopping/store
hdfs dfs -put /mnt/hivetest/store_details.csv /apps/shopping/store
hdfs dfs -mkdir -p /apps/shopping/review
hdfs dfs -put /mnt/hivetest/store_review.csv /apps/shopping/review
##/tmp/shopping/data/customer/
##/tmp/shopping/data/transaction/
##/tmp/shopping/data/store/
##/tmp/shopping/data/review/
           

4.建表查表

4.1 Clear all tables if exists

%hive
create database shopping
use shopping
           
%hive
--建立顧客表
drop table ext_transaction_details
%hive
create external table if not exists ext_customer_details(
customer_id string,
first_name string,
last_name string,
email string,
gender string,
address string,
country string,
language string,
job string,
credit_type string,
credit_no string)
row format SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties(
"separatorChar"=",",
"escapeChar"="\\")
stored as textfile
location '/apps/shopping/customer'
           
%hive
--建立交易流水表
create external table if not exists ext_transaction_details(
transaction_id string,
customer_id string,
store_id string,
price decimal(10,2),
product string,
purchase_date string,
purchase_time string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties(
"separatorChar"=",",
"quoteChar"="\"",
"escapeChar"="\\")
stored as textfile
location '/apps/shopping/transaction'
           
%hive
--建立商店詳情表
create external table if not exists ext_store_details(
store_id string,
store_name string,
employee_number int)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties(
"separatorChar"=",",
"escapeChar"="\\")
stored as textfile
location '/apps/shopping/store'
tblproperties("skip.head.line.count"="1")
           
%hive
--建立評價表
create external table if not exists ext_store_review(
stransaction_id string,
store_id string,
review_score int)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties(
"separatorChar"=",",
"escapeChar"="\\")
stored as textfile
location '/apps/shopping/review'
           

4.2 Verify all Tables are Created

%hive
--select * from ext_customer_details limit 20
--select distinct language from ext_customer_details
--select * from ext_transaction_details limit 20
select transaction_id,count(transaction_id) cnt from ext_transaction_details group by transaction_id order by cnt desc 
--select * from ext_store_details limit 20
--select * from ext_store_review limit 20
           

5.資料清洗

解決以下有問題的資料

  • 對transaction_details中的重複資料生成新ID
  • 過濾掉store_review中沒有評分的資料
  • 可以把清洗好的資料放到另一個表或者用View表示
  • 找出PII(personal information identification)或PCI(personal confidential information)資料進行加密或hash
  • 重新組織transaction資料按照日期YYYY-MM做分區

5.1 Clean and Mask customer_details

%hive
-- 敏感資訊加密
-- drop view vm_customer_details
create view if not exists vm_customer_details as
select
customer_id ,
first_name ,
unbase64(last_name) lastname,
unbase64(email) email,
gender ,
unbase64(address) address,
country ,
language,
job ,
credit_type ,
unbase64(credit_no) credit_no
from 
ext_customer_details
           

5.2 Clean transaction_details into partition table

%hive
-- 建立流水詳情表
create table if not exists transaction_details
(
transaction_id string,
customer_id string,
store_id string,
price decimal(8,2),
product string,
purchase_date date,
purchase_time string
)
partitioned by(purchase_month string)
-- select transaction_id,count(1) from ext_transaction_details group by transaction_id having count(1)>1
-- select * from ext_transaction_details where transaction_id=8001
set hive.exec.dynamic.partition.mode=nonstrict -- 開啟動态分區
-- 重寫資料
with base as (
select
transaction_id,
customer_id ,
store_id ,
price ,
product,
purchase_date,
purchase_time,
from_unixtime(unix_timestamp(purchase_date,'yyyy-MM-dd'),'yyyy-MM') as purchase_month,
row_number() over (partition by transaction_id order by store_id) as rn
from ext_transaction_details
)
insert overwrite table transaction_details partition(purchase_month)
select
if(rn=1,transaction_id,concat_ws('-',transaction_id,'_fix')) ,
customer_id ,
store_id ,
price ,
product,
purchase_date ,
purchase_time,
purchase_month
from base
-- 檢視修複資訊
select * from transaction_details where transaction_id like '%fix%'
           

5.3 Clean store_review table

%hive
create view if not exists vw_store_review as
select
transaction_id,
review_score
from  ext_store_review where review_score <> ''
show tables
           

最終會有七個表

大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

6.Customer分析

6.1找出顧客最常用的信用卡

%hive
select credit_type,count(credit_type) cnt from customer_details group by credit_type order by cnt desc
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

6.2找出客戶資料中排名前五的職位名稱

%hive
select job ,count(1) as pn from customer_details group by job
order by pn desc limit 5
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

6.3在美國女性最常用的信用卡

%hive
select credit_type,count(credit_type) cnt from customer_details where gender='Female' and country='United States' group by credit_type order by cnt desc limit 10
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

6.4按性别和國家進行客戶統計

%hive
select country,gender,count(1) cnt from customer_details group by country,gender
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.Transaction分析

7.1計算每月總收入

%hive
select month,store_id,sum(price) total from transaction_details group by store_id,month 
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.2計算每個季度的總收入

%hive
select year(purchase_date) year,concat
("第",floor((month(purchase_date)-1)/3)+1,"季度") season,sum(price)
 from transaction_details group by year(purchase_date) ,
 concat("第",floor((month(purchase_date)-1)/3)+1,"季度") order by year desc,season
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.3按年計算總收入

%hive
select year(purchase_date) yy,sum(price) toatal
from transaction_details group by year(purchase_date) order by yy desc
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.4按工作日計算總收入

%hive
select dayofweek(purchase_date) work_date,sum(price) from transaction_details
 where dayofweek(purchase_date) between 1 and 5 group by dayofweek(purchase_date)
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.5按時間段計算總收入(需要清理資料)

%hive
-- 使用正規表達式清理資料然後使用case when 分組查詢
with
t1 as(
select *, if(instr(purchase_time,'PM')>0,
				if(cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)+12>=24,
					0,
					cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)+12),
				cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)) as timeTrans
from transaction_details), t2 as(
select t1.*,case when t1.timeTrans<=8 and t1.timeTrans>5 then 'early morning'
				 when t1.timeTrans<=11 and t1.timeTrans>8 then 'morning'
				 when t1.timeTrans<=13 and t1.timeTrans>11 then 'noon'
				 when t1.timeTrans<=18 and t1.timeTrans>13 then 'afternoon'
				 when t1.timeTrans<=22 and t1.timeTrans>18 then 'evening'
				 else 'night'
			end as timeSplit
from t1)
select t2.timeSplit,sum(price)
from t2 
group by t2.timeSplit

           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.6按時間段計算平均消費

%hive
-- 使用正規表達式清理資料然後使用case when 分組查詢
with
t1 as(
select *, if(instr(purchase_time,'PM')>0,
				if(cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)+12>=24,
					0,
					cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)+12),
				cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)) as timeTrans
from transaction_details), t2 as(
select t1.*,case when t1.timeTrans<=8 and t1.timeTrans>5 then 'early morning'
				 when t1.timeTrans<=11 and t1.timeTrans>8 then 'morning'
				 when t1.timeTrans<=13 and t1.timeTrans>11 then 'noon'
				 when t1.timeTrans<=18 and t1.timeTrans>13 then 'afternoon'
				 when t1.timeTrans<=22 and t1.timeTrans>18 then 'evening'
				 else 'night'
			end as timeSplit
from t1)
select t2.timeSplit,avg(price)
from t2 
group by t2.timeSplit
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.7按工作日計算平均消費

%hive
select dayofweek(purchase_date) work_date,avg(price) from transaction_details
 where dayofweek(purchase_date) between 1 and 5 group by dayofweek(purchase_date)
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.8計算年、月、日的交易總數

%hive
--select purchase_date,count(1) from transaction_details group by purchase_date
--select concat(year(purchase_date),"-",month(purchase_date)),count(1) from transaction_details group by year(purchase_date),month(purchase_date)
--select year(purchase_date),count(1) from transaction_details group by year(purchase_date)\
select count(1) over(partition by year(purchase_date)) year,
       count(1) over(partition by year(purchase_date),month(purchase_date)) month,
       count(1) over(partition by year(purchase_date),month(purchase_date),day(purchase_date))day from transaction_details
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.9找出交易量最大的10個客戶

%hive
select customer_id,count(transaction_id)a from transaction_details group by customer_id order by a desc limit 10
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.10找出消費最多的前10位顧客

%hive
select customer_id,sum(price)a from transaction_details group by customer_id order by a desc limit 10
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.11統計該期間交易數量最少的使用者

%hive
select customer_id,count(transaction_id)a from transaction_details group by customer_id order by a limit 1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.12計算每個季度的獨立客戶總數

%hive
select year(purchase_date) yy,
concat("季度",floor((month(purchase_date)-1)/3)+1) season,
count(distinct customer_id) from transaction_details group by year(purchase_date),
concat("季度",floor((month(purchase_date)-1)/3)+1) 
order by yy desc,season
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.13計算每周的獨立客戶總數

%hive
select concat(year(purchase_date),'年第',weekofyear(purchase_date),'周'),count(distinct customer_id) from transaction_details group by year(purchase_date),weekofyear(purchase_date)
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.14計算整個活動客戶平均花費的最大值

%hive
select a.customer_id,max(a.av) from (select customer_id,avg(price)av from
 transaction_details group by customer_id)a group by a.customer_id
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.15統計每月花費最多的客戶

%hive
select b.m,b.id,b.s
from(
select a.m,a.id,a.s ,row_number() over(partition by  a.m order by a.s desc) as win1
from(
select concat(year(purchase_date),'-',month(purchase_date)) m,customer_id id,sum(price) s
from transaction_details
group by year(purchase_date),month(purchase_date),customer_id)a) b 
where b.win1=1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.16統計每月通路次數最多的客戶

%hive
select b.m,b.id,b.c
from(
select a.m,a.id,a.c,row_number() over(partition by a.m order by a.c desc) as win1 
from(
select concat(year(purchase_date),'-',month(purchase_date)) m,customer_id id, count(1) c
from transaction_details
group by year(purchase_date),month(purchase_date),customer_id) a) b 
where b.win1=1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.17按總價找出最受歡迎的5種産品

%hive
select product,sum(price)a from transaction_details group by product order by a desc limit 5 
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.18根據購買頻率找出最暢銷的5種産品

%hive
select product,count(transaction_id)a from transaction_details group by product order by a desc limit 5
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.19根據客戶數量找出最受歡迎的5種産品

%hive
select product,count(distinct customer_id)a from transaction_details group by product order by a desc limit 5
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

7.20驗證前5個details

%hive
select * from transaction_details where product in ('Goat - Whole Cut')
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.Store分析

8.1按客流量找出最受歡迎的商店

%hive
select store_id,count(transaction_id) a from transaction_details group by store_id order by a desc limit 1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.2根據顧客消費價格找出最受歡迎的商店

%hive
select store_id,sum(price)a from transaction_details group by store_id order by a desc limit 1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.3根據顧客交易情況找出最受歡迎的商店

%hive
select store_id,count(transaction_id)a,sum(price)b from transaction_details group by store_id order by a desc,b desc limit 1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.4根據商店和唯一的顧客id擷取最受歡迎的産品

%hive
select b.store_id,b.product
from (
select a.store_id,a.product,a.c ,row_number() over(partition by store_id order by a.c desc )as win1 
from(
select store_id,product,count(distinct customer_id) c 
from transaction_details
group by store_id,product) a )b 
where b.win1 =1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.5擷取每個商店的員工與顧客比

%hive
select a.store_id,concat_ws(':',cast(ceil(round(s.employee_number/a.c*100))as string),'100')
from(
select t.store_id,count(distinct customer_id) c
from transaction_details t 
group by t.store_id)a join ext_store_details s 
on a.store_id=s.store_id
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.6按年和月計算每家店的收入

%hive
select store_id,year(purchase_date),month(purchase_date), sum(price) from transaction_details group by store_id,year(purchase_date),month(purchase_date)
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.7按店鋪制作總收益餅圖

%hive
select store_id,sum(price)a from transaction_details group by store_id 
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.8找出每個商店最繁忙的時間段

%hive
with
t1 as(
select *, if(instr(purchase_time,'PM')>0,
				if(cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)+12>=24,
					0,
					cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)+12),
				cast(regexp_extract(purchase_time,'([0-9]{1,2}):([0-9]{2}\\w*)',1)as int)) as timeTrans
from transaction_details), t2 as(
select t1.*,case when t1.timeTrans<=8 and t1.timeTrans>5 then 'early morning'
				 when t1.timeTrans<=11 and t1.timeTrans>8 then 'morning'
				 when t1.timeTrans<=13 and t1.timeTrans>11 then 'noon'
				 when t1.timeTrans<=18 and t1.timeTrans>13 then 'afternoon'
				 when t1.timeTrans<=22 and t1.timeTrans>18 then 'evening'
				 else 'night'
			end as timeSplit
from t1),
t3 as(
select t2.store_id,t2.timeSplit,count(1) c 
from t2 
group by t2.store_id,t2.timeSplit),
t4 as(
select t3.store_id,t3.timeSplit,row_number() over(partition by store_id order by t3.timeSplit desc)as win1
from t3 )
select t4.store_id,t4.timeSplit
from t4
where t4.win1=1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.9找出每家店的忠實顧客

%hive
select b.* from (select store_id,customer_id,count(transaction_id)a from transaction_details group by store_id,customer_id)b where b.a>=5
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

8.10根據每位員工的最高收入找出明星商店

%hive
-- 求總收入與雇員比值的最大值
with
t1 as (
select  store_id,sum(price) s 
from transaction_details 
group by store_id)
select t1.store_id,t1.s/s.employee_number ss
from t1 join ext_store_details s  on s.store_id= t1.store_id
order by ss desc 
limit 1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

9.Review分析

9.1在ext_store_review中找出存在沖突的交易映射關系

%hive
select stransaction_id
from store_review
group by stransaction_id
having count(1)>1
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

9.2了解客戶評價的覆寫率

%hive
-- 求各個店共有多少顧客評價
with 
t1 as(
select t2.store_id,t1.transaction_id,t2.customer_id
from store_review t1 join transaction_details t2 on t1.transaction_id=t2.transaction_id)
select t1.store_id,count(distinct t1.customer_id)
from t1 
group by t1.store_id
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

9.3根據評分了解客戶的分布情況

%hive
-- 求每家店每個評分有多少個客戶給的
with
t1 as(
select  t2.store_id ,t1.review_score,t2.customer_id
from store_review t1 join  transaction_details t2 on t1.transaction_id=t2.transaction_id)
select t1.store_id,t1.review_score,count(distinct customer_id)
from t1
group by t1.store_id,t1.review_score
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

9.4根據交易了解客戶的分布情況

%hive
-- 求每家店每個客戶的訂單數
select store_id,customer_id,count(1)
from transaction_details
group by store_id,customer_id
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

9.5客戶給出的最佳評價是否總是同一家門店

%hive
with yes as(select customer_id,count(distinct s.store_id)cnt,"yes" from store_review s join transaction_details on substr(transaction_id,0,length(transaction_id)-2)=stransaction_id where substr(transaction_id,length(transaction_id),length(transaction_id))<>'2' group by customer_id having cnt=1),
 no as(select customer_id,count(distinct s.store_id)cnt,"no" from store_review s join transaction_details on substr(transaction_id,0,length(transaction_id)-2)=stransaction_id where substr(transaction_id,length(transaction_id),length(transaction_id))<>'2' group by customer_id having cnt>1)
select * from yes union all select * from no
           
大資料——項目實戰(使用Zeppelin分析電子商務消費行為分析)任務描述問題分析連接配接至Zeppelin

繼續閱讀