需求
查詢前20%時間的訂單資訊
複制
建表語句
create table business(
name string,
orderdate string,
cost int
)
row format delimited fields terminated by '\t'
;
複制
資料
insert overwrite table business values
("jack","2017-01-01",10),
("tony","2017-01-02",15),
("jack","2017-02-03",23),
("tony","2017-01-04",29),
("jack","2017-01-05",46),
("jack","2017-04-06",42),
("tony","2017-01-07",50),
("jack","2017-01-08",55),
("mart","2017-04-08",62),
("mart","2017-04-09",68),
("neil","2017-05-10",12),
("mart","2017-04-11",75),
("neil","2017-06-12",80),
("mart","2017-04-13",94);
複制
實作
select
name,
orderdate,
cost,
ntile(5) over(order by orderdate) n
from
business
;
複制
結果
Total MapReduce CPU Time Spent: 3 seconds 190 msec
OK
t1.name t1.orderdate t1.cost t1.n
jack 2017-01-01 10 1
tony 2017-01-02 15 1
tony 2017-01-04 29 1
Time taken: 14.82 seconds, Fetched: 3 row(s)
複制
分析
1、查詢前20%時間的訂單資訊,沒有特指某個使用者,是以不用指定分區
2、開窗裡面對時間排序
3、ntile進行分桶,按照時間順序分成5份,因為這裡說的是20%
4、先用子查詢分桶,然後再進行過濾即n=1
複制
擴充
如果不進行排序,則按讀取的資料是預設有序
select
name,
orderdate,
cost,
ntile(5) over() n
from
business
;
Total MapReduce CPU Time Spent: 2 seconds 260 msec
OK
name orderdate cost n
mart 2017-04-13 94 1
neil 2017-06-12 80 1
mart 2017-04-11 75 1
neil 2017-05-10 12 2
mart 2017-04-09 68 2
mart 2017-04-08 62 2
jack 2017-01-08 55 3
tony 2017-01-07 50 3
jack 2017-04-06 42 3
jack 2017-01-05 46 4
tony 2017-01-04 29 4
jack 2017-02-03 23 4
tony 2017-01-02 15 5
jack 2017-01-01 10 5
Time taken: 14.127 seconds, Fetched: 14 row(s)
複制
知識點
ntile:把有序的資料集合平均配置設定到指定的資料量個桶中,将桶号配置設定給每一行。
如果不能平均配置設定,則優先配置設定較小編号的桶,并且各個桶中能放的行數最多相差1。
複制