視窗函數應用場景:
(1)用于分區排序
(2)動态Group By
(3)Top N
(4)累計計算
(5)層次查詢
一、分析函數
用于等級、百分點、n分片等。
函數 說明
RANK() 傳回資料項在分組中的排名,排名相等會在名次中留下空位
DENSE_RANK() 傳回資料項在分組中的排名,排名相等會在名次中不會留下空位
NTILE() 傳回n分片後的值
ROW_NUMBER() 為每條記錄傳回一個數字
Rank、DENSE_RANK
RANK()在出現等級相同的元素時預留為空,DENSE_RANK()不會。
Eg:某産品類型有兩個并列第一
RANK():第一二為1,第三位3
DENSE_RANK():第一二為1,第三位2
Sql代碼 收藏代碼
SELECT
column_name,
RANK() OVER (ORDER BY column_name DESC) AS rank,
DENSE_RANK() OVER (ORDER BY SUM(column_name) DESC) AS dense_rank
FROM table_name
OVER 需要,括号内為編号順序
注意:order by 時,desc NULL 值排在首位,ASC時NULL值排在末尾
可以通過NULLS LAST、NULLS FIRST 控制
Java代碼 收藏代碼
RANK() OVER (ORDER BY column_name DESC NULLS LAST)
PARTITION BY 分組排列順序
Java代碼 收藏代碼
RANK() OVER(PARTITION BY month ORDER BY column_name DESC)
這樣,就會按照month 來分,即所需要排列的資訊先以month 的值來分組,在分組中排序,各個分組間不幹涉
CUBE,ROLLUP,GROUPING SETS() 詳見:HIVE增強的聚合,也可以結合RANK()使用實作具體邏輯。
NTILE
按層次查詢,如一年中,統計出工資前1/5之的人員的名單,使用NTILE分析函數,把所有工資分為5份,為1的哪一份就是我們想要的結果:
Sql代碼 收藏代碼
select empno,ename,sum(sal),ntile(5) over (order by sum(sal) desc nulls last) til from emp group by empno,ename;
ROW_NUMBER
ROW_NUMBER()從1開始,為每條記錄傳回一個數字
Sql代碼 收藏代碼
SELECT
ROW_NUMBER() OVER (ORDER BY column_name DESC)AS row_name
FROM table_name;
二、視窗函數
可以計算一定範圍内、一定值域内、或者一段時間内的累積和以及移動平均值等。
可以結合聚集函數SUM() 、AVG() 等使用。
可以結合FIRST_VALUE() 和LAST_VALUE(),傳回視窗的第一個和最後一個值
(1)計算累計和
eg:統計1-12月的累積銷量,即1月為1月份的值,2月為1.2月份值的和,3月為123月份的和,12月為1-12月份值的和
Java代碼 收藏代碼
SELECT
month,SUM(amount) month_amount,
SUM( SUM(amount)) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_amount
FROM table_name
GROUP BY month
ORDER BY month;
其中:
SUM( SUM(amount)) 内部的SUM(amount)為需要累加的值,在上述可以換為 month_amount
ORDER BY month 按月份對查詢讀取的記錄進行排序,就是視窗範圍内的排序
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 定義起點和終點,UNBOUNDED PRECEDING 為起點,表明從第一行開始, CURRENT ROW為預設值,就是這一句等價于:
ROWS UNBOUNDED PRECEDING
PRECEDING:在前 N 行的意思。
FOLLOWING:在後 N 行的意思。
計算前3個月之間的和
Sql代碼 收藏代碼
SUM( SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS cumulative_amount
也可以
Java代碼 收藏代碼
SUM( SUM(amount)) OVER (ORDER BY month 3 PRECENDING) AS cumulative_amount
前後一個月之間的和
Sql代碼 收藏代碼
SUM( SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS cumulative_amount
窗體第一條和最後一條的值
Java代碼 收藏代碼
FIRST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS xxxx;
LAST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS xxxx;
三、LAG、LEAD
獲得相對于目前記錄指定距離的那條記錄的資料
LAG()為向前、LEAD()為向後
Sql代碼 收藏代碼
LAG(column_name1,1) OVER(ORDER BY column_name2)
LEAG(column_name1,1) OVER(ORDER BY column_name2)
這樣就獲得前一條、後一條的資料
四、FIRST、LAST
獲得一個排序分組中的第一個值群組後一個值。可以與分組函數結合
Java代碼 收藏代碼
SELECT
MIN(month) KEEP(DENSE_RANK FIRST ORDER BY SUM(amount)) AS highest_sales_month,
MIN(month) KEEP(DENSE_RANK LAST ORDER BY SUM(amount)) AS lows_sales_month
FROM table_name
GROUP BY month
ORDER BY month;
這樣就可以求得一年中銷量最高和最低的月份。
輸出的是月份,但是用SUM(amount)來判斷。
示例
PARTITION BY with one partitioning column, no ORDER BY or window specification
SELECT a, COUNT(b) OVER (PARTITION BY c) FROM T;
根據c分派任務, 選擇a和每個c下b的個數
PARTITION BY with two partitioning columns, no ORDER BY or window specification
SELECT a, COUNT(b) OVER (PARTITION BY c, d) FROM T;
根據c, d分派任務, 選擇a和每個c, d下b的個數
PARTITION BY with one partitioning column, one ORDER BY column, and no window specification
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d) FROM T;
根據c分派任務, 每個任務組依據d排序, 選擇a和每個c下字段b的和
PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification
SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f) FROM T;
根據c,d分派任務, 每個任務組依據e,f排序, 選擇a和每個c, d下字段的b的和
PARTITION BY with partitioning, ORDER BY, and window specification
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM T;
根據c分派任務, 每個任務組依據d排序, 選擇a和每個c下, 從最開始到目前列的字段b的和
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) FROM T;
根據c分派任務, 每個任務組依據d排序, 選擇a和每個c下, 從目前列之前3個到目前列的字段b的平均數
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) FROM T;
根據c分派任務, 每個任務組依據d排序, 選擇a和每個c下, 目前列前後各3列的字段b的平均數
SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM T;
根據c分派任務, 每個任務組依據d排序, 選擇a和每個c下, 目前列到最後一列的字段b的平均數
WINDOW clause
SELECT a, SUM(b) OVER w FROM T; WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING)
根據c分派任務, 每個任務組依據d排序, 選擇a和每個c下, 第一列到目前列字段b的和
LEAD using default 1 row lead and not specifying default value
SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) FROM T;
根據b分派任務, 每個任務組依據c排序, 選擇a和下一條記錄的a
LAG specifying a lag of 3 rows and default value of 0
SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING) FROM T;
根據b分派任務, 每個任務組依據c排序, 選擇a和前面第三條記錄的a, 如果超出視窗, 傳回0
本文轉自:http://yugouai.iteye.com/blog/1908121