天天看點

hive:函數:排名函數:Rank(筆記)

Rank

1.函數說明

RANK() 排序(排名)相同時會重複,總數不會變

DENSE_RANK() 排序(排名)相同時會重複,總數會減少

ROW_NUMBER() 依次進行排名
           

2.資料準備

表6-7 資料準備

name subject score
孫悟空 國文 87
孫悟空 數學 95
孫悟空 英語 68
大海 國文 94
大海 數學 56
大海 英語 84
宋宋 國文 64
宋宋 數學 86
宋宋 英語 84
婷婷 國文 65
婷婷 數學 85
婷婷 英語 78

3.需求

計算每門學科成績排名

4.建立本地score.txt,導入資料

 vi score.txt

5.建立hive表并導入資料

create table score(
name string,
subject string,
score int)
row format delimited fields terminated by "\t";

load data local inpath '/opt/module/datas/score.txt' into table score;
           

6.按需求查詢資料

select name,
	subject,
	score,
	rank() over(partition by subject order by score desc) rank,
	dense_rank() over(partition by subject order by score desc) dense_rank,
	row_number() over(partition by subject order by score desc) row_number
from tmp.score;
           
hive:函數:排名函數:Rank(筆記)

擴充:求出每門學科前三名的學生?

select * from
(
	select
	name,
	subject,
	score,
	rank() over(partition by subject order by score desc ) rank,
	dense_rank() over(partition by subject order by score desc) dense_rank,
	row_number() over(partition by subject order by score desc) row_number
	from tmp.score
) s where row_number<=3
           
hive:函數:排名函數:Rank(筆記)

注意:partition後邊有哪個字段隻會影響最後的rank的排名,跟去重沒關系,select後邊字段可以指定多個,不一定跟partition後邊的字段一緻。

舉例:

=================sql1:

select 
					create_time,creator_id,fullname,expectCity,positionName,keywords,row_number() over(partition by creator_id,fullname order by create_time desc) as rank
				from ( 
					select from_unixtime(ceil(c.create_time/1000),'yyyy-MM-dd') create_time,c.creator_id,e.fullname,
					c.expectCity,c.positionName,c.keywords,c.degree,c.minage,c.maxage,c.isNationalUnified,c.isFamous,c.companyName,c.expectIndustry,c.workCity,c.gender,c.minCurrentSalary,c.maxCurrentSalary,c.minWorkYear,c.maxWorkYear
					--,concat(expectCity,':',positionName,'(',keywords,' ',degree,' ',minage,'-',maxage,')') conditions 
					from tmp.t_conditions c
					--select c.creator_id,explode(split(regexp_replace(regexp_replace(expectCity,'\\[',''),'\\]',''),',')) city from tmp.t_conditions c
					left join ods.ods_aimsen_walre_anlle_base_employees e
					on c.creator_id=e.number
					where c.creator_id not in ('130000062','130000063','130000071','430000003')
					and c.tag in(2)
					and c.phone = ''
					and c.resumeId = ''
					and c.name = ''
					and from_unixtime(ceil(c.create_time/1000),'yyyy-MM-dd') > '2020-07-20'
					--and expectCity !='' and degree !=''  and (maxAge!='' or minAge!='') and (positionName !='' or keyWords!='')
					and creator_id=100400287
					group by create_time,c.creator_id,e.fullname,c.expectCity,c.positionName,c.keywords,c.degree,c.minage,c.maxage,c.isNationalUnified,c.isFamous,c.companyName,c.expectIndustry,c.workCity,c.gender,c.minCurrentSalary,c.maxCurrentSalary,c.minWorkYear,c.maxWorkYear--,concat(expectCity,':',positionName,'(',keywords,' ',degree,' ',minage,'-',maxage,')')
				) x


=================sql2:

select 
					create_time,creator_id,fullname,expectCity,positionName,keywords,row_number() over(partition by creator_id,fullname,expectCity order by create_time desc) as rank
				from ( 
					select from_unixtime(ceil(c.create_time/1000),'yyyy-MM-dd') create_time,c.creator_id,e.fullname,
					c.expectCity,c.positionName,c.keywords,c.degree,c.minage,c.maxage,c.isNationalUnified,c.isFamous,c.companyName,c.expectIndustry,c.workCity,c.gender,c.minCurrentSalary,c.maxCurrentSalary,c.minWorkYear,c.maxWorkYear
					--,concat(expectCity,':',positionName,'(',keywords,' ',degree,' ',minage,'-',maxage,')') conditions 
					from tmp.t_conditions c
					--select c.creator_id,explode(split(regexp_replace(regexp_replace(expectCity,'\\[',''),'\\]',''),',')) city from tmp.t_conditions c
					left join ods.ods_aimsen_walre_anlle_base_employees e
					on c.creator_id=e.number
					where c.creator_id not in ('130000062','130000063','130000071','430000003')
					and c.tag in(2)
					and c.phone = ''
					and c.resumeId = ''
					and c.name = ''
					and from_unixtime(ceil(c.create_time/1000),'yyyy-MM-dd') > '2020-07-20'
					--and expectCity !='' and degree !=''  and (maxAge!='' or minAge!='') and (positionName !='' or keyWords!='')
					and creator_id=100400287
					group by create_time,c.creator_id,e.fullname,c.expectCity,c.positionName,c.keywords,c.degree,c.minage,c.maxage,c.isNationalUnified,c.isFamous,c.companyName,c.expectIndustry,c.workCity,c.gender,c.minCurrentSalary,c.maxCurrentSalary,c.minWorkYear,c.maxWorkYear--,concat(expectCity,':',positionName,'(',keywords,' ',degree,' ',minage,'-',maxage,')')
				) x
           

sql1和sql2差別為:

sql2多個expectCity字段

create_time,creator_id,fullname,expectCity,positionName,keywords,row_number() over(partition by creator_id,fullname,expectCity order by create_time desc) as rank

hive:函數:排名函數:Rank(筆記)

sql1查詢效果:

hive:函數:排名函數:Rank(筆記)

sql2查詢效果:

hive:函數:排名函數:Rank(筆記)

​rank排名發生了變化,但最終查詢出的結果條數并沒有改變。

繼續閱讀