天天看點

基于Hive+sparkSQL的人力資源系統執行個體

1.功能介紹:

基于Hive+sparkSQL的人力資源系統執行個體

2.資料源介紹:

2.1employee職工資訊:

職工姓名,職工id,職工性别,職工年齡,入職年份,職位,部門id

Michael,1,male,37,2001,developer,2
Andy,2,female,33,2003,manager,1
Justin,3,female,23,2013,specialist,3
John,4,male,22,2014,developer,2
Herry,5,male,23,2010,developer,1
Brewster,6,male,27,2001,manager,2
Brice,7,female,33,2003,manager,3
Justin,8,male,23,2013,specialist,3
John,9,male,22,2014,developer,1
Herry,10,female,27,2010,specialist,3
           

2.2dept部門資訊:

部門名稱,部門編号

management,1
development,2
HumanResouce,3
           

2.3checkwork職工考勤資訊

職工id,年,月,加班小時數,遲到小時數,曠工小時數,早退小時數

1,2015,12,0,2,4,0
2,2015,8,5,0,5,3
3,2015,3,16,4,1,5
4,2015,3,0,0,0,0
5,2015,3,0,3,0,0
6,2015,3,32,0,0,0
7,2015,3,0,16,32,3
8,2015,19,0,0,0,3
9,2015,6,0,2,4,0
10,2015,10,30,0,2,2
1,2014,12,56,40,0,22
2,2014,12,4,2,4,11
3,2014,12,5,8,8,20
4,2014,12,7,0,12,8
5,2014,12,1,3,7,9
6,2014,12,0,2,11,10
7,2014,12,2,1,11,14
8,2014,12,0,0,6,18
9,2014,12,6,1,8,17
10,2014,12,0,2,2,2
           

2.4salary工資資訊

職工id,工資

1,5000
2,10000
3,6000
4,8000
5,5000
6,11000
7,15000
8,5500
9,6500
10,4500
           

3.建立資料庫

//建立與使用HRS資料庫
 sqlContext.sql("create database HRS")

 //使用資料庫
 sqlContext.sql("use HRS")
           

4.建立資料表

特别說明:

row format delimited fields terminated by ‘,’

lines terminated by ‘\n’

每一行字段以,分隔 每一行以\n分隔

以後不再說明

4.1建立員工資料表

sqlContext.sql("create table if not exists employee" 
           +"(name String,id Int,gender String, age Int,year                  int,position String, deptId int)"
  +"row format delimited fields terminated by ','lines terminated by '\n'")
           

4.2建立 部門資料表

sqlContext.sql("create table if not exists deparment
(name String,deptID INT)"
+"row format delimited fields terminated by ','lines terminated by '\n'")
           

4.3建立考勤表

sqlContext.sql("create table if not exists attendance(id int,year int,"
 +"month int,overtime int,latetime int,absenteeism int,leaveearlytime int)"
  +"row format delimited fields terminated by ','lines terminated by '\n'")
           

4.4建立工資表

sqlContext.sql("create table if not exists salary(id int,salary int)"
  +"row format delimited fields terminated by ','lines terminated by '\n'")
           

5.加載資料

說明:

從本地的路徑下加載資料,不從hdfs上加載資料 overwrite表示覆寫目前表的資料

5.1本地加載員工資訊

sqlContext.sql("load data local inpath '/home/data/hrms/employee.txt'" 
                    +"overwrite into table employee")
           

5.2本地加載部門資訊

sqlContext.sql("load data local inpath '/home/data/hrms/dept.txt'" 
                    +"overwrite into table deparment")
           

5.3本地加載考勤資訊

sqlContext.sql("load data local inpath '/home/data/hrms/checkWork.txt'" 
                    +"overwrite into table attendance")
           

5.4本地加載工資資訊

sqlContext.sql("load data local inpath '/home/data/hrms/salary.txt'" 
                    +"overwrite into table salary")
           

6查詢

6.1檢視各個表的schema資訊

//員工
     sqlContext.sql("select *from employee")  

     //部門
     sqlContext.sql("select *from deparment")   

     //考勤
     sqlContext.sql("select *from attendance")   

     //工資
     sqlContext.sql("select *from salary")  
           

部門職工數的查詢,将employee表與department join操作,根據部門名稱分組,統計部門員工數量

結果:

基于Hive+sparkSQL的人力資源系統執行個體

對各個部門的薪資的 總數排序

結果:

基于Hive+sparkSQL的人力資源系統執行個體

對各個部門的薪資的 平均值排序

結果:

基于Hive+sparkSQL的人力資源系統執行個體

查詢各部門員工的考勤資訊

sqlContext.sql("select d.name,sum(info.attinfo) 
  as deptattinfo,info.year from"+ 
+"overtime-latetime-absenteeism-leaveearlytime as attinfo" 
   +" from employee e join   attendance a on e.id=a.id )"
 +" info join deparment d on info.deptid=d.deptid group by d.name,info.year").show
           

結果:

基于Hive+sparkSQL的人力資源系統執行個體

合并前面所有的查詢

sqlContext.sql("select edept.name, edept.empNum,edst.totalSalary,edsa.avgSalary,eda.deptattinfo,eda.year from "
                    +"(select  d.name ,count(*) as empNum from employee e join deparment d on e.deptid=d.deptid group by d.name) edept"
                    +" join "
                    +"(select d.name ,sum(s.salary) as totalSalary from employee e join deparment d on e.deptid=d.deptid join salary s on e.id=s.id group by d.name order by totalSalary) edst"
                    +" on (edept.name=edst.name)"
                    +" join "
                    +"(select d.name ,avg(s.salary) as avgSalary from employee e join deparment d on e.deptid=d.deptid join salary s on e.id=s.id group by d.name order by avgSalary) edsa"
                    +" on (edst.name=edsa.name) "
                    +" join "
                    +"(select d.name,sum(info.attinfo) as deptattinfo,info.year from" 
                    +"(select  e.id,e.deptid,a.year,a.month,overtime-latetime-absenteeism-leaveearlytime as attinfo from employee e join attendance a on e.id=a.id )"
                    +" info join deparment d on info.deptid=d.deptid group by d.name,info.year) eda"
                    +" on (edsa.name=eda.name)"
                    +" order by edept.name").show
           

結果:

基于Hive+sparkSQL的人力資源系統執行個體

就是使用sparkSQL進行簡單的建立,加載資料,查詢資料的基本操作。

程式運作在spark叢集,hive上,是以必須搭建spark與hive的環境。具體環境的搭建,參考網上教程,這裡就不在示範環境的搭建過程了。

繼續閱讀