基于Hive+sparkSQL的人力資源系統執行個體

1.功能介紹:

2.資料源介紹:

2.1employee職工資訊：

職工姓名，職工id，職工性别，職工年齡，入職年份，職位，部門id

Michael,1,male,37,2001,developer,2
Andy,2,female,33,2003,manager,1
Justin,3,female,23,2013,specialist,3
John,4,male,22,2014,developer,2
Herry,5,male,23,2010,developer,1
Brewster,6,male,27,2001,manager,2
Brice,7,female,33,2003,manager,3
Justin,8,male,23,2013,specialist,3
John,9,male,22,2014,developer,1
Herry,10,female,27,2010,specialist,3

2.2dept部門資訊:

部門名稱，部門編号

management,1
development,2
HumanResouce,3

2.3checkwork職工考勤資訊

職工id，年，月，加班小時數，遲到小時數，曠工小時數，早退小時數

1,2015,12,0,2,4,0
2,2015,8,5,0,5,3
3,2015,3,16,4,1,5
4,2015,3,0,0,0,0
5,2015,3,0,3,0,0
6,2015,3,32,0,0,0
7,2015,3,0,16,32,3
8,2015,19,0,0,0,3
9,2015,6,0,2,4,0
10,2015,10,30,0,2,2
1,2014,12,56,40,0,22
2,2014,12,4,2,4,11
3,2014,12,5,8,8,20
4,2014,12,7,0,12,8
5,2014,12,1,3,7,9
6,2014,12,0,2,11,10
7,2014,12,2,1,11,14
8,2014,12,0,0,6,18
9,2014,12,6,1,8,17
10,2014,12,0,2,2,2

2.4salary工資資訊

職工id，工資

3.建立資料庫

//建立與使用HRS資料庫
 sqlContext.sql("create database HRS")

 //使用資料庫
 sqlContext.sql("use HRS")

4.建立資料表

特别說明：

row format delimited fields terminated by ‘,’

lines terminated by ‘\n’

每一行字段以,分隔每一行以\n分隔

以後不再說明

4.1建立員工資料表

sqlContext.sql("create table if not exists employee" 
           +"(name String,id Int,gender String, age Int,year                  int,position String, deptId int)"
  +"row format delimited fields terminated by ','lines terminated by '\n'")

4.2建立部門資料表

sqlContext.sql("create table if not exists deparment
(name String,deptID INT)"
+"row format delimited fields terminated by ','lines terminated by '\n'")

4.3建立考勤表

sqlContext.sql("create table if not exists attendance(id int,year int,"
 +"month int,overtime int,latetime int,absenteeism int,leaveearlytime int)"
  +"row format delimited fields terminated by ','lines terminated by '\n'")

4.4建立工資表

sqlContext.sql("create table if not exists salary(id int,salary int)"
  +"row format delimited fields terminated by ','lines terminated by '\n'")

5.加載資料

說明:

從本地的路徑下加載資料，不從hdfs上加載資料 overwrite表示覆寫目前表的資料

5.1本地加載員工資訊

sqlContext.sql("load data local inpath '/home/data/hrms/employee.txt'" 
                    +"overwrite into table employee")

5.2本地加載部門資訊

sqlContext.sql("load data local inpath '/home/data/hrms/dept.txt'" 
                    +"overwrite into table deparment")

5.3本地加載考勤資訊

sqlContext.sql("load data local inpath '/home/data/hrms/checkWork.txt'" 
                    +"overwrite into table attendance")

5.4本地加載工資資訊

sqlContext.sql("load data local inpath '/home/data/hrms/salary.txt'" 
                    +"overwrite into table salary")

6查詢

6.1檢視各個表的schema資訊

//員工
     sqlContext.sql("select *from employee")  

     //部門
     sqlContext.sql("select *from deparment")   

     //考勤
     sqlContext.sql("select *from attendance")   

     //工資
     sqlContext.sql("select *from salary")

部門職工數的查詢,将employee表與department join操作，根據部門名稱分組，統計部門員工數量

結果:

基于Hive+sparkSQL的人力資源系統執行個體

對各個部門的薪資的總數排序

結果:

基于Hive+sparkSQL的人力資源系統執行個體

對各個部門的薪資的平均值排序

結果:

基于Hive+sparkSQL的人力資源系統執行個體

查詢各部門員工的考勤資訊

sqlContext.sql("select d.name,sum(info.attinfo) 
  as deptattinfo,info.year from"+ 
+"overtime-latetime-absenteeism-leaveearlytime as attinfo" 
   +" from employee e join   attendance a on e.id=a.id )"
 +" info join deparment d on info.deptid=d.deptid group by d.name,info.year").show

結果：

基于Hive+sparkSQL的人力資源系統執行個體

合并前面所有的查詢

sqlContext.sql("select edept.name, edept.empNum,edst.totalSalary,edsa.avgSalary,eda.deptattinfo,eda.year from "
                    +"(select  d.name ,count(*) as empNum from employee e join deparment d on e.deptid=d.deptid group by d.name) edept"
                    +" join "
                    +"(select d.name ,sum(s.salary) as totalSalary from employee e join deparment d on e.deptid=d.deptid join salary s on e.id=s.id group by d.name order by totalSalary) edst"
                    +" on (edept.name=edst.name)"
                    +" join "
                    +"(select d.name ,avg(s.salary) as avgSalary from employee e join deparment d on e.deptid=d.deptid join salary s on e.id=s.id group by d.name order by avgSalary) edsa"
                    +" on (edst.name=edsa.name) "
                    +" join "
                    +"(select d.name,sum(info.attinfo) as deptattinfo,info.year from" 
                    +"(select  e.id,e.deptid,a.year,a.month,overtime-latetime-absenteeism-leaveearlytime as attinfo from employee e join attendance a on e.id=a.id )"
                    +" info join deparment d on info.deptid=d.deptid group by d.name,info.year) eda"
                    +" on (edsa.name=eda.name)"
                    +" order by edept.name").show

結果：

基于Hive+sparkSQL的人力資源系統執行個體

就是使用sparkSQL進行簡單的建立，加載資料，查詢資料的基本操作。

程式運作在spark叢集，hive上，是以必須搭建spark與hive的環境。具體環境的搭建，參考網上教程，這裡就不在示範環境的搭建過程了。

基于Hive+sparkSQL的人力資源系統執行個體

繼續閱讀

一篇文章讓你精通Java JSP規範

世界因大資料而改變

hive sql通過具體位址解析出行政區劃(省＞市＞區＞縣＞鄉＞鎮＞村)

Spark的RDD轉換算子-雙value型Spark的RDD轉換算子-雙value型

SparkSQL項目練習1 準備資料2 需求：各區域熱門商品Top3

Hive最全常見錯誤及解決方案hive --service metastore &

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

Spark在windows環境裡跑時報錯找不到org.apache.hadoop.fs.FSDataInputStream

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

Spark流式分析系統實作流式實時日志分析系統

Scala和Java二種方式實戰Spark Streaming開發

Spark基礎:Spark簡介及特點,運作模式,安裝Spark,Driver與Executor,Local模式,Standalone模式,Yarn模式,Mesos模式,WordCount案例,HA配置第1章 Spark概述第2章 Spark運作模式第3章案例實操

HiveQl語句應用執行個體：WordCount具體步驟如下：

Spark實作wordcount

大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結