1.功能介紹:
2.資料源介紹:
2.1employee職工資訊:
職工姓名,職工id,職工性别,職工年齡,入職年份,職位,部門id
Michael,1,male,37,2001,developer,2
Andy,2,female,33,2003,manager,1
Justin,3,female,23,2013,specialist,3
John,4,male,22,2014,developer,2
Herry,5,male,23,2010,developer,1
Brewster,6,male,27,2001,manager,2
Brice,7,female,33,2003,manager,3
Justin,8,male,23,2013,specialist,3
John,9,male,22,2014,developer,1
Herry,10,female,27,2010,specialist,3
2.2dept部門資訊:
部門名稱,部門編号
management,1
development,2
HumanResouce,3
2.3checkwork職工考勤資訊
職工id,年,月,加班小時數,遲到小時數,曠工小時數,早退小時數
1,2015,12,0,2,4,0
2,2015,8,5,0,5,3
3,2015,3,16,4,1,5
4,2015,3,0,0,0,0
5,2015,3,0,3,0,0
6,2015,3,32,0,0,0
7,2015,3,0,16,32,3
8,2015,19,0,0,0,3
9,2015,6,0,2,4,0
10,2015,10,30,0,2,2
1,2014,12,56,40,0,22
2,2014,12,4,2,4,11
3,2014,12,5,8,8,20
4,2014,12,7,0,12,8
5,2014,12,1,3,7,9
6,2014,12,0,2,11,10
7,2014,12,2,1,11,14
8,2014,12,0,0,6,18
9,2014,12,6,1,8,17
10,2014,12,0,2,2,2
2.4salary工資資訊
職工id,工資
1,5000
2,10000
3,6000
4,8000
5,5000
6,11000
7,15000
8,5500
9,6500
10,4500
3.建立資料庫
//建立與使用HRS資料庫
sqlContext.sql("create database HRS")
//使用資料庫
sqlContext.sql("use HRS")
4.建立資料表
特别說明:
row format delimited fields terminated by ‘,’
lines terminated by ‘\n’
每一行字段以,分隔 每一行以\n分隔
以後不再說明
4.1建立員工資料表
sqlContext.sql("create table if not exists employee"
+"(name String,id Int,gender String, age Int,year int,position String, deptId int)"
+"row format delimited fields terminated by ','lines terminated by '\n'")
4.2建立 部門資料表
sqlContext.sql("create table if not exists deparment
(name String,deptID INT)"
+"row format delimited fields terminated by ','lines terminated by '\n'")
4.3建立考勤表
sqlContext.sql("create table if not exists attendance(id int,year int,"
+"month int,overtime int,latetime int,absenteeism int,leaveearlytime int)"
+"row format delimited fields terminated by ','lines terminated by '\n'")
4.4建立工資表
sqlContext.sql("create table if not exists salary(id int,salary int)"
+"row format delimited fields terminated by ','lines terminated by '\n'")
5.加載資料
說明:
從本地的路徑下加載資料,不從hdfs上加載資料 overwrite表示覆寫目前表的資料
5.1本地加載員工資訊
sqlContext.sql("load data local inpath '/home/data/hrms/employee.txt'"
+"overwrite into table employee")
5.2本地加載部門資訊
sqlContext.sql("load data local inpath '/home/data/hrms/dept.txt'"
+"overwrite into table deparment")
5.3本地加載考勤資訊
sqlContext.sql("load data local inpath '/home/data/hrms/checkWork.txt'"
+"overwrite into table attendance")
5.4本地加載工資資訊
sqlContext.sql("load data local inpath '/home/data/hrms/salary.txt'"
+"overwrite into table salary")
6查詢
6.1檢視各個表的schema資訊
//員工
sqlContext.sql("select *from employee")
//部門
sqlContext.sql("select *from deparment")
//考勤
sqlContext.sql("select *from attendance")
//工資
sqlContext.sql("select *from salary")
部門職工數的查詢,将employee表與department join操作,根據部門名稱分組,統計部門員工數量
結果:
對各個部門的薪資的 總數排序
結果:
對各個部門的薪資的 平均值排序
結果:
查詢各部門員工的考勤資訊
sqlContext.sql("select d.name,sum(info.attinfo)
as deptattinfo,info.year from"+
+"overtime-latetime-absenteeism-leaveearlytime as attinfo"
+" from employee e join attendance a on e.id=a.id )"
+" info join deparment d on info.deptid=d.deptid group by d.name,info.year").show
結果:
合并前面所有的查詢
sqlContext.sql("select edept.name, edept.empNum,edst.totalSalary,edsa.avgSalary,eda.deptattinfo,eda.year from "
+"(select d.name ,count(*) as empNum from employee e join deparment d on e.deptid=d.deptid group by d.name) edept"
+" join "
+"(select d.name ,sum(s.salary) as totalSalary from employee e join deparment d on e.deptid=d.deptid join salary s on e.id=s.id group by d.name order by totalSalary) edst"
+" on (edept.name=edst.name)"
+" join "
+"(select d.name ,avg(s.salary) as avgSalary from employee e join deparment d on e.deptid=d.deptid join salary s on e.id=s.id group by d.name order by avgSalary) edsa"
+" on (edst.name=edsa.name) "
+" join "
+"(select d.name,sum(info.attinfo) as deptattinfo,info.year from"
+"(select e.id,e.deptid,a.year,a.month,overtime-latetime-absenteeism-leaveearlytime as attinfo from employee e join attendance a on e.id=a.id )"
+" info join deparment d on info.deptid=d.deptid group by d.name,info.year) eda"
+" on (edsa.name=eda.name)"
+" order by edept.name").show
結果:
就是使用sparkSQL進行簡單的建立,加載資料,查詢資料的基本操作。
程式運作在spark叢集,hive上,是以必須搭建spark與hive的環境。具體環境的搭建,參考網上教程,這裡就不在示範環境的搭建過程了。