MySql表連結詳解(結合Hadoop中的Hive資料倉庫)

在做Hadoop黑馬日志分析項目的過程中，進行了表的連結。本篇部落格将結合Hive詳細說明Mysql表連結。：

1、統計每日的pv(浏覽量)

hive> create table hmbbs_pv
    > as select count() as pv from hmbbs_table;

檢視運作結果：

hive> describe hmbbs_pv;
OK
pv      bigint
Time taken:  seconds


hive> select pv from hmbbs_pv;
Total MapReduce jobs = 
Launching Job  out of 
Number of reduce tasks is set to  since there's no reduce operator
Starting Job = job_1469064014798_0058, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0058/
Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0058
Hadoop job information for Stage-: number of mappers: ; number of reducers: 
-- ::, Stage- map = %,  reduce = %
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
MapReduce Total cumulative CPU time:  seconds  msec
Ended Job = job_1469064014798_0058
MapReduce Jobs Launched: 
Job : Map:    Cumulative CPU:  sec   HDFS Read:  HDFS Write:  SUCCESS
Total MapReduce CPU Time Spent:  seconds  msec
OK

Time taken:  seconds

2、統計每日的register(注冊使用者數)

hive> create table hmbbs_register                       
    > as select count() as register                    
    > from hmbbs_table                                  
    > where instr(urllog,'member.php?mod=register') > ;

檢視運作結果：

hive> describe hmbbs_register;
OK
register        bigint
Time taken:  seconds


hive> select register from hmbbs_register;
Total MapReduce jobs = 
Launching Job  out of 
Number of reduce tasks is set to  since there's no reduce operator
Starting Job = job_1469064014798_0061, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0061/
Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0061
Hadoop job information for Stage-: number of mappers: ; number of reducers: 
-- ::, Stage- map = %,  reduce = %
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
MapReduce Total cumulative CPU time:  seconds  msec
Ended Job = job_1469064014798_0061
MapReduce Jobs Launched: 
Job : Map:    Cumulative CPU:  sec   HDFS Read:  HDFS Write:  SUCCESS
Total MapReduce CPU Time Spent:  seconds  msec
OK

Time taken:  seconds

3、統計每日的獨立的ip

hive> create table hmbbs_ip as   
    > select count(distinct iplog)  as ip 
    > from hmbbs_table;

檢視運作結果：

hive> describe hmbbs_ip;
OK
ip      bigint
Time taken:  seconds


hive> select ip from hmbbs_ip;
Total MapReduce jobs = 
Launching Job  out of 
Number of reduce tasks is set to  since there's no reduce operator
Starting Job = job_1469064014798_0063, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0063/
Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0063
Hadoop job information for Stage-: number of mappers: ; number of reducers: 
-- ::, Stage- map = %,  reduce = %
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
MapReduce Total cumulative CPU time:  seconds  msec
Ended Job = job_1469064014798_0063
MapReduce Jobs Launched: 
Job : Map:    Cumulative CPU:  sec   HDFS Read:  HDFS Write:  SUCCESS
Total MapReduce CPU Time Spent:  seconds  msec
OK

Time taken:  seconds

4、統計每日的獨立的跳出率

hive> CREATE TABLE hmbbs_jumper AS SELECT COUNT() AS jumper FROM (SELECT COUNT(iplog) AS times FROM   hmbbs_table  GROUP BY iplog  HAVING times=) e ;

檢視運作結果：

hive> describe hmbbs_jumper;
OK
jumper  bigint
Time taken:  seconds


hive> select jumper from hmbbs_jumper;
Total MapReduce jobs = 
Launching Job  out of 
Number of reduce tasks is set to  since there's no reduce operator
Starting Job = job_1469064014798_0066, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0066/
Kill Command = /usr/local/hadoop/bin/hadoop job  -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0066
Hadoop job information for Stage-: number of mappers: ; number of reducers: 
-- ::, Stage- map = %,  reduce = %
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
-- ::, Stage- map = %,  reduce = %, Cumulative CPU  sec
MapReduce Total cumulative CPU time:  seconds  msec
Ended Job = job_1469064014798_0066
MapReduce Jobs Launched: 
Job : Map:    Cumulative CPU:  sec   HDFS Read:  HDFS Write:  SUCCESS
Total MapReduce CPU Time Spent:  seconds  msec
OK

Time taken:  seconds

到此，上面四個表已經擷取到了相應的運作結果：

hive> show tables;
OK
hmbbs_ip
hmbbs_jumper
hmbbs_pv
hmbbs_register
hmbbs_table
Time taken:  seconds
hive> select * from hmbbs_ip;
OK

Time taken:  seconds
hive> select * from hmbbs_jumper;
OK

Time taken:  seconds
hive> select * from hmbbs_pv;    
OK

Time taken:  seconds
hive> select * from hmbbs_register;
OK

Time taken:  seconds

接下來進行表連結：

表關聯：層

         select from hmbbs_pv 
         join hmbbs_register on
         join hmbbs_ip       on
         join hmbbs_jumper   on

   表關聯：層

         select from hmbbs_pv 
         join hmbbs_register on  =
         join hmbbs_ip       on  =
         join hmbbs_jumper   on  =


   表關聯：層  （給每個表起别名:hmbbs_pv  a  hmbbs_register b   hmbbs_ip     c    hmbbs_jumper     d ）

         select from hmbbs_pv  a 
         join hmbbs_register   b  on  =
         join hmbbs_ip         c  on  =
         join hmbbs_jumper     d  on  =

   表關聯：層  （取每個表中特定的字段）

         select  a.pv,b.register,c.ip,d.jumper
         from hmbbs_pv  a 
         join hmbbs_register   b  on  =
         join hmbbs_ip         c  on  =
         join hmbbs_jumper     d  on  =

   表關聯：層  （增加一個字段，變成個字段）

         select '2013_05_30'，a.pv,b.register,c.ip,d.jumper
         from hmbbs_pv  a 
         join hmbbs_register   b  on  =
         join hmbbs_ip         c  on  =
         join hmbbs_jumper     d  on  =

如有問題，歡迎指正！

MySql表連結詳解(結合Hadoop中的Hive資料倉庫)

繼續閱讀

資料庫之DDL操作資料庫DDL操作資料庫DDL操作資料表

資料庫之DQL操作資料庫

HiveQl語句應用執行個體：WordCount具體步驟如下：

mysql優化（sql優化）

資料遷移方法資料遷移原則資料遷移之雙寫方案資料遷移之級聯同步方案

redis叢集資料一緻性_RedisRaft為Redis叢集帶來強大的資料一緻性

寶塔面闆mysql恢複2018.1.8更新

Centos7 MySQL 5.7 安裝MySQL 5.7 安裝

查找入職員工時間排名倒數第三的員工所有資訊

Hibernate使用Hibernate的“3個準備，7個步驟”Hibernate API簡介操作實體對象對象識别

雲計算面試題——mysql/存儲引擎/備份

SQL語言基礎：常用的資料查詢語句

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

MySQL的4種隔離級别？出現問題

neo4j之cypher使用文檔

mysql使用source指令導入.sql檔案