hive 基本指令介紹

hive 指令行的使用

hive –help 擷取幫助資訊

[[email protected] data]# hive --hlep --service cli

    Unrecognized option: --hlep
    usage: hive
     -d,--define <key=value>          Variable subsitution to apply to hive
                                      commands. e.g. -d A=B or --define A=B
        --database <databasename>     Specify the database to use
     -e <quoted-query-string>         SQL from command line
     -f <filename>                    SQL from files
     -H,--help                        Print help information
        --hiveconf <property=value>   Use value for given property
        --hivevar <key=value>         Variable subsitution to apply to hive
                                      commands. e.g. --hivevar A=B
     -i <filename>                    Initialization SQL file
     -S,--silent                      Silent mode in interactive shell
     -v,--verbose                     Verbose mode (echo executed SQL to the
                                      console)

hive -e  使用方法

hive -e 'create table hive_cli2(id int)'  #h-sql 語句，需要使用'' 括起來
    hive -e 'desc hive_cli2'
    OK
    id                      int
    hive -e 'show tables'
    hive_cli2

hive -f 的使用方法，将h-sql 語句放在檔案裡面，然後使用hive -f H-SQL_FILENAME

# hive腳本中的注釋在行首使用 "--"，但是在cli 中不支援這樣的注釋。
    [[email protected] ~]# cat hive_file.txt 
    -- hive 腳本檔案的注釋，測試hive -f 這個參數
    CREATE TABLE userinfo
    (
        id int,
        name string,
        age string
    )
    row format delimited fields terminated by '\t'
    lines terminated by '\n';

    show tables
    hive -f hive_file.txt  # 使用這種方式可以使hive一次執行多條指令
    輸出：
    userinfo

hive -i 和 hive -f 用法相同，兩者的差別
    hive -i filename  在進入hive終端的時候，執行filename 這個檔案，這個檔案存儲的内容一般是一些配置資訊，如設定系統屬性，增加hadoop的分布式記憶體，進行自定義的hive擴充的jar包
    hive -f filename  執行完檔案中的h-sql 指令後不進入hive cli 終端
hive啟動的時候，預設情況下會找bash $HOME/.hiverc 這個檔案，執行這個檔案的一些配置，等價于hive -i ~/.hiverc
${HOME}/.hivehistory 這個檔案存儲了hive cli 中執行指令的曆史記錄
hive cli 終端中可以使用tab 鍵補齊指令，使用tab鍵需要注意：如果某些行是以tab鍵開頭的話，就會産生一個常見的令人困惑的錯誤，使用者這時會看到一個"是否顯示所有可能的情況"的提示，而且輸入流後面的字元會被認為是對這個提示的回複，也是以會導緻指令執行失敗
hive -S 進入hive cli 終端後，執行指令使用靜默模式，過濾掉一些日志資訊(一些提示資訊，如：OK，用時等資訊，但是錯誤資訊是不會過濾掉的)，隻輸出結果資訊，如果是建立/删除表，完成後沒有任何資訊傳回
hive (default)> !pwd;  # hive cli 終端可以執行shell指令，前面加一個""
/root

hive cli 終端還可以執行hdfs 中的dfs指令，需要将hdfs指令去掉，直接運作dfs，但是注意不支援輸入hadoop fs -ls /。這種使用hadoop 指令的方式實際上比在base中執行要高效，因為在base中執行hadoop指令，每次都需要啟動一個新的JVM執行個體，而在hive中會在同一個JVM程序中執行這些指令
hive (default)> dfs -ls /;  # 等價于bash 中 hdfs dfs -ls / 或者 bash 中 hadoop fs -ls /
Found 4 items
-rw-r--r--   3 root supergroup        114 2018-01-25 07:59 /b.txt
drwx-wx-wx   - root supergroup          0 2018-01-26 03:34 /data
drwx-wx-wx   - root supergroup          0 2018-01-24 09:24 /tmp
drwxr-xr-x   - root supergroup          0 2018-01-24 09:21 /user
#獲得dfs 指令的幫助資訊
dfs -help;

hive -d  a=b  #設定 k=v 在hive cli 中設定變量替換
    hive> set a;
    a=b
    hive> set hivevar:a
    a=b

hive --defin key=value  實際上和 hive cli --hivevar key=value 是等價的。二者都可以讓使用者在指令行定義使用者自定義變量以便在hive腳本中引用，來滿足不同情況的執行。注意這個功能隻有在hive 0.8 版本和之後的版本才支援

hive 中變量和屬性命名空間：

命名空間使用權限描述

hivevar 可讀寫 hive 0.8及後續版本支援，使用者自定義變量

hiveconf 可讀寫 hive 相關配置屬性

system 可讀寫 java 定義的配置屬性

env 隻讀 Shell 環境，定義的環境變量

hive 變量内部是以java 字元串的方式存儲的。使用者可以在查詢中引用變量。hive會優先使用變量值替換掉查詢的變量引用，然後才會将查詢語句送出給查詢處理器

hive cli set 指令，會列印命名空間hivevar,hiveconf,system和env中所有的變量。使用-v 标記，則還會列印Hadoop 中定義的屬性，列如HDFS 和MapReduce的屬性

[[email protected] ~]# hive --define filed=username

    hive> set filed;
    filed=username
    hive> set hivevar:filed;
    hivevar:filed=username
    hive> set filed=user_name;
    hive> set filed;
    filed=user_name
    hive> set hivevar:filed;
    hivevar:filed=username
    hive> create table userinfo(id int,${filed} string);
    OK
    Time taken:  seconds
    hive> create table userinfo1(id int,${hivevar:filed} string);
    OK
    Time taken:  seconds
    hive> desc userinfo;
    OK
    id                      int                                         
    username                string                                      
    Time taken:  seconds, Fetched:  row(s)
    hive> desc userinfo1;
    OK
    id                      int                                         
    username                string                                      
    Time taken:  seconds, Fetched:  row(s)
    hive>

從上面的代碼可以看出，在shell終端使用hive指令，然後帶上--define 參數定義的變量和在hive cli 終端中使用set hivevarvariable_name=value  定義的變量等價。但是在hive cli 終端，修改以存在變量的值必須帶上hivevar: 字首，否則不生效，擷取以存在變量的值不需要帶hivevar: 這個字首。同樣的在hive cli 終端定義變量，必須帶上hivevar: 這個字首

下面兩個指令都是擷取hivevar 這個命名空間中，foo 這個變量的值
set foo; # 如果之前沒有定義過這個變量(沒有運作這條指令 set hivevar:foo)，預設是擷取hiveconf 這個命名空間的
set hivevar:foo

set hivevar:foo=test
這個指令的意思是将hivevar 命名空間中foo的值設定為test，如果這個變量以存在就是修改變量的值，如果這個變量不存在就建立這個變量并指派test
set foo=test1
這條指令等同于 set hiveconf:foo=test1
set key=value 這個是設定hive的配置不是設定hive變量，屬于hiveconf 命名空間，等價于hive-site.xml 檔案中的配置，但是離開hive終端後設定失效

詳情見下面的執行個體：

hive> set var_name;
    var_name=test5
    hive> set hiveconf:var_name;
    hiveconf:var_name=test5

    hive> set hive.exec.scratchdir=/tmp/mydir;
     #等價于 ${HIVE_HOME}/conf/hive-site.xml 這個檔案中添加下面4行
      <configuration>
          <!--  前面配置省略  -->
           <property>
               <name>hive.exec.scratchdir</name>
               <value>/tmp/mydir</value>
               <description>Scratch space for Hive jobs</description>
          </property>
          <!--  後面配置省略  -->
      </configuration>

[[email protected] ~]# hive --hiveconf y=5  


    Logging initialized using configuration in jar:file:/data/tools/hive/lib/hive-common-.jar!/hive-log4j.properties
    hive> set y；
    y=
    hive> select * from word_count where count=${hiveconf:y};  #hiveconf:  這個字首不能省
    OK
    by      
    day     
    life    
    what    
    when    
    Time taken:  seconds, Fetched:  row(s)

--conf 等同于在hive cli 終端使用set 設定屬性一樣

hive> set y;
    y is undefined
    hive> set y=;
    hive> select * from word_count where count=${hiveconf:y};
    OK
    by      
    day     
    life    
    what    
    when    
    Time taken:  seconds, Fetched:  row(s)

可以将env 命名空間中的環境變量傳遞到hive cli 中

如果--hiveconf  後面接一個檔案名或者目錄，貌似不起作用，這個有待研究，下面看一個執行個體：

[[email protected] ~]# cat /tmp/hive-site.xml 
        <configuration>


            <property>
                <name>hive.exec.compress.output</name>
                <value>true</value>
            </property>


            <property>
                <name>map.input.file</name>
                <value>/b.txt</value>
            </property>

        </configuration>

啟動hive

[[email protected] ~]# hive --hiveconf /tmp/hive-site.xml 

        hive> set hiveconf:hive.exec.compress.output; # 這個屬性預設值是false，上面的配置檔案設定的是true，可是沒有生效
        hiveconf:hive.exec.compress.output=false
        hive> set map.input.file; # 這個屬性沒有預設值，上面的配置檔案設定了，可以這裡同樣沒有生效
        map.input.file is undefined

[[email protected] ~]# export a=10 # 注意這裡需要使用export 将a變成系統環境變量，如果沒有export這個關鍵字，a這個變量是無法在hive中引用的
[[email protected] ~]# echo $a
10
hive> set env:a;
env:a=10
hive> select * from word_count where count = ${env:a};
OK
a       10
we      10
Time taken: 0.501 seconds, Fetched: 2 row(s)

另一種實作方式使用 hive -e  這樣就不需要使用export關鍵字

COUNT=5 hive -e 'select * from word_count where count=${env:COUNT}';
#輸出：
by      5
day     5
life    5
what    5
when    5
# 驗證COUNT 這個變量在hive中引用是否正确
hive -e 'select * from word_count where count=5';
#輸出結果：
by      5
day     5
life    5
what    5
when    5
# 結果一樣

當使用者不能完整記清楚某個屬性名是，可以使用下面的方法來獲得

[[email protected] ~]# hive -S -e 'set' | grep warehouse

    hive.metastore.warehouse.dir=/user/hive/warehouse
    hive.warehouse.subdir.inherit.perms=true

查詢表的時候，顯示字段名稱

hive (default)> set hive.cli.print.header=true;
hive (default)> select * from word_count limit 3;
OK
word_count.word word_count.count
        81
a       10
about   2
Time taken: 0.396 seconds, Fetched: 3 row(s)

hive 基本指令介紹

繼續閱讀

luogu1078 文化之旅

Hadoop離線_Hive的基本操作

Hive中内部表、外部表、分區、分桶以及SQL的執行順序

Hive中的内部表外部表及分區表

Hive---外部分區表的建立

Hive學習筆記 3 Hive的資料模型：内部表、分區表、外部表、桶表、視圖

HiveQL(二):分區表

Hive的分區表入門

Hive的分區表

Hive（二）--分區分桶，内部表外部表

大資料高頻面試題之Hive的小檔案合并

世界因大資料而改變

hive sql通過具體位址解析出行政區劃(省＞市＞區＞縣＞鄉＞鎮＞村)

Hive最全常見錯誤及解決方案hive --service metastore &

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

HiveQl語句應用執行個體：WordCount具體步驟如下：