天天看點

drill1.0配置hive storage plugin及測試drill1.0配置hive storage plugin及測試

drill1.0配置hive storage plugin及測試

drill,hive

截止到目前本部落格釋出前,apache drill最新釋出版本是1.0.0,對與此版本的資料源支援和檔案格式的支援:

  • avro
  • parquet
  • hive
  • hbase
  • csv tsv psv
  • File system 

    對于目前我的需求:snappy+sequencefile 的hdfs存儲格式,drill沒有直接的支援,想到hive支援查詢snappy+sequencefile,而drill支援hive,由此産生了是否可以通過hive storage plugin的方式來讀取snappy+sequencefile? 經查證是可以的,配置如下:

    1. hive開啟metastore的thrift服務:在hive-site.xml中加入如下配置 
      1. <property>

      2. <name>hive.metastore.uris</name>

      3. <value>thrift://10.170.250.47:9083</value>

      4. </property>

      5. <property>

      6. <name>hive.metastore.local</name>

      7. <value>false</value>

      8. </property>

    啟動metastore服務:
    1. [[email protected] local]$ ../hive-1.2.1/bin/hive --service metastore &

    1. 從drill的web ui上配置hive的plugin: 
      1. {

      2. "type":"hive",

      3. "enabled":true,

      4. "configProps":{

      5. "hive.metastore.uris":"thrift://10.170.250.47:9083",#hive的metastore服務位址和端口

      6. "javax.jdo.option.ConnectionURL":"jdbc:mysql://xxx:3306/hive_database",

      7. "hive.metastore.warehouse.dir":"/user/hive/warehouse",#為hive在hdfs上的warehouse目錄

      8. "fs.default.name":"hdfs://xxx:9000",

      9. "hive.metastore.sasl.enabled":"false"

      10. }

      11. }

    儲存退出後,重新開機drillbit服務
    1. [[email protected] drill-1.1.0]$ bin/drillbit.sh restart

    2. ```

    3. 3. 查詢sequencefile測試:

    4. ``` shell

    5. [[email protected] drill-1.1.0]$ bin/sqlline -u jdbc:drill:zk=10.172.171.229:2181

    6. apache drill 1.0.0

    7. "the only truly happy people are children, the creative minority and drill users"

    8. 0: jdbc:drill:zk=10.172.171.229:2181>use hive.ai;

    9. +-------+--------------------------------------+

    10. | ok | summary |

    11. +-------+--------------------------------------+

    12. |true|Default schema changed to [hive.ai]|

    13. +-------+--------------------------------------+

    14. 1 row selected (0.188 seconds)

    15. 0: jdbc:drill:zk=10.172.171.229:2181>!table

    16. +------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+

    17. | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |

    18. +------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+

    19. | DRILL | INFORMATION_SCHEMA | CATALOGS | TABLE |||||||

    20. | DRILL | INFORMATION_SCHEMA | COLUMNS | TABLE |||||||

    21. | DRILL | INFORMATION_SCHEMA | SCHEMATA | TABLE |||||||

    22. | DRILL | INFORMATION_SCHEMA | TABLES | TABLE |||||||

    23. | DRILL | INFORMATION_SCHEMA | VIEWS | TABLE |||||||

    24. | DRILL | hive.ai | metric_data_entity | TABLE |||||||

    25. | DRILL | sys | boot | TABLE |||||||

    26. | DRILL | sys | drillbits | TABLE |||||||

    27. | DRILL | sys | memory | TABLE |||||||

    28. | DRILL | sys | options | TABLE |||||||

    29. | DRILL | sys | threads | TABLE |||||||

    30. | DRILL | sys | version | TABLE |||||||

    31. +------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+

    32. 0: jdbc:drill:zk=10.172.171.229:2181> SELECT count(1) FROM metric_data_entity where pt='2015080510';

    33. +-----------+

    34. | EXPR$0 |

    35. +-----------+

    36. |40455402|

    37. +-----------+

    38. 1 row selected (14.482 seconds)

    39. 0: jdbc:drill:zk=10.172.171.229:2181>

    40. ```

    41. 以上查詢已經可以支援sequencefile查詢,但是查詢有壓縮的snappy的檔案就報錯:

    42. ```

    43. 2015-08-0516:34:49,067[WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager- org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exce

    44. ption.

    45. java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

    46. at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(NativeMethod)

    47. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)~[na:1.7.0_85]

    48. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[na:1.7.0_85]

    49. at java.lang.Thread.run(Thread.java:745)[na:1.7.0_85]

    50. 2015-08-0516:39:05,781[UserServer-1] INFO o.a.drill.exec.work.foreman.Foreman-State change requested. RUNNING --> CANCELLATION_REQUESTED

    很明顯要配置snappy的本地庫:LD_LIBRARY_PATH環境變量,請配置下面的第四步
  1. 配置LD_LIBRARY_PATH=/oneapm/local/hadoop-2.7.1/lib/native的 系統環境變量 并加入到CLASSPATH中

參考文獻: https://drill.apache.org/docs/hive-storage-plugin/ https://gist.github.com/vicenteg/7e060e79603f1e7ed3b4 http://blog.csdn.net/reesun/article/details/8556078

繼續閱讀