Apache Drill學習

簡介

Apache Drill is a low latency distributed query engine for large-scale datasets, including structured and semi-structured/nested

data.

官網：

http://drill.apache.org/

Apache Drill的用途：Drill是SQL查詢引擎，可以建構在幾乎所有的NoSQL資料庫或檔案系統(如：Hive, HDFS, mongo db, Amazon S3等)上，用來加速查詢，比如，我們所熟知的Hive，用于在hdfs進行類SQL查詢，但是利用Hive的速度比較慢，是以可以利用Drill一類的查詢引擎加速查詢，用于分布式大資料的實時查詢等場景。

架構

drill 通過 Storage plugin interface 即插件的形式實作在不同的資料源上建構查詢引擎。

安裝，分為嵌入模式與分布模式。

這裡介紹linux下嵌入模式的安裝：

嵌入模式無需做相關配置，較簡便，首先要安裝JDK 7；

進入到待安裝目錄，打開shell；

下載下傳安裝包，運作以下指令中其中一條：

wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz

或curl

-o apache-drill-1.1.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz

下載下傳檔案到待安裝目錄後(或下載下傳後移動至安裝目錄)；

解壓縮安裝包，執行指令tar -xvzf <.tar.gz file name>

解壓縮後，進入目錄，此處解壓過後的目錄為apache-drill-1.1.0，執行指令 bin/drill，如圖

此時可能會報錯，顯示記憶體不足，這裡可以在子目錄conf中修改drill-env.sh檔案中的預設記憶體配置設定設定即可，預設是4G，對于一般家用機器，必然會報錯。

即啟動嵌入模式drill。

上圖中最後一行表明drill已啟動，可以開始執行查詢，最後一行指令提示符的含義為，0表示連接配接數，jdbc表示連接配接類型，zk=local表示使用ZooKeeper本地節點。

退出指令 !quit

drill web通路接口，在浏覽器輸入 http://<IP address or host name>:8047

即可，通路效果如圖：

以上我們安裝好了drill工具，但是并未将其與我們的特定資料源關聯，以下我們進行相關配置，使其可以對具體資料執行查詢。

1. 記憶體配置，如上，修改，在drill-env.sh中修改參數 XX:MaxDirectMemorySize 即可。

2. 配置多使用者設定

3. 配置使用者權限與角色

4. 。。。待續

連接配接資料源

dril連接配接資料源，通過存儲插件形式，這樣增加了靈活性，對于不同的資料源，通過插件實作多資料源的相容，drill可以連接配接資料庫，檔案，分布式檔案系統，hive metastore等。

可以通過三種方式指定配置存儲插件配置：

(1) 通過查詢中的FROM語句

(2) 在查詢語句前使用USE指令

(3) 在啟動drill時指定

Web配置方式

可以在 http://<IP address>:8047/storage 檢視和配置存儲插件，存在以下選項，

cp連接配接jar file

dfs連接配接本地檔案系統或任何分布式檔案系統，如hadoop，amazon s3等

hbase連接配接Hbase

hive連接配接hive metastore

mongo連接配接MongoDB

點選進入update選項，可以配置資料格式等選項，

可以輸入存儲插件名字建立新的存儲插件，如圖

dfs插件配置示例，如圖

drill插件可配置屬性介紹

Attribute	Example Values	Required	Description
"type"	"file" "hbase" "hive" "mongo"	yes	A valid storage plugin type name.
"enabled"	true false		State of the storage plugin.
"connection"	"classpath:///" "file:///" "mongodb://localhost:27017/" "hdfs://"	implementation-dependent	The type of distributed file system, such as HDFS, Amazon S3, or files in your file system, and an address/path name.
"workspaces"	null "logs"	no	One or more unique workspace names. If a workspace name is used more than once, only the last definition is effective.
"workspaces". . . "location"	"location": "/Users/johndoe/mydata" "location": "/tmp"		Full path to a directory on the file system.
"workspaces". . . "writable"			One or more unique workspace names. If defined more than once, the last workspace name overrides the others.
"workspaces". . . "defaultInputFormat"	"parquet" "csv" "json"		Format for reading data, regardless of extension. Default = "parquet"
"formats"	"psv" "tsv" "avro" "maprdb" *		One or more valid file formats for reading. Drill implicitly detects formats of some files based on extension or bits of data in the file; others require configuration.
"formats" . . . "type"	"text"		Format type. You can define two formats, csv and psv, as type "Text", but having different delimiters.
formats . . . "extensions"	["csv"]	format-dependent	File name extensions that Drill can read.
"formats" . . . "delimiter"	"\t" ","		Sequence of one or more characters that serve as a record separator in a delimited text file, such as CSV. Use a 4-digit hex code syntax \uXXXX for a non-printable delimiter.
"formats" . . . "quote"	"""		A single character that starts/ends a value in a delimited text file.
"formats" . . . "escape"	"`"		A single character that escapes a quotation mark inside a value.
"formats" . . . "comment"	"#"		The line decoration that starts a comment line in the delimited text file.
"formats" . . . "skipFirstLine"			To include or omit the header when reading a delimited text file. Set to true to avoid reading headers as data.

也可以通過Drill Rest API進行插件配置，使用POST方式傳遞名字和配置兩個屬性，例如

curl -X POST -/json" -d '{"name":"myplugin", "config": {"type": "file", "enabled": false, "connection": "file:///", "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": null}}, "formats": null}}' https://localhost:8047/storage/myplugin.json

上面指令建立一個名為myplugin的插件，用于查詢本地檔案系統根目錄的未知檔案類型。

介紹連接配接hive的配置，首先確定hive metastore服務啟動，

hive.metastore.uris

hive
 --service metastore

進入Drill Web接口，進入Store頁籤， http://<IP address>:8047/storage

點選hive旁的update選項，進行配置，如圖

進入配置界面，在預設内容上添加如下，

Apache Drill學習

繼續閱讀

七牛雲-C#SDK-上傳-前期準備

伺服器配置——Apache

Apache靜态檔案通路配置（書封伺服器）

apache httpd 配置

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

Ubuntu14.04 LTS下安裝mongodb

ubuntu14.04下安裝hbse1.0.1.1

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服務

Apache2.4.x 配置檔案詳解Apache配置需要了解如下：開始講解：

配置apache支援PHP（win7）

neo4j之cypher使用文檔

vue-cli簡介（中文翻譯）

sqlServer根據經緯查距離

Ajax發送和擷取json資料到Spring mvc 1.spring mvc後端2.web前段

JSONObject包導入異常 java.lang.NoClassDefFoundErrorweb項目的導入包的問題