天天看点

[4]Carbondata integration-presto查询carbondata

1、编译carbondata获得presto connector相关jar.

参考:CarbonData编译与可能的依赖错误

在presto(建议0.210+版本,否则spi接口不一致presto无法识别carbondata)安装目录的plugin目录下新建carbondata目录,将carbondata编译生成的相关jar拷贝到该新建目录:

cd plugin
mkdir carbondata
cp <carbon-data-installation-directory>/integration/presto/target/carbondata-presto-1.5.1-SNAPSHOT/* <presto-installation-directory>/plugin/carbondata

           

2、presto相关配置

这里演示单机presto

在etc/catalog下新建carbondata.properties

connector.name=carbondata
hive.metastore.uri=thrift://localhost:9083 

           
Carbondata becomes one of the supported format of presto hive plugin, so the configurations and setup is similar to hive connector of presto. Please refer https://prestodb.io/docs/current/connector/hive.html for more details.
Note: Since carbon can work only with hive metastore, it is necessary that spark also connects to same metastore db for creating tables and updating tables. All the operations done on spark will be reflected in presto immediately. It is mandatory to create Carbon tables from spark using CarbonData 1.5.2 or greater version since input/output formats are updated in carbon table properly from this version.

其他配置文件参考

(1)config.properties

coordinator=true
datasources=mysql,hive,carbondata
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080

           

(2) jvm.config

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Dcarbon.properties.filepath=/xxxxx/carbon.properties
           
carbon.properties.filepath property is used to set the carbon.properties file path and it is recommended to set otherwise some features may not work. Please check the above example.

(3) node.properties

node.environment=test
node.id=1-1-1-1-1
node.data-dir=/xxx/software/presto212/data
           

(4)log.properties

com.facebook.presto=DEBUG
com.facebook.presto.server.PluginManager=DEBUG
           

3、presto查询

下载配置presto CLI

wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.210/presto-cli-0.212-executable.jar
mv presto-cli-0.212-executable.jar presto
chmod +x presto

           

已经通过spark,在carbondata建立一张表:

default.test_table

, 具体见:Installing and Configuring CarbonData to run locally with Spark Shell

启动presto,通过CLI连接

./presto --server localhost:8080 --catalog carbondata --schema default


presto:default> show catalogs;
  Catalog
------------
 carbondata
 hive
 jmx
 mysql
 system
(5 rows)

Query 20190322_135336_00000_5vn59, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

presto:default> use carbondata ;
USE

presto:default> show schemas;
       Schema
--------------------
 default
 hive_test
 information_schema
(3 rows)

Query 20190322_135550_00006_5vn59, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 49B] [13 rows/s, 224B/s]

presto:default> show tables from default;
                                   Table
----------------------------------------------------------------------------
 test_table
(2 rows)

Query 20190322_135623_00007_5vn59, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [2 rows, 118B] [5 rows/s, 320B/s]

presto:default> select * from default.test_table;
 id | name  |   city   | age
----+-------+----------+-----
 1  | david | shenzhen |  31
 2  | eason | shenzhen |  27
 3  | jarry | wuhan    |  35
(3 rows)

Query 20190322_135655_00008_5vn59, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:03 [3 rows, 122B] [1 rows/s, 45B/s]