天天看點

CitusDB, PostgreSQLs Use Hadoop Distribute Query - 4 : Query Data IN HDFS

CitusDB, PostgreSQLs Use Hadoop Distribute Query - 4 : Query Data IN HDFS

Postgres2015全國使用者大會将于11月20至21日在北京麗亭華苑酒店召開。本次大會嘉賓陣容強大,國内頂級PostgreSQL資料庫專家将悉數到場,并特邀歐洲、俄羅斯、日本、美國等國家和地區的資料庫方面專家助陣:

  • Postgres-XC項目的發起人鈴木市一(SUZUKI Koichi)
  • Postgres-XL的項目發起人Mason Sharp
  • pgpool的作者石井達夫(Tatsuo Ishii)
  • PG-Strom的作者海外浩平(Kaigai Kohei)
  • Greenplum研發總監姚延棟
  • 周正中(德哥), PostgreSQL中國使用者會創始人之一
  • 汪洋,平安科技資料庫技術部經理
  • ……
  • 2015年度PG大象會報名位址:http://postgres2015.eventdove.com/
  • PostgreSQL中國社群: http://postgres.cn/
  • PostgreSQL專業1群: 3336901(已滿)
  • PostgreSQL專業2群: 100910388
  • PostgreSQL專業3群: 150657323
CitusDB, PostgreSQLs Use Hadoop Distribute Query - 4 : Query Data IN HDFS

前面幾篇文章分别介紹了CitusDB在單台伺服器上的安裝和配置, file_fdw的安裝和使用, 以及hadoop_sync的安裝. 接下來的這篇文章将要搭建CitusDB+HDFS的環境, 利用file_fdw, hadoop_sync以及CitusDB的SQL引擎并行查詢HDFS中的結構化資料. 本文的測試環境 : 

OS : CentOS 5.x x64
   
   
    Hadoop : 1.1.2
   
   
    CitusDB : 2.0
   

      

4台主機 : 

Hadoop namenode, CitusDB master
   
   
    172.16.3.150
   
   
    Hadoop datanode, CitusDB worker
   
   
    172.16.3.33
   
   
    172.16.3.39
   
   
    172.16.3.40
   

      

首先在所有節點建立hadoop的運作以及管理使用者.

[[email protected]-172-16-3-150 data06]# useradd digoal
      

其他節點操作同上

所有節點,  下載下傳hadoop軟體包 : 

[[email protected]-172-16-3-150 opt]# cd /opt
   
   
    [[email protected] opt]# wget http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-1.1.2/hadoop-1.1.2.tar.gz
   
   
    [[email protected] opt]# tar -zxvf hadoop-1.1.2.tar.gz
   
   
    [r[email protected] ~]# chown -R digoal:digoal /opt/hadoop-1.1.2
   

      

其他節點操作同上.

hadoop 安裝前提要素 :  所有節點, 安裝rsync, ssh, sshd包: 

yum install openssh-server
   
   
    yum install openssh-clients
   
   
    yum install rsync
   

      

所有節點, 配置允許使用公鑰連接配接, 這是hadoop namenode節點遠端調用datanode節點腳本的方法, 與Greenplum的管理方式類似 : 

/etc/ssh/sshd_config
   
   
    PubkeyAuthentication yes
   

      

namenode節點, 在hadoop的管理使用者下生成公鑰 : 

su - digoal
  
  
   [email protected]-172-16-3-150-> ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  
  
   [email protected]-172-16-3-150-> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
  
  
   
    [email protected]-172-16-3-150-> chmod 700 ~
   
   
    [email protected]-172-16-3-150-> chmod 700 ~/.ssh
   
   
    [email protected]> chmod 600 ~/.ssh/authorized_keys
   
  
  
   
    [email protected]-172-16-3-150-> cd .ssh
   
   
    [email protected]-172-16-3-150-> cat id_dsa.pub 
   
   
    ssh-dss AAAAB3NzaC1kc3MAAACBANBTKT4G4PZ5PC8SWrScHaB1PpVQ38BRQU2TiqeAZLU2Du2c8gP1h4XhO703R1v7K1r5isSyaVt14MIMBX5FKRUfwC7FNFMzr9VpjeprJRbDByYhsyXiMj5ziDamXq8VwUiaexfqPXbnE1n43od3M03bI1vOvXr7u8uG2BXKZXddAAAAFQCsSXXqmnwr9zlSp5Bc56WH98B67wAAAIEAoXahSqPMCpedHO5gEQcrZ00JGrnXVRx1HL7XHqNg+00ZvgqCmWr8WzYQJS5SzZ1E7PgPDLLJ/ym1AIrirYUfCttry8/YJBUO1B1jTWIvsHDblnKQPNeS4opuTcaJdklyfdeFS/sQrUU+EQfcmwpCVIY6gOduKm5PCE7xlaSZMm8AAACAEvWgs402kRZhWgvXCqOoOZhdFHLJ7h+53gmEpHa+Rhu4i7ag1RsK15q/aTt3eGFt3xbZS4GYT6LnBYM0TOB0yO3cmKoohNx1A7SZIYxnA1x48G3HFddwJdATP4xnK0VURI5JbMjkgoY2F91r5xwKdwf+Ypk7CBDVm3kjcJ+UCrU= [email protected]-172-16-3-150.sky-mobi.com
   
  

      

datanode節點,  拷貝namenode節點的公鑰id_dsa.pub的内容到本地的 ~/.ssh/authorized_keys

[email protected]-172-16-3-33-> vi ~/.ssh/authorized_keys
  
  
   ssh-dss AAAAB3NzaC1kc3MAAACBAItGIu3Uf+vg9DatqoJt35J3cOSTJnlt8uqoRF9InRH7tWN5g4j9WE+Ol5pLjCSlpIysTxolmbZBj9muhcH4qmpLnI5Y4COVv975woHdgCUeXPeWUJ8J56cNEHLQS0KdEtd6eqQrIRpNnHEiLyv/75ID6HjOIld+JueSEQQCDxl/AAAAFQDyBVg8sB+e5zhRixuiSSEJzkiTqwAAAIBN+tkwON6kH26bNZWLK287GiZi6ymr1AdGTNVLW3cdrliFN8ENI3XQ5T7APz8bS+sgteg+Hwyz1gIfuaCw4srCInh1a3MTb2Mk4KvHK7DdplgaMWmQUjvSeoQoV6qeDJPeUoaIR5JnX4HZQLEqYO+NjPeLc4/uUKGSNycXZIqQqwAAAIBqAMd3YP6Pvu1BFyRZGslRu0us+xhEE375mpxLX1Csj4/WHWZvPHVYm3jiJVKS9s5So9a/7uKYKwTJCPZ6bBONtKEUEgu0oPMDwBlFbqj0VIf3zVaiWo34h+w2UQE6bb/pYstScqmWSn5A+mQ7uJY3HCgdKxGhm6B8k3kgc7faKw== [email protected]-172-16-3-150.sky-mobi.com
  
  
   
    [email protected]-172-16-3-33-> chmod 700 ~
   
   
    [email protected]-172-16-3-33-> chmod 700 ~/.ssh
   
   
    [email protected]> chmod 600 ~/.ssh/authorized_keys
   
  
  
   另外兩個datanode的操作同上.
  

      

所有節點, 在沒有DNS的環境中, 需要配置namenode和datanode的主機名資訊(與GreenPlum類似) :  不要使用1個IP對應多個主機名, 否則會帶來不必要的麻煩, 見本文末尾部分.

[[email protected]-172-16-3-150 ~]# vi /etc/hosts
  
  
   
    172.16.3.150 db-172-16-3-150.sky-mobi.com
   
   
    172.16.3.33 db-172-16-3-33.sky-mobi.com
   
   
    172.16.3.39 db-172-16-3-39.sky-mobi.com
   
   
    172.16.3.40 db-172-16-3-40.sky-mobi.com
   
  

      

在namenode節點的hadoop管理使用者下, 連接配接datanode主機的管理使用者無密碼認證驗證 : 

[email protected]-172-16-3-150-> ssh db-172-16-3-150.sky-mobi.com date
    
    
     Thu Mar 21 16:21:58 CST 2013
    
    
     [email protected]-172-16-3-150-> ssh db-172-16-3-33.sky-mobi.com date
    
    
     Thu Mar 21 16:21:59 CST 2013
    
    
     [email protected]-172-16-3-150-> ssh db-172-16-3-39.sky-mobi.com date
    
    
     Thu Mar 21 16:22:00 CST 2013
    
    
     [email protected]-172-16-3-150-> ssh db-172-16-3-40.sky-mobi.com date
    
    
     Thu Mar 21 15:49:33 CST 2013
    

      

所有節點, 安裝java環境. 注意這裡JAVA_HOME=/usr : 

[[email protected]-172-16-3-33 opt]# yum install java
  
  
   
    [[email protected] conf]# java -version
   
   
    java version "1.6.0_24"
   
   
    OpenJDK Runtime Environment (IcedTea6 1.11.9) (rhel-1.36.1.11.9.el5_9-x86_64)
   
   
    OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
   
   
    [[email protected] conf]# which java
   
   
    /usr/bin/java
   
  

      

所有節點, 配置hadoop目錄資訊 : 

hadoop 運作日志目錄 : 
  
  
   mkdir /var/log/hadoop
  
  
   chown -R digoal:digoal /var/log/hadoop
  
  
   namenode 事物日志目錄 : 
  
  
   mkdir /hadoop/tdata
  
  
   chown -R digoal:digoal /hadoop
  
  
   datanode 資料塊目錄 : 
  
  
   mkdir /hadoop/data
  
  
   chown -R digoal:digoal /hadoop
  

      

namenode節點, 配置hadoop core-site.xml, 0.0.0.0指監聽所有接口

[[email protected]-172-16-3-150 opt]# vi /opt/hadoop-1.1.2/conf/core-site.xml
  
  
   
    <?xml version="1.0"?>
   
   
    <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
   
   
    

   
   
    <!-- Put site-specific property overrides in this file. -->
   
   
    

   
   
    <configuration>
   
   
      <property>
   
   
        <name>fs.default.name</name>
   
   
        <value>hdfs://0.0.0.0:9000</value>
   
   
      </property>
   
   
    </configuration>
   
  

      

# namenode節點確定9000端口未被使用.

[[email protected]-172-16-3-150 conf]# netstat -anp|grep 9000
  
  
   [[email protected] conf]# 
  

      

datanode 節點配置 hadoop core-site.xml, 注意這裡配置的不是0.0.0.0, 而是namenode節點的主機名(或DNS中的名稱)

[[email protected]-172-16-3-150 opt]# vi /opt/hadoop-1.1.2/conf/core-site.xml
   
   
    <?xml version="1.0"?>
   
   
    <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
   
   
    

   
   
    <!-- Put site-specific property overrides in this file. -->
   
   
    

   
   
    <configuration>
   
   
      <property>
   
   
        <name>fs.default.name</name>
   
   
        <value>hdfs://db-172-16-3-150.sky-mobi.com:9000</value>
   
   
      </property>
   
   
    </configuration>
   
   
    # 其他節點配置同上
   

      

# 所有節點配置hadoop hdfs-site.xml

[[email protected]-172-16-3-150 conf]# vi /opt/hadoop-1.1.2/conf/hdfs-site.xml
  
  
   
    <?xml version="1.0"?>
   
   
    <?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
   
   
    

   
   
    <!-- Put site-specific property overrides in this file. -->
   
   
    

   
   
    <configuration>
   
   
      <property>
   
   
        <name>dfs.name.dir</name>
   
   
        <value>/hadoop/tdata</value>
   
   
      </property>
   
   
      <property>
   
   
        <name>dfs.data.dir</name>
   
   
        <value>/hadoop/data</value>
   
   
      </property>
   
   
      <property>
   
   
        <name>dfs.block.local-path-access.user</name>
   
   
        <value>digoal</value>
   
   
      </property>
   
   
    </configuration>
   
  

      

# 在 name node節點 配置datanode的主機資訊.

[[email protected]-172-16-3-150 ~]# su - digoal 
   
   
    [email protected]> vi /opt/hadoop-1.1.2/conf/slaves
   
  
  
   
    db-172-16-3-33.sky-mobi.com
   
   
    db-172-16-3-39.sky-mobi.com
   
   
    db-172-16-3-40.sky-mobi.com
   
  

      

# 在所有節點配置hadoop運作環境配置檔案.

[email protected]-172-16-3-150-> which java
    
    
     /usr/bin/java
    
   
   
    [email protected]-172-16-3-150-> vi /opt/hadoop-1.1.2/conf/hadoop-env.sh
   
   
    
     export JAVA_HOME=/usr
    
   
   
    export HADOOP_LOG_DIR=/var/log/hadoop
   

      

# 确認調用hadoop-env.sh後環境變量生效.

[email protected]-172-16-3-150-> . /opt/hadoop-1.1.2/conf/hadoop-env.sh
  
  
   [email protected]-172-16-3-150-> echo $JAVA_HOME
  
  
   /usr
  

      

# 在namenode節點初始化namespace資訊.

[email protected]-172-16-3-150-> /opt/hadoop-1.1.2/bin/hadoop namenode -format
  
  
   13/03/21 17:27:49 INFO namenode.NameNode: STARTUP_MSG: 
  
  
   /************************************************************
  
  
   STARTUP_MSG: Starting NameNode
  
  
   STARTUP_MSG:   host = db-172-16-3-150.sky-mobi.com/172.16.3.150
  
  
   STARTUP_MSG:   args = [-format]
  
  
   STARTUP_MSG:   version = 1.1.2
  
  
   STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
  
  
   ************************************************************/
  
  
   Re-format filesystem in /hadoop/tdata ? (Y or N) y
  
  
   Format aborted in /hadoop/tdata
  
  
   13/03/21 17:28:07 INFO namenode.NameNode: SHUTDOWN_MSG: 
  
  
   /************************************************************
  
  
   SHUTDOWN_MSG: Shutting down NameNode at db-172-16-3-150.sky-mobi.com/172.16.3.150
  
  
   ************************************************************/
  

      

# 在namenode節點啟動HDFS, 這裡調用了start-all.sh, 實際上調用start-dfs.sh就可以了, 因為CitusDB不需要用Hadoop提供的mapreduce.

[email protected]-172-16-3-150-> /opt/hadoop-1.1.2/bin/start-all.sh
   
   
    starting namenode, logging to /var/log/hadoop/hadoop-digoal-namenode-db-172-16-3-150.sky-mobi.com.out
   
   
    db-172-16-3-40: starting datanode, logging to /var/log/hadoop/hadoop-digoal-datanode-db-172-16-3-40.sky-mobi.com.out
   
   
    db-172-16-3-39: starting datanode, logging to /var/log/hadoop/hadoop-digoal-datanode-db-172-16-3-39.sky-mobi.com.out
   
   
    db-172-16-3-33: starting datanode, logging to /var/log/hadoop/hadoop-digoal-datanode-db-172-16-3-33.sky-mobi.com.out
   
   
    localhost: starting secondarynamenode, logging to /var/log/hadoop/hadoop-digoal-secondarynamenode-db-172-16-3-150.sky-mobi.com.out
   
   
    starting jobtracker, logging to /var/log/hadoop/hadoop-digoal-jobtracker-db-172-16-3-150.sky-mobi.com.out
   
   
    db-172-16-3-39: starting tasktracker, logging to /var/log/hadoop/hadoop-digoal-tasktracker-db-172-16-3-39.sky-mobi.com.out
   
   
    db-172-16-3-40: starting tasktracker, logging to /var/log/hadoop/hadoop-digoal-tasktracker-db-172-16-3-40.sky-mobi.com.out
   
   
    db-172-16-3-33: starting tasktracker, logging to /var/log/hadoop/hadoop-digoal-tasktracker-db-172-16-3-33.sky-mobi.com.out
   

      

# 啟動後檢視9000端口是否監聽.

[email protected]-172-16-3-150-> netstat -anp|grep 9000
   
   
    (Not all processes could be identified, non-owned process info
   
   
     will not be shown, you would have to be root to see it all.)
   
   
    tcp        0      0 0.0.0.0:9000           0.0.0.0:*                   LISTEN      18985/java          
   
   
    tcp        0      0 172.16.3.150:13243          172.16.3.150:9000           ESTABLISHED 18985/java          
   
   
    tcp        0      0 172.16.3.150:13246          172.16.3.150:9000           ESTABLISHED 19195/java          
   
   
    tcp        0      0 172.16.3.150:9000           172.16.3.40:22747           ESTABLISHED 18985/java          
   
   
    tcp        0      0 172.16.3.150:9000           172.16.3.150:13243          ESTABLISHED 18985/java          
   
   
    tcp        0      0 172.16.3.150:9000           172.16.3.150:13246          ESTABLISHED 18985/java          
   
   
    tcp        0      0 172.16.3.150:9000           172.16.3.33:60047           ESTABLISHED 18985/java          
   
   
    tcp        0      0 172.16.3.150:9000           172.16.3.39:17978           ESTABLISHED 18985/java  
   

      

# 在namenode節點下載下傳後面要用到的測試文本檔案.

[[email protected]-172-16-3-150 data06]# wget http://examples.citusdata.com/customer_reviews_1998.csv.gz
  
  
   [[email protected] data06]# wget http://examples.citusdata.com/customer_reviews_1999.csv.gz
  
  
   [[email protected] data06]# gzip -d customer_reviews_1998.csv.gz
  
  
   [[email protected] data06]# gzip -d customer_reviews_1999.csv.gz
  

      

# 在namenode節點, 建立一個HDFS檔案夾

[[email protected]-172-16-3-150 citusdb]# su - digoal
  
  
   d[email protected]> ./hadoop fs -mkdir /user/data/reviews
  

      

# 在namenode節點将下載下傳好的資料導入到hdfs中.

[[email protected]-172-16-3-150 citusdb]# su - digoal
  
  
   [email protected]> /opt/hadoop-1.1.2/bin/hadoop fs -put /data06/customer_reviews_1998.csv /user/data/reviews
  
  
   [email protected]> /opt/hadoop-1.1.2/bin/hadoop fs -put /data06/customer_reviews_1999.csv /user/data/reviews
  
  
   [email protected]> /opt/hadoop-1.1.2/bin/hadoop fs -ls /user/data/reviews
  
  
   Found 2 items
  
  
   -rw-r--r--   3 digoal supergroup  101299118 2013-03-21 17:57 /user/data/reviews/customer_reviews_1998.csv
  
  
   -rw-r--r--   3 digoal supergroup  198247156 2013-03-21 17:57 /user/data/reviews/customer_reviews_1999.csv
  

      

# 在各個datanode節點上檢視datanode 監聽端口

[[email protected]-172-16-3-33 hadoop]# netstat -anp|grep 50020
  
  
   tcp        0      0 0.0.0.0:50020               0.0.0.0:*                   LISTEN      10948/java          
  
  
   [[email protected] hadoop]# ps -ewf|grep 10948
  
  
   digoal   10948     1  2 17:53 ?        00:00:05 /usr/bin/java -Dproc_datanode -Xmx1000m -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/var/log/hadoop -Dhadoop.log.file=hadoop-digoal-datanode-db-172-16-3-33.sky-mobi.com.log -Dhadoop.home.dir=/opt/hadoop-1.1.2/libexec/.. -Dhadoop.id.str=digoal -Dhadoop.root.logger=INFO,DRFA -Dhadoop.security.logger=INFO,NullAppender -Djava.library.path=/opt/hadoop-1.1.2/libexec/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath /opt/hadoop-1.1.2/libexec/../conf:/usr/lib/tools.jar:/opt/hadoop-1.1.2/libexec/..:/opt/hadoop-1.1.2/libexec/../hadoop-core-1.1.2.jar:/opt/hadoop-1.1.2/libexec/../lib/asm-3.2.jar:/opt/hadoop-1.1.2/libexec/../lib/aspectjrt-1.6.11.jar:/opt/hadoop-1.1.2/libexec/../lib/aspectjtools-1.6.11.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-beanutils-1.7.0.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-beanutils-core-1.8.0.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-cli-1.2.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-codec-1.4.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-collections-3.2.1.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-configuration-1.6.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-daemon-1.0.1.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-digester-1.8.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-el-1.0.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-io-2.1.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-lang-2.4.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-logging-1.1.1.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-math-2.1.jar:/opt/hadoop-1.1.2/libexec/../lib/commons-net-3.1.jar:/opt/hadoop-1.1.2/libexec/../lib/core-3.1.1.jar:/opt/hadoop-1.1.2/libexec/../lib/hadoop-capacity-scheduler-1.1.2.jar:/opt/hadoop-1.1.2/libexec/../lib/hadoop-fairscheduler-1.1.2.jar:/opt/hadoop-1.1.2/libexec/../lib/hadoop-thriftfs-1.1.2.jar:/opt/hadoop-1.1.2/libexec/../lib/hsqldb-1.8.0.10.jar:/opt/hadoop-1.1.2/libexec/../lib/jackson-core-asl-1.8.8.jar:/opt/hadoop-1.1.2/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/opt/hadoop-1.1.2/libexec/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop-1.1.2/libexec/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop-1.1.2/libexec/../lib/jdeb-0.8.jar:/opt/hadoop-1.1.2/libexec/../lib/jersey-core-1.8.jar:/opt/hadoop-1.1.2/libexec/../lib/jersey-json-1.8.jar:/opt/hadoop-1.1.2/libexec/../lib/jersey-server-1.8.jar:/opt/hadoop-1.1.2/libexec/../lib/jets3t-0.6.1.jar:/opt/hadoop-1.1.2/libexec/../lib/jetty-6.1.26.jar:/opt/hadoop-1.1.2/libexec/../lib/jetty-util-6.1.26.jar:/opt/hadoop-1.1.2/libexec/../lib/jsch-0.1.42.jar:/opt/hadoop-1.1.2/libexec/../lib/junit-4.5.jar:/opt/hadoop-1.1.2/libexec/../lib/kfs-0.2.2.jar:/opt/hadoop-1.1.2/libexec/../lib/log4j-1.2.15.jar:/opt/hadoop-1.1.2/libexec/../lib/mockito-all-1.8.5.jar:/opt/hadoop-1.1.2/libexec/../lib/oro-2.0.8.jar:/opt/hadoop-1.1.2/libexec/../lib/servlet-api-2.5-20081211.jar:/opt/hadoop-1.1.2/libexec/../lib/slf4j-api-1.4.3.jar:/opt/hadoop-1.1.2/libexec/../lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop-1.1.2/libexec/../lib/xmlenc-0.52.jar:/opt/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop-1.1.2/libexec/../lib/jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.hdfs.server.datanode.DataNode
  

      

# 在namenode節點上可以看到HDFS的報告

[email protected]-172-16-3-150-> /opt/hadoop-1.1.2/bin/hadoop dfsadmin -report
  
  
   Configured Capacity: 229023145984 (213.29 GB)
  
  
   Present Capacity: 126058467328 (117.4 GB)
  
  
   DFS Remaining: 125152571392 (116.56 GB)
  
  
   DFS Used: 905895936 (863.93 MB)
  
  
   DFS Used%: 0.72%
  
  
   Under replicated blocks: 0
  
  
   Blocks with corrupt replicas: 0
  
  
   Missing blocks: 0
  
  
   

  
  
   -------------------------------------------------
  
  
   Datanodes available: 3 (3 total, 0 dead)
  
  
   

  
  
   Name: 172.16.3.33:50010
  
  
   Decommission Status : Normal
  
  
   Configured Capacity: 144471613440 (134.55 GB)
  
  
   DFS Used: 301965312 (287.98 MB)
  
  
   Non DFS Used: 96595206144 (89.96 GB)
  
  
   DFS Remaining: 47574441984(44.31 GB)
  
  
   DFS Used%: 0.21%
  
  
   DFS Remaining%: 32.93%
  
  
   Last contact: Thu Mar 21 20:41:48 CST 2013
  
  
   

  
  
   

  
  
   Name: 172.16.3.39:50010
  
  
   Decommission Status : Normal
  
  
   Configured Capacity: 42275766272 (39.37 GB)
  
  
   DFS Used: 301965312 (287.98 MB)
  
  
   Non DFS Used: 2332004352 (2.17 GB)
  
  
   DFS Remaining: 39641796608(36.92 GB)
  
  
   DFS Used%: 0.71%
  
  
   DFS Remaining%: 93.77%
  
  
   Last contact: Thu Mar 21 20:41:48 CST 2013
  
  
   

  
  
   

  
  
   Name: 172.16.3.40:50010
  
  
   Decommission Status : Normal
  
  
   Configured Capacity: 42275766272 (39.37 GB)
  
  
   DFS Used: 301965312 (287.98 MB)
  
  
   Non DFS Used: 4037468160 (3.76 GB)
  
  
   DFS Remaining: 37936332800(35.33 GB)
  
  
   DFS Used%: 0.71%
  
  
   DFS Remaining%: 89.74%
  
  
   Last contact: Thu Mar 21 20:41:48 CST 2013
  

      

# 接下來需要配置CitusDB CitusDB master節點與Hadoop namenode放在一台主機上. CitusDB worker節點與 Hadoop datanode放在一起.

# 所有節點, 建立citusdb運作使用者

[[email protected]-172-16-3-150 ~]# useradd citusdb
  
  
   #其他節點操作同上
        

# 所有節點, 下載下傳citusdb

[[email protected]-172-16-3-150 ~]# su - citusdb
   
   
    [[email protected] ~]$ wget http://packages.citusdata.com/readline-5.0/citusdb-2.0.0-1.x86_64.rpm
   
  
  
   # 将安裝包拷貝到其他節點
  
  
   [[email protected] citusdb]# scp citusdb-2.0.0-1.x86_64.rpm 172.16.3.33:/home/citusdb/
  
  
   [[email protected] citusdb]# scp citusdb-2.0.0-1.x86_64.rpm 172.16.3.39:/home/citusdb/
  
  
   [[email protected] citusdb]# scp citusdb-2.0.0-1.x86_64.rpm 172.16.3.40:/home/citusdb/
        

# 所有節點, 安裝citusdb

[[email protected]-172-16-3-150 ~]# rpm -ivh /home/citusdb/citusdb-2.0.0-1.x86_64.rpm 
  
  
   [[email protected] ~]# rpm -ivh /home/citusdb/citusdb-2.0.0-1.x86_64.rpm 
  
  
   [[email protected] ~]# rpm -ivh /home/citusdb/citusdb-2.0.0-1.x86_64.rpm 
  
  
   [[email protected] ~]# rpm -ivh /home/citusdb/citusdb-2.0.0-1.x86_64.rpm 
  

      

PostgreSQL9.2.1将安裝在/opt/citusdb/2,0目錄,  資料目錄在/opt/citusdb/2.0/data

# 所有節點, 修改citusdb權限

[[email protected]-172-16-3-150 opt]# chown -R citusdb:citusdb /opt/citusdb
  
  
   其他節點操作同上
        

# 以下兩步略, 因為citusdb-2.0.0-1.x86_64.rpm安裝過程中自動初始化了資料庫, 如果沒有初始化則需要手工初始化一下. 所有節點, 建立資料目錄, 建議目錄一緻 所有節點, 初始化資料庫

# 主節點配置work list

[[email protected]-172-16-3-150 opt]# vi /opt/citusdb/2.0/data/pg_worker_list.conf
  
  
   
    db-172-16-3-33.sky-mobi.com 9900
   
   
    db-172-16-3-39.sky-mobi.com 9900
   
   
    db-172-16-3-40.sky-mobi.com 9900
   
  
  
   # 後面配置postgresql.conf中, worker節點資料庫監聽9900端口.
        

# 主節點配置postgresql.conf, pg_hba.conf

[[email protected]-172-16-3-150 opt]# vi /opt/citusdb/2.0/data/postgresql.conf
  
  
   
    listen_addresses = '0.0.0.0'
   
   
    port = 9900
   
   
    unix_socket_directory = '.'
   
   
    unix_socket_permissions = 0700
   
   
    tcp_keepalives_idle = 60
   
   
    tcp_keepalives_interval = 10
   
   
    tcp_keepalives_count = 10
   
   
    shared_buffers = 2048MB
   
   
    maintenance_work_mem = 1024MB
   
   
    max_stack_depth = 8MB
   
   
    synchronous_commit = off
   
   
    wal_buffers = 16384kB
   
   
    wal_writer_delay = 10ms
   
   
    checkpoint_segments = 32
   
   
    random_page_cost = 1.0
   
   
    effective_cache_size = 81920MB
   
   
    log_destination = 'csvlog'
   
   
    logging_collector = on
   
   
    log_directory = 'pg_log'
   
   
    log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
   
   
    log_file_mode = 0600
   
   
    log_truncate_on_rotation = on
   
   
    log_rotation_age = 1d
   
   
    log_rotation_size = 10MB
   
   
    log_checkpoints = on
   
   
    log_connections = on
   
   
    log_disconnections = on
   
   
    log_error_verbosity = verbose
   
   
    autovacuum = on
   
   
    log_autovacuum_min_duration = 0
   
  
  
   

  
  
   [[email protected]-172-16-3-150 data]# vi /opt/citusdb/2.0/data/pg_hba.conf
  
  
   
    host all all 127.0.0.1/32 trust
   
   
    host all all 172.16.3.150/32 trust
   
   
    host all all 172.16.3.33/32 trust
   
   
    host all all 172.16.3.39/32 trust
   
   
    host all all 172.16.3.40/32 trust
   
   
    host all all 0.0.0.0/0 md5
   
  
  
   # 允許127.0.0.1 trust認證, 主要用于hadoop-sync程序使用
        

# worker節點配置postgresql.conf, pg_hba.conf

[[email protected]-172-16-3-33 opt]# vi /opt/citusdb/2.0/data/postgresql.conf
    
    
     
      listen_addresses = '0.0.0.0'
     
     
      port = 9900
     
     
      unix_socket_directory = '.'
     
     
      unix_socket_permissions = 0700
     
     
      tcp_keepalives_idle = 60
     
     
      tcp_keepalives_interval = 10
     
     
      tcp_keepalives_count = 10
     
     
      shared_buffers = 2048MB
     
     
      maintenance_work_mem = 1024MB
     
     
      max_stack_depth = 8MB
     
     
      synchronous_commit = off
     
     
      wal_buffers = 16384kB
     
     
      wal_writer_delay = 10ms
     
     
      checkpoint_segments = 32
     
     
      random_page_cost = 1.0
     
     
      effective_cache_size = 81920MB
     
     
      log_destination = 'csvlog'
     
     
      logging_collector = on
     
     
      log_directory = 'pg_log'
     
     
      log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
     
     
      log_file_mode = 0600
     
     
      log_truncate_on_rotation = on
     
     
      log_rotation_age = 1d
     
     
      log_rotation_size = 10MB
     
     
      log_checkpoints = on
     
     
      log_connections = on
     
     
      log_disconnections = on
     
     
      log_error_verbosity = verbose
     
     
      autovacuum = on
     
     
      log_autovacuum_min_duration = 0
     
    
   
   
    

   
   
    
     [[email protected]-172-16-3-33 data]# vi /opt/citusdb/2.0/data/pg_hba.conf
    
    
     
      host all all 172.16.3.150/32 trust
     
     
      host all all 172.16.3.33/32 trust
     
     
      host all all 172.16.3.39/32 trust
     
     
      host all all 172.16.3.40/32 trust
     
     
      host all all 0.0.0.0/0 md5
     
     
      # 其他節點操作同上
     
    
         

# 所有節點配置citusdb使用者環境變量

[[email protected]-172-16-3-150 data]# vi /home/citusdb/.bash_profile
  
  
   
    export PS1="$USER@`/bin/hostname -s`-> "
   
   
    export PGPORT=9900
   
   
    export PGUSER=postgres
   
   
    export PGDATA=/opt/citusdb/2.0/data
   
   
    export PGHOST=$PGDATA
   
   
    export PGDATABASE=digoal
   
   
    export LANG=en_US.utf8
   
   
    export PGHOME=/opt/citusdb/2.0
   
   
    export LD_LIBRARY_PATH=$PGHOME/lib:$PGHOME/lib/postgresql:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
   
   
    export DATE=`date +"%Y%m%d%H%M"`
   
   
    export PATH=$PGHOME/bin:$PATH:.
   
   
    #export MANPATH=$PGHOME/share/man:$MANPATH
   
   
    export LD_PRELOAD=/usr/lib64/libncurses.so
   
   
    alias rm='rm -i'
   
   
    alias ll='ls -lh'
   
        

# 啟動citusdb叢集

[[email protected]-172-16-3-150 data]# su - citusdb
  
  
   
    [email protected]> pg_ctl start -D $PGDATA
   
   
    server starting
   
  
  
   
    [[email protected] data]# su - citusdb
   
   
    [email protected]> pg_ctl start -D $PGDATA
   
   
    server starting
   
  
  
   
    [[email protected] data]# su - citusdb
   
   
    [email protected]> pg_ctl start -D $PGDATA
   
   
    server starting
   
  
  
   
    [[email protected] data]# su - citusdb
   
   
    [email protected]> pg_ctl start -D $PGDATA
   
   
    server starting
   
        

# 所有節點在postgres資料庫下面安裝file_fdw (目前hadoop_sync不支援配置資料庫名以及schema) 細節參考 :  http://blog.163.com/[email protected]/blog/static/16387704020132192011747/

# 下載下傳
   
   
    [[email protected] soft_bak]# su - citusdb
   
   
    [email protected]> wget --no-check-certificate https://github.com/citusdata/file_fdw/archive/master.zip
   
   
    [email protected]> unzip master
   
   
    Archive:  master
   
   
    # 編譯安裝
   
   
    su - root
   
   
    [[email protected] data05]# . /home/citusdb/.bash_profile 
   
   
    [email protected]> which pg_config
   
   
    /opt/citusdb/2.0/bin/pg_config
   
   
    [email protected]> cd /home/citusdb/file_fdw-master/
   
   
    [email protected]> gmake USE_PGXS=1 clean
   
   
    [email protected]> gmake USE_PGXS=1
   
   
    [email protected]> gmake USE_PGXS=1 install
   
  
  
   其他節點操作同上
        

# 主節點 在postgres資料庫下面 建立外部表

[email protected]-172-16-3-150-> psql postgres postgres
   
   
    psql (9.2.1)
   
   
    Type "help" for help.
   
   
    
     postgres=# create extension file_fdw;
    
    
     CREATE EXTENSION
    
   
  
  
   
    postgres=# CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;
   
   
    CREATE SERVER
   
  
  
   
    postgres=# CREATE FOREIGN TABLE customer_reviews
   
   
    (
   
   
        customer_id TEXT not null,
   
   
        review_date DATE not null,
   
   
        review_rating INTEGER not null,
   
   
        review_votes INTEGER,
   
   
        review_helpful_votes INTEGER,
   
   
        product_id CHAR(10) not null,
   
   
        product_title TEXT not null,
   
   
        product_sales_rank BIGINT,
   
   
        product_group TEXT,
   
   
        product_category TEXT,
   
   
        product_subcategory TEXT,
   
   
        similar_product_ids CHAR(10)[]
   
   
    )
   
   
    DISTRIBUTE BY APPEND (review_date)
   
   
    SERVER file_server
   
   
    OPTIONS (filename '', hdfs_directory_path '/user/data/reviews', format 'csv');
   
   
    CREATE FOREIGN TABLE
   
  
  
   # 注意這個目錄/user/data/reviews是HDFS中的目錄.
        

# 主節點安裝hadoop_sync 細節參考 :  http://blog.163.com/[email protected]/blog/static/16387704020132189835346/

wget http://mirror.bjtu.edu.cn/apache/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
   
   
    tar -zxvf apache-maven-3.0.5-bin.tar.gz
   
   
    mv apache-maven-3.0.5 /opt/
   
   
    yum install java
   
   
    java -version
   
   
    which java
   
   
    yum install java-devel
   
   
    javac -version
   
   
    which javac
   
   
    vi ~/.bash_profile
   
   
    export M2_HOME=/opt/apache-maven-3.0.5
   
   
    export M2=$M2_HOME/bin
   
   
    export MAVEN_OPTS="-Xms256m -Xmx512m"
   
   
    export JAVA_HOME=/usr
   
   
    export PATH=$M2:$JAVA_HOME/bin:$PATH:.
   
   
    

   
   
    . ~/.bash_profile
   
   
    mvn --version
   
   
    cd /home/citusdb/
   
   
    wget --no-check-certificate https://github.com/citusdata/hadoop-sync/archive/master.zip -O hadoop-sync.zip
   
   
    unzip hadoop-sync.zip
   
   
    cd hadoop-sync-master
   
   
    mvn install
   
   
    [[email protected] hadoop-sync-master]# ll
   
   
    total 16
   
   
    -rw-r--r-- 1 root root 2533 Feb 16 04:56 pom.xml
   
   
    -rw-r--r-- 1 root root 3047 Feb 16 04:56 README.md
   
   
    drwxr-xr-x 3 root root 4096 Feb 16 04:56 src
   
   
    drwxr-xr-x 7 root root 4096 Mar 22 09:15 target
         

#  主節點 配置hadoop-sync, 一定要配置為hadoop的啟動使用者, 否則會有hadoop block get權限問題.

[[email protected]-172-16-3-150 hadoop-sync-master]# mkdir /opt/hadoop-sync
    
    
     [[email protected] hadoop-sync-master]# mv /home/citusdb/hadoop-sync-master/target /opt/hadoop-sync/
    
    
     [[email protected] hadoop-sync-master]# chown -R digoal:digoal /opt/hadoop-sync
    
   
   
    
     [[email protected] hadoop-sync-master]# su - digoal
    
    
     [email protected]> cd /opt/hadoop-sync/
    
   
   
    [email protected]> cp target/classes/sync.properties .
   
   
    [email protected]> vi /opt/hadoop-sync/sync.properties
   
   
    
     # HDFS related cluster configuration settings
    
    
     HdfsMasterNodeName = 127.0.0.1
    
    
     HdfsMasterNodePort = 9000
    
    
     HdfsWorkerNodePort = 50020
    
    
     # CitusDB related cluster configuration settings
    
    
     CitusMasterNodeName = 127.0.0.1
    
    
     CitusMasterNodePort = 9900
    
    
     CitusWorkerNodePort = 9900
    
         

# 主節點運作hadoop_sync同步namenode中的中繼資料.

[[email protected]-172-16-3-150 hadoop-sync-master]# su - digoal
  
  
   [email protected]> java -jar /opt/hadoop-sync/target/hadoop-sync-0.1.jar customer_reviews --fetch-min-max
        

# SQL查詢

[[email protected]-172-16-3-150 hdfs]# su - citusdb
    
    
     
      [email protected]> psql -h 127.0.0.1 -p 9900 -U digoal postgres
     
     
      psql (9.2.1)
     
     
      Type "help" for help.
     
     
      
       postgres=# select count(*) from customer_reviews ;
      
      
         count  
      
      
       ---------
      
      
        1762499
      
      
       (1 row)
      
      
       Time: 3101.759 ms
      
     
    
   
   
    
     postgres=# select * from customer_reviews limit 1;
    
    
       customer_id  | review_date | review_rating | review_votes | review_helpful_votes | product_id |      product_title      | product_
    
    
     sales_rank | product_group | product_category | product_subcategory |                   similar_product_ids                    
    
    
     ---------------+-------------+---------------+--------------+----------------------+------------+-------------------------+---------
    
    
     -----------+---------------+------------------+---------------------+----------------------------------------------------------
    
    
      AQV87I9Y4CIQF | 1998-09-06  |             5 |           56 |                   55 | 1561580368 | Building for a Lifetime |         
    
    
         215662 | Book          | Home & Garden    | Home Design         | {1589230612,1881955656,0140258094,1931498113,0070171513}
    
    
     (1 row)
    
    
     Time: 604.151 ms
    
   
   
    
     postgres=# select * from pg_dist_partition ;
    
    
      logicalrelid | partmethod |                                                          partkey                                       
    
    
                        
    
    
     --------------+------------+--------------------------------------------------------------------------------------------------------
    
    
     -------------------
    
    
             16403 | a          | {VAR :varno 1 :varattno 2 :vartype 1082 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattn
    
    
     o 2 :location 436}
    
    
     (1 row)
    
    
     Time: 0.386 ms
    
    
     postgres=# select * from pg_dist_shard;
    
    
      logicalrelid |       shardid        | shardstorage | shardalias | shardminvalue | shardmaxvalue 
    
    
     --------------+----------------------+--------------+------------+---------------+---------------
    
    
             16403 | -1221512698612183530 | f            |            | 1999-01-01    | 1999-05-13
    
    
             16403 |  2698407172708592541 | f            |            | 1999-05-13    | 1999-09-11
    
    
             16403 |   631600631725577683 | f            |            | 1998-09-06    | 1998-12-31
    
    
             16403 | -2048097437225231483 | f            |            | 1999-09-11    | 1999-12-31
    
    
             16403 |  3598423008504059948 | f            |            | 1970-12-30    | 1998-09-06
    
    
     (5 rows)
    
    
     Time: 0.502 ms
    
    
     postgres=# select * from pg_dist_shard_placement ;
    
    
            shardid        | shardstate | shardlength |          nodename           | nodeport 
    
    
     ----------------------+------------+-------------+-----------------------------+----------
    
    
      -2048097437225231483 |          1 |    64029428 | db-172-16-3-33.sky-mobi.com |     9900
    
    
      -2048097437225231483 |          1 |    64029428 | db-172-16-3-39.sky-mobi.com |     9900
    
    
      -2048097437225231483 |          1 |    64029428 | db-172-16-3-40.sky-mobi.com |     9900
    
    
      -1221512698612183530 |          1 |    67108864 | db-172-16-3-33.sky-mobi.com |     9900
    
    
      -1221512698612183530 |          1 |    67108864 | db-172-16-3-39.sky-mobi.com |     9900
    
    
      -1221512698612183530 |          1 |    67108864 | db-172-16-3-40.sky-mobi.com |     9900
    
    
        631600631725577683 |          1 |    34190254 | db-172-16-3-33.sky-mobi.com |     9900
    
    
        631600631725577683 |          1 |    34190254 | db-172-16-3-39.sky-mobi.com |     9900
    
    
        631600631725577683 |          1 |    34190254 | db-172-16-3-40.sky-mobi.com |     9900
    
    
       2698407172708592541 |          1 |    67108864 | db-172-16-3-33.sky-mobi.com |     9900
    
    
       2698407172708592541 |          1 |    67108864 | db-172-16-3-39.sky-mobi.com |     9900
    
    
       2698407172708592541 |          1 |    67108864 | db-172-16-3-40.sky-mobi.com |     9900
    
    
       3598423008504059948 |          1 |    67108864 | db-172-16-3-33.sky-mobi.com |     9900
    
    
       3598423008504059948 |          1 |    67108864 | db-172-16-3-39.sky-mobi.com |     9900
    
    
       3598423008504059948 |          1 |    67108864 | db-172-16-3-40.sky-mobi.com |     9900
    
    
     (15 rows)
    
    
     Time: 0.541 ms
    
   

      
postgres=# SELECT
   
   
        customer_id, review_date, review_rating, product_id, product_title
   
   
    FROM
   
   
        customer_reviews
   
   
    WHERE
   
   
        customer_id ='A27T7HVDXA3K2A' AND
   
   
        product_title LIKE '%Dune%' AND
   
   
        review_date >= '1998-01-01' AND
   
   
        review_date <= '1998-12-31';
   
   
      customer_id   | review_date | review_rating | product_id |                 product_title                 
   
   
    ----------------+-------------+---------------+------------+-----------------------------------------------
   
   
     A27T7HVDXA3K2A | 1998-04-10  |             5 | 0399128964 | Dune (Dune Chronicles (Econo-Clad Hardcover))
   
   
     A27T7HVDXA3K2A | 1998-04-10  |             5 | 044100590X | Dune
   
   
     A27T7HVDXA3K2A | 1998-04-10  |             5 | 0441172717 | Dune (Dune Chronicles, Book 1)
   
   
     A27T7HVDXA3K2A | 1998-04-10  |             5 | 0881036366 | Dune (Dune Chronicles (Econo-Clad Hardcover))
   
   
     A27T7HVDXA3K2A | 1998-04-10  |             5 | 1559949570 | Dune Audio Collection
   
   
    (5 rows)
   
   
    Time: 3102.624 ms
   
  
  
   
    postgres=# SELECT
   
   
        width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
   
   
        round(avg(review_rating), 2) AS review_average,
   
   
        count(*)
   
   
    FROM
   
   
       customer_reviews
   
   
    WHERE
   
   
        product_group = 'Book'
   
   
    GROUP BY
   
   
        title_length_bucket
   
   
    ORDER BY
   
   
        title_length_bucket;
   
   
     title_length_bucket | review_average | count  
   
   
    ---------------------+----------------+--------
   
   
                       1 |           4.26 | 139034
   
   
                       2 |           4.24 | 411317
   
   
                       3 |           4.34 | 245670
   
   
                       4 |           4.32 | 167361
   
   
                       5 |           4.30 | 118421
   
   
                       6 |           4.40 | 116411
   
   
    (6 rows)
   
   
    Time: 3403.387 ms
   
  

      

[注意] 1. namenode的事物日志目錄建議不要使用軟連結. 否則namenode format 的時候可能會有問題. 2.  目前hadoop_sync不支援配置資料庫角色,資料庫名以及schema 因為CitusWorkerNode.java中連結worker節點的URL被固定了.

        /* connection string format used in connecting to worker node */
  
  
           private static final String CONNECTION_STRING_FORMAT =
  
  
                   "jdbc:postgresql://%s:%d/postgres";
  

      

連接配接資料庫的角色無法指定 : 

[email protected]-172-16-3-150-> java -jar /opt/hadoop-sync/target/hadoop-sync-0.1.jar customer_reviews --fetch-min-max
   
   
    2013-03-22 09:35:29,626 [main] ERROR hdfs.HdfsSynchronizer - could not synchronize table metadata
   
   
    org.postgresql.util.PSQLException: FATAL: role "citusdb" does not exist
   
   
            at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:469)
   
   
            at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:112)
   
   
            at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
   
   
            at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)
   
   
            at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
   
   
            at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)
   
   
            at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:30)
   
   
            at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
   
   
            at org.postgresql.Driver.makeConnection(Driver.java:393)
   
   
            at org.postgresql.Driver.connect(Driver.java:267)
   
   
            at java.sql.DriverManager.getConnection(DriverManager.java:620)
   
   
            at java.sql.DriverManager.getConnection(DriverManager.java:222)
   
   
            at com.citusdata.sync.hdfs.CitusMasterNode.<init>(CitusMasterNode.java:81)
   
   
            at com.citusdata.sync.hdfs.HdfsSynchronizer.calculateMetadataDifference(HdfsSynchronizer.java:152)
   
   
            at com.citusdata.sync.hdfs.HdfsSynchronizer.main(HdfsSynchronizer.java:69)
   
  
  
   解決辦法, master和worker節點建立調用hadoop-sync腳本的作業系統角色.(本例為digoal使用者)
  
  
   
    [[email protected]-172-16-3-150 hadoop]# su - citusdb
   
   
    [email protected]> psql postgres postgres
   
   
    psql (9.2.1)
   
   
    Type "help" for help.
   
   
    postgres=# create role digoal superuser login encrypted password 'DigoAL';
   
   
    CREATE ROLE
   
  
  
   # 其他worker節點操作同上
        

3. hadoop 的getBlockLocalPathInfo隻能使用啟用Hadoop的使用者調用. 如果使用其他使用者, 如citusdb則會報錯如下 : 

2013-03-22 10:45:00,859 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:citusdb cause:org.apache.hadoop.security.AccessControlException: Can't continue with getBlockLocalPathInfo() authorization. The user citusdb is not allowed to call getBlockLocalPathInfo
  
  
   解決辦法, 使用hadoop使用者執行hadoop-sync腳本.
  
  
   [[email protected] opt]# chown -R digoal:digoal hadoop-sync
  
  
   [[email protected] opt]# su - digoal
  
  
   
    [email protected]> java -jar /opt/hadoop-sync/target/hadoop-sync-0.1.jar customer_reviews --fetch-min-max
   
   
    2013-03-22 10:54:18,453 [main] INFO  hdfs.HdfsSynchronizer - synchronized metadata for table: customer_reviews
   
        

4. 如果/etc/hosts中一個IP對應了多個主機名, 可能會造成問題, 例如 :  如果主機名配置

cat /etc/hosts
  
  
   
    172.16.3.150 db-172-16-3-150.sky-mobi.com db-172-16-3-150
   
   
    172.16.3.33 db-172-16-3-33.sky-mobi.com db-172-16-3-33
   
   
    172.16.3.39 db-172-16-3-39.sky-mobi.com db-172-16-3-39
   
   
    172.16.3.40 db-172-16-3-40.sky-mobi.com db-172-16-3-40
   
  

      

而 worker_list 配置了短名稱

[[email protected]-172-16-3-150 opt]# vi /opt/citusdb/2.0/data/pg_worker_list.conf
  
  
   db-172-16-3-33 9900
  
  
   db-172-16-3-39 9900
  
  
   db-172-16-3-40 9900
  

      

查詢時可能報如下錯誤

postgres=# select * from customer_reviews limit 1;
  
  
   ERROR:  failed to assign 5 task(s) to worker nodes
  

      

原因是nodename和pg_worker_list.conf中的配置不一緻.

postgres=# select * from pg_dist_shard_placement ;
  
  
          shardid        | shardstate | shardlength |          nodename           | nodeport 
  
  
   ----------------------+------------+-------------+-----------------------------+----------
  
  
    -2048097437225231483 |          1 |    64029428 | db-172-16-3-33.sky-mobi.com |     9900
  
  
    -2048097437225231483 |          1 |    64029428 | db-172-16-3-39.sky-mobi.com |     9900
  
  
    -2048097437225231483 |          1 |    64029428 | db-172-16-3-40.sky-mobi.com |     9900
  
  
    -1221512698612183530 |          1 |    67108864 | db-172-16-3-33.sky-mobi.com |     9900
  
  
    -1221512698612183530 |          1 |    67108864 | db-172-16-3-39.sky-mobi.com |     9900
  
  
    -1221512698612183530 |          1 |    67108864 | db-172-16-3-40.sky-mobi.com |     9900
  
  
      631600631725577683 |          1 |    34190254 | db-172-16-3-33.sky-mobi.com |     9900
  
  
      631600631725577683 |          1 |    34190254 | db-172-16-3-39.sky-mobi.com |     9900
  
  
      631600631725577683 |          1 |    34190254 | db-172-16-3-40.sky-mobi.com |     9900
  
  
     2698407172708592541 |          1 |    67108864 | db-172-16-3-33.sky-mobi.com |     9900
  
  
     2698407172708592541 |          1 |    67108864 | db-172-16-3-39.sky-mobi.com |     9900
  
  
     2698407172708592541 |          1 |    67108864 | db-172-16-3-40.sky-mobi.com |     9900
  
  
     3598423008504059948 |          1 |    67108864 | db-172-16-3-33.sky-mobi.com |     9900
  
  
     3598423008504059948 |          1 |    67108864 | db-172-16-3-39.sky-mobi.com |     9900
  
  
     3598423008504059948 |          1 |    67108864 | db-172-16-3-40.sky-mobi.com |     9900
  
  
   (15 rows)
  

      

解決辦法是修改 修改/etc/hosts,  同時修改 pg_worker_list.conf,  以及 hadoop的配置中設計主機名的部分. 

[參考] 1.  http://www.citusdata.com/docs/sql-on-hadoop 2.  http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html 3.  http://blog.163.com/[email protected]/blog/static/1638770402013219840831/ 4.  http://blog.163.com/[email protected]/blog/static/16387704020132192011747/

5.  http://blog.163.com/[email protected]/blog/static/16387704020132189835346/