天天看點

hadoop學習筆記-第四天-PIG環境搭建安裝配置pig 0.12.0測試PIG工作

安裝配置pig 0.12.0

1、下載下傳pig 0.12.0

2、直接解壓,配置環境變量 export JAVA_HOME=/usr/java/jdk1.7.0_45

export HADOOP_HOME=/home/hdpuser/hadoop-2.2.0

export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:/home/hdpuser/pig-0.12.0/bin:/home/hdpuser/apache-ant-1.9.2/bin

export PIG_HADOOP_VERSION=23 3、pig -x local 4、做一個簡單load檔案的操作,發現有如下報錯: WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 5、查閱各種資料後,做如下動作(有些動作 可能是無效的)     a、參考RELEASE_NOTES.txt,執行如下步驟

1. Download pig-0.12.0.tar.gz
2. Unpack the file: tar -xzvf pig-0.12.0.tar.gz
3. Move into the installation directory: cd pig-0.12.0
4. To run pig without Hadoop cluster, execute the command below. This will
take you into an interactive shell called grunt that allows you to navigate
the local file system and execute Pig commands against the local files
    bin/pig -x local
5. To run on your Hadoop cluster, you need to set PIG_CLASSPATH environment
variable to point to the directory with your hadoop-site.xml file and then run
pig. The commands below will take you into an interactive shell called grunt
that allows you to navigate Hadoop DFS and execute Pig commands against it
export PIG_CLASSPATH=/hadoop/conf
    bin/pig
6. To build your own version of pig.jar run
    ant
7. To run unit tests run
    ant test
8. To build jar file with available user defined functions run commands below.
    cd contrib/piggybank/java
    ant
9. To build the tutorial:
    cd tutorial
    ant
10. To run tutorial follow instructions in http://wiki.apache.org/pig/PigTutorial
           

      b、随後發現問題依然存在,在網上看到如下說法:       ant clean jar-withouthadoop -Dhadoopversion=23,依然很糾結,我的環境是hadoop 2.2.0,不知道這個數值應該如何設定,看到過0.18的設定為18,0.20的設定為20,抱着試試的态度,執行了一下。問題解決。        這裡,不知道的是,之前的步驟a是否一定要執行。 6、将/etc/passwd檔案copy到目前目錄,執行pig -x local(本地模式) 7、參考官網的内容執行( http://pig.apache.org/docs/r0.12.0/start.html):

grunt> A = load 'passwd' using PigStorage(':'); 
grunt> B = foreach A generate $0 as id; 
grunt> dump B; 
           

      正确傳回。

測試PIG工作

随後測試hadoop模式下是否可以正常工作。這裡将叢集環境切回僞叢集模式。在全叢集模式下,還是有報錯,尚未解決,初步判斷和叢集配置有關。 1、将PIG教程帶的檔案COPY至HDFS   hadoop fs -copyFromLocal  /home/hdpuser/pig-0.12.0/tutorial/data/excite-small.log /user/hdpuser/excite-small.log 2、pig -x mapreduce,進入pig hadoop模式 3、 grunt> cd /user 4、 grunt>cd /hdpuser 5、ls檢視copy的檔案是否存在 6、 grunt> log = LOAD '/usr/hdpusr/excite-small.log' AS (user:chararray, time:long, query:chararray); 7、grunt> lmt = LIMIT log 4; 8、grunt> DUMP lmt; 9、結果正确傳回

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias Feature  Outputs
job_local1555930682_0002        1       1       0       0       0       0       0       0       0       0       lmt,log
job_local1844986210_0003        1       1       0       0       0       0       0       0       0       0       log             hdfs://localhost:9000/tmp/temp1201738014/tmp-313826598,


Input(s):
Successfully read 0 records from: "/user/hdpuser/excite-small.log"


Output(s):
Successfully stored 0 records in: "hdfs://localhost:9000/tmp/temp1201738014/tmp-313826598"


Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0


Job DAG:
job_local1555930682_0002        ->      job_local1844986210_0003,
job_local1844986210_0003




2013-12-08 04:45:37,849 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-12-08 04:45:37,852 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-12-08 04:45:37,852 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2013-12-08 04:45:37,852 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-12-08 04:45:37,860 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-08 04:45:37,860 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2A9EABFB35F5B954,970916105432,+md foods +proteins)
(BED75271605EBD0C,970916001949,yahoo chat)
(BED75271605EBD0C,970916001954,yahoo chat)
(BED75271605EBD0C,970916003523,yahoo chat)
           

繼續閱讀