天天看點

hadoop學習筆記--第七天--程式設計讀寫HDFS

    雖然并不打算使用JAVA深入的開發Mapreduce程式,但經過這幾日的了解,總覺得,如果不寫幾段MapReduce程式,也許真的不能很好了解HADOOP的一些思想。用PIG或Hive時,很多時候,察覺不到HDFS的存在。

    從網上找了一段讀寫HDFS的代碼,來自《HADOOP實戰》一書。因為之前沒有任何JAVA程式設計的基礎,是以今天的目标更多是搭建一個JAVA的開發環境,能夠将COPY的代碼編譯,并跑起來。

    在網上看到的資料,一直談到hadoop-eclipse-plugin,但很遺憾,在HADOOP2.2.0中實在是沒有找到,在CSDN倒是找到一個,可惜在我的環境中無法使用。直到在http://stackoverflow.com/questions/20021169/hadoop-2-2-0-eclipse-plugin這裡看到There is no eclipse plugin for 2.2.0 its still in development stage but you cna try intellj which is also nice dev tool for java。

    跟着連結到了http://vichargrave.com/intellij-project-for-building-hadoop-the-definitive-guide-examples/,下載下傳了IntelliJ IDEA 13。參考文中内容做如下設定。感覺就像在C裡編譯時include。

hadoop學習筆記--第七天--程式設計讀寫HDFS

     然後把如下代碼copy進去,研讀代碼了解功能,并添加注釋

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class PutMerge {

    public static void main(String[] args) throws IOException {

        Configuration conf = new Configuration();
        FileSystem hdfs  = FileSystem.get(conf);
        FileSystem local = FileSystem.getLocal(conf);//擷取一個本地檔案系統的FileSystem對象

        Path inputDir = new Path(args[0]);//參數1
        Path hdfsFile = new Path(args[1]); //參數2

        try {
            FileStatus[] inputFiles = local.listStatus(inputDir);//擷取目錄中的檔案清單
            FSDataOutputStream out = hdfs.create(hdfsFile);//将資料寫入HDFS檔案
            //循環讀取inputDir中的檔案,并寫入hdfs
            for (int i=0; i<inputFiles.length; i++) {
                System.out.println(inputFiles[i].getPath().getName());
                FSDataInputStream in = local.open(inputFiles[i].getPath());
                byte buffer[] = new byte[256];
                int bytesRead = 0;
                while( (bytesRead = in.read(buffer)) > 0) {
                    out.write(buffer, 0, bytesRead);
                }
                in.close();
            }
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
           

    編譯。産生對應的PutMerge.class檔案。此檔案放在//home/hdpuser/IdeaProjects/putmerg/out/production/putmerg中

   然後  jar cvf PutMerge.jar -C /home/hdpuser/IdeaProjects/putmerg/out/production/putmerg .

   産生PutMerge.jar包

   随後在/home/hdpuser中建立input目錄,并産生兩個檔案,如下

[[email protected] ~]$ ls input
testdt  tetdt2
[[email protected] ~]$ cat input/*
wudi 123 aaa
cg 321 bbb
mm 111 ccc
a       b       c
x       y       z
           

   測試程式運作

[[email protected] ~]$ hadoop jar PutMerge.jar PutMerge ~/input /output
testdt
tetdt2
           

  到hadoop中檢測生成的output檔案

[[email protected] ~]$ hadoop fs -cat /output
wudi 123 aaa
cg 321 bbb
mm 111 ccc
a       b       c
x       y       z
           

   随後,壞毛病作祟,還是想在指令行裡編譯java檔案。研讀相關資料後,做如下操作:

   1、檢視代碼中import的這些家夥,在那個JAR包裡。(實在沒有相關的JAVA知識,以通過文檔http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/package-summary.html判斷這些家夥在那個JAR包裡邊,隻能根據名字猜測)

       運氣不錯,直接想到了hadoop-common-2.2.0.jar,用jar  tvf檢視,确認判斷。

[[email protected] common]$ pwd
/home/hdpuser/hadoop-2.2.0/share/hadoop/common
[[email protected] common]$ ls
hadoop-common-2.2.0.jar  hadoop-common-2.2.0-tests.jar  hadoop-nfs-2.2.0.jar  jdiff  lib  sources  templates
[[email protected] common]$ jar tvf hadoop-common-2.2.0.jar | grep -w conf
     0 Fri Dec 06 21:35:02 CST 2013 org/apache/hadoop/conf/
   953 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$5.class
  2470 Fri Dec 06 21:35:00 CST 2013 org/apache/hadoop/conf/ReconfigurableBase.class
   531 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Reconfigurable.class
  1901 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/ReconfigurationException.class
  3014 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration.class
  2351 Fri Dec 06 21:35:02 CST 2013 org/apache/hadoop/conf/ReconfigurationUtil.class
   451 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$NegativeCacheSentinel.class
   719 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/conf/Configurable.class
   770 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$IntegerRanges$Range.class
  3857 Fri Dec 06 21:35:02 CST 2013 org/apache/hadoop/conf/ConfServlet.class
   675 Fri Dec 06 21:35:02 CST 2013 org/apache/hadoop/conf/ReconfigurationUtil$PropertyChange.class
 46708 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration.class
   980 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$Resource.class
  2032 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$DeprecatedKeyInfo.class
  8355 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/ReconfigurationServlet.class
  2126 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$IntegerRanges$RangeNumberIterator.class
   950 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$7.class
   951 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$6.class
   953 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$4.class
   958 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$1.class
   959 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$2.class
   236 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$1.class
   561 Fri Dec 06 21:35:02 CST 2013 org/apache/hadoop/conf/ConfServlet$BadFormatException.class
   959 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$ParsedTimeDuration$3.class
  1231 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/conf/Configured.class
  3611 Fri Dec 06 21:34:58 CST 2013 org/apache/hadoop/conf/Configuration$IntegerRanges.class
[[email protected] common]$ jar tvf hadoop-common-2.2.0.jar | grep -w FileSystem
  2347 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$Cache$Key.class
  1535 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$Cache$ClientFinalizer.class
   586 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$3.class
  6182 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$Cache.class
 46009 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem.class
  2764 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$5.class
  2850 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$Statistics.class
  1255 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$1.class
  2541 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$4.class
  1263 Fri Dec 06 21:34:56 CST 2013 org/apache/hadoop/fs/FileSystem$2.class
  1022 Fri Dec 06 21:34:56 CST 2013 META-INF/services/org.apache.hadoop.fs.FileSystem
           

     2、指令行執行編譯,産生了6個警告資訊,直接忽略。

[[email protected] src]$ javac -classpath /home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar PutMerge.java 
/home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/FSDataInputStream.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
/home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/FSDataOutputStream.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate'
/home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/FileSystem.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate'
/home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/FileSystem.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate'
/home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/FileSystem.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate'
/home/hdpuser/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate'
6 warnings
           

      3、産生了對應的class檔案,随後打包後,到hadoop執行,OK。

      小結:

      今天的内容,其實和HADOOP沒啥關系。就是JAVA的編譯。

      從來沒玩過JAVA,也從來沒打算接觸JAVA,還是喜歡C這樣相對”低級“一些的東西。hadoop也有用C++寫代碼的方式,可看起來總覺得不是那回事。下一步還是要寫一些MapReduce的程式,總感覺不這麼做,HADOOP這玩意就少了很多樂趣和體驗。

繼續閱讀