天天看點

大資料平台運維之Flume

大資料系列之運維(自主搭建的大資料平台)

(8)Flume運維

  1. 在 master 節點安裝啟動 Flume 元件,打開 Linux Shell 運作 flume-ng 的幫助指令,檢視 Flume-ng 的用法資訊。
[[email protected] ~]# flume-ng help
Usage: /usr/flume/apache-flume-1.9.0-bin/bin/flume-ng <command> [options]...

commands:
  help                      display this help text
  agent                     run a Flume agent
  avro-client               run an avro Flume client
  version                   show Flume version info

global options:
  --conf,-c <conf>          use configs in <conf> directory
  --classpath,-C <cp>       append to the classpath
  --dryrun,-d               do not actually start Flume, just print the command
  --plugins-path <dirs>     colon-separated list of plugins.d directories. See the
                            plugins.d section in the user guide for more details.
                            Default: $FLUME_HOME/plugins.d
  -Dproperty=value          sets a Java system property value
  -Xproperty=value          sets a Java -X option

agent options:
  --name,-n <name>          the name of this agent (required)
  --conf-file,-f <file>     specify a config file (required if -z missing)
  --zkConnString,-z <str>   specify the ZooKeeper connection to use (required if -f missing)
  --zkBasePath,-p <path>    specify the base path in ZooKeeper for agent configs
  --no-reload-conf          do not reload config file if changed
  --help,-h                 display help text

avro-client options:
  --rpcProps,-P <file>   RPC client properties file with server connection params
  --host,-H <host>       hostname to which events will be sent
  --port,-p <port>       port of the avro source
  --dirname <dir>        directory to stream to avro source
  --filename,-F <file>   text file to stream to avro source (default: std input)
  --headerFile,-R <file> File containing event headers as key/value pairs on each new line
  --help,-h              display help text

  Either --rpcProps or both --host and --port must be specified.

Note that if <conf> directory is specified, then it is always included first
in the classpath.

           
  1. 根據提供的模闆 log-example.conf 檔案,使用 Flume NG 工具收集 master節點的系統日志/var/log/secure,将收集的日志資訊檔案的名稱以“xiandian-sec”為字首,存放于 HDFS 檔案系統的/1daoyun/file/flume 目錄中,并且定義在 HDFS 中産生的檔案的時間戳為 10 分鐘。進行收集後,查詢HDFS 檔案系統中/1daoyun/file/flume 的清單資訊。
[[email protected] Flume]# vi log-example.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path =hdfs://master:9000/1daoyun/file/flume
a1.sinks.k1.hdfs.filePrefix = xiandian-sec
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

           
a1.sinks.k1.hdfs.path =hdfs://master:9000/1daoyun/file/flume中的HDFS目錄要提前建立。
[[email protected] Flume]# flume-ng agent --conf . -f /root/tiku/Flume/log-example.conf -n a1 -Dflume.root.logger=INFO,console
           
大資料平台運維之Flume

開啟flume服務後,會停留在此頁面

[[email protected] ~]# hadoop fs -ls /1daoyun/file/flume
-rw-r--r--   2 root supergroup       1237 2020-04-02 15:15 /1daoyun/file/flume/xiandian-sec.1585811701465

           
  1. 根據提供的模闆 hdfs-example.conf 檔案,使用 Flume NG 工具設定 master節點的系統路徑/opt/xiandian/為實時上傳檔案至 HDFS 檔案系統的實時路徑,設定 HDFS 檔案系統的存儲路徑為/data/flume/,上傳後的檔案名保持不變,檔案類型為 DataStream,然後啟動 flume-ng agent。
[[email protected] Flume]# vi hdfs-example.conf
# Name the components on this agent
master.sources = webmagic
master.sinks = k1
master.channels = c1
# Describe/configure the source
master.sources.webmagic.type = spooldir
master.sources.webmagic.fileHeader = true
master.sources.webmagic.fileHeaderKey = fileName
master.sources.webmagic.fileSuffix = .COMPLETED
master.sources.webmagic.deletePolicy = never
master.sources.webmagic.spoolDir = /opt/xiandian/
master.sources.webmagic.ignorePattern = ^$
master.sources.webmagic.consumeOrder = oldest
master.sources.webmagic.deserializer =org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
master.sources.webmagic.batchsize = 5
master.sources.webmagic.channels = c1
# Use a channel which buffers events in memory
master.channels.c1.type = memory
# Describe the sink
master.sinks.k1.type = hdfs
master.sinks.k1.channel = c1
master.sinks.k1.hdfs.path =hdfs://master:9000/data/flume/%{dicName}
master.sinks.k1.hdfs.filePrefix = %{fileName}
master.sinks.k1.hdfs.fileType = DataStream

           
master.sources.webmagic.spoolDir = /opt/xiandian/
master.sinks.k1.hdfs.path =hdfs://master:9000/data/flume/%{dicName}
在自己的節點上檢視是否有這兩個路徑。沒有則建立,前一個是file系統路徑,後一個是HDFS系統路徑。

           
[[email protected] Flume]# flume-ng agent --conf . -f /root/tiku/Flume/hdfs-example.conf -n master -Dflume.root.logger=INFO,console
           

這裡的 -n 後面接的是master.sources = webmagic語句的最前面的元素。

大資料平台運維之Flume

做個小測試:

[[email protected] ~]# cd /opt/xiandian/
[[email protected] xiandian]# vi abc
aaa
bbb
ccc

           
[[email protected] xiandian]# hadoop fs -put /opt/xiandian/abc.COMPLETED /data/flume
           

我們再到服務端檢視一下:

大資料平台運維之Flume

看一下之前服務開啟的時間和做測試之後伺服器的時間。有變化。

在此感謝先電雲提供的題庫。

感謝Apache開源技術服務支援

感謝抛物線、mn525520、菜鳥一枚2019三位部落客的相關部落格。

繼續閱讀