Flume1.5.0的安裝、部署、簡單應用(含僞分布式、與hadoop2.2.0、hbase0.96的案例) … …

—————————————

博文作者迦壹

目錄

一、什麼是flume?

1)flume的特點

2)flume的可靠性

3)flume的可恢複性

4)flume 的一些核心概念

二、flume的官方網站在哪裡

三、在哪裡下載下傳

四、如何安裝

五、flume的案例

1)案例1avro

2)案例2spool

3)案例3exec

4)案例4syslogtcp

5)案例5jsonhandler

6)案例6hadoop sink

7)案例7file roll sink

8)案例8replicating channel selector

9)案例9multiplexing channel selector

10)案例10flume sink processors

11)案例11load balancing sink processor

12)案例12hbase sink

flume 作為 cloudera 開發的實時日志收集系統受到了業界的認可與廣泛應用。flume 初始的發行版本目前被統稱為 flume ogoriginal generation屬于 cloudera。但随着 flume 功能的擴充flume og 代碼工程臃腫、核心元件設計不合理、核心配置不标準等缺點暴露出來尤其是在 flume og 的最後一個發行版本 0.94.0 中日志傳輸不穩定的現象尤為嚴重為了解決這些問題2011 年 10 月 22 号cloudera 完成了 flume-728對 flume 進行了裡程碑式的改動重構核心元件、核心配置以及代碼架構重構後的版本統稱為 flume ngnext generation改動的另一原因是将 flume 納入 apache 旗下cloudera flume 改名為 apache flume。

flume的特點

flume是一個分布式、可靠、和高可用的海量日志采集、聚合和傳輸的系統。支援在日志系統中定制各類資料發送方用于收集資料;同時flume提供對資料進行簡單處理并寫到各種資料接受方(比如文本、hdfs、hbase等)的能力。

flume的資料流由事件(event)貫穿始終。事件是flume的基本資料機關它攜帶日志資料(位元組數組形式)并且攜帶有頭資訊這些event由agent外部的source生成當source捕獲事件後會進行特定的格式化然後source會把事件推入(單個或多個)channel中。你可以把channel看作是一個緩沖區它将儲存事件直到sink處理完該事件。sink負責持久化日志或者把事件推向另一個source。

flume的可靠性

當節點出現故障時日志能夠被傳送到其他節點上而不會丢失。flume提供了三種級别的可靠性保障從強到弱依次分别為end-to-end收到資料agent首先将event寫到磁盤上當資料傳送成功後再删除如果資料發送失敗可以重新發送。store on failure這也是scribe采用的政策當資料接收方crash時将資料寫到本地待恢複後繼續發送besteffort資料發送到接收方後不會進行确認。

flume的可恢複性

還是靠channel。推薦使用filechannel事件持久化在本地檔案系統裡(性能較差)。

flume的一些核心概念

agent 使用jvm 運作flume。每台機器運作一個agent但是可以在一個agent中包含多個sources和sinks。

client 生産資料運作在一個獨立的線程。

source 從client收集資料傳遞給channel。

sink 從channel收集資料運作在一個獨立線程。

channel 連接配接 sources 和 sinks 這個有點像一個隊列。

events 可以是日志記錄、 avro 對象等。

flume以agent為最小的獨立運作機關。一個agent就是一個jvm。單agent由source、sink和channel三大元件構成

值得注意的是flume提供了大量内置的source、channel和sink類型。不同類型的source,channel和sink可以自由組合。組合方式基于使用者設定的配置檔案非常靈活。比如channel可以把事件暫存在記憶體裡也可以持久化到本地硬碟上。sink可以把日志寫入hdfs, hbase甚至是另外一個source等等。flume支援使用者建立多級流也就是說多個agent可以協同工作并且支援fan-in、fan-out、contextual routing、backup routes這也正是nb之處。如下圖所示:

<a href="http://flume.apache.org/">http://flume.apache.org/</a>

<a href="http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz">http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz</a>

1)将下載下傳的flume包解壓到/home/hadoop目錄中你就已經完成了50%簡單吧

2)修改 flume-env.sh 配置檔案,主要是java_home變量設定

root@m1:/home/hadoop/flume-1.5.0-bin# cp conf/flume-env.sh.template conf/flume-env.sh

root@m1:/home/hadoop/flume-1.5.0-bin# vi conf/flume-env.sh

# licensed to the apache software foundation (asf) under one

# or more contributor license agreements. see the notice file

# distributed with this work for additional information

# regarding copyright ownership. the asf licenses this file

# to you under the apache license, version 2.0 (the

# “license”); you may not use this file except in compliance

# with the license. you may obtain a copy of the license at

# http://www.apache.org/licenses/license-2.0

# unless required by applicable law or agreed to in writing, software

# distributed under the license is distributed on an “as is” basis,

# without warranties or conditions of any kind, either express or implied.

# see the license for the specific language governing permissions and

# limitations under the license.

# if this file is placed at flume_conf_dir/flume-env.sh, it will be sourced

# during flume startup.

# enviroment variables can be set here.

java_home=/usr/lib/jvm/java-7-oracle

# give flume more memory and pre-allocate, enable remote monitoring via jmx

#java_opts=”-xms100m -xmx200m -dcom.sun.management.jmxremote”

# note that the flume conf directory is always included in the classpath.

#flume_classpath=””

3)驗證是否安裝成功

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng version

flume 1.5.0

source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

revision: 8633220df808c4cd0c13d1cf0320454a94f1ea97

compiled by hshreedharan on wed may 7 14:49:18 pdt 2014

fromsourcewith checksum a01fe726e4380ba0c9f7a7d222db961f

root@m1:/home/hadoop#

出現上面的資訊表示安裝成功了

avro可以發送一個給定的檔案給flumeavro 源使用avro rpc機制。

a)建立agent配置檔案

root@m1:/home/hadoop#vi /home/hadoop/flume-1.5.0-bin/conf/avro.conf

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# describe/configure the source

a1.sources.r1.type= avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

# describe the sink

a1.sinks.k1.type= logger

# use a channel which buffers events in memory

a1.channels.c1.type= memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactioncapacity = 100

# bind the source and sink to the channel

a1.sinks.k1.channel = c1

b)啟動flume agent a1

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/avro.conf -n a1 -dflume.root.logger=info,console

c)建立指定檔案

root@m1:/home/hadoop# echo “hello world” > /home/hadoop/flume-1.5.0-bin/log.00

d)使用avro-client發送檔案

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng avro-client -c . -h m1 -p 4141 -f /home/hadoop/flume-1.5.0-bin/log.00

f)在m1的控制台可以看到以下資訊注意最後一行

root@m1:/home/hadoop/flume-1.5.0-bin/conf# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/avro.conf -n a1 -dflume.root.logger=info,console

info: sourcing environment configuration script/home/hadoop/flume-1.5.0-bin/conf/flume-env.sh

info: including hadoop libraries found via (/home/hadoop/hadoop-2.2.0/bin/hadoop)forhdfs access

info: excluding/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath

info: excluding/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath

…

2014-08-10 10:43:25,112 (new i/o worker#1) [info - org.apache.avro.ipc.nettyserver$nettyserveravrohandler.handleupstream(nettyserver.java:171)] [id: 0x92464c4f, /192.168.1.50:59850 :> /192.168.1.50:4141] unbound

2014-08-10 10:43:25,112 (new i/o worker#1) [info - org.apache.avro.ipc.nettyserver$nettyserveravrohandler.channelclosed(nettyserver.java:209)] connection to /192.168.1.50:59850 disconnected.

2014-08-10 10:43:26,718 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 68 65 6c 6c 6f 20 77 6f 72 6c 64 hello world }

spool監測配置的目錄下新增的檔案并将檔案中的資料讀取出來。需要注意兩點

1) 拷貝到spool目錄下的檔案不可以再打開編輯。

2) spool目錄下不可包含相應的子目錄

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/spool.conf

a1.sources.r1.type= spooldir

a1.sources.r1.spooldir =/home/hadoop/flume-1.5.0-bin/logs

a1.sources.r1.fileheader =true

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/spool.conf -n a1 -dflume.root.logger=info,console

c)追加檔案到/home/hadoop/flume-1.5.0-bin/logs目錄

root@m1:/home/hadoop# echo “spool test1″ > /home/hadoop/flume-1.5.0-bin/logs/spool_text.log

d)在m1的控制台可以看到以下相關資訊

14/08/1011:37:13 infosource.spooldirectorysource: spooling directory source runner hasshutdown.

14/08/1011:37:14 info avro.reliablespoolingfileeventreader: preparing to movefile/home/hadoop/flume-1.5.0-bin/logs/spool_text.log to/home/hadoop/flume-1.5.0-bin/logs/spool_text.log.completed

14/08/1011:37:14 infosource.spooldirectorysource: spooling directory source runner hasshutdown.

14/08/1011:37:14 info sink.loggersink: event: { headers:{file=/home/hadoop/flume-1.5.0-bin/logs/spool_text.log} body: 73 70 6f 6f 6c 20 74 65 73 74 31 spool test1 }

14/08/1011:37:15 infosource.spooldirectorysource: spooling directory source runner hasshutdown.

14/08/1011:37:16 infosource.spooldirectorysource: spooling directory source runner hasshutdown.

14/08/1011:37:17 infosource.spooldirectorysource: spooling directory source runner hasshutdown.

exec執行一個給定的指令獲得輸出的源,如果要使用tail指令必選使得file足夠大才能看到輸出内容

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/exec_tail.conf

a1.sources.r1.type=exec

a1.sources.r1.command=tail-f/home/hadoop/flume-1.5.0-bin/log_exec_tail

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/exec_tail.conf -n a1 -dflume.root.logger=info,console

c)生成足夠多的内容在檔案裡

root@m1:/home/hadoop# for i in {1..100};do echo “exec tail$i” >> /home/hadoop/flume-1.5.0-bin/log_exec_tail;echo $i;sleep 0.1;done

e)在m1的控制台可以看到以下資訊

2014-08-10 10:59:25,513 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 20 74 65 73 74 exectailtest}

2014-08-10 10:59:34,535 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 20 74 65 73 74 exectailtest}

2014-08-10 11:01:40,557 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 31 exectail1 }

2014-08-10 11:01:41,180 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 32 exectail2 }

2014-08-10 11:01:41,181 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 34 exectail4 }

….

2014-08-10 11:01:51,550 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 39 36 exectail96 }

2014-08-10 11:01:51,551 (sinkrunner-pollingrunner-defaultsinkprocessor) [info - org.apache.flume.sink.loggersink.process(loggersink.java:70)] event: { headers:{} body: 65 78 65 63 20 74 61 69 6c 39 38 exectail98 }

syslogtcp監聽tcp的端口做為資料源

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf

a1.sources.r1.type= syslogtcp

a1.sources.r1.port = 5140

a1.sources.r1.host = localhost

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf -n a1 -dflume.root.logger=info,console

c)測試産生syslog

root@m1:/home/hadoop# echo “hello idoall.org syslog” | nc localhost 5140

d)在m1的控制台可以看到以下資訊

14/08/1011:41:45 info node.pollingpropertiesfileconfigurationprovider: reloading configurationfile:/home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf

14/08/1011:41:45 info conf.flumeconfiguration: added sinks: k1 agent: a1

14/08/1011:41:45 info conf.flumeconfiguration: processing:k1

14/08/1011:41:45 info conf.flumeconfiguration: post-validation flume configuration contains configurationforagents: [a1]

14/08/1011:41:45 info node.abstractconfigurationprovider: creating channels

14/08/1011:41:45 info channel.defaultchannelfactory: creating instance of channel c1typememory

14/08/1011:41:45 info node.abstractconfigurationprovider: created channel c1

14/08/1011:41:45 infosource.defaultsourcefactory: creating instance ofsourcer1,typesyslogtcp

14/08/1011:41:45 info sink.defaultsinkfactory: creating instance of sink: k1,type: logger

14/08/1011:41:45 info node.abstractconfigurationprovider: channel c1 connected to [r1, k1]

14/08/1011:41:45 info node.application: starting new configuration:{ sourcerunners:{r1=eventdrivensourcerunner: {source:org.apache.flume.source.syslogtcpsource{name:r1,state:idle} }} sinkrunners:{k1=sinkrunner: { policy:org.apache.flume.sink.defaultsinkprocessor@6538b14 countergroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.memorychannel{name: c1}} }

14/08/1011:41:45 info node.application: starting channel c1

14/08/1011:41:45 info instrumentation.monitoredcountergroup: monitored counter groupfortype: channel, name: c1: successfully registered new mbean.

14/08/1011:41:45 info instrumentation.monitoredcountergroup: componenttype: channel, name: c1 started

14/08/1011:41:45 info node.application: starting sink k1

14/08/1011:41:45 info node.application: starting source r1

14/08/1011:41:45 infosource.syslogtcpsource: syslog tcp source starting…

14/08/1011:42:15 warnsource.syslogutils: event created from invalid syslog data.

14/08/1011:42:15 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 68 65 6c 6c 6f 20 69 64 6f 61 6c 6c 2e 6f 72 67 hello idoall.org }

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/post_json.conf

a1.sources.r1.type= org.apache.flume.source.http.httpsource

a1.sources.r1.port = 8888

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/post_json.conf -n a1 -dflume.root.logger=info,console

c)生成json 格式的post request

root@m1:/home/hadoop# curl -x post -d ‘[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]‘ http://localhost:8888

14/08/1011:49:59 info node.application: starting channel c1

14/08/1011:49:59 info instrumentation.monitoredcountergroup: monitored counter groupfortype: channel, name: c1: successfully registered new mbean.

14/08/1011:49:59 info instrumentation.monitoredcountergroup: componenttype: channel, name: c1 started

14/08/1011:49:59 info node.application: starting sink k1

14/08/1011:49:59 info node.application: starting source r1

14/08/1011:49:59 info mortbay.log: logging to org.slf4j.impl.log4jloggeradapter(org.mortbay.log) via org.mortbay.log.slf4jlog

14/08/1011:49:59 info mortbay.log: jetty-6.1.26

14/08/1011:50:00 info mortbay.log: started [email protected]:8888

14/08/1011:50:00 info instrumentation.monitoredcountergroup: monitored counter groupfortype: source, name: r1: successfully registered new mbean.

14/08/1011:50:00 info instrumentation.monitoredcountergroup: componenttype: source, name: r1 started

14/08/1012:14:32 info sink.loggersink: event: { headers:{b=b1, a=a1} body: 69 64 6f 61 6c 6c 2e 6f 72 67 5f 62 6f 64 79 idoall.org_body }

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/hdfs_sink.conf

a1.sinks.k1.type= hdfs

a1.sinks.k1.hdfs.path = hdfs://m1:9000/user/flume/syslogtcp

a1.sinks.k1.hdfs.fileprefix = syslog

a1.sinks.k1.hdfs.round =true

a1.sinks.k1.hdfs.roundvalue = 10

a1.sinks.k1.hdfs.roundunit = minute

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/hdfs_sink.conf -n a1 -dflume.root.logger=info,console

root@m1:/home/hadoop# echo “hello idoall flume -> hadoop testing one” | nc localhost 5140

14/08/1012:20:39 info instrumentation.monitoredcountergroup: monitored counter groupfortype: channel, name: c1: successfully registered new mbean.

14/08/1012:20:39 info instrumentation.monitoredcountergroup: componenttype: channel, name: c1 started

14/08/1012:20:39 info node.application: starting sink k1

14/08/1012:20:39 info node.application: starting source r1

14/08/1012:20:39 info instrumentation.monitoredcountergroup: monitored counter groupfortype: sink, name: k1: successfully registered new mbean.

14/08/1012:20:39 info instrumentation.monitoredcountergroup: componenttype: sink, name: k1 started

14/08/1012:20:39 infosource.syslogtcpsource: syslog tcp source starting…

14/08/1012:21:46 warnsource.syslogutils: event created from invalid syslog data.

14/08/1012:21:49 info hdfs.hdfssequencefile: writeformat = writable, userawlocalfilesystem =false

14/08/1012:21:49 info hdfs.bucketwriter: creating hdfs://m1:9000/user/flume/syslogtcp//syslog.1407644509504.tmp

14/08/1012:22:20 info hdfs.bucketwriter: closing hdfs://m1:9000/user/flume/syslogtcp//syslog.1407644509504.tmp

14/08/1012:22:20 info hdfs.bucketwriter: close tries incremented

14/08/1012:22:20 info hdfs.bucketwriter: renaming hdfs://m1:9000/user/flume/syslogtcp/syslog.1407644509504.tmp to hdfs://m1:9000/user/flume/syslogtcp/syslog.1407644509504

14/08/1012:22:20 info hdfs.hdfseventsink: writer callback called.

e)在m1上再打開一個視窗去hadoop上檢查檔案是否生成

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hadoop fs -ls /user/flume/syslogtcp

found 1 items

-rw-r–r– 3 root supergroup 155 2014-08-10 12:22/user/flume/syslogtcp/syslog.1407644509504

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hadoop fs -cat /user/flume/syslogtcp/syslog.1407644509504

seq!org.apache.hadoop.io.longwritable”org.apache.hadoop.io.byteswritable^;>gv$hello idoall flume -> hadoop testing one

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/file_roll.conf

a1.sources.r1.port = 5555

a1.sinks.k1.type= file_roll

a1.sinks.k1.sink.directory =/home/hadoop/flume-1.5.0-bin/logs

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/file_roll.conf -n a1 -dflume.root.logger=info,console

c)測試産生log

root@m1:/home/hadoop# echo “hello idoall.org syslog” | nc localhost 5555

root@m1:/home/hadoop# echo “hello idoall.org syslog 2″ | nc localhost 5555

d)檢視/home/hadoop/flume-1.5.0-bin/logs下是否生成檔案,預設每30秒生成一個新檔案

root@m1:/home/hadoop# ll /home/hadoop/flume-1.5.0-bin/logs

總用量 272

drwxr-xr-x 3 root root 4096 aug 10 12:50 ./

drwxr-xr-x 9 root root 4096 aug 10 10:59 ../

-rw-r–r– 1 root root 50 aug 10 12:49 1407646164782-1

-rw-r–r– 1 root root 0 aug 10 12:49 1407646164782-2

-rw-r–r– 1 root root 0 aug 10 12:50 1407646164782-3

root@m1:/home/hadoop# cat /home/hadoop/flume-1.5.0-bin/logs/1407646164782-1 /home/hadoop/flume-1.5.0-bin/logs/1407646164782-2

hello idoall.org syslog

hello idoall.org syslog 2

flume支援fan out流從一個源到多個通道。有兩種模式的fan out分别是複制和複用。在複制的情況下流的事件被發送到所有的配置通道。在複用的情況下事件被發送到可用的管道中的一個子集。fan out流需要指定源和fan out通道的規則。

這次我們需要用到m1,m2兩台機器

a)在m1建立replicating_channel_selector配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf

a1.sinks = k1 k2

a1.channels = c1 c2

a1.sources.r1.channels = c1 c2

a1.sources.r1.selector.type= replicating

a1.sinks.k1.type= avro

a1.sinks.k1.hostname= m1

a1.sinks.k1.port = 5555

a1.sinks.k2.type= avro

a1.sinks.k2.channel = c2

a1.sinks.k2.hostname= m2

a1.sinks.k2.port = 5555

a1.channels.c2.type= memory

a1.channels.c2.capacity = 1000

a1.channels.c2.transactioncapacity = 100

b)在m1建立replicating_channel_selector_avro配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf

c)在m1上将2個配置檔案複制到m2上一份

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf

d)打開4個視窗在m1和m2上同時啟動兩個flume agent

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector_avro.conf -n a1 -dflume.root.logger=info,console

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/replicating_channel_selector.conf -n a1 -dflume.root.logger=info,console

e)然後在m1或m2的任意一台機器上測試産生syslog

f)在m1和m2的sink視窗分别可以看到以下資訊,這說明資訊得到了同步

14/08/1014:08:18 info ipc.nettyserver: connection to/192.168.1.51:46844 disconnected.

14/08/1014:08:52 info ipc.nettyserver: [id: 0x90f8fe1f,/192.168.1.50:35873 =>/192.168.1.50:5555] open

14/08/1014:08:52 info ipc.nettyserver: [id: 0x90f8fe1f,/192.168.1.50:35873 =>/192.168.1.50:5555] bound:/192.168.1.50:5555

14/08/1014:08:52 info ipc.nettyserver: [id: 0x90f8fe1f,/192.168.1.50:35873 =>/192.168.1.50:5555] connected:/192.168.1.50:35873

14/08/1014:08:59 info ipc.nettyserver: [id: 0xd6318635,/192.168.1.51:46858 =>/192.168.1.50:5555] open

14/08/1014:08:59 info ipc.nettyserver: [id: 0xd6318635,/192.168.1.51:46858 =>/192.168.1.50:5555] bound:/192.168.1.50:5555

14/08/1014:08:59 info ipc.nettyserver: [id: 0xd6318635,/192.168.1.51:46858 =>/192.168.1.50:5555] connected:/192.168.1.51:46858

14/08/1014:09:20 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 68 65 6c 6c 6f 20 69 64 6f 61 6c 6c 2e 6f 72 67 hello idoall.org }

a)在m1建立multiplexing_channel_selector配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf

a1.sources.r1.selector.type= multiplexing

a1.sources.r1.selector.header =type

#映射允許每個值通道可以重疊。預設值可以包含任意數量的通道。

a1.sources.r1.selector.mapping.baidu = c1

a1.sources.r1.selector.mapping.ali = c2

a1.sources.r1.selector.default = c1

b)在m1建立multiplexing_channel_selector_avro配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf

c)将2個配置檔案複制到m2上一份

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector_avro.conf -n a1 -dflume.root.logger=info,console

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/multiplexing_channel_selector.conf -n a1 -dflume.root.logger=info,console

root@m1:/home/hadoop# curl -x post -d ‘[{ "headers" :{"type" : "baidu"},"body" : "idoall_test1"}]‘ http://localhost:5140 && curl -x post -d ‘[{ "headers" :{"type" : "ali"},"body" : "idoall_test2"}]‘ http://localhost:5140 && curl -x post -d ‘[{ "headers" :{"type" : "qq"},"body" : "idoall_test3"}]‘ http://localhost:5140

f)在m1的sink視窗可以看到以下資訊

14/08/1014:32:21 info node.application: starting sink k1

14/08/1014:32:21 info node.application: starting source r1

14/08/1014:32:21 infosource.avrosource: starting avrosourcer1: { bindaddress: 0.0.0.0, port: 5555 }…

14/08/1014:32:21 info instrumentation.monitoredcountergroup: monitored counter groupfortype: source, name: r1: successfully registered new mbean.

14/08/1014:32:21 info instrumentation.monitoredcountergroup: componenttype: source, name: r1 started

14/08/1014:32:21 infosource.avrosource: avrosourcer1 started.

14/08/1014:32:36 info ipc.nettyserver: [id: 0xcf00eea6,/192.168.1.50:35916 =>/192.168.1.50:5555] open

14/08/1014:32:36 info ipc.nettyserver: [id: 0xcf00eea6,/192.168.1.50:35916 =>/192.168.1.50:5555] bound:/192.168.1.50:5555

14/08/1014:32:36 info ipc.nettyserver: [id: 0xcf00eea6,/192.168.1.50:35916 =>/192.168.1.50:5555] connected:/192.168.1.50:35916

14/08/1014:32:44 info ipc.nettyserver: [id: 0x432f5468,/192.168.1.51:46945 =>/192.168.1.50:5555] open

14/08/1014:32:44 info ipc.nettyserver: [id: 0x432f5468,/192.168.1.51:46945 =>/192.168.1.50:5555] bound:/192.168.1.50:5555

14/08/1014:32:44 info ipc.nettyserver: [id: 0x432f5468,/192.168.1.51:46945 =>/192.168.1.50:5555] connected:/192.168.1.51:46945

14/08/1014:34:11 info sink.loggersink: event: { headers:{type=baidu} body: 69 64 6f 61 6c 6c 5f 54 45 53 54 31 idoall_test1 }

14/08/1014:34:57 info sink.loggersink: event: { headers:{type=qq} body: 69 64 6f 61 6c 6c 5f 54 45 53 54 33 idoall_test3 }

g)在m2的sink視窗可以看到以下資訊

14/08/1014:32:27 info node.application: starting sink k1

14/08/1014:32:27 info node.application: starting source r1

14/08/1014:32:27 infosource.avrosource: starting avrosourcer1: { bindaddress: 0.0.0.0, port: 5555 }…

14/08/1014:32:27 info instrumentation.monitoredcountergroup: monitored counter groupfortype: source, name: r1: successfully registered new mbean.

14/08/1014:32:27 info instrumentation.monitoredcountergroup: componenttype: source, name: r1 started

14/08/1014:32:27 infosource.avrosource: avrosourcer1 started.

14/08/1014:32:36 info ipc.nettyserver: [id: 0x7c2f0aec,/192.168.1.50:38104 =>/192.168.1.51:5555] open

14/08/1014:32:36 info ipc.nettyserver: [id: 0x7c2f0aec,/192.168.1.50:38104 =>/192.168.1.51:5555] bound:/192.168.1.51:5555

14/08/1014:32:36 info ipc.nettyserver: [id: 0x7c2f0aec,/192.168.1.50:38104 =>/192.168.1.51:5555] connected:/192.168.1.50:38104

14/08/1014:32:44 info ipc.nettyserver: [id: 0x3d36f553,/192.168.1.51:48599 =>/192.168.1.51:5555] open

14/08/1014:32:44 info ipc.nettyserver: [id: 0x3d36f553,/192.168.1.51:48599 =>/192.168.1.51:5555] bound:/192.168.1.51:5555

14/08/1014:32:44 info ipc.nettyserver: [id: 0x3d36f553,/192.168.1.51:48599 =>/192.168.1.51:5555] connected:/192.168.1.51:48599

14/08/1014:34:33 info sink.loggersink: event: { headers:{type=ali} body: 69 64 6f 61 6c 6c 5f 54 45 53 54 32 idoall_test2 }

可以看到根據header中不同的條件分布到不同的channel上

failover的機器是一直發送給其中一個sink當這個sink不可用的時候自動發送到下一個sink。

a)在m1建立flume_sink_processors配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf

#這個是配置failover的關鍵需要有一個sink group

a1.sinkgroups = g1

a1.sinkgroups.g1.sinks = k1 k2

#處理的類型是failover

a1.sinkgroups.g1.processor.type= failover

#優先級數字越大優先級越高每個sink的優先級必須不相同

a1.sinkgroups.g1.processor.priority.k1 = 5

a1.sinkgroups.g1.processor.priority.k2 = 10

#設定為10秒當然可以根據你的實際狀況更改成更快或者很慢

a1.sinkgroups.g1.processor.maxpenalty = 10000

b)在m1建立flume_sink_processors_avro配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors_avro.conf -n a1 -dflume.root.logger=info,console

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/flume_sink_processors.conf -n a1 -dflume.root.logger=info,console

e)然後在m1或m2的任意一台機器上測試産生log

root@m1:/home/hadoop# echo “idoall.org test1 failover” | nc localhost 5140

f)因為m2的優先級高是以在m2的sink視窗可以看到以下資訊而m1沒有

14/08/1015:02:46 info ipc.nettyserver: connection to/192.168.1.51:48692 disconnected.

14/08/1015:03:12 info ipc.nettyserver: [id: 0x09a14036,/192.168.1.51:48704 =>/192.168.1.51:5555] open

14/08/1015:03:12 info ipc.nettyserver: [id: 0x09a14036,/192.168.1.51:48704 =>/192.168.1.51:5555] bound:/192.168.1.51:5555

14/08/1015:03:12 info ipc.nettyserver: [id: 0x09a14036,/192.168.1.51:48704 =>/192.168.1.51:5555] connected:/192.168.1.51:48704

14/08/1015:03:26 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 31 idoall.org test1 }

g)這時我們停止掉m2機器上的sink(ctrl+c)再次輸出測試資料

root@m1:/home/hadoop# echo “idoall.org test2 failover” | nc localhost 5140

h)可以在m1的sink視窗看到讀取到了剛才發送的兩條測試資料

14/08/1015:02:46 info ipc.nettyserver: connection to/192.168.1.51:47036 disconnected.

14/08/1015:03:12 info ipc.nettyserver: [id: 0xbcf79851,/192.168.1.51:47048 =>/192.168.1.50:5555] open

14/08/1015:03:12 info ipc.nettyserver: [id: 0xbcf79851,/192.168.1.51:47048 =>/192.168.1.50:5555] bound:/192.168.1.50:5555

14/08/1015:03:12 info ipc.nettyserver: [id: 0xbcf79851,/192.168.1.51:47048 =>/192.168.1.50:5555] connected:/192.168.1.51:47048

14/08/1015:07:56 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 31 idoall.org test1 }

14/08/1015:07:56 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 32 idoall.org test2 }

i)我們再在m2的sink視窗中啟動sink

j)輸入兩批測試資料

root@m1:/home/hadoop# echo “idoall.org test3 failover” | nc localhost 5140 && echo “idoall.org test4 failover” | nc localhost 5140

k)在m2的sink視窗我們可以看到以下資訊因為優先級的關系log消息會再次落到m2上

14/08/1015:09:47 info node.application: starting sink k1

14/08/1015:09:47 info node.application: starting source r1

14/08/1015:09:47 infosource.avrosource: starting avrosourcer1: { bindaddress: 0.0.0.0, port: 5555 }…

14/08/1015:09:47 info instrumentation.monitoredcountergroup: monitored counter groupfortype: source, name: r1: successfully registered new mbean.

14/08/1015:09:47 info instrumentation.monitoredcountergroup: componenttype: source, name: r1 started

14/08/1015:09:47 infosource.avrosource: avrosourcer1 started.

14/08/1015:09:54 info ipc.nettyserver: [id: 0x96615732,/192.168.1.51:48741 =>/192.168.1.51:5555] open

14/08/1015:09:54 info ipc.nettyserver: [id: 0x96615732,/192.168.1.51:48741 =>/192.168.1.51:5555] bound:/192.168.1.51:5555

14/08/1015:09:54 info ipc.nettyserver: [id: 0x96615732,/192.168.1.51:48741 =>/192.168.1.51:5555] connected:/192.168.1.51:48741

14/08/1015:09:57 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 32 idoall.org test2 }

14/08/1015:10:43 info ipc.nettyserver: [id: 0x12621f9a,/192.168.1.50:38166 =>/192.168.1.51:5555] open

14/08/1015:10:43 info ipc.nettyserver: [id: 0x12621f9a,/192.168.1.50:38166 =>/192.168.1.51:5555] bound:/192.168.1.51:5555

14/08/1015:10:43 info ipc.nettyserver: [id: 0x12621f9a,/192.168.1.50:38166 =>/192.168.1.51:5555] connected:/192.168.1.50:38166

14/08/1015:10:43 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 33 idoall.org test3 }

14/08/1015:10:43 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 34 idoall.org test4 }

load balance type和failover不同的地方是load balance有兩個配置一個是輪詢一個是随機。兩種情況下如果被選擇的sink不可用就會自動嘗試發送到下一個可用的sink上面。

a)在m1建立load_balancing_sink_processors配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors.conf

#這個是配置load balancing的關鍵需要有一個sink group

a1.sinkgroups.g1.processor.type= load_balance

a1.sinkgroups.g1.processor.backoff =true

a1.sinkgroups.g1.processor.selector = round_robin

a1.sinks.k2.channel = c1

b)在m1建立load_balancing_sink_processors_avro配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors.conf

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors_avro.conf -n a1 -dflume.root.logger=info,console

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/load_balancing_sink_processors.conf -n a1 -dflume.root.logger=info,console

e)然後在m1或m2的任意一台機器上測試産生log一行一行輸入輸入太快容易落到一台機器上

root@m1:/home/hadoop# echo “idoall.org test1″ | nc localhost 5140

root@m1:/home/hadoop# echo “idoall.org test2″ | nc localhost 5140

root@m1:/home/hadoop# echo “idoall.org test3″ | nc localhost 5140

root@m1:/home/hadoop# echo “idoall.org test4″ | nc localhost 5140

14/08/1015:35:29 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 32 idoall.org test2 }

14/08/1015:35:33 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 34 idoall.org test4 }

14/08/1015:35:27 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 31 idoall.org test1 }

14/08/1015:35:29 info sink.loggersink: event: { headers:{severity=0, flume.syslog.status=invalid, facility=0} body: 69 64 6f 61 6c 6c 2e 6f 72 67 20 74 65 73 74 33 idoall.org test3 }

說明輪詢模式起到了作用。

b)然後将以下檔案複制到flume中

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar/home/hadoop/flume-1.5.0-bin/lib

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar/home/hadoop/flume-1.5.0-bin/lib

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar/home/hadoop/flume-1.5.0-bin/lib

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar/home/hadoop/flume-1.5.0-bin/lib

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar/home/hadoop/flume-1.5.0-bin/lib

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar/home/hadoop/flume-1.5.0-bin/lib

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar/home/hadoop/flume-1.5.0-bin/lib@@@

cp/home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar/home/hadoop/flume-1.5.0-bin/lib

d)在m1建立hbase_simple配置檔案

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/hbase_simple.conf

a1.sinks.k1.type= hbase

a1.sinks.k1.table = test_idoall_org

a1.sinks.k1.columnfamily = name

a1.sinks.k1.column = idoall

a1.sinks.k1.serializer = org.apache.flume.sink.hbase.regexhbaseeventserializer

a1.sinks.k1.channel = memorychannel

e)啟動flume agent

/home/hadoop/flume-1.5.0-bin/bin/flume-ngagent -c . -f/home/hadoop/flume-1.5.0-bin/conf/hbase_simple.conf -n a1 -dflume.root.logger=info,console

f)測試産生syslog

root@m1:/home/hadoop# echo “hello idoall.org from flume” | nc localhost 5140

g)這時登入到hbase中可以發現新資料已經插入

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell

2014-08-10 16:09:48,984 info [main] configuration.deprecation: hadoop.native.lib is deprecated. instead, use io.native.lib.available

hbase shell; enter’help<return>’forlist of supported commands.

type”exit<return>”to leave the hbase shell

version 0.96.2-hadoop2, r1581096, mon mar 24 16:03:18 pdt 2014

hbase(main):001:0> list

table

slf4j: class path contains multiple slf4j bindings.

slf4j: found bindingin[jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/staticloggerbinder.class]

slf4j: found bindingin[jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/staticloggerbinder.class]

slf4j: see http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

hbase2hive_idoall

hive2hbase_idoall

test_idoall_org

3 row(s)in2.6880 seconds

=> ["hbase2hive_idoall","hive2hbase_idoall","test_idoall_org"]

hbase(main):002:0> scan”test_idoall_org”

row column+cell

10086 column=name:idoall, timestamp=1406424831473, value=idoallvalue

1 row(s)in0.0550 seconds

hbase(main):003:0> scan”test_idoall_org”

1407658495588-xbqcozrkk8-0 column=name:payload, timestamp=1407658498203, value=hello idoall.org from flume

2 row(s)in0.0200 seconds

hbase(main):004:0> quit

經過這麼多flume的例子測試如果你全部做完後會發現flume的功能真的很強大可以進行各種搭配來完成你想要的工作俗話說師傅領進門修行在個人如何能夠結合你的産品業務将flume更好的應用起來快去動手實踐吧。

這篇文章做為一個筆記希望能夠對剛入門的同學起到幫助作用。

Flume1.5.0的安裝、部署、簡單應用(含僞分布式、與hadoop2.2.0、hbase0.96的案例) … …

繼續閱讀

Apache2.4.x 配置檔案詳解Apache配置需要了解如下：開始講解：

配置apache支援PHP（win7）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method