天天看點

關于配置flume-ng負載均衡

在實際項目中,為了減輕一次性向hdfs上寫資料,往往采用分層寫入的功能,以減少負載,負載的分層拓撲圖如下:

關于配置flume-ng負載均衡

選擇四台機器,一台作為agent1,負責收集資料進行多sink傳輸資料,三台作為分層負載。

agent1的配置:

a1.sources=r1
a1.sinks=k1 k2 k3
a1.channels=c1

#source
a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /usr/test/a.txt
#sink group 
a1.sinkgroups=g1
a1.sinkgroups.g1.sinks=k1 k2 k3
a1.sinkgroups.g1.processor.type=load_balance
a1.sinkgroups.g1.processor.backoff=true
#設定為輪詢排程
a1.sinkgroups.g1.processor.selector=round_robin 

#sink1
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=
a1.sinks.k1.port=

#sink2
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=
a1.sinks.k2.port=

#sink3
a1.sinks.k3.type=avro
a1.sinks.k3.hostname=
a1.sinks.k3.port=

#channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=
a1.channels.c1.transactionCapacity=

#bind

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c1
a1.sinks.k3.channel=c1
           

agent2-4的配置相同:

a1.sources=r1
a1.channels=c1
a1.sinks=k1

#source
a1.sources.r1.type=avro
a1.sources.r1.bind=
a1.sources.r1.port=

#channels
a1.channels.c1.type=memory
a1.channels.c1.capacity=
a1.channels.c1.transactionCapacity=
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
#sinks
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://ns1/flume/%y%m%d
a1.sinks.k1.hdfs.filePrefix=events-
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.rollCount=
a1.sinks.k1.hdfs.rollSize= 
a1.sinks.k1.hdfs.rollInterval=
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
           

配置完成後,經實驗發現在a.txt中寫資料,由于在設定了60s為一個檔案,是以檔案較多,截圖如下:

關于配置flume-ng負載均衡

注意要先起agent2-4,不然agent1會報端口找不到,切換到flume目錄,啟動指令:

/bin/flume-ng agent -n a1 -c conf/ -f conf/a1.conf -Dflume.root.logger=INFO,console

各個flume之間采用avro rpc通信,記得綁定1000以上的端口号。

繼續閱讀