大数据系列之运维(自主搭建的大数据平台)
(8)Flume运维
- 在 master 节点安装启动 Flume 组件,打开 Linux Shell 运行 flume-ng 的帮助命令,查看 Flume-ng 的用法信息。
[[email protected] ~]# flume-ng help
Usage: /usr/flume/apache-flume-1.9.0-bin/bin/flume-ng <command> [options]...
commands:
help display this help text
agent run a Flume agent
avro-client run an avro Flume client
version show Flume version info
global options:
--conf,-c <conf> use configs in <conf> directory
--classpath,-C <cp> append to the classpath
--dryrun,-d do not actually start Flume, just print the command
--plugins-path <dirs> colon-separated list of plugins.d directories. See the
plugins.d section in the user guide for more details.
Default: $FLUME_HOME/plugins.d
-Dproperty=value sets a Java system property value
-Xproperty=value sets a Java -X option
agent options:
--name,-n <name> the name of this agent (required)
--conf-file,-f <file> specify a config file (required if -z missing)
--zkConnString,-z <str> specify the ZooKeeper connection to use (required if -f missing)
--zkBasePath,-p <path> specify the base path in ZooKeeper for agent configs
--no-reload-conf do not reload config file if changed
--help,-h display help text
avro-client options:
--rpcProps,-P <file> RPC client properties file with server connection params
--host,-H <host> hostname to which events will be sent
--port,-p <port> port of the avro source
--dirname <dir> directory to stream to avro source
--filename,-F <file> text file to stream to avro source (default: std input)
--headerFile,-R <file> File containing event headers as key/value pairs on each new line
--help,-h display help text
Either --rpcProps or both --host and --port must be specified.
Note that if <conf> directory is specified, then it is always included first
in the classpath.
- 根据提供的模板 log-example.conf 文件,使用 Flume NG 工具收集 master节点的系统日志/var/log/secure,将收集的日志信息文件的名称以“xiandian-sec”为前缀,存放于 HDFS 文件系统的/1daoyun/file/flume 目录中,并且定义在 HDFS 中产生的文件的时间戳为 10 分钟。进行收集后,查询HDFS 文件系统中/1daoyun/file/flume 的列表信息。
[[email protected] Flume]# vi log-example.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path =hdfs://master:9000/1daoyun/file/flume
a1.sinks.k1.hdfs.filePrefix = xiandian-sec
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.path =hdfs://master:9000/1daoyun/file/flume中的HDFS目录要提前创建。
[[email protected] Flume]# flume-ng agent --conf . -f /root/tiku/Flume/log-example.conf -n a1 -Dflume.root.logger=INFO,console
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsIyZuBnLzMjN4AjN1UTM0ATNwAjMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
开启flume服务后,会停留在此页面
[[email protected] ~]# hadoop fs -ls /1daoyun/file/flume
-rw-r--r-- 2 root supergroup 1237 2020-04-02 15:15 /1daoyun/file/flume/xiandian-sec.1585811701465
- 根据提供的模板 hdfs-example.conf 文件,使用 Flume NG 工具设置 master节点的系统路径/opt/xiandian/为实时上传文件至 HDFS 文件系统的实时路径,设置 HDFS 文件系统的存储路径为/data/flume/,上传后的文件名保持不变,文件类型为 DataStream,然后启动 flume-ng agent。
[[email protected] Flume]# vi hdfs-example.conf
# Name the components on this agent
master.sources = webmagic
master.sinks = k1
master.channels = c1
# Describe/configure the source
master.sources.webmagic.type = spooldir
master.sources.webmagic.fileHeader = true
master.sources.webmagic.fileHeaderKey = fileName
master.sources.webmagic.fileSuffix = .COMPLETED
master.sources.webmagic.deletePolicy = never
master.sources.webmagic.spoolDir = /opt/xiandian/
master.sources.webmagic.ignorePattern = ^$
master.sources.webmagic.consumeOrder = oldest
master.sources.webmagic.deserializer =org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
master.sources.webmagic.batchsize = 5
master.sources.webmagic.channels = c1
# Use a channel which buffers events in memory
master.channels.c1.type = memory
# Describe the sink
master.sinks.k1.type = hdfs
master.sinks.k1.channel = c1
master.sinks.k1.hdfs.path =hdfs://master:9000/data/flume/%{dicName}
master.sinks.k1.hdfs.filePrefix = %{fileName}
master.sinks.k1.hdfs.fileType = DataStream
master.sources.webmagic.spoolDir = /opt/xiandian/
master.sinks.k1.hdfs.path =hdfs://master:9000/data/flume/%{dicName}
在自己的节点上查看是否有这两个路径。没有则创建,前一个是file系统路径,后一个是HDFS系统路径。
[[email protected] Flume]# flume-ng agent --conf . -f /root/tiku/Flume/hdfs-example.conf -n master -Dflume.root.logger=INFO,console
这里的 -n 后面接的是master.sources = webmagic语句的最前面的元素。
做个小测试:
[[email protected] ~]# cd /opt/xiandian/
[[email protected] xiandian]# vi abc
aaa
bbb
ccc
[[email protected] xiandian]# hadoop fs -put /opt/xiandian/abc.COMPLETED /data/flume
我们再到服务端查看一下:
看一下之前服务开启的时间和做测试之后服务器的时间。有变化。
在此感谢先电云提供的题库。
感谢Apache开源技术服务支持
感谢抛物线、mn525520、菜鸟一枚2019三位博主的相关博客。