一、編譯 參考官網: http://eagle.apache.org/docs/quick-start.html
1.1 前置條件
1. Currently eagle is tested on **JDK-1.7.X**, currently (v0.4.0) not supporting JDK 1.8.
2. **NPM** should be installed (On MAC OS try "brew install node"), this is a prerequisite.
npm 必須要安裝,否則編譯時候會報錯:
[INFO] eagle-webservice ................................... FAILURE [03:03 min]
Failed toexecute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:exec (exec-ui-install) on project eagle-webservice: Command execution failed. Process exited with an error: 1 (Exit value: 1)
3. Eagle is built using [Apache Maven](https://maven.apache.org/).
1.2下載下傳源碼-編譯 http://www-us.apache.org/dist/incubator/eagle/apache-eagle-0.4.0-incubating/apache-eagle-0.4.0-incubating-src.tar.gz
$ tar -zxvf apache-eagle-0.4.0-incubating-src.tar.gz
$ cd apache-eagle-0.4.0-incubating-src
$ curl -O https://patch-diff.githubusercontent.com/raw/apache/incubator-eagle/pull/268.patch
$ git apply 268.patch
$ mvn clean package -DskipTests
二、安裝
2.0 環境依賴
- For streaming platform dependencies
- Storm: 0.9.3 or later
<pre style="font-family: Monaco, Consolas, Courier, 'Lucida Console', monospace; font-style: normal; color: rgb(0, 0, 0); font-size: 14px; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 21px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span style="line-height: 1.5; background-color: inherit;"> # 安裝且配置環境變量STORM_HOME</span>
JAVA_HOME=/data/jdk1.7.0_79 STORM_HOME=/data/storm PATH=$PATH:$JAVA_HOME/bin:$STORM_HOME/bin CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JAVA_HOME export STORM_HOME export CLASSPATH export PATH
# 配置storm,有三個配置eagle會用到
########### These MUST be filled in for a storm configuration storm.zookeeper.servers: - "172.17.32.99" # - "server2" # nimbus.host: "172.17.32.99" nimbus.thrift.port: 6627 storm.local.dir: "/var/storm" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 ui.port: 8099
# 啟動 storm
nohup bin/storm nimbus >> /dev/null & nohup bin/storm supervisor >> /dev/null & nohup bin/storm ui >> /dev/null &
-
- Kafka: 0.8.x or later #依賴zookeeper , 叢集|單點均可, 必須
- Java: 1.7.x
- NPM (On MAC OS try “brew install node”)
- For database dependencies (Choose one of them)
- HBase: 0.98 or later
- Hadoop5: 2.6.x is required
- Mysql (本次選擇mysql)
- Installation is required
- HBase: 0.98 or later
本次選擇mysql ,先建立eagle庫 create database eagle; grant all privileges on eagle.* to [email protected]'%' identified by 'eagle'; flush privileges;
2.1 解壓
$ tar -zxvf apache-eagle-0.4.0-incubating-bin.tar.gz
$ mv apache-eagle-0.4.0-incubating eagle
$ mv eagle /usr/
$ cd /usr/eagle
2.2 配置conf/eagle-service.conf
eagle {
service {
storage-type="jdbc"
storage-adapter="mysql"
storage-username="eagle"
storage-password=eagle
storage-database=eagle
storage-connection-url="jdbc:mysql://hadoop.slave1:3306/eagle"
storage-connection-props="encoding=UTF-8"
storage-driver-class="com.mysql.jdbc.Driver"
storage-connection-max=8
}
}
2.3 配置 bin/eagle-env.sh
# set EAGLE_HOME
export EAGLE_HOME=$(dirname $0)/..
# The java implementation to use. please use jdk 1.7 or later
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/share/jdk1.7.0_79
# nimbus.host, default is localhost
export EAGLE_NIMBUS_HOST=localhost
# EAGLE_SERVICE_HOST, default is `hostname -f`
export EAGLE_SERVICE_HOST=localhost
# EAGLE_SERVICE_PORT, default is 9099
export EAGLE_SERVICE_PORT=9099
# EAGLE_SERVICE_USER
export EAGLE_SERVICE_USER=admin
# EAGLE_SERVICE_PASSWORD
export EAGLE_SERVICE_PASSWD=secret
export EAGLE_CLASSPATH=$EAGLE_HOME/conf
# Add eagle shared library jars
for file in $EAGLE_HOME/lib/share/*;do
EAGLE_CLASSPATH=$EAGLE_CLASSPATH:$file
done
# Add eagle storm library jars
# Separate out of share directory because of asm version conflict
export EAGLE_STORM_CLASSPATH=$EAGLE_CLASSPATH
for file in $EAGLE_HOME/lib/storm/*;do
EAGLE_STORM_CLASSPATH=$EAGLE_STORM_CLASSPATH:$file
done
2.4 配置 conf/eagle-scheduler.conf ,主要是storm相關配置 (具體參數見本節環境依賴項描述)
### scheduler propertise
appCommandLoaderEnabled = false
appCommandLoaderIntervalSecs = 1
appHealthCheckIntervalSecs = 5
### execution platform properties
envContextConfig.env = "storm"
envContextConfig.url = "http://hadoop.slave1:8744" # storm 的ui 位址
envContextConfig.nimbusHost = "hadoop.slave1" #storm 主機,不要寫localhost
envContextConfig.nimbusThriftPort = 6627 # thrift 服務端口
envContextConfig.jarFile = "/usr/eagle/lib/topology/eagle-topology-0.4.0-incubating-assembly.jar" # 實際jar所在路徑
### default topology properties
eagleProps.mailHost = "mailHost.com"
eagleProps.mailSmtpPort = "25"
eagleProps.mailDebug = "true"
eagleProps.eagleService.host = "localhost"
eagleProps.eagleService.port = 9099
eagleProps.eagleService.username = "admin"
eagleProps.eagleService.password = "secret"
eagleProps.dataJoinPollIntervalSec = 30
dynamicConfigSource.enabled = true
dynamicConfigSource.initDelayMillis = 0
dynamicConfigSource.delayMillis = 30000
[[email protected] eagle]# bin/eagle-service.sh start
Starting eagle service ...
Eagle service started.
浏覽器打開(ip為eagle所在ip) http://192.168.222.136:9099/eagle-service
使用者名/密碼:admin/sceret 可以通路
三、HDFS日志輸出到卡夫卡 How to stream hdfs log data into Kafka
見官網:http://eagle.apache.org/docs/import-hdfs-auditLog.html,官網提供兩個方案,本次采用logstash-kafka方式
提前下載下傳logstash,解壓
3.1 Create a Kafka topic as the streaming input.
[[email protected] kafka-broker]# bin/kafka-topics.sh --create --zookeeper hadoop.master:2181,hadoop.slave1:2181,hadoop.slave2:2181 --replication-factor 1 --partitions 1 --topic sandbox_hdfs_audit_log
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "sandbox_hdfs_audit_log".
[[email protected] kafka-broker]# bin/kafka-topics.sh --list --zookeeper hadoop.master:2181,hadoop.slave1:2181,hadoop.slave2:2181
sandbox_hdfs_audit_log
3.2 Install Logstash-kafka plugin
For Logstash 1.5.x, logstash-kafka has been intergrated into logstash-input-kafka and logstash-output-kafka, and released with the 1.5 version of Logstash. So you can directly use it.
For Logstash 1.4.x, a user should install logstash-kafka firstly. Notice that this version does not support partition_key_format.
本次采用logstash 2.4 ,已含此插件
3.3 Create a Logstash configuration file under ${LOGSTASH_HOME}/conf
[[email protected] conf]# pwd
/root/logstash-2.4.0/conf
[[email protected] conf]# ls
hdfs-audit.conf
[[email protected] conf]# cat hdfs-audit.conf
input {
file {
type => "hdp-nn-audit"
path => "/var/log/audit/audit.log" # hdfs audit 日志路徑
start_position => end
sincedb_path => "/var/log/logstash"
}
}
filter{
if [type] == "hdp-nn-audit" {
grok {
match => ["message", "ugi=(?<user>([\w\d\-]+))@|ugi=(?<user>([\w\d\-]+))/[\w\d\-.][email protected]|ugi=(?<user>([\w\d.\-_]+))[\s(]+"]
}
}
}
output {
if [type] == "hdp-nn-audit" {
kafka {
codec => plain {
format => "%{message}"
}
bootstrap_servers => "192.168.222.136:9092" #kafka 位址
topic_id => "sandbox_hdfs_audit_log" #topic
timeout_ms => 10000
retries => 3
client_id => "hdp-nn-audit"
}
# stdout { codec => rubydebug }
}
}
3.3 啟動logstash
[[email protected] logstash-2.4.0]# bin/logstash -f conf/hdfs-audit.conf
Settings: Default pipeline workers: 1
Pipeline main started
3.4 問題
如果日志未輸出到kafka ,可能原因:
kafka配置檔案 config/server.properties ,發現修改以下配置
改為自己主機的ip,新版改為如下配置:
[[email protected] config]# cat server.properties
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = security_protocol://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://192.168.222.136:9092 <span style="color:#FF6666;"><strong> #改為主機ip</strong></span>
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
4.1 初始化 topology
初始化之前,先修改初始化參數
vi bin/eagle-topology-init.sh
#hdfs 位址
classification.fs.defaultFS=hdfs://hadoop.master:8020
classification.hbase.zookeeper.property.clientPort=2181\nclassification.hbase.zookeeper.quorum=localhost
#hive中繼資料庫
classification.accessType=metastoredb_jdbc\nclassification.password=hive\nclassification.user=hive\nclassification.jdbcDriverClassName=com.mysql.jdbc.Driver\nclassification.jdbcUrl=jdbc:mysql://hadoop.slave1/hive?createDatabaseIfNotExist=true
classification.accessType=oozie_api\nclassification.oozieUrl=http://hadoop.master:11000/oozie\nclassification.filter=status=RUNNING\nclassification.authType=SIMPLE
{
"envContextConfig" : {
"env" : "storm",
"mode" : "cluster",
"topologyName" : "sandbox-hdfsAuditLog-topology",
"stormConfigFile" : "security-auditlog-storm.yaml",
"parallelismConfig" : {
"kafkaMsgConsumer" : 1,
"hdfsAuditLogAlertExecutor*" : 1
}
},
"dataSourceConfig": {
"topic" : "sandbox_hdfs_audit_log", <strong><span style="color:#FF6666;"># 和logstash 建立的topic一緻</span></strong>
"zkConnection" : "hadoop.master:2181,hadoop.slave1:2181,hadoop.slave2:2181", # zk 位址
"brokerZkPath" : "/brokers",
"zkConnectionTimeoutMS" : 15000,
"fetchSize" : 1048586,
"deserializerClass" : "org.apache.eagle.security.auditlog.HdfsAuditLogKafkaDeserializer",
"transactionZKServers" : "hadoop.master,hadoop.slave1,hadoop.slave2",
"transactionZKPort" : 2181,
"transactionZKRoot" : "/consumers",
"consumerGroupId" : "eagle.hdfsaudit.consumer",
"transactionStateUpdateMS" : 2000
},
"alertExecutorConfigs" : {
"hdfsAuditLogAlertExecutor" : {
"parallelism" : 1,
"partitioner" : "org.apache.eagle.policy.DefaultPolicyPartitioner",
"needValidation" : "true"
}
},
"eagleProps" : {
"site" : "sandbox",
"application": "hdfsAuditLog",
"dataJoinPollIntervalSec" : 30,
"mailHost" : "mailHost.com",
"mailSmtpPort":"25",
"mailDebug" : "true",
"eagleService": {
"host": "hadoop.slave1", <strong><span style="color:#FF6666;"> # eagle服務位址,不要寫localhost,這個配置檔案是給storm的worker線程用的</span></strong>
"port": 9099
"username": "admin",
"password": "secret"
}
},
"dynamicConfigSource" : {
"enabled" : true,
"initDelayMillis" : 0,
"delayMillis" : 30000
}
}
4.3 啟動 topology
bin/eagle-topology.sh start
預設啟動
bin/eagle-topology.sh --main org.apache.eagle.security.auditlog.HdfsAuditLogProcessorMain --config conf/
sandbox-hdfsAuditLog-application.conf start
啟動其他topology需要指定main和config,如下(需要提前HIVE query logs into Eagle platform):
bin/eagle-topology.sh --main org.apache.eagle.security.hive.jobrunning.HiveJobRunningMonitoringMain --config conf/sandbox-hiveQueryLog-application.conf start