天天看點

Apache eagle 安裝配置全步驟

一、編譯 參考官網: http://eagle.apache.org/docs/quick-start.html

  1.1 前置條件

1. Currently eagle is tested on **JDK-1.7.X**, currently (v0.4.0) not supporting JDK 1.8.
2. **NPM** should be installed (On MAC OS try "brew install node"), this is a prerequisite.
   npm 必須要安裝,否則編譯時候會報錯:
   [INFO] eagle-webservice ................................... FAILURE [03:03 min]
   Failed toexecute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:exec (exec-ui-install) on project eagle-webservice: Command execution failed. Process exited with an error: 1 (Exit value: 1)
3. Eagle is built using [Apache Maven](https://maven.apache.org/). 
           

1.2下載下傳源碼-編譯    http://www-us.apache.org/dist/incubator/eagle/apache-eagle-0.4.0-incubating/apache-eagle-0.4.0-incubating-src.tar.gz

$ tar -zxvf apache-eagle-0.4.0-incubating-src.tar.gz
$ cd apache-eagle-0.4.0-incubating-src 
$ curl -O https://patch-diff.githubusercontent.com/raw/apache/incubator-eagle/pull/268.patch
$ git apply 268.patch
$ mvn clean package -DskipTests
           
二、安裝      
2.0 環境依賴

         
  • For streaming platform dependencies
    • Storm: 0.9.3 or later  
<pre style="font-family: Monaco, Consolas, Courier, 'Lucida Console', monospace; font-style: normal; color: rgb(0, 0, 0); font-size: 14px; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 21px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span style="line-height: 1.5; background-color: inherit;">          # 安裝且配置環境變量STORM_HOME</span>
           

JAVA_HOME=/data/jdk1.7.0_79 STORM_HOME=/data/storm PATH=$PATH:$JAVA_HOME/bin:$STORM_HOME/bin CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JAVA_HOME export STORM_HOME export CLASSPATH export PATH

# 配置storm,有三個配置eagle會用到

########### These MUST be filled in for a storm configuration
 storm.zookeeper.servers:
     - "172.17.32.99"
#     - "server2"
# 
 nimbus.host: "172.17.32.99" 
 nimbus.thrift.port: 6627
 storm.local.dir: "/var/storm" 
 supervisor.slots.ports: 
     - 6700 
     - 6701 
     - 6702 
     - 6703 
 ui.port: 8099  
           

# 啟動 storm

nohup bin/storm nimbus >> /dev/null &
nohup bin/storm supervisor >> /dev/null &  
nohup bin/storm ui >> /dev/null &  
           
    • Kafka: 0.8.x or later  #依賴zookeeper , 叢集|單點均可, 必須
    • Java: 1.7.x
    • NPM (On MAC OS try “brew install node”)
  • For database dependencies (Choose one of them)
    • HBase: 0.98 or later
      • Hadoop5: 2.6.x is required
    • Mysql (本次選擇mysql)
      • Installation is required
本次選擇mysql ,先建立eagle庫
create database eagle;
grant all privileges on eagle.* to  [email protected]'%' identified by 'eagle';
flush privileges;
           
2.1 解壓      
$ tar -zxvf apache-eagle-0.4.0-incubating-bin.tar.gz
 $ mv apache-eagle-0.4.0-incubating eagle
 $ mv eagle /usr/
 $ cd /usr/eagle
           
2.2 配置conf/eagle-service.conf 

           
eagle {
    service {
        storage-type="jdbc"
        storage-adapter="mysql"
        storage-username="eagle"
        storage-password=eagle
        storage-database=eagle
        storage-connection-url="jdbc:mysql://hadoop.slave1:3306/eagle"
        storage-connection-props="encoding=UTF-8"
        storage-driver-class="com.mysql.jdbc.Driver"
        storage-connection-max=8
    }
}
           

2.3 配置 bin/eagle-env.sh

# set EAGLE_HOME
export EAGLE_HOME=$(dirname $0)/..

# The java implementation to use. please use jdk 1.7 or later
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/share/jdk1.7.0_79

# nimbus.host, default is localhost
export EAGLE_NIMBUS_HOST=localhost

# EAGLE_SERVICE_HOST, default is `hostname -f`
export EAGLE_SERVICE_HOST=localhost

# EAGLE_SERVICE_PORT, default is 9099
export EAGLE_SERVICE_PORT=9099

# EAGLE_SERVICE_USER
export EAGLE_SERVICE_USER=admin

# EAGLE_SERVICE_PASSWORD
export EAGLE_SERVICE_PASSWD=secret

export EAGLE_CLASSPATH=$EAGLE_HOME/conf
# Add eagle shared library jars
for file in $EAGLE_HOME/lib/share/*;do
    EAGLE_CLASSPATH=$EAGLE_CLASSPATH:$file
done

# Add eagle storm library jars
# Separate out of share directory because of asm version conflict
export EAGLE_STORM_CLASSPATH=$EAGLE_CLASSPATH
for file in $EAGLE_HOME/lib/storm/*;do
    EAGLE_STORM_CLASSPATH=$EAGLE_STORM_CLASSPATH:$file
done
           

2.4 配置 conf/eagle-scheduler.conf ,主要是storm相關配置 (具體參數見本節環境依賴項描述)

### scheduler propertise
appCommandLoaderEnabled = false
appCommandLoaderIntervalSecs = 1
appHealthCheckIntervalSecs = 5

### execution platform properties
envContextConfig.env = "storm"
envContextConfig.url = "http://hadoop.slave1:8744"    # storm 的ui 位址
envContextConfig.nimbusHost = "hadoop.slave1"        #storm 主機,不要寫localhost
envContextConfig.nimbusThriftPort = 6627                  # thrift 服務端口 
envContextConfig.jarFile = "/usr/eagle/lib/topology/eagle-topology-0.4.0-incubating-assembly.jar"  # 實際jar所在路徑

### default topology properties
eagleProps.mailHost = "mailHost.com"
eagleProps.mailSmtpPort = "25"
eagleProps.mailDebug = "true"
eagleProps.eagleService.host = "localhost"       
eagleProps.eagleService.port = 9099
eagleProps.eagleService.username = "admin"       
eagleProps.eagleService.password = "secret"
eagleProps.dataJoinPollIntervalSec = 30

dynamicConfigSource.enabled = true
dynamicConfigSource.initDelayMillis = 0
dynamicConfigSource.delayMillis = 30000
           
[[email protected] eagle]# bin/eagle-service.sh start
Starting eagle service ...
Eagle service started.
           
浏覽器打開(ip為eagle所在ip) http://192.168.222.136:9099/eagle-service      
使用者名/密碼:admin/sceret 可以通路

      
三、HDFS日志輸出到卡夫卡 How to stream hdfs log data into Kafka
       
見官網:http://eagle.apache.org/docs/import-hdfs-auditLog.html,官網提供兩個方案,本次采用logstash-kafka方式
提前下載下傳logstash,解壓
      
3.1  Create a Kafka topic as the streaming input.      
[[email protected] kafka-broker]# bin/kafka-topics.sh --create --zookeeper hadoop.master:2181,hadoop.slave1:2181,hadoop.slave2:2181 --replication-factor 1 --partitions 1 --topic sandbox_hdfs_audit_log
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic "sandbox_hdfs_audit_log".
[[email protected] kafka-broker]# bin/kafka-topics.sh --list --zookeeper hadoop.master:2181,hadoop.slave1:2181,hadoop.slave2:2181
sandbox_hdfs_audit_log
           
3.2 Install Logstash-kafka plugin      
For Logstash 1.5.x, logstash-kafka has been intergrated into logstash-input-kafka and logstash-output-kafka, and released with the 1.5 version of Logstash. So you can directly use it.
For Logstash 1.4.x, a user should install logstash-kafka firstly. Notice that this version does not support partition_key_format.
           
本次采用logstash 2.4 ,已含此插件 3.3 Create a Logstash configuration file under ${LOGSTASH_HOME}/conf
[[email protected] conf]# pwd
/root/logstash-2.4.0/conf
[[email protected] conf]# ls
hdfs-audit.conf
[[email protected] conf]# cat hdfs-audit.conf 
 input {
      file {
       type => "hdp-nn-audit"
           path => "/var/log/audit/audit.log"      # hdfs audit 日志路徑
           start_position => end
           sincedb_path => "/var/log/logstash"
       }
  }

  filter{
      if [type] == "hdp-nn-audit" {
         grok {
             match => ["message", "ugi=(?<user>([\w\d\-]+))@|ugi=(?<user>([\w\d\-]+))/[\w\d\-.][email protected]|ugi=(?<user>([\w\d.\-_]+))[\s(]+"]
         }
      }
  }

  output {
      if [type] == "hdp-nn-audit" {
          kafka {
              codec => plain {
                  format => "%{message}"
              }
              bootstrap_servers => "192.168.222.136:9092"    #kafka 位址
              topic_id => "sandbox_hdfs_audit_log"    #topic
              timeout_ms => 10000
              retries => 3
              client_id => "hdp-nn-audit"
          }
          # stdout { codec => rubydebug }
      }
  }
           
3.3 啟動logstash
[[email protected] logstash-2.4.0]# bin/logstash -f conf/hdfs-audit.conf 
Settings: Default pipeline workers: 1
Pipeline main started
           
3.4 問題 如果日志未輸出到kafka ,可能原因: kafka配置檔案 config/server.properties ,發現修改以下配置 改為自己主機的ip,新版改為如下配置:
[[email protected] config]# cat server.properties 

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = security_protocol://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://192.168.222.136:9092       <span style="color:#FF6666;"><strong> #改為主機ip</strong></span>

# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8
           
4.1 初始化                topology      
初始化之前,先修改初始化參數      
vi bin/eagle-topology-init.sh 
#hdfs 位址
classification.fs.defaultFS=hdfs://hadoop.master:8020
classification.hbase.zookeeper.property.clientPort=2181\nclassification.hbase.zookeeper.quorum=localhost
#hive中繼資料庫
classification.accessType=metastoredb_jdbc\nclassification.password=hive\nclassification.user=hive\nclassification.jdbcDriverClassName=com.mysql.jdbc.Driver\nclassification.jdbcUrl=jdbc:mysql://hadoop.slave1/hive?createDatabaseIfNotExist=true
classification.accessType=oozie_api\nclassification.oozieUrl=http://hadoop.master:11000/oozie\nclassification.filter=status=RUNNING\nclassification.authType=SIMPLE
           
{
  "envContextConfig" : {
    "env" : "storm",
    "mode" : "cluster",
    "topologyName" : "sandbox-hdfsAuditLog-topology",
    "stormConfigFile" : "security-auditlog-storm.yaml",
    "parallelismConfig" : {
      "kafkaMsgConsumer" : 1,
      "hdfsAuditLogAlertExecutor*" : 1
    }
  },
  "dataSourceConfig": {
    "topic" : "sandbox_hdfs_audit_log",                    <strong><span style="color:#FF6666;"># 和logstash 建立的topic一緻</span></strong>
    "zkConnection" : "hadoop.master:2181,hadoop.slave1:2181,hadoop.slave2:2181",   # zk 位址
    "brokerZkPath" : "/brokers",
    "zkConnectionTimeoutMS" : 15000,
    "fetchSize" : 1048586,
    "deserializerClass" : "org.apache.eagle.security.auditlog.HdfsAuditLogKafkaDeserializer",
    "transactionZKServers" : "hadoop.master,hadoop.slave1,hadoop.slave2",
    "transactionZKPort" : 2181,
    "transactionZKRoot" : "/consumers",
    "consumerGroupId" : "eagle.hdfsaudit.consumer",
    "transactionStateUpdateMS" : 2000
  },
  "alertExecutorConfigs" : {
     "hdfsAuditLogAlertExecutor" : {
       "parallelism" : 1,
       "partitioner" : "org.apache.eagle.policy.DefaultPolicyPartitioner",
       "needValidation" : "true"
     }
  },
  "eagleProps" : {
    "site" : "sandbox",
    "application": "hdfsAuditLog",
      "dataJoinPollIntervalSec" : 30,
    "mailHost" : "mailHost.com",
    "mailSmtpPort":"25",
    "mailDebug" : "true",
    "eagleService": {
      "host": "hadoop.slave1",  <strong><span style="color:#FF6666;"> # eagle服務位址,不要寫localhost,這個配置檔案是給storm的worker線程用的</span></strong>
      "port": 9099
      "username": "admin",
      "password": "secret"
    }
  },
  "dynamicConfigSource" : {
      "enabled" : true,
      "initDelayMillis" : 0,
      "delayMillis" : 30000
  }
}
           
4.3 啟動                topology      
bin/eagle-topology.sh start 

        
        
         預設啟動
         bin/eagle-topology.sh  --main org.apache.eagle.security.auditlog.HdfsAuditLogProcessorMain --config conf/
         sandbox-hdfsAuditLog-application.conf start 

              
啟動其他topology需要指定main和config,如下(需要提前HIVE query logs into Eagle platform):      
bin/eagle-topology.sh --main org.apache.eagle.security.hive.jobrunning.HiveJobRunningMonitoringMain  --config  conf/sandbox-hiveQueryLog-application.conf start