ELK收集Nginx通路日志應用架構
Nginx的日志格式與日志變量
Nginx跟Apache一樣,都支援自定義輸出日志格式,在進行Nginx日志格式定義前,有必要先了解一下關于多層代理擷取使用者真實IP的幾個概念。
- remote_addr:表示用戶端位址,但有個條件,如果沒有使用代理,這個位址就是用戶端的真實IP,如果使用了代理,這個位址就是上層代理的IP。相當于apache日志變量%a
- X-Forwarded-For:簡稱XFF,這是一個HTTP擴充頭,格式為 X-Forwarded-For: client, proxy1, proxy2,如果一個HTTP請求到達伺服器之前,經過了三個代理 Proxy1、Proxy2、Proxy3,IP 分别為 IP1、IP2、IP3,使用者真實IP為 IP0,那麼按照 XFF标準,服務端最終會收到以下資訊:X-Forwarded-For: IP0, IP1, IP2
由此可知,IP3這個位址X-Forwarded-For并沒有擷取到,而remote_addr剛好擷取的就是IP3的位址。
還要幾個容易混淆的變量,這裡也列出來做下說明:
- $remote_addr :此變量如果走代理通路,那麼将擷取上層代理的IP,如果不走代理,那麼就是用戶端真實IP位址。相當于apache日志中的%a
- $http_x_forwarded_for:此變量擷取的就是X-Forwarded-For的值。
- $proxy_add_x_forwarded_for:此變量是$http_x_forwarded_for和$remote_addr兩個變量之和。
系統預設定義和引用的日志格式為main:
[[email protected] nginx]# grep -A 4 'log_format' /etc/nginx/nginx.conf
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
自定義Nginx日志格式
在掌握了Nginx日志變量的含義後,接着開始對它輸出的日志格式進行改造,這裡我們仍将Nginx日志輸出設定為json格式,下面僅列出Nginx配置檔案nginx.conf中日志格式和日志檔案定義部分,定義好的日志格式與日志檔案如下:
map $http_x_forwarded_for $clientRealIp { # 定義日志變量clientRealIp
"" $remote_addr; # 當$http_x_forwarded_for變量為空時,将$remote_addr變量的值指派給$clientRealIp變量
~^(?P<firstAddr>[0-9\.]+),?.*$ $firstAddr; # 當$http_x_forwarded_for變量非空時,使用正規表達式取出$http_x_forwarded_for變量中的第一個IP值并指派給$firstAddr變量,最後$firstAddr變量的值再指派給$clientRealIp變量
} # 是以map指令整段配置就是要擷取到真正的用戶端IP位址并将其指派給$clientRealIp變量,$clientRealIp變量會在下面定義日志格式時引用
# 以下為自定義nginx日志格式
[[email protected] ~]# vim /etc/nginx/nginx.conf
log_format nginx_log_json '{"accessip_list":"$proxy_add_x_forwarded_for","client_ip":"$clientRealIp","http_host":"$host","@timestamp":"$time_iso8601","method":"$request_method","url":"$request_uri","status":"$status","http_referer":"$http_referer","body_bytes_sent":"$body_bytes_sent","request_time":"$request_time","http_user_agent":"$http_user_agent","total_bytes_sent":"$bytes_sent","server_ip":"$server_addr"}';
access_log /var/log/nginx/access.log nginx_log_json;
驗證日志輸出
[[email protected] ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[[email protected] ~]# systemctl restart nginx
[[email protected] ~]# ifconfig ens32 | awk 'NR==2 {print $2}'
192.168.126.90
浏覽器通路 http://192.168.126.90
檢視nginx日志
[[email protected] ~]# tailf /var/log/nginx/access.log
{"accessip_list":"192.168.126.1","client_ip":"192.168.126.1","http_host":"192.168.126.90","@timestamp":"2021-08-14T22:54:46+08:00","method":"GET","url":"/","status":"200","http_referer":"-","body_bytes_sent":"612","request_time":"0.000","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","total_bytes_sent":"850","server_ip":"192.168.126.90"}
{"accessip_list":"192.168.126.1","client_ip":"192.168.126.1","http_host":"192.168.126.90","@timestamp":"2021-08-14T22:54:46+08:00","method":"GET","url":"/favicon.ico","status":"404","http_referer":"-","body_bytes_sent":"153","request_time":"0.000","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","total_bytes_sent":"308","server_ip":"192.168.126.90"}
為nginx伺服器設定一層反向代理
[[email protected] ~]# ifconfig ens32 | awk 'NR==2 {print $2}'
192.168.126.91
[[email protected] ~]# vim /etc/httpd/conf/httpd.conf
ProxyPass / http://192.168.126.90
ProxyPassReverse / http://192.168.126.90/
[[email protected] ~]# systemctl restart httpd
浏覽器通路 http://192.168.126.91
檢視nginx日志
[[email protected] ~]# tailf /var/log/nginx/access.log
{"accessip_list":"192.168.126.1, 192.168.126.91","client_ip":"192.168.126.1","http_host":"192.168.126.90","@timestamp":"2021-08-14T23:02:48+08:00","method":"GET","url":"/","status":"200","http_referer":"-","body_bytes_sent":"612","request_time":"0.000","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","total_bytes_sent":"850","server_ip":"192.168.126.90"}
# 可以看到此時"accessip_list"字段是兩個IP,第一個是用戶端真實IP,第二個是代理IP;"client_ip"字段為真實用戶端IP
在一層的基礎上設定二層反向代理
[[email protected] ~]# ifconfig ens32 | awk 'NR==2 {print $2}'
192.168.126.92
[[email protected] ~]# vim /etc/httpd/conf/httpd.conf
ProxyPass / http://192.168.126.91
ProxyPassReverse / http://192.168.126.91/
[[email protected] ~]# systemctl restart httpd
浏覽器通路 http://192.168.126.92
檢視nginx日志
[[email protected] ~]# tailf /var/log/nginx/access.log
{"accessip_list":"192.168.126.1, 192.168.126.92, 192.168.126.91","client_ip":"192.168.126.1","http_host":"192.168.126.90","@timestamp":"2021-08-14T23:08:11+08:00","method":"GET","url":"/","status":"200","http_referer":"-","body_bytes_sent":"612","request_time":"0.000","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","total_bytes_sent":"850","server_ip":"192.168.126.90"}
# 可以看到此時"accessip_list"字段是三個IP,第一個是用戶端真實IP,第一個IP是第一層代理IP,第二個IP是第二層代理IP;"client_ip"字段為真實用戶端IP
在這個輸出中,可以看到,client_ip和accessip_list輸出的異同,client_ip字段輸出的就是真實的用戶端IP位址,而accessip_list輸出是代理疊加而成的IP清單,第一條日志,是直接通路http://192.168.126.90不經過任何代理得到的輸出日志,第二條日志,是經過一層代理通路http://192.168.126.91 而輸出的日志,第三條日志,是經過二層代理通路http://192.168.126.92得到的日志輸出。
Nginx中擷取用戶端真實IP的方法很簡單,無需做特殊處理,這也給後面編寫logstash的事件配置檔案減少了很多工作量。
配置filebeat
filebeat是安裝在Nginx伺服器上的,這裡給出配置好的filebeat.yml檔案的内容:
[[email protected] filebeat]# vim /usr/local/filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
fields:
log_topic: nginxlogs
name: "192.168.126.90"
output.kafka:
enabled: true
hosts: ["192.168.126.91:9092", "192.168.126.92:9092", "192.168.126.93:9092"]
version: "0.10"
topic: '%{[fields][log_topic]}'
partition.round_robin:
reachable_only: true
worker: 2
required_acks: 1
compression: gzip
max_message_bytes: 10000000
logging.level: debug
# 啟動
[[email protected] filebeat]# nohup /usr/local/filebeat/filebeat -e -c /usr/local/filebeat/filebeat.yml &
[1] 1056
nohup: ignoring input and appending output to ‘nohup.out’
啟動kafka+zookeeper叢集
/usr/local/zookeeper/bin/zkServer.sh start
nohup /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties &
浏覽器通路nginx
檢視nginx通路日志
[[email protected] filebeat]# tailf /var/log/nginx/access.log
{"accessip_list":"192.168.126.1","client_ip":"192.168.126.1","http_host":"192.168.126.90","@timestamp":"2021-08-15T15:10:23+08:00","method":"GET","url":"/","status":"304","http_referer":"-","body_bytes_sent":"0","request_time":"0.000","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0","total_bytes_sent":"180","server_ip":"192.168.126.90"}
同時驗證filebeat采集日志資料
2021-08-15T15:10:32.294+0800 DEBUG [publish] pipeline/processor.go:308 Publish event: {
"@timestamp": "2021-08-15T07:10:32.292Z",
"@metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.5.4"
},
"offset": 2007,
"message": "{\"accessip_list\":\"192.168.126.1\",\"client_ip\":\"192.168.126.1\",\"http_host\":\"192.168.126.90\",\"@timestamp\":\"2021-08-15T15:10:23+08:00\",\"method\":\"GET\",\"url\":\"/\",\"status\":\"304\",\"http_referer\":\"-\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.000\",\"http_user_agent\":\"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0\",\"total_bytes_sent\":\"180\",\"server_ip\":\"192.168.126.90\"}",
"fields": {
"log_topic": "nginxlogs"
},
"prospector": {
"type": "log"
},
"input": {
"type": "log"
},
"beat": {
"name": "192.168.126.90",
"hostname": "filebeatserver",
"version": "6.5.4"
},
"host": {
"name": "192.168.126.90"
},
"source": "/var/log/nginx/access.log"
}
驗證kafka叢集是否能消費到
[[email protected] ~]# /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper 192.168.126.91:2181,192.168.126.92:2181,192.168.126.93:2181 --topic nginxlogs
{"@timestamp":"2021-08-15T07:10:32.292Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.5.4","topic":"nginxlogs"},"prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"192.168.126.90","hostname":"filebeatserver","version":"6.5.4"},"host":{"name":"192.168.126.90"},"source":"/var/log/nginx/access.log","offset":2007,"message":"{\"accessip_list\":\"192.168.126.1\",\"client_ip\":\"192.168.126.1\",\"http_host\":\"192.168.126.90\",\"@timestamp\":\"2021-08-15T15:10:23+08:00\",\"method\":\"GET\",\"url\":\"/\",\"status\":\"304\",\"http_referer\":\"-\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.000\",\"http_user_agent\":\"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0\",\"total_bytes_sent\":\"180\",\"server_ip\":\"192.168.126.90\"}","fields":{"log_topic":"nginxlogs"}}
均能正确收集到日志資訊
配置logstash
由于在Nginx輸出日志中已經定義好了日志格式,是以在logstash中就不需要對日志進行過濾和分析操作了,下面直接給出logstash事件配置檔案kafka_nginx_into_es.conf的内容:
[[email protected] ~]# vim /usr/local/logstash/config/kafka_nginx_into_es.conf
input {
kafka {
bootstrap_servers => "192.168.126.91:9092,192.168.126.92:9092,192.168.126.93:9092"
topics => "nginxlogs" #指定輸入源中需要從哪個topic中讀取資料,這裡會自動建立一個名為nginxlogs的topic
group_id => "logstash"
codec => json {
charset => "UTF-8"
}
add_field => { "[@metadata][myid]" => "nginxaccess-log" } #增加一個字段,用于辨別和判斷,在output輸出中會用到。
}
}
filter {
if [@metadata][myid] == "nginxaccess-log" {
mutate {
gsub => ["message","\\x","\\\x"] # 這裡的message就是message字段,也就是日志的内容。這個插件的作用是将message字段内容中UTF-8單位元組編碼做替換處理,這是為了應對URL有中文出現的情況。
}
if ('method":"HEAD' in [message]) { # 如果message字段中有HEAD請求,就删除此條資訊。
drop{}
}
json {
source => "message"
remove_field => "prospector"
remove_field => "beat"
remove_field => "source"
remove_field => "input"
remove_field => "offset"
remove_field => "fields"
remove_field => "host"
remove_field => "@version"
remove_field => "message"
}
}
}
output {
if [@metadata][myid] == "nginxaccess-log" {
elasticsearch {
hosts => ["192.168.126.95:9200","192.168.126.96:9200","192.168.126.97:9200"]
index => "logstash_nginxlogs-%{+YYYY.MM.dd}" #指定Nginx日志在elasticsearch中索引的名稱,這個名稱會在Kibana中用到。索引的名稱推薦以logstash開頭,後面跟上索引辨別和時間。
}
}
}
這個logstash事件配置檔案非常簡單,沒對日志格式或邏輯做任何特殊處理,由于整個配置檔案跟elk收集apache日志的配置檔案基本相同。所有配置完成後,就可以啟動logstash了,執行如下指令:
[[email protected] ~]# nohup /usr/local/logstash/bin/logstash -f /usr/local/logstash/config/kafka_nginx_into_es.conf &
[1] 1084
nohup: ignoring input and appending output to ‘nohup.out’
啟動es叢集
su - elasticsearch
/usr/local/elasticsearch/bin/elasticsearch -d
通路nginx使其産生日志,并檢視es叢集是否生成對應的索引(生成索引需要一定的時間)
配置Kibana
Filebeat從nginx上收集資料到kafka,然後logstash從kafka拉取資料,如果資料能夠正确發送到elasticsearch,我們就可以在Kibana中配置索引了。
[[email protected] ~]# ifconfig | awk 'NR==2 {print $2}'
192.168.126.96
# 啟動
[[email protected] ~]# nohup /usr/local/kibana/bin/kibana &
[1] 1495
nohup: ignoring input and appending output to ‘nohup.out’
浏覽器通路 http://192.168.126.96:5601 登入Kibana,首先配置一個index_pattern,點選kibana左側導航中的Management菜單,然後選擇右側的Index Patterns按鈕,最後點選左上角的Create index pattern。