Nginx有兩套日志,一套access.log一套error.log,access.log使用者可以自定義,兩套日志處理好,業務的品質就了然于心了,另外,日志關鍵名額可視化分析我認為是運維中最重要的事情了,沒有之一。
對于後期的可視化監控,核心觀點兩個字“收斂”,工具用grafana,把最有用的關鍵名額放到dashboard上,我線上設計的一個版面如下,由于日志量大,200 ok的日志基本不看,是以結合filebeat隻收集了一套異常日志,異常日志定義為狀态碼非200或者響應時間大于0.5秒,裡面要考慮幾個點:
1、要用日志時間建es索引,便于分析排障;
2、資料字段要轉化為資料類型,便于計算;
3、Dashbord考慮品質、性能、錯誤3個次元内容;
4、為了友善聚合分析,url要把問号前的接口拆出來,比如/v1/feedback/check_regid_valid?regId=DmuZXBoN4nRulcjmZczGw1fZLya151h7%2BtY0%2FgoM6qwllq%2BR723oL8fK9Wahdxee&packageName=com.tencent.news&appId=2882303761517336368,拆成/v1/feedback/check_regid_valid,後面一堆參數可處理可不處理,後面會寫一下如果處理可以繼續拆一下k/v;
5、grafana要動态可選業務類型、機房、機器、響應時間等資訊,便于鑽取分析。
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsISPrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdsATOfd3bkFGazxCMx8VesATMfhHLlN3XnxCMwEzX0xiRGZkRGZ0Xy9GbvNGLpZTY1EmMZVDUSFTU4VFRR9Fd4VGdsYTMfVmepNHLrJXYtJXZ0F2dvwVZnFWbp1zczV2YvJHctM3cv1Ce-cGcq5SOwgTNzIjN5QTM5gzN1UTMvwVNxUDM5EDMy8CXzV2Zh1WavwVbvNmLvR3YxUjLxM3Lc9CX6MHc0RHaiojIsJye.jpg)
對于access.log
原始日志如下(我用上撇做的分隔)
#access.log
10.132.10.29`api.xmpush.xiaomi.com`103.107.217.171`-`14/May/2019:00:00:01 +0800`/v1/feedback/check_regid_valid?regId=46quGCb9wT89VV2XzUW89bORMmralBlKriPnZbeAmIzF2nABHj
LJKCI8%2FF0InyHR&packageName=com.smile.gifmaker&appId=2880376151713534`325`332`0.009`200`10.132.50.2:9085`-`0.009`GET`okhttp/3.12.1`200`lT666Qlxdq6G6iUqt/G3FrQ==
logstash清洗規則如下
filter {
ruby {
init => "@kname = ['server_addr','domain','remote_addr','http_x_forwarded_for','time_local','request_uri','request_length','bytes_sent','request_time','status','upstream_addr','upstream_cache_status','upstream_response_time','request_method','http_user_agent','upstream_status','key']"
code => "event.append(Hash[@kname.zip(event['message'].split('`'))])"
}
#将api接口拆開
if [request_uri] {
grok {
match => ["request_uri","%{DATA:api}(\?%{DATA:args})?$"]
}
}
#用日志時間建立索引,把無用字段去掉
date {
match => ["time_local", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
remove_field => ["time_local","message","path","offset","prospector","[fields][ttopic]","[beat][version]","[beat][name]"]
}
#把類型從字元型轉為數字型
mutate {
convert => [
"request_length" , "integer",
"status" , "integer",
"upstream_response_time" , "float",
"request_time" , "float",
"bytes_sent" , "integer" ]
}
}
output {
#再次判斷一下隻收集處理異常日志
if [status] != 200 or [request_time] > 1 {
elasticsearch {
hosts => ["1.1.1.1:9220","2.2.2.2:9220","3.3.3.3:9220"]
#workers => 4
index => "logstash-im-push-mt-nginx-access-%{+YYYY.MM.dd.hh}"
}
}
#某幾台機器收集全量日志,用于抽樣整體情況了解
else if [beat][hostname] == 'c4-hostname01.bj' or [beat][hostname] == 'c4-hostname02.bj' {
elasticsearch {
hosts => ["1.1.1.1:9220","2.2.2.2:9220","3.3.3.3:9220"]
#workers => 4
index => "logstash-im-push-mt-nginx-access-%{+YYYY.MM.dd.hh}"
}
}
#stdout { codec => rubydebug }
}
清洗後的一個截圖如下,滿足需求
對于error.log
原始日志如下
#error.log 有很多種,這裡隻寫一條,後面正則會做全部比對
2019/05/14 10:28:16 [error] 13655#0: *3199 connect() failed (111: Connection refused) while connecting to upstream, client: 47.111.97.155, server: api.xmpush.xiaomi.co
m, request: "POST /v2/message/regid HTTP/1.1", upstream: "http://10.132.28.28:9085/v2/message/regid", host: "api.xmpush.xiaomi.com"
logstash清洗規則如下
#根據error日志的規律,兩個正則覆寫所有情況,正則忘記從哪個大神那取的經了,多謝,另外,為了防止特殊情況出現,我保留了原始msg
filter {
grok {
match => [
"message", "(?<time>\d{4}/\d{2}/\d{2}\s{1,}\d{2}:\d{2}:\d{2})\s{1,}\[%{DATA:err_severity}\]\s{1,}(%{NUMBER:pid:int}#%{NUMBER}:\s{1,}\*%{NUMBER}|\*%{NUMBER}) %{DATA:err_message}(?:,\s{1,}client:\s{1,}(?<client_ip>%{IP}|%{HOSTNAME}))(?:,\s{1,}server:\s{1,}%{IPORHOST:server})(?:, request: %{QS:request})?(?:, host: %{QS:host})?(?:, referrer: \"%{URI:referrer})?",
"message", "(?<time>\d{4}/\d{2}/\d{2}\s{1,}\d{2}:\d{2}:\d{2})\s{1,}\[%{DATA:err_severity}\]\s{1,}%{GREEDYDATA:err_message}"]
}
date{
match=>["time","yyyy/MM/dd HH:mm:ss"]
target=>"@timestamp"
remove_field => ["time_local","time","path","offset","prospector","[fields][ttopic]","[beat][version]","[beat][name]"]
}
}
清洗後的截圖如下
延伸配置
對于拆出來的args,如果業務上有需要分析的資訊,要繼續将k/v分開,比如a=111&b=2222,拆到es裡a作為key數字作為value,可以在logstash裡編寫如下配置
if [args] {
#很多url是經過decode過的,亂碼不便于閱讀,我們把它恢複
urldecode {
field => "args"
}
#以&為分隔符,将args的k/v分開
kv {
source => "args"
field_split => "&"
}
}
好了,希望能在這個基礎上,幫助你搭建自己的nginx可視化監控。
檢視更多請到部落客原創站:運維網咖社(www.net-add.com)
轉載于:https://blog.51cto.com/benpaozhe/2395107