有一个发送短信的功能,用户有7000人,要每人发送3条短信,结果用户描述,点击了一次,发送了2次短信,也就是没人发送了6条短信,据此排查问题。
查看kibana的nginx日志
![](https://img.laitimes.com/img/_0nNw4CM6IyYiwiM6ICdiwiI0gTMx81dsQWZ4lmZf1GLlpXazVmcvwFciV2dsQXYtJ3bm9CX9s2RkBnVHFmb1clWvB3MaVnRtp1XlBXe0xCMy81dvRWYoNHLwEzX5xCMx8FesU2cfdGLwMzX0xiRGZkRGZ0Xy9GbvNGLpZTY1EmMZVDUSFTU4VFRR9Fd4VGdsYTMfVmepNHLrJXYtJXZ0F2dvwVZnFWbp1zczV2YvJHctM3cv1Ce-cmbw5CNxIzMxEGO0EzY0U2M5gTNzYzXyUjN1kDM2IzLcBTMyIDMy8CXn9Gbi9CXzV2Zh1WavwVbvNmLvR3YxUjLyM3Lc9CX6MHc0RHaiojIsJye.png)
从上面的日志分析,用户应该是点击了多次,因为发送时间没有规律可寻,应该不是程序触发的动作,也就是有可能前端没有控制好,点击发送短信的按钮。
但是有一个小插曲,到短信供应商的平台,查看短信发送情况,发现时间对不上,从nginx日志看,打印日志的时间,短信平台已经提前发送短信了。
走到这里,怀疑是nginx的log时间不对,查看nginx的配置的日志格式
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$http_host" "$upstream_response_time" "$request_time" "$request_body"';
access_log /home/logs/nginx/access.log main;
error_log
time_local 就是我们在kibana上的时间
官方文档只是这样解释
$time_local
local time in the
现在的疑问是:他是请求的时间还是nginx处理完成打印日志的时间。找到一个英文解释,我们验证一下
The $local_time variable contains the time when the log entry is written.
when the HTTP request header is read, nginx does a lookup of the associated virtual server configuration. If the virtual server is found, the request goes through six phases:
server rewrite phase
location phase
location rewrite phase (which can bring the request back to the previous phase)
access control phase
try_files phase
log phase
Since the log phase is the last one, $local_time variable is much more colse to the end of the request than it's start.
查看我们自己tomcat请求发送短信的日志,在日志所在位置输入liunx命令:
cat -n catalina.2017-11-06.out | grep "发送短信操作开始" -B 2
可以看出:
20:57:27发送第一次请求,短信平台最早收到请求是20:58:00,后台把整个短信发完总处理时间为560.417秒,kibana中的时间为21:06:47,恰恰是20:57:27+560.417≈21:06:47
21:08:49发送第二次请求,短信平台最早收到请求是21:18:10,后台把整个短信发送处理时间为561.225秒,kibana中的时间为21:18:10,恰恰是21:57:27+561.225≈21:18:10
21:20:11发送第三次请求,短信平台没有收到请求,后台处理完成时间为40.743秒,kibana中的时间为21:20:51,恰恰是21:20:11+40.743≈21:20:51
经验证:time_local 是日志产生的时间
nginx日志中的其他两个时间的解释 :
1、request_time
官网描述:request processing time in seconds with a milliseconds resolution; time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client 。
指的就是从接受用户请求的第一个字节到发送完响应数据的时间,即包括接收请求数据时间、程序响应时间、输出
响应数据时间。
2、upstream_response_time
官网描述:keeps times of responses obtained from upstream servers; times are kept in seconds with a milliseconds resolution. Several response times are separated by commas and colons like addresses in the $upstream_addr variable
是指从Nginx向后端(php-cgi)建立连接开始到接受完数据然后关闭连接为止的时间。