Tengine request_time 和 upstream_response_time 相等的问题排查

最近一段时间我们把 Tengine 从 1.4.6(nginx/1.2.9) 升级到了 Tengine 2.1.0(nginx/1.6.2),发现 request_time 出奇的小,大部分情况下都和 upstream_response_time 相等,特别对于 https 请求,一般要 0.x 秒,但是日志记录的只有 0.0x 秒。

 

搜遍了 Tengine,也没搜到有关系的内容,由于 Tengine 是 fork 的 Nginx,所以在 Nginx 测试一下,发现 nginx/1.6.2 同样有问题,继续测试 nginx/1.2.9 发现没问题。

所以是在 nginx/1.2.9 和 nginx/1.6.2 中间的某个版本引文的问题,实在没辙,只有用二分法找到了出问题的版本:

nginx-1.4.7    NOTOK
nginx-1.4.3    NOTOK
nginx-1.4.0    NOTOK
ngnx-1.3.16   NOTOK
nginx-1.3.15  NOTOK
ngnx-1.3.14   OK
nginx-1.3.9    OK
nginx-1.2.9    OK

nginx-1.3.15 的 changlog:

Changes with nginx 1.3.15 26 Mar 2013 *) Change: opening and closing a connection without sending any data in
it is no longer logged to access_log with error code 400. *) Feature: the ngx_http_spdy_module.
Thanks to Automattic for sponsoring this work.

*) Feature: the “limit_req_status” and “limit_conn_status” directives.
Thanks to Nick Marden.

*) Feature: the “image_filter_interlace” directive.
Thanks to Ian Babrou.

*) Feature: $connections_waiting variable in the
ngx_http_stub_status_module.

*) Feature: the mail proxy module now supports IPv6 backends.

*) Bugfix: request body might be transmitted incorrectly when retrying a
request to the next upstream server; the bug had appeared in 1.3.9.
Thanks to Piotr Sikora.

*) Bugfix: in the “client_body_in_file_only” directive; the bug had
appeared in 1.3.9.

*) Bugfix: responses might hang if subrequests were used and a DNS error
happened during subrequest processing.
Thanks to Lanshun Zhou.

*) Bugfix: in backend usage accounting.

changlog 都没提,擦,搜了一下,搜到了这个:

http://trac.nginx.org/nginx/ticket/356

解释如下:

As of 1.3.15, $request_time calculation was slightly changed due to some optimizations introduced, and for a first request in a connection it no longer includes time between opening a connection and reading a first byte from a client. It still works correctly though.

看样子是从收到请求数据的第一个字节开始算的,而不是按照之前建立连接时的时间(之前是否是建立时间我没真正测过,属猜测),在官方文档中看到 request_time 的说明:

request processing time in seconds with a milliseconds resolution; time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client.

官方文档写的是第一个字节。

 

抓个包看看:request_time.pcap

日志中记录:”request_time=0.013″ “upstream_response_time=0.013”

E06FF4CC-110F-4739-B015-DBD55B0D9F75

10.0.23.74 是 Nginx,Nginx 接受到真实的请求数据是在 16,17 便是 Nginx 发送完数据的时间点,两者的时间差正好是 0.013,确认。

 

参考:

http://nginx.org/en/CHANGES-1.4

http://trac.nginx.org/nginx/ticket/356