曾经研究memcached的时候发现它的状态机里有一个很隐蔽的逻辑bug很难触发就随手在issue里提了个bug,现在已经被合并到memcached主干上了
<code>on line 3857</code>
<code>if</code> <code>update_event() </code><code>return</code> <code>false</code> <code>and the connection's bind event is still EV_READ</code>
<code>c->state will be conn_closing and stop will be </code><code>true</code>
<code>if</code> <code>no </code><code>more</code> <code>bytes comes, the socket will never be cleanup?</code>
就是说在memcache.c的3857行如果update_event()返回false并且 连接的bind event依旧是EV_READ
c->state将会是conn_closing 而且 stop的值将会是true
这时如果没有更多的数据从socket上传入socket将永远不会被清理造成泄漏
Status: Fixed:
<a href="https://code.google.com/p/memcached/issues/detail?id=261&can=1&q=auxtenwpc%40gmail.com" target="_blank">https://code.google.com/p/memcached/issues/detail?id=261&can=1&q=auxtenwpc%40gmail.com</a>
<a href="https://github.com/memcached/memcached/commit/b2734f8321230bd52e36df7f82a6b1d71532e496" target="_blank">https://github.com/memcached/memcached/commit/b2734f8321230bd52e36df7f82a6b1d71532e496</a>
过了两年才有人鸟我然后被合并到主干上了:-)
<code>3839 </code><code>case</code> <code>conn_new_cmd:</code>
<code>3840 /* Only process nreqs at a </code><code>time</code> <code>to avoid starving other</code>
<code>3841 connections */</code>
<code>3842 </code>
<code>3843 --nreqs;</code>
<code>3844 </code><code>if</code> <code>(nreqs >= 0) {</code>
<code>3845 reset_cmd_handler(c);</code>
<code>3846 } </code><code>else</code> <code>{</code>
<code>3847 pthread_mutex_lock(&c->thread->stats.mutex);</code>
<code>3848 c->thread->stats.conn_yields++;</code>
<code>3849 pthread_mutex_unlock(&c->thread->stats.mutex);</code>
<code>3850 </code><code>if</code> <code>(c->rbytes > 0) {</code>
<code>3851 /* We have already </code><code>read</code> <code>in</code> <code>data into the input buffer,</code>
<code>3852 so libevent will most likely not signal </code><code>read</code> <code>events</code>
<code>3853 on the socket (unless </code><code>more</code> <code>data is available. As a</code>
<code>3854 hack we should just put </code><code>in</code> <code>a request to write data,</code>
<code>3855 because that should be possible ;-)</code>
<code>3856 */</code>
<code>3857 </code><code>if</code> <code>(!update_event(c, EV_WRITE | EV_PERSIST)) {</code>
<code>3858 </code><code>if</code> <code>(settings.verbose > 0)</code>
<code>3859 fprintf(stderr, </code><code>"Couldn't update event\n"</code><code>);</code>
<code>3860 conn_set_state(c, conn_closing);</code>
<code>3861 }</code>
<code>3862 }</code>
<code>3863 stop = </code><code>true</code><code>;</code>
<code>3864 }</code>
<code>3865 </code><code>break</code><code>;</code>
<code>so i think there need a patch</code>
<code>--- a</code><code>/memcached</code><code>.c</code>
<code>+++ b</code><code>/memcached</code><code>.c</code>
<code>@@ -3858,6 +3858,7 @@ static void drive_machine(conn *c) {</code>
<code> </code><code>if</code> <code>(settings.verbose > 0)</code>
<code> </code><code>fprintf(stderr, </code><code>"Couldn't update event\n"</code><code>);</code>
<code> </code><code>conn_set_state(c, conn_closing);</code>
<code>+ </code><code>break</code><code>;</code>
<code> </code><code>}</code>
<code> </code><code>}</code>
<code> </code><code>stop = </code><code>true</code><code>;</code>
<code>Wow, two years old... and it looks correct to me. If that update_event fails the connection might zombie. It</code><code>'s very hard for that to fail and it'</code><code>s been that way forever.</code>
<code>Pushed a commit </code><code>for</code> <code>the next release.</code>
欢迎加入运维开发技术分享QQ群365534424
本文转自 xjtuhit 51CTO博客,原文链接:http://blog.51cto.com/51reboot/1560493