天天看点

pymongo中AutoReconnect异常的正确避免方式

pymongo中AutoReconnect异常的正确避免方式

    • 问题来源
    • 解决思路
    • 解决方案
    • 最终方案

问题来源

在windows系统运行一下代码,会出现问题。非windows系统可以exit了。

import time

from pymongo import MongoClient

mongodb_setting = dict(
    host='127.0.0.1',
    port=27017,
    username='root',
    password='root',
    authSource='admin',
)
database_name = 'test'
db = MongoClient(**mongodb_setting)[database_name]
table = db['test_table']


def get_some_info():
    now = time.time()
    table.find_one({})
    print(time.time() - now)


def do_something():
    get_some_info()  # 第一次查询
    time.sleep(600)  # do something other
    get_some_info()  # 第二次查询


do_something()

           

当第二次查询时,会抛出异常pymongo.errors.AutoReconnect

官方文档中的描述是:

exception pymongo.errors.AutoReconnect(message=’’, errors=None) Raised when a connection to the database is lost and an attempt to auto-reconnect will be made. In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).

大致是意思是,pymongo会自动重连mongodb,但是我们必须手动处理这个异常。

至今我还是没明白,既然你都自动重连了,为什么要我们去处理这个异常?,求大神指点!

DEBUG后查到抛出异常位置,pool.py 262行

def _raise_connection_failure(address, error, msg_prefix=None):
    """Convert a socket.error to ConnectionFailure and raise it."""
    host, port = address
    # If connecting to a Unix socket, port will be None.
    if port is not None:
        msg = '%s:%d: %s' % (host, port, error)
    else:
        msg = '%s: %s' % (host, error)
    if msg_prefix:
        msg = msg_prefix + msg
    if isinstance(error, socket.timeout):
        raise NetworkTimeout(msg)
    elif isinstance(error, SSLError) and 'timed out' in str(error):
        # CPython 2.6, 2.7, PyPy 2.x, and PyPy3 do not distinguish network
        # timeouts from other SSLErrors (https://bugs.python.org/issue10272).
        # Luckily, we can work around this limitation because the phrase
        # 'timed out' appears in all the timeout related SSLErrors raised
        # on the above platforms. CPython >= 3.2 and PyPy3.3 correctly raise
        # socket.timeout.
        raise NetworkTimeout(msg)
    else:
        raise AutoReconnect(msg)
           

解决思路

  1. 老老实实按照官方文档说的,去捕获AutoReconnect异常,然后再次发出相同的请求。这个工作量很大,基本要重写每一个的函数,例如insert_one(),find_one()之类的。(个人理解,有更好的方法麻烦告知,谢谢!)
  2. 插个话题,按照方法1去捕获AutoReconnect异常的时候。每次抛出该异常前,必须忍受20s的等待异常时间。例如当运行find_one()方法,20s后才会抛出AutoReconnect异常,然后我们处理这个异常,再次运行一次find_one()方法,耗时大概0.020s,所以一次查询用了20多秒的时间,这样很痛苦。查询mongo_client.py中的class MongoClient的初始化函数,看看超时选项
- `connectTimeoutMS`: (integer or None) Controls how long (in
            milliseconds) the driver will wait during server monitoring when
            connecting a new socket to a server before concluding the server
            is unavailable. Defaults to ``20000`` (20 seconds).
          - `serverSelectionTimeoutMS`: (integer) Controls how long (in
            milliseconds) the driver will wait to find an available,
            appropriate server to carry out a database operation; while it is
            waiting, multiple server monitoring operations may be carried out,
            each controlled by `connectTimeoutMS`. Defaults to ``30000`` (30
            seconds).
           

默认connectTimeoutMS为20s,我之前的方法是,把connectTimeoutMS,socketTimeoutMS都设置为1000ms,然后处理NetworkTimeout异常,而不再是AutoReconnect异常。也是很痛苦的事(被windows害惨了)

  1. 最终还是回到socket的连接上找问题。出现AutoReconnect异常说明从连接池中拿到的连接已经失效,如果连接池里的连接一直保持着跟mongodb服务器的连接,就不会有自动重连的异常。说明socket的心跳检查有问题。而socket心跳跟几个参数有关:

    TCP_KEEPIDLE : 多少秒socket连接没有数据通信,发送keepalive探测分组,单位是秒

    TCP_KEEPINTVL : 如果没有响应,多少秒后重新发送keepalive探测分组

    TCP_KEEPCNT : 多少次没有响应,则关闭连接

解决方案

从源代码中查找出响应代码,在pool.py中的126行,关键函数为 _set_keepalive_times(sock)

_MAX_TCP_KEEPIDLE = 300
_MAX_TCP_KEEPINTVL = 10
_MAX_TCP_KEEPCNT = 9

if sys.platform == 'win32':
    try:
        import _winreg as winreg
    except ImportError:
        import winreg

    try:
        with winreg.OpenKey(
                winreg.HKEY_LOCAL_MACHINE,
                r"SYSTEM\CurrentControlSet\Services\Tcpip\Parameters") as key:
            _DEFAULT_TCP_IDLE_MS, _ = winreg.QueryValueEx(key, "KeepAliveTime")
            _DEFAULT_TCP_INTERVAL_MS, _ = winreg.QueryValueEx(
                key, "KeepAliveInterval")
            # Make sure these are integers.
            if not isinstance(_DEFAULT_TCP_IDLE_MS, integer_types):
                raise ValueError
            if not isinstance(_DEFAULT_TCP_INTERVAL_MS, integer_types):
                raise ValueError
    except (OSError, ValueError):
        # We could not check the default values so do not attempt to override.
        def _set_keepalive_times(dummy):
            pass
    else:
        def _set_keepalive_times(sock):
            idle_ms = min(_DEFAULT_TCP_IDLE_MS, _MAX_TCP_KEEPIDLE * 1000)
            interval_ms = min(_DEFAULT_TCP_INTERVAL_MS,
                              _MAX_TCP_KEEPINTVL * 1000)
            if (idle_ms < _DEFAULT_TCP_IDLE_MS or
                    interval_ms < _DEFAULT_TCP_INTERVAL_MS):
                sock.ioctl(socket.SIO_KEEPALIVE_VALS,
                           (1, idle_ms, interval_ms))
else:
    def _set_tcp_option(sock, tcp_option, max_value):
        if hasattr(socket, tcp_option):
            sockopt = getattr(socket, tcp_option)
            try:
                # PYTHON-1350 - NetBSD doesn't implement getsockopt for
                # TCP_KEEPIDLE and friends. Don't attempt to set the
                # values there.
                default = sock.getsockopt(socket.IPPROTO_TCP, sockopt)
                if default > max_value:
                    sock.setsockopt(socket.IPPROTO_TCP, sockopt, max_value)
            except socket.error:
                pass

    def _set_keepalive_times(sock):
        _set_tcp_option(sock, 'TCP_KEEPIDLE', _MAX_TCP_KEEPIDLE)
        _set_tcp_option(sock, 'TCP_KEEPINTVL', _MAX_TCP_KEEPINTVL)
        _set_tcp_option(sock, 'TCP_KEEPCNT', _MAX_TCP_KEEPCNT)

           

在windows系统和非win系统函数定义_set_keepalive_times()都不一样,我们先看windows系统。

1.先查找系统注册表中 SYSTEM\CurrentControlSet\Services\Tcpip\Parameters位置的两个键KeepAliveTime和KeepAliveInterval。我win10系统打开一看,根本就没有这两个键,所以_set_keepalive_times被定义为pass,没有心跳一段时间后就会造成AutoReconnect异常!

2.添加两个以上的键和值后,还需要与默认值对比,设置的是毫秒

_MAX_TCP_KEEPIDLE = 300
_MAX_TCP_KEEPINTVL = 10

……
……

        def _set_keepalive_times(sock):
            idle_ms = min(_DEFAULT_TCP_IDLE_MS, _MAX_TCP_KEEPIDLE * 1000)
            interval_ms = min(_DEFAULT_TCP_INTERVAL_MS,
                              _MAX_TCP_KEEPINTVL * 1000)
            if (idle_ms < _DEFAULT_TCP_IDLE_MS or
                    interval_ms < _DEFAULT_TCP_INTERVAL_MS):
                sock.ioctl(socket.SIO_KEEPALIVE_VALS,
                           (1, idle_ms, interval_ms))
           

只有其中一个值比默认值大,才会执行sock.ioctl()。(ps:我被这个判断坑惨了!)

也就是说KeepAliveInterval要大于10 * 1000

或者KeepAliveTime大于300 * 1000

最终方案

win系统开发太多坑了

步骤:

  1. win键+R,然后输入 regedit 回车
  2. 找到地址
计算机\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
  1. 空白处右键 > 新建 > QWORD
  2. 键名KeepAliveTime,值 60000(十进制)
  3. 键名KeepAliveInterval,值 20000(十进制)

完事!

继续阅读