天天看点

OpenStack Swift源码分析(4)----swift-ring-builder源代码解析之一

感谢朋友支持本博客,欢迎共同探讨交流,由于能力和时间有限,错误之处在所难免,欢迎指正!

如果转载,请保留作者信息。

博客地址:http://blog.csdn.net/gaoxingnengjisuan

邮箱地址:[email protected]

    ring是swift中的核心组件,它描述和决定了数据如何在集群系统中分布。其中的一致性哈希算法更是核心内容之一。

    swift-ring-builder中包含了对ring的各种操作方法,包括create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等方法,我将会逐个解析这写方法的实现过程;

    首先来看swift-ring-builder中的main方法:

if __name__ == '__main__':
    if len(argv) < 2:
        print "swift-ring-builder %(MAJOR_VERSION)s.%(MINOR_VERSION)s\n" % globals()
        print Commands.default.__doc__.strip()
        print
        cmds = [c for c, f in Commands.__dict__.iteritems()
                if f.__doc__ and c[0] != '_' and c != 'default']
        cmds.sort()
        for cmd in cmds:
            print Commands.__dict__[cmd].__doc__.strip()
            print
        print RingBuilder.search_devs.__doc__.strip()
        print
        for line in wrap(' '.join(cmds), 79, initial_indent='Quick list: ',
                         subsequent_indent='            '):
            print line
        print ('Exit codes: 0 = operation successful\n'
               '            1 = operation completed with warnings\n'
               '            2 = error')
        exit(EXIT_SUCCESS)

    # 验证argv[1]的路径是否存在;
    if exists(argv[1]):
        # builder从argv[1]指定文件获取初始化数据;
        # 再对RingBuilder类进行实例化;
        # 根据具体情况决定是否改变builder的初始化数据;
        builder = RingBuilder.load(argv[1])
    # 如果argv[1]不存在,而且调用的还不是'create'命令,则退出;
    elif len(argv) < 3 or argv[2] != 'create':
        print 'Ring Builder file does not exist: %s' % argv[1]
        exit(EXIT_ERROR)

    # 获取文件夹'backups'的整体路径;
    backup_dir = pathjoin(dirname(argv[1]), 'backups')
    # 新建文件夹'backups';
    try:
        mkdir(backup_dir)
    except OSError, err:
        if err.errno != EEXIST:
            raise

    # 获取argv[1],作为ring_file;
    ring_file = argv[1]
    if ring_file.endswith('.builder'):
        ring_file = ring_file[:-len('.builder')]
    ring_file += '.ring.gz'   

    if len(argv) == 2:
        command = "default"
    else:
        command = argv[2]
    if argv[0].endswith('-safe'):
        try:
            with lock_parent_directory(abspath(argv[1]), 15):
                Commands.__dict__.get(command, Commands.unknown.im_func)()
        except exceptions.LockTimeout:
            print "Ring/builder dir currently locked."
            exit(2)
    else:
        Commands.__dict__.get(command, Commands.unknown.im_func)()
           

    这个main方法主要做了以下几件事:

    (1)检测命令行的正确性;

    (2)如果argv[1]文件存在,则调用类RingBuilder中的方法load加载argv[1]文件,并返回类RingBuilder的实例化对象;

    (3)建立backups文件夹;

    (4)初始化ring_file文件;

    (5)调用运行command中指定的处理ring的方法;

    在类Commands中有若干处理ring的方法,如create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等等。下面我们来逐一解析这些方法:

    1.来看方法create:

def create():
        if len(argv) < 6:
            print Commands.create.__doc__.strip()
            exit(EXIT_ERROR)
            
        builder = RingBuilder(int(argv[3]), float(argv[4]), int(argv[5]))
        backup_dir = pathjoin(dirname(argv[1]), 'backups')
        try:
            mkdir(backup_dir)
        except OSError, err:
            if err.errno != EEXIST:
                raise

        # Python中可以使用 pickle 模块将对象转化为文件保存在磁盘上,在需要的时候再读取并还原。
        # 这里即是把转化为字典格式的builder写入到文件pathjoin(backup_dir,'%d.' % time() + basename(argv[1]))文件中;
        # builder.to_dict():以字典的形式返回初始化的RingBuilder类的对象;
        pickle.dump(builder.to_dict(), open(pathjoin(backup_dir,'%d.' % time() + basename(argv[1])), 'wb'), protocol=2)

        # 再把转化为字典格式的builder写入到argv[1]指明的文件中;
        pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)

        exit(EXIT_SUCCESS)
           

    这个方法主要实现的功能是:

    (1)根据命令初始化类RingBuilder,获取类RingBuilder的实例化对象;

    (2)创建用于备份的文件夹'backups';

    (3)使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;这里具体是把转化为字典格式的builder保存两份,一份写入到建立的文件夹'backups'中的指定文件中,一份写入到argv[1]指明的文件中;

    2.来看方法default:

def default():
        print '%s, build version %d' % (argv[1], builder.version)
        regions = 0
        zones = 0
        balance = 0
        dev_count = 0
        if builder.devs:
            regions = len(set(d['region'] for d in builder.devs
                              if d is not None))
            zones = len(set((d['region'], d['zone']) for d in builder.devs
                            if d is not None))
            dev_count = len([d for d in builder.devs
                             if d is not None])
            balance = builder.get_balance()
        print '%d partitions, %.6f replicas, %d regions, %d zones, ' \
              '%d devices, %.02f balance' % (builder.parts, builder.replicas,
                                             regions, zones, dev_count,
                                             balance)
        print 'The minimum number of hours before a partition can be ' \
              'reassigned is %s' % builder.min_part_hours
        if builder.devs:
            print 'Devices:    id  region  zone      ip address  port' \
                  '      name weight partitions balance meta'
            weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None)
            for dev in builder.devs:
                if dev is None:
                    continue
                if not dev['weight']:
                    if dev['parts']:
                        balance = 999.99
                    else:
                        balance = 0
                else:
                    balance = 100.0 * dev['parts'] / (dev['weight'] * weighted_parts) - 100.0
                print('         %5d %5d %5d %15s %5d %9s %6.02f %10s'
                      '%7.02f %s' %
                      (dev['id'], dev['region'], dev['zone'], dev['ip'],
                       dev['port'], dev['device'], dev['weight'], dev['parts'],
                       balance, dev['meta']))
        exit(EXIT_SUCCESS)
           

    这个方法主要实现的功能是显示ring和设备内部的信息;

    3.来看方法search:

def search():
        if len(argv) < 4:
            print Commands.search.__doc__.strip()
            print
            print builder.search_devs.__doc__.strip()
            exit(EXIT_ERROR)
            
        # search_devs:<search-value>可以通过<device_id>/<region>/<zone>-<ip>:<port>/<device_name>_<meta>几种格式来进行查询;
        devs = builder.search_devs(argv[3])

        if not devs:
            print 'No matching devices found'
            exit(EXIT_ERROR)
        print 'Devices:    id  region  zone      ip address  port      name ' \
              'weight partitions balance meta'

        weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None)

        for dev in devs:
            if not dev['weight']:
                if dev['parts']:
                    balance = 999.99
                else:
                    balance = 0
            else:
                balance = 100.0 * dev['parts'] / \
                    (dev['weight'] * weighted_parts) - 100.0
            print('         %5d %5d %5d %15s %5d %9s %6.02f %10s %7.02f %s' %
                  (dev['id'], dev['region'], dev['zone'], dev['ip'],
                   dev['port'], dev['device'], dev['weight'], dev['parts'],
                   balance, dev['meta']))
        exit(EXIT_SUCCESS)
           

    这个方法主要实现了显示匹配的设备信息的功能;

    主要做了以下几件事:

    (1)验证命令行正确性;

    (2)调用方法search_devs实现对设备信息的搜索功能;

    (3)遍历得到的匹配设备信息,组成输出信息并进行打印输出;

    我们来看看方法search_devs:

def search_devs(self, search_value):
        """        
        <search-value>可以通过<device_id>/<region>/<zone>-<ip>:<port>/<device_name>_<meta>几种格式来进行查询;
        
        Examples::

        d74              Matches the device id 74
        r4               Matches devices in region 4
        z1               Matches devices in zone 1
        z1-1.2.3.4       Matches devices in zone 1 with the ip 1.2.3.4
        1.2.3.4          Matches devices in any zone with the ip 1.2.3.4
        z1:5678          Matches devices in zone 1 using port 5678
        :5678            Matches devices that use port 5678
        /sdb1            Matches devices with the device name sdb1
        _shiny           Matches devices with shiny in the meta data
        _"snet: 5.6.7.8" Matches devices with snet: 5.6.7.8 in the meta data
        [::1]            Matches devices in any zone with the ip ::1
        z1-[::1]:5678    Matches devices in zone 1 with ip ::1 and port 5678

        Most specific example::

        d74r4z1-1.2.3.4:5678/sdb1_"snet: 5.6.7.8"

        """
        orig_search_value = search_value
        match = []
        if search_value.startswith('d'):
            ......
        if search_value.startswith('r'):
            ......
        if search_value.startswith('z'):
            ......
        if search_value.startswith('-'):
            ......
        if len(search_value) and search_value[0].isdigit():
            ......
        elif len(search_value) and search_value[0] == '[':
            ......
        if search_value.startswith(':'):
            ......
        if search_value.startswith('/'):
            ......
        if search_value.startswith('_'):
            ......
        if search_value:
            ......

        matched_devs = []
        for dev in self.devs:
            if not dev:
                continue
            matched = True
            for key, value in match:
                if key == 'meta':
                    if value not in dev.get(key):
                        matched = False
                elif dev.get(key) != value:
                    matched = False
            if matched:
                matched_devs.append(dev)
        return matched_devs
           

    在这个方法中我们要注意的地方就是搜索指令的格式;

    4.来看方法add:

def add():
        """
        swift-ring-builder <builder_file> add
        [r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
        [[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>] ...

        使用给定的信息添加新的设备到ring上;
        add操作不会分配partitions到新的设备上,只有运行了'rebalance'命令后,才会进行分区的分配;
        因此,这种机制可以允许我们一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配;
        
        使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;
        这里具体是把转化为字典格式的builder写入到argv[1]指定文件中;
        """
        if len(argv) < 5 or len(argv) % 2 != 1:
            print Commands.add.__doc__.strip()
            exit(EXIT_ERROR)

        # itertools.islice(iterable, start, stop[, step])
        # islice('ABCDEFG', 2) --> A B
        # islice('ABCDEFG', 2, 4) --> C D
        # islice('ABCDEFG', 2, None) --> C D E F G
        # islice('ABCDEFG', 0, None, 2) --> A C E G
              
        # 从命令行获取匹配的devstr和weightstr集合;
        devs_and_weights = izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))
        for devstr, weightstr in devs_and_weights:
            region = 1
            rest = devstr
            
            if devstr.startswith('r'):
                i = 1
                while i < len(devstr) and devstr[i].isdigit():
                    i += 1
                region = int(devstr[1:i])
                rest = devstr[i:]
            else:
                stderr.write("WARNING: No region specified for %s. "
                             "Defaulting to region 1.\n" % devstr)

            if not rest.startswith('z'):
                print 'Invalid add value: %s' % devstr
                exit(EXIT_ERROR)
                
            i = 1
            while i < len(rest) and rest[i].isdigit():
                i += 1
            zone = int(rest[1:i])
            rest = rest[i:]

            if not rest.startswith('-'):
                print 'Invalid add value: %s' % devstr
                print "The on-disk ring builder is unchanged.\n"
                exit(EXIT_ERROR)
                
            i = 1
            if rest[i] == '[':
                i += 1
                while i < len(rest) and rest[i] != ']':
                    i += 1
                i += 1
                ip = rest[1:i].lstrip('[').rstrip(']')
                rest = rest[i:]
            else:
                while i < len(rest) and rest[i] in '0123456789.':
                    i += 1
                ip = rest[1:i]
                rest = rest[i:]

            if not rest.startswith(':'):
                print 'Invalid add value: %s' % devstr
                print "The on-disk ring builder is unchanged.\n"
                exit(EXIT_ERROR)
                
            i = 1
            while i < len(rest) and rest[i].isdigit():
                i += 1
            port = int(rest[1:i])
            rest = rest[i:]

            if not rest.startswith('/'):
                print 'Invalid add value: %s' % devstr
                print "The on-disk ring builder is unchanged.\n"
                exit(EXIT_ERROR)
                
            i = 1
            while i < len(rest) and rest[i] != '_':
                i += 1
            device_name = rest[1:i]
            rest = rest[i:]

            meta = ''
            if rest.startswith('_'):
                meta = rest[1:]

            try:
                weight = float(weightstr)
            except ValueError:
                print 'Invalid weight value: %s' % weightstr
                print "The on-disk ring builder is unchanged.\n"
                exit(EXIT_ERROR)

            if weight < 0:
                print 'Invalid weight value (must be positive): %s' % weightstr
                print "The on-disk ring builder is unchanged.\n"
                exit(EXIT_ERROR)

            for dev in builder.devs:
                if dev is None:
                    continue
                if dev['ip'] == ip and dev['port'] == port and \
                        dev['device'] == device_name:
                    print 'Device %d already uses %s:%d/%s.' % \
                          (dev['id'], dev['ip'], dev['port'], dev['device'])
                    print "The on-disk ring builder is unchanged.\n"
                    exit(EXIT_ERROR)

            # 增加一个device到ring;
            # 这个方法不会马上执行ring的重新平衡操作,因为我们可能需要在重新平衡操作之前进行多次改变;
            # 这样做也是为了提高效率的;
            builder.add_dev({'region': region, 'zone': zone, 'ip': ip,
                             'port': port, 'device': device_name,
                             'weight': weight, 'meta': meta})
            new_dev = builder.search_devs('r%dz%d-%s:%s/%s' % (region, zone, ip, port, device_name))[0]['id']
            
            if ':' in ip:
                print(
                    'Device r%dz%d-[%s]:%s/%s_"%s" with %s weight got id %s' %
                    (region, zone, ip, port,
                     device_name, meta, weight, new_dev))
            else:
                print('Device r%dz%d-%s:%s/%s_"%s" with %s weight got id %s' %
                      (region, zone, ip, port,
                       device_name, meta, weight, new_dev))
        
        # 使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;
        # 这里具体是把转化为字典格式的builder写入到argv[1]指定文件中;
        pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)

        exit(EXIT_SUCCESS)
           

    可以知道命令行格式:

    swift-ring-builder <builder_file> add

    [r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>

    [[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>]

    这个方法主要做了以下几件事:

    (1)验证命令行的正确性;

    (2)从命令行获取匹配的devstr和weightstr集合;

    调试示例:

    argv = ['/usr/bin/swift-ring-builder', 'account.builder', 'add', 'z1-127.0.0.1:6012/sda3', '1']

    izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))) = ('z1-127.0.0.1:6012/sda3', '1')

    (3)遍历每一对匹配的devstr和weightstr,从devstr和weightstr中获取region、zone、ip、port、device_name、weight和meta的值。调用add_dev方法实现增加这个device信息到ring,这个方法不会马上执行ring的重新平衡操作,因为可能需要在重新平衡操作之前进行多次改变(正如这里是个遍历循环操作,如果命令行中一次增加多个设备信息,就需要执行多次add_dev方法)。这样做也是为了提高效率的。

    (4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv[1]指定文件中。

    5.来看方法set_weight:

def set_weight():
        """
        swift-ring-builder <builder_file> set_weight <search-value> <weight>
        [<search-value> <weight] ...
        """
        if len(argv) < 5 or len(argv) % 2 != 1:
            print Commands.set_weight.__doc__.strip()
            print
            print builder.search_devs.__doc__.strip()
            exit(EXIT_ERROR)

        devs_and_weights = izip(islice(argv, 3, len(argv), 2),
                                islice(argv, 4, len(argv), 2))
        for devstr, weightstr in devs_and_weights:
            devs = builder.search_devs(devstr)
            weight = float(weightstr)
            if not devs:
                print("Search value \"%s\" matched 0 devices.\n"
                      "The on-disk ring builder is unchanged.\n"
                      % devstr)
                exit(EXIT_ERROR)
            if len(devs) > 1:
                print 'Matched more than one device:'
                for dev in devs:
                    print '    d%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \
                          '"%(meta)s"' % dev
                if raw_input('Are you sure you want to update the weight for '
                             'these %s devices? (y/N) ' % len(devs)) != 'y':
                    print 'Aborting device modifications'
                    exit(EXIT_ERROR)
            for dev in devs:
                # 设置device的weight值;
                # 这个方法不是仅仅直接在device字典中变更weight值;
                # 还有builder将会需要重新设置一些内部状态来反应weight值的改变;
                builder.set_dev_weight(dev['id'], weight)
                print 'd%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \
                      '"%(meta)s" weight set to %(weight)s' % dev
        pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)
        exit(EXIT_SUCCESS)
           

    这个方法实现了重新设置设备的weight。set_weight操作后,设备上的partition不会重新分配,只有运行了'rebalance'命令后才会进行分区的分配。因此,这种机制可以允许你一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配。

    这个方法做了以下几件事:

    (1)验证命令行正确性;

    (2)从命令行获取匹配的devstr和weightstr集合;

    (3)遍历每一对匹配的devstr和weightstr。调用search_devs方法根据devstr查询得到对应的devs,也就是要修改weight值的devs。遍历得到的devs,调用set_dev_weight方法为每一个dev重新设置weight值为float(weightstr)。所以每一组devs的weight的值都是相同的。

    (4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv[1]指定文件中。

    这里再来看看方法set_dev_weight,方法比较容易理解,具体解析如注释所示;

def set_dev_weight(self, dev_id, weight):
        """
        设置device的weight值;
        这个方法不是仅仅直接在device字典中变更weight值;
        还有builder将会需要重新设置一些内部状态来反应weight值的改变;
        """
        self.devs[dev_id]['weight'] = weight
        # _set_parts_wanted:方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目;
        self._set_parts_wanted()
        # 设置devs_changed为TRUE,说明已经改变;
        self.devs_changed = True
        self.version += 1
           

    这里再来看看方法_set_parts_wanted,这个方法会在很多地方被调用,这里先来解析一下,具体解析如注释所示:

def _set_parts_wanted(self):
        """       
        方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目;
        计算方法是:
        1.首先计算每个partition的weight,即将partition数目乘以副本数得到总的partition数目;
          然后除以现有dev的weight总和,得到每个partition的权重;
          实际上得到的是单位权重对应的partition数目;
        2.dev根据上述结构计算自己应该获取的partition数目;
          计算方法:dev weight * weigh_of_one_part – 已经分配的partition数目;
          具体解释就是单位权重对应的partition数目乘以权重值,得到权重dev['weight']对应的总的partition数目,
          然后减去已经分配的partition数目,就得到了还要分配的partition数目;
        
        由此可见,每次add设备操作都会引起partition分配的变化,但是真正的partition搬迁操作在rebalance执行时;
        """
        
        # weight_of_one_part:从所有设备的总权重(weight)中,返回计算出来的每一个分区的权重(weight);
        # 计算方法就是将partition数目乘以副本数得到总的partition数目,然后除以现有dev的weight总和,得到每个partition的权重;
        # 也就是说具有同样副本数的分区具有同样的权重值(weight);
        weight_of_one_part = self.weight_of_one_part()

        for dev in self._iter_devs():
            if not dev['weight']:
                dev['parts_wanted'] = -self.parts * self.replicas
            else:
                dev['parts_wanted'] = int(weight_of_one_part * dev['weight']) - dev['parts']
           

    此篇博文解析到这里吧,下一篇博文将会继续解析swift-ring-builder文件。

    博文中不免有不正确的地方,欢迎朋友们不吝批评指正,谢谢大家了!

OpenStack Swift源码分析(4)----swift-ring-builder源代码解析之一

继续阅读