laitimes

The client dynamically downgrades the system

author:Flash Gene

01

background

Whether it is an iOS or Android device, running online is affected by many factors such as hardware, network environment, code quality, etc., which may lead to performance problems, some of which cannot be found in the development stage. How to always provide users with a relatively smooth user experience online is a problem that needs to be considered in client development.

02

Service degradation and circuit breaker

The server has a degradation mechanism and a circuit breaker mechanism, and the existing scheme of the server can be referred to when designing the client degradation system. For example, if a performance issue or network congestion occurs, you need to reduce the burden on the device and the network, and then upgrade the policy after recovery.

Server downgrade mechanism triggers degradation when the overall load on the server is large or data errors occur due to special reasons. Different scenarios correspond to different downgrade policies. For example, if the data is due to data reasons, you can return cached data without reading the DB database. From the user's point of view, it may be that the data is not updated in time, but it can be displayed normally.

When a microservice on the server side is unavailable or the response time is too long, it will trigger a circuit breaker and no longer call the service. From the user's point of view, it may be that the avatar cannot be displayed, or some of the page templates are not displayed, or coupons cannot be used for the payment of shopping cart items.

03

Brief description of the solution

First of all, we need to figure out what problems the client needs to deal with. I divide it into two categories, performance and network speed, and performance can be refined into three categories: CPU, memory, and battery, all of which will affect the operation of the app. In the same way, we can't directly divide performance into good and bad, but need to refine it into different levels by enumeration.

Taking the iOS system as an example, we need to monitor the CPU, memory, battery, and network speed of the iOS device in real time, and we can set a reasonable interval range. In the event of a previous performance problem, thresholds are calculated for different types of problems to derive the corresponding levels. If the level changes, the business party is notified to downgrade or upgrade.

When a downgrade occurs, the service performs the corresponding downgrade operation, such as reducing the size of the image requested by the network. Through service degradation, the system performance consumption is reduced, the CPU and memory are gradually restored to the normal range, and the service is upgraded to restore the original service processing rules.

Through the above methods, it is ensured that users can still use the App smoothly in the event of performance or network problems, and the normal use of the functions in the App will not be affected.

04

Overall design

The design of the dynamic degradation system is mainly divided into three parts, and the responsibilities are divided as follows.

DynamicLevelManager: Call monitor and decision to complete the hierarchical calculation, and notify the business party by notification when the level changes.

DynamicLevelMonitor: monitors key performance indicators and is called by manager on a regular basis.

DynamicLevelDecision: The manager hands over the collected performance metrics to the decision, and desicion calculates the metrics in a unified manner, determines the performance level, and returns them to the manager.

The client dynamically downgrades the system

The following is the redacted pseudocode, which mainly expresses the design idea clearly. The demo code can also be run directly, and if necessary, it can be directly copied and used.

05

DynamicLevelManager

DynamicLevelManager is the core class of the dynamic degradation system, which is called manager, and when the App starts, it registers the listener through the openLevelAnalyze method, so as to open a loop implemented by the dispatch_source_t, which is executed every 1.5 seconds, and the dispatch_source_set_event_handler callback method will be triggered when executed. dispatch_source_t is triggered by the hardware clock of the mobile phone, which is not affected by the lag of the main thread, and the monitoring is relatively accurate.

/// 开启动态降级监控系统
- (void)openLevelAnalyze {
    self.sourceHandle = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, dispatch_get_global_queue(0, 0));
    dispatch_source_set_timer(self.sourceHandle, dispatch_time(DISPATCH_TIME_NOW, 0), 1.5 * NSEC_PER_SEC, 0);
    dispatch_source_set_event_handler(self.sourceHandle, ^{
        /// 计算综合性能级别
        CGFloat cpuUsageValue = [[DynamicLevelMonitor sharedInstance] cpuUsageForApp];
        NSInteger memoryUsageValue = [[DynamicLevelMonitor sharedInstance] useMemoryForApp];
        CGFloat batteryUsageValue = [[DynamicLevelMonitor sharedInstance] batteryUsageForApp];
        [[DynamicLevelDecision sharedInstance] calculatePerformanceLevelWithMemoryValue:memoryUsageValue
                                                                               cpuValue:cpuUsageValue
                                                                           batteryValue:batteryUsageValue
                                                                        completionBlock:^(MemoryUsageLevel memoryLevel, CPUUsageLevel cpuLevel, BatteryUsageLevel batteryLevel, MultiplePerformanceLevel performanceLevel) {
            /// 判断级别是否发生变化,发送性能降级或恢复原有等级的通知
            if (performanceLevel != self.currentPerformanceLevel) {
                [self postPerformanceNotifiWithPerformanceLevel:performanceLevel
                                                    memoryLevel:memoryLevel
                                                       cpuLevel:cpuLevel
                                                   batteryLevel:batteryLevel];
            }
        }];
        
        /// 计算网络性能级别
        CGFloat networkSpeed = [[QUICManager shareQUICManager] currentNetworkSpeed];
        [[DynamicLevelDecision sharedInstance] calculateNetworkLevelWithNetworkSpeed:networkSpeed completionBlock:^(NetworkSpeedLevel speedLevel) {
            /// 判断级别是否发生变化,发送网络降级或恢复原有等级的通知
            if (speedLevel != self.currentNetworkSpeedLevel) {
                [self postPerformanceNotifiWithNetworkSpeedLevel:speedLevel];
            }
        }];
    });
    dispatch_resume(self.sourceHandle);
}

- (void)closeLevelAnalyze {
    dispatch_source_cancel(self.sourceHandle);
}

/// 发送性能降级或恢复原有等级的通知
- (void)postPerformanceNotifiWithPerformanceLevel:(MultiplePerformanceLevel)performanceLevel
                                      memoryLevel:(MemoryUsageLevel)memoryLevel
                                         cpuLevel:(CPUUsageLevel)cpuLevel
                                     batteryLevel:(BatteryUsageLevel)batteryLevel {
    [[NSNotificationCenter defaultCenter] postNotificationName:@"PerformanceLevelChanged"
                                                        object:nil
                                                      userInfo:@{@"performanceLevel": @(performanceLevel),
                                                                 @"memoryLevel": @(memoryLevel),
                                                                 @"cpuLevel": @(cpuLevel),
                                                                 @"batteryLevel": @(batteryLevel)}];
}

/// 发送网络降级或恢复原有等级的通知
- (void)postPerformanceNotifiWithNetworkSpeedLevel:(NetworkSpeedLevel)networkSpeedLevel {
    [[NSNotificationCenter defaultCenter] postNotificationName:@"NetworkSpeedLevelChanged"
                                                        object:nil
                                                      userInfo:@{@"networkSpeedLevel": @(networkSpeedLevel)}];
}
           

There are two types of message callbacks provided by the manager, one is the performanceLevel calculated by the CPU, memory, and battery, and the other is the networkSpeedLevel.

5.1 performanceLevel

In the handler method, the cpuUsageForApp method of monitor is called to obtain the CPU usage, and the value range is 0-1, and when the CPU is overclocked, it may exceed 1. Call the useMemoryForApp method of monitor to obtain the memory usage, and the value range is 0-1. Call the batteryUsageForApp method of monitor to obtain the remaining battery life, and the value range is 0-100.

After obtaining this information, call the calculatePerformanceLevel method of the decision, hand over the information to the decision for comprehensive calculation, and return four values after calculation.

1、performanceLevel:综合性能分级

2、memoryLevel:内存占用率分级

3. cpuLevel: CPU usage level

4. batteryLevel: power usage level

The core here is the comprehensive performance level, the type is MultiplePerformanceLevel, which is the result of comprehensive calculation based on memory, power, and CPU. The above four values are defined by enumeration and are defined as follows.

/// 综合性能枚举
typedef NS_ENUM(NSUInteger, MultiplePerformanceLevel) {
    MultiplePerformanceLevelNormal,
    MultiplePerformanceLevelLow,
    MultiplePerformanceLevelVeryLow,
};

/// cpu使用率枚举,overclock表示cpu已超频
typedef NS_ENUM(NSUInteger, CPUUsageLevel) {
    CPUUsageLevelLow,
    CPUUsageLevelHigh,
    CPUUsageLevelOverclock,
};

/// 内存使用级别枚举
typedef NS_ENUM(NSUInteger, MemoryUsageLevel) {
    MemoryUsageLevelLow,
    MemoryUsageLevelMiddle,
    MemoryUsageLevelHigh,
};

/// 电量使用枚举,high表示使用较多,电量剩余1%
typedef NS_ENUM(NSUInteger, BatteryUsageLevel) {
    BatteryUsageLevelLow,
    BatteryUsageLevelMiddle,
    BatteryUsageLevelHigh,
};
           

After obtaining these performance levels, it will determine whether the performanceLevel has changed, and if it is lower than the current level, a downgrade will occur. If it is higher than the current level, it indicates that the performance has been restored. NSNotificationCenter will then be called to notify you in the form of a notification named PerformanceLevelChanged, and the four grading parameters will be passed through. If the level does not change, no message notification will be sent.

5.2 speedLevel

The other is the Internet speed rating, which is not classified in the performance rating because it is not in the same category as the performance rating.

In the handler method, the currentNetworkSpeed method of the network library QUICManager is called to obtain the current network speed, which is measured in kbs per second. The QUICManager here is a self-developed network library that provides the current real-time network speed.

After getting the network speed data, the calculateNetworkLevel method of the decision will be called and handed over to the decision for calculation. decision returns a speedLevel of the current network speed, which is of the type NetworkSpeedLevel, which is divided into three levels.

/// 当前网速枚举
typedef NS_ENUM(NSUInteger, NetworkSpeedLevel) {
    NetworkSpeedLevelNormal,
    NetworkSpeedLevelLow,
    NetworkSpeedLevelVeryLow,
};
           

After obtaining this information, it will determine whether the speedLevel has changed, and if it is lower than the current level, it means that the network speed has deteriorated. If the current level is higher, the network speed is restored. NSNotificationCenter is then called to notify you of a message named NetworkSpeedLevelChanged, and pass the speedLevel parameter to it. If the level does not change, no message notification will be sent.

06

DynamicLevelDecision

Decision is responsible for receiving the data information from the manager and returning the corresponding performance level. During the calculation, the incoming parameters are calculated first, and the level level corresponding to a single performance parameter is calculated, and then the performanceLevel level is calculated.

/// 进行综合性能计算
- (void)calculatePerformanceLevelWithMemoryValue:(NSInteger)memoryValue
                                        cpuValue:(CGFloat)cpuValue
                                    batteryValue:(CGFloat)batteryValue
                                 completionBlock:(DynamicPerformanceLevelBlock)completionBlock {
    MemoryUsageLevel memoryLevel = [self calculateMemoryUsageLevelWithMemoryValue:memoryValue];
    CPUUsageLevel cpuLevel = [self calculateCPUUsageLevelWithCpuValue:cpuValue];
    BatteryUsageLevel batteryLevel = [self calculateBatteryUsageLevelWithBatteryValue:batteryValue];
    
    MultiplePerformanceLevel performanceLevel = MultiplePerformanceLevelNormal;
    if (batteryLevel == BatteryUsageLevelHigh) {
        performanceLevel = MultiplePerformanceLevelVeryLow;
    }
    else if (cpuLevel == CPUUsageLevelOverclock && memoryLevel == MemoryUsageLevelHigh) {
        performanceLevel = MultiplePerformanceLevelVeryLow;
    }
    else if (batteryLevel >= 1 && memoryLevel >= 1) {
        performanceLevel = MultiplePerformanceLevelLow;
    }
    else if (batteryLevel >= 1 && cpuLevel >= 1) {
        performanceLevel = MultiplePerformanceLevelLow;
    }
    else if (memoryLevel >= 1 && cpuLevel >= 1) {
        performanceLevel = MultiplePerformanceLevelLow;
    }
    
    if (completionBlock) {
        completionBlock(memoryLevel, cpuLevel, batteryLevel, performanceLevel);
    }
}

/// 进行网速级别计算
- (void)calculateNetworkLevelWithNetworkSpeed:(CGFloat)networkSpeed
                              completionBlock:(DynamicNetworkSpeedLevelBlock)completionBlock {
    [self.networkSpeedArray addObject:@(networkSpeed)];
    if (self.networkSpeedArray.count > 5) {
        [self.networkSpeedArray removeObjectsInRange:NSMakeRange(0, self.networkSpeedArray.count - 5)];
    }
    
    __block NSInteger middleCount = 0;
    __block NSInteger highCount = 0;
    [self.networkSpeedArray enumerateObjectsUsingBlock:^(NSNumber * _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
        if (obj.floatValue <= 200) {
            middleCount++;
        }
        if (obj.floatValue <= 50) {
            highCount++;
        }
    }];
    
    NetworkSpeedLevel networkThreshold = NetworkSpeedLevelNormal;
    if (highCount >= 3) {
        networkThreshold = NetworkSpeedLevelVeryLow;
    } else if (middleCount >= 3) {
        networkThreshold = NetworkSpeedLevelLow;
    }
    
    if (completionBlock) {
        completionBlock(networkThreshold);
    }
}

/// 计算内存使用级别
- (MemoryUsageLevel)calculateMemoryUsageLevelWithMemoryValue:(NSInteger)memoryValue {
    [self.memoryUsageArray addObject:@(memoryValue)];
    if (self.memoryUsageArray.count > 5) {
        [self.memoryUsageArray removeObjectsInRange:NSMakeRange(0, self.memoryUsageArray.count - 5)];
    }
    
    __block NSInteger middleCount = 0;
    __block NSInteger highCount = 0;
    [self.memoryUsageArray enumerateObjectsUsingBlock:^(NSNumber * _Nonnull obj, NSUInteger idx, BOOL * _Nonnull stop) {
        if (obj.floatValue > 0.45) {
            highCount++;
        }
        if (obj.floatValue > 0.4) {
            middleCount++;
        }
    }];
    
    MemoryUsageLevel memoryThreshold = MemoryUsageLevelLow;
    if (highCount >= 3) {
        memoryThreshold = MemoryUsageLevelHigh;
    } else if (middleCount >= 3) {
        memoryThreshold = MemoryUsageLevelMiddle;
    }
    return memoryThreshold;
}

/// 计算CPU使用级别
- (CPUUsageLevel)calculateCPUUsageLevelWithCpuValue:(CGFloat)cpuValue {
    [self.cpuUsageArray addObject:@(cpuValue)];
    /// cpu level calculate
    return CPUUsageLevelLow;
}

/// 计算电量使用级别
- (BatteryUsageLevel)calculateBatteryUsageLevelWithBatteryValue:(CGFloat)batteryValue {
    [self.batteryUsageArray addObject:@(batteryValue)];
    /// battery level calculate
    return BatteryUsageLevelLow;
}
           

6.1 Level calculation of individual performance parameters

CPU: The input value > 0.8, that is, the CPU usage exceeds 80%, and the CPUUsageLevel is equal to levelMiddle, if the CPU usage exceeds 100%, CPU overclocking occurs, and CPUUsageLevel is equal to levelHigh.

Memory: Because in the iOS system, the App can use up to 50% of the total memory of the device, and the memory usage exceeds 40%, MemoryUsageLevel is equal to levelMiddle, and if the memory usage is more than 45%, MemoryUsageLevel is equal to levelHigh.

Battery: If the input value is <6%, the battery is low, BatteryUsageLevel is equal to levelMiddle, and if the input value is <1%, the critical value is reached, and BatteryUsageLevel is equal to levelHigh.

6.2 performanceLevel计算

After obtaining the levels of the above three performance parameters, the manager will call the calculatePerformanceLevel method of decision, and obtain the performanceLevel by returning the value of the method, and its type is MultiplePerformanceLevel. When calculating the performanceLevel, the following conditions are set according to the order in which the conditions are mutually exclusive.

1. Determine whether batteryLevel is equal to levelHigh, if so, it means that the power is close to the critical value, then directly set the performanceLevel to veryLow;

2. cpuLevel is equal to overclock, memoryLevel is equal to high, it means that the CPU is in the overclocking state, and the memory usage is also in a very high state, at this time, it is easy to be killed by the system to cause OOM, directly set the performanceLevel to veryLow;

3、batteryLevel、cpuLevel、memoryLevel,任意两者构成middle或high,则将performanceLevel设置为low。

6.3 speedLevel计算

Manager calls the calculateNetworkLevel method of decision to obtain network change metrics. When calculating speedLevel, if the incoming network speed is less than 200 kb/s, it means that the network speed is low, and if the speedLevel is set to low, and the incoming network speed is less than 50 kb/s, the network speed is very slow, and the speedLevel is set to veryLow.

6.3.1 Performance Calculation Window

When obtaining performance parameters, you cannot use the performance data at a certain point in time as the basis for calculation, but use multiple performance data in a time window as the basis for calculation, so that the comprehensive performance of the time period can be better reflected.

The performance calculation window is based on the handler's callback, which collects data from the current time to the previous four times, which are five consecutive times, and comprehensively calculates them. For example, if the network speed is less than 50 kb/s for more than three times, the NetworkSpeedLevel is equal to veryLow, and if the network speed is less than 200 kb/s for more than three times, the NetworkSpeedLevel is equal to low.

From the perspective of implementation, the performance calculation window is implemented through NSMutableArray, and the FIFO policy is used to eliminate the five adjacent data pieces.

07

DynamicLevelMonitor

The function of monitor is to provide methods to obtain system performance information, and the three monitor methods called in the handler are implemented internally as follows.

/// 当前app内存使用量,返回单位百分比
- (NSInteger)useMemoryForApp {
    task_vm_info_data_t vmInfo;
    mach_msg_type_number_t count = TASK_VM_INFO_COUNT;
    kern_return_t kernelReturn = task_info(mach_task_self(), TASK_VM_INFO, (task_info_t) &vmInfo, &count);
    if (kernelReturn == KERN_SUCCESS) {
        int64_t memoryUsageInByte = (int64_t) vmInfo.phys_footprint;
        int64_t totalMemory = [[NSProcessInfo processInfo] physicalMemory];
        return memoryUsageInByte / totalMemory;
    } else {
        return -1;
    }
}

/// 当前app的CPU使用率
- (CGFloat)cpuUsageForApp {
    kern_return_t           kr;
    thread_array_t          thread_list;
    mach_msg_type_number_t  thread_count;
    thread_info_data_t      thinfo;
    mach_msg_type_number_t  thread_info_count;
    thread_basic_info_t     basic_info_th;
    
    kr = task_threads(mach_task_self(), &thread_list, &thread_count);
    if (kr != KERN_SUCCESS)
        return -1;
    
    float total_cpu_usage = 0;
    for (int i = 0; i < thread_count; i++) {
        thread_info_count = THREAD_INFO_MAX;
        kr = thread_info(thread_list[i], THREAD_BASIC_INFO, (thread_info_t)thinfo, &thread_info_count);
        if (kr != KERN_SUCCESS) {
            return -1;
        }
        
        basic_info_th = (thread_basic_info_t)thinfo;
        if (!(basic_info_th->flags & TH_FLAGS_IDLE)) {
            total_cpu_usage += basic_info_th->cpu_usage / (float)TH_USAGE_SCALE;
        }
    }
    
    kr = vm_deallocate(mach_task_self(), (vm_offset_t)thread_list, thread_count * sizeof(thread_t));
    assert(kr == KERN_SUCCESS);
    return total_cpu_usage;
}
           

The UseMemoryForApp method obtains the memory used by the current App through the system task_info function, and obtains the physical memory of the device through the physicalMemory method of NSProcessInfo, both of which are in bytes, and the percentage of memory used by the App is obtained by calculating the percentage of task_info in physicalMemory.

The CpuUsageForApp method implements the thread_list of obtaining the information of all threads through the system task_threads function, thread_list is an array, traversing thread_list to obtain the information of thread_info_t individual threads, and accumulating the cpu_usage properties of the thread_info_t (the cpu_usage attribute represents the percentage of CPU used by the current thread) to get the percentage of total CPU usage.

BatteryUsageForApp method, set the batteryMonitoringEnabled of the system UIDevice to true, and enable battery monitoring. And receive the callback of the power change through the notification, the unit of the callback is 0~1, and then multiply it by 100 to return to the manager.

08

Business Parties

After receiving the PerformanceLevelChanged message, the business side can make a judgment based on the comprehensive performance of the performanceLevel, and if it is veryLow, it can pause the second broadcast processing in the stream, that is, in the video stream, sliding to the next video will not play automatically.

For example, when the batteryLevel indicator is middle or low, that is, when the battery is less than 6%, the user can be prompted not to perform performance-consuming operations such as video file caching to avoid automatic shutdown of the mobile phone due to performance-consuming operations.

After receiving the NetworkSpeedLevelChanged message, the service can handle low and veryLow differently based on the speedLevel parameters passed in the notification. For example, you can reduce the size of an image from the server, such as 80% for low and 60% for veryLow, which can significantly improve the speed of getting images from the server under a weak network. When requesting an image URL, the image is spliced in the URL and sent to the server, and the server returns the image with the corresponding compression ratio.

Author: Liu Zhuang

Source-WeChat public account: Sohu technology products

Source: https://mp.weixin.qq.com/s/bBmmnUyU3KwJwgvaybleTg