laitimes

Experienced a Redis CPU spike in the Spring Cloud microservice project

If you encounter a Redis CPU spike in a Spring Cloud microservice project, you can take the following steps to troubleshoot and solve the problem:

Monitoring and analytics

When monitoring and analyzing Redis performance, it is important to pay attention to CPU usage, memory usage, network traffic, and command statistics. Below I will explain in detail how to use the commands and monitoring tools that come with Redis.

1. Use the built-in INFO command of Redis

  1. Command introduction: The INFO command is used to obtain the status information of the Redis server, including detailed data about memory, CPU, client connection, persistence, etc.
  2. Example code:
redis-cli INFO           

This command will output a lot of information that you can focus on as needed, e.g. memory for memory information, cpu for CPU usage.

  1. Interpretation of key indicators:
  • used_memory: The total amount of memory allocated by the Redis allocator, which may indicate a memory leak or a large cache.
  • used_cpu_sys: The CPU time consumed by the Redis server.
  • connected_clients: The number of clients currently connected, too many connections may cause performance degradation.

2.使用redis-cli --stat

  1. Command description: redis-cli --stat provides a real-time updated interface that displays key Redis performance metrics.
  2. Example code:
redis-cli --stat           

This command will display information such as the number of operations, network traffic, and the number of connected clients in real time.

3. Use redis-top or third-party monitoring tools

  1. Tool introduction: redis-top is a third-party real-time monitoring tool, similar to the top command of Linux, specially designed for Redis.

Third-party monitoring tools, such as Prometheus and Grafana, can provide more detailed monitoring and graphical interfaces.

  1. To install and use redis-top:
git clone https://github.com/myzhan/redis-top.git
cd redis-top
python redis-top.py           

This will start redis-top and display real-time data.

  1. Configure Prometheus and Grafana monitoring (more advanced):
  • 安装Prometheus和Grafana。
  • Configure Prometheus to collect metrics from Redis.
  • Create dashboards in Grafana to showcase these metrics.

Focus on metrics

  • CPU usage: A high CPU usage may indicate that Redis is processing a large number of requests or is experiencing performance issues.
  • Memory usage: Memory leaks, improper key design, or large keys can lead to increasing memory usage.
  • Network traffic: Monitoring network traffic can help identify if there are large data transfers or frequent small data transfers, which may affect the performance of Redis.
  • Command statistics: Understanding the types and frequency of commands processed by Redis can help you identify which operations are the most resource-intensive.

Identify hotspot actions

Identifying hotspot operations is a key part of solving Redis performance problems, and you can use the SLOWLOG command to find commands that take a long time to execute, and the MONITOR command to monitor Redis operations in real time. Here's a detailed explanation of how to use these two commands and how to think about them.

1. Use the SLOWLOG command to find slow queries

  1. Command introduction: The SLOWLOG command is used to record commands whose execution time exceeds a specified threshold. This helps identify time-consuming operations and is an important reference for performance optimization.
  2. Example code:
redis-cli SLOWLOG GET 10           

This command returns the last 10 slow queries.

  1. Analysis and Reflection:
  • View the command and execution time of a slow query.
  • Analyze the causes of slow queries, such as large-key operations and complex data structure operations.
  1. Set the threshold for slow queries:
redis-cli CONFIG SET slowlog-log-slower-than <microseconds>           

This command can set the threshold for slow queries in microseconds.

2. Use the MONITOR command to monitor Redis operations in real time

  1. Command introduction: MONITOR is a debugging command, which prints out every command received by the server in real time for real-time monitoring and diagnosis.
  2. Example code:
redis-cli MONITOR           

After running this command, you will see all the commands executed through Redis.

  1. Analysis and Reflection:
  • Monitor command output for frequently occurring actions.
  • Pay attention to big key operations, such as big hash or big list.
  • Monitor for unreasonable full-key scans (e.g., KEYS*).

Code-level troubleshooting

Code-level troubleshooting is one of the key steps to locate and resolve Redis performance issues. In a microservices architecture, different services may interact with Redis in different ways, so the code needs to be carefully reviewed. Here are the steps and examples for doing code-level playthrough:

1. Audit frequent read and write operations

  • Problem: Frequent read and write operations may cause increased Redis CPU usage and response time delays.
  • Troubleshooting method:
    • Review the code for a large number of repetitive Redis read and write operations, especially in loops.
    • Check for unused caches or inappropriate cache expiration policies.
  • 代 (Java使用Jedis):
Jedis jedis = new Jedis("localhost");
for (int i = 0; i < 10000; i++) {
    jedis.set("key" + i, "value" + i); // 频繁写操作
    jedis.get("key" + i); // 频繁读操作
}
jedis.close();           

2. Check the big key operation

  • Problem: Large-key operations consume a large amount of memory and CPU resources, affecting the performance of Redis instance.
  • Troubleshooting method:
    • Check whether a large amount of data is stored in a single key, such as a large list or hash table.
    • Run the MEMORY USAGE key command to check the memory usage of the key.
  • Code examples:
Jedis jedis = new Jedis("localhost");
// 假设有一个大的列表
for (int i = 0; i < 100000; i++) {
    jedis.lpush("largeList", "element" + i);
}
jedis.close();           
  • This code creates a very large list that can cause performance issues.

3. Avoid unnecessary full-key scanning

  • Problem: Using a full-key scan such as a KEYS command blocks Redis and consumes a lot of CPU resources.
  • Troubleshooting method:
    • Review the code to see if a full-key scan command such as KEYS is used.
    • Consider using more efficient commands, such as SCAN.
  • Code examples:
Jedis jedis = new Jedis("localhost");
Set<String> keys = jedis.keys("*"); // 不推荐在生产环境使用
for (String key : keys) {
    // 对key进行操作
}
jedis.close();           
  • This code uses the KEYS command to perform a full-key scan, which may cause performance issues.

Connection management

In a microservices architecture, it is critical to properly manage Redis connections, as improper connection management can lead to performance issues. In the following sections, we discuss how to check and configure the number of connections and connection pools of Redis and provide the corresponding example code.

1. Check the number of Redis connections

  1. Use the INFO command:
redis-cli INFO clients           

This command displays the number of clients currently connected to Redis and some related information.

  1. Key Indicators:
  • connected_clients: The number of clients that are currently connected.
  • maxclients: the maximum number of client connections configured on the server.

2. Configure the connection pool

The use of connection pooling can effectively reduce the overhead of frequently establishing and destroying connections and improve performance.

An example of using Jedis connection pooling in Java

  1. To create a connection pool configuration:
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxTotal(50); // 最大连接数
poolConfig.setMaxIdle(10);  // 最大空闲连接数
poolConfig.setMinIdle(5);   // 最小空闲连接数
poolConfig.setTestOnBorrow(true); // 在获取连接的时候检查有效性
           
  1. To create a Jedis connection pool:
JedisPool jedisPool = new JedisPool(poolConfig, "localhost", 6379);
           
  1. To use connection pooling:
try (Jedis jedis = jedisPool.getResource()) {
    // 执行Redis操作
    jedis.set("key", "value");
    String value = jedis.get("key");
    // ...
} // 连接将自动归还给连接池
           

Spring Boot中配置连接池

If you're using Spring Boot, you can usually configure it in application.yml or application.properties.

  1. Configure in the application.yml:
spring:
  redis:
    host: localhost
    port: 6379
    jedis:
      pool:
        max-active: 50  # 最大连接数
        max-idle: 10    # 最大空闲连接数
        min-idle: 5     # 最小空闲连接数           
  1. Using RedisTemplate: In a Spring Boot application, you can directly inject a RedisTemplate or StringRedisTemplate to perform operations.

Redis configuration tuning

Configuring and tuning Redis is an important step to improve its performance and stability. This includes adjusting memory management, choosing the right persistence strategy, and optimizing other related configurations. Here are some of the main tuning directions and examples.

1. 调整maxmemory策略

  • Problem: When Redis is used as a cache, if the memory is not enough to hold all the data, a mechanism is needed to decide which data is removed.
  • Solution: Set the maxmemory configuration and select an appropriate retirement policy.
  • Configuration example:
CONFIG SET maxmemory 100mb
CONFIG SET maxmemory-policy allkeys-lru           
  • This example sets the maximum memory to 100MB and uses the allkeys-lru (least recently used) policy, which will eliminate the least recently used keys when the memory reaches the limit.

2. Optimize the persistence settings

Redis supports two main persistence methods: RDB (snapshot) and AOF (append-only).

RDB Endurance

  • Advantages: RDB is an efficient way to save snapshots of Redis data at a certain time.
  • Configuration example:
CONFIG SET save "60 10000"  # 60秒内如果超过10000次更新则触发RDB持久化           

AOF persistence

  • Pros: AOF records every write operation and recovers data by replaying these operations on restart.
  • Configuration example:
CONFIG SET appendonly yes
CONFIG SET appendfsync everysec  # 每秒同步一次到磁盘           

3. Other performance-related configurations

  • TCP connection settings:
CONFIG SET tcp-keepalive 60           

Setting the amount of time TCP remains active helps prevent network devices from accidentally closing idle connections.

  • Disable unnecessary commands: If some dangerous commands (e.g., FLUSHALL, KEYS) don't need to be used in a production environment, consider disabling them for better security.

Hardware resource checks

Hardware resources are critical to ensure the performance of Redis and the overall microservices architecture. If the hardware resources are insufficient, no matter how much you optimize your code or Redis configuration, you may not be able to achieve the desired performance. The following are the steps to inspect and evaluate your server's hardware resources and related recommendations.

1. Check the CPU usage

  • Important: High CPU usage may cause slow response from Redis.
  • Check method: You can use the tools provided by the operating system, such as the top or htop command in Linux.
  • Example command:
top           
  • In the top interface, look at the %Cpu(s) field to understand the CPU usage.

2. Check memory usage

  • Why it matters: Redis is a memory-based storage system, and adequate memory is critical to its performance.
  • Check the memory usage of the system using the free command.
  • Example command:
free -m           
  • This shows memory usage in MB, including total, used, and available.

3. Disk I/O performance

  • Importance: Especially when AOF persistence is used, the I/O performance of the disk will directly affect the performance of Redis.
  • Check method: You can use the iostat tool to check.
  • Example command:
iostat -dx 2           
  • This displays disk IO statistics for each device.

4. Network bandwidth

  • Why it matters: For distributed microservices and Redis clusters, network bandwidth is an important consideration.
  • How to check: Use a tool such as IFTOP or NLOAD to monitor network traffic.
  • Example command:
iftop           
  • This shows the real-time bandwidth usage of the network interface.

Redis shards or clusters

If a single Redis instance cannot meet performance requirements or needs to improve availability, you can use Redis shards or clusters. These methods can improve the scalability and fault tolerance of Redis.

1.Redis分片(Sharding)

Sharding is the process of distributing data across multiple Redis instances to scale out performance and capacity.

  1. Using client-side sharding:
  • Client-side sharding is the most common sharding method, which is achieved by assigning keys to different Redis instances in the client-side logic.
  • Each instance runs independently and has no internal communication.
  1. Instance code for client-side shards

To implement Redis client sharding in the Spring Boot architecture, you usually need to use a third-party library to assist in handling the sharding logic. Spring Boot does not directly provide client-side sharding on its own, but it can be achieved by configuring multiple RedisTemplates or JedisConnectionFactories. Here's a simple example of Spring Boot-based Redis client sharding:

1. Add dependencies

First make sure you add Redis dependencies such as spring-boot-starter-data-redis to pom.xml.

2. Configure multiple Redis connection factories

Define multiple JedisConnectionFactories in the Spring configuration file, each corresponding to a Redis instance.

@Configuration
public class RedisConfig {

 @Bean
    public JedisConnectionFactory redisConnectionFactory1() {
        RedisStandaloneConfiguration config = new RedisStandaloneConfiguration("server1", 6379);
        return new JedisConnectionFactory(config);
    }

    @Bean
    public JedisConnectionFactory redisConnectionFactory2() {
        RedisStandaloneConfiguration config = new RedisStandaloneConfiguration("server2", 6379);
        return new JedisConnectionFactory(config);
    }

    // 可以根据需要配置更多的Redis连接工厂
}
           

3. Create a sharding strategy

Next, create a simple sharding strategy that decides which Redis instance to use based on the key.

public class RedisShardingStrategy {

    private List<JedisConnectionFactory> connectionFactories;

    public RedisShardingStrategy(List<JedisConnectionFactory> connectionFactories) {
        this.connectionFactories = connectionFactories;
    }

    public JedisConnectionFactory getShard(String key) {
        int shardIndex = Math.abs(key.hashCode()) % connectionFactories.size();
        return connectionFactories.get(shardIndex);
    }
}
           

4、创建RedisTemplate实例

Create a corresponding instance of RedisTemplate for each JedisConnectionFactory.

@Bean
public RedisTemplate<String, Object> redisTemplate1() {
    RedisTemplate<String, Object> template = new RedisTemplate<>();
    template.setConnectionFactory(redisConnectionFactory1());
    return template;
}

@Bean
public RedisTemplate<String, Object> redisTemplate2() {
    RedisTemplate<String, Object> template = new RedisTemplate<>();
    template.setConnectionFactory(redisConnectionFactory2());
    return template;
}

// 为其他连接工厂创建更多的RedisTemplate实例
           

5. Use a sharding strategy

Use the RedisShardingStrategy in the service to select the appropriate RedisTemplate.

@Service
public class RedisService {

    private final RedisShardingStrategy shardingStrategy;
    private final List<RedisTemplate<String, Object>> redisTemplates;

    public RedisService(RedisShardingStrategy shardingStrategy, 
                        List<RedisTemplate<String, Object>> redisTemplates) {
        this.shardingStrategy = shardingStrategy;
        this.redisTemplates = redisTemplates;
    }

    private RedisTemplate<String, Object> getRedisTemplate(String key) {
        JedisConnectionFactory factory = shardingStrategy.getShard(key);
        for (RedisTemplate<String, Object> template : redisTemplates) {
            if (template.getConnectionFactory().equals(factory)) {
                return template;
            }
        }
        throw new IllegalStateException("No RedisTemplate found for given key");
    }

    public void setValue(String key, Object value) {
        RedisTemplate<String, Object> template = getRedisTemplate(key);
        template.opsForValue().set(key, value);
    }

    public Object getValue(String key) {
        RedisTemplate<String, Object> template = getRedisTemplate(key);
        return template.opsForValue().get(key);
    }
}
           

2.Redis集群(Clustering)

Redis clusters provide a more advanced solution that scales Redis by automating sharding and providing data replication and failover.

  1. Cluster features:
  • A Redis cluster automatically shards data to multiple nodes.
  • Provides data replication and failover capabilities to improve availability.
  1. To set up a Redis cluster:
  • At least three Redis nodes are required to build a stable cluster.
  • Use the cluster meet command of Redis to connect nodes to each other.
  1. Example configuration of a Redis cluster:
  • Suppose there are three Redis nodes with ports 7000, 7001, and 7002.
  • Example configuration file (created for each node):
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
           
  • Run the redis-cli --cluster create command to create a cluster.

Application layer optimization

Application layer optimization is an important means to improve cache efficiency and reduce dependence on Redis. It involves the optimization of caching policies, special handling of hot data, and the implementation of local caching and cache warming. Here are some of the main application-layer optimization strategies and sample code.

1. Optimize caching policies

  1. Failure Policy:
  • Set a reasonable cache expiration time to avoid data staleness or cache overload.
  • For some data that is frequently accessed but not updated frequently, you can set a longer cache time.
  1. Hot Data Processing:
  • For frequently accessed hot data, more efficient caching strategies can be employed, such as adding replicas and using faster storage media.
  1. Example of caching operations in Java:
public class RedisCache {
    private JedisPool jedisPool;

    public RedisCache(JedisPool jedisPool) {
        this.jedisPool = jedisPool;
    }

    public String getData(String key) {
        try (Jedis jedis = jedisPool.getResource()) {
            // 尝试从缓存中获取数据
            String value = jedis.get(key);
            if (value == null) {
                // 如果缓存中没有,从数据库或其他地方加载
                value = loadDataFromDB(key);
                // 设置缓存,同时设置过期时间为1小时
                jedis.setex(key, 3600, value);
            }
            return value;
        }
    }

    private String loadDataFromDB(String key) {
        // 加载数据的逻辑...
        return "some_data";
    }
}
           

2. Local cache and cache warm-up

  1. Local caching:
  • For frequently accessed data that is not updated frequently, it can be cached in local memory to reduce access to Redis.
  • Local caching can be implemented using libraries such as Guava Cache and Caffeine.
  1. Cache warm-up:
  • Load hot data into the cache at system startup or at regular intervals to reduce large amounts of database access during cold starts.
  1. Example of local caching in Java (using Guava Cache):
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;

public class LocalCache {
    private Cache<String, String> cache;

    public LocalCache() {
        cache = CacheBuilder.newBuilder()
                            .maximumSize(1000)
                            .expireAfterWrite(10, TimeUnit.MINUTES)
                            .build();
    }

    public String getData(String key) {
        String value = cache.getIfPresent(key);
        if (value == null) {
            value = loadDataFromDB(key);
            cache.put(key, value);
        }
        return value;
    }

    private String loadDataFromDB(String key) {
        // 加载数据的逻辑...
        return "some_data";
    }
}
           

Stress test

Conducting stress tests is an important part of evaluating and optimizing system performance, especially for caching systems like Redis. Stress testing in a non-production environment can help identify potential performance bottlenecks to optimize the overall performance of the system. Here are some steps and recommendations for conducting a Redis stress test:

1. Choose a stress testing tool

For Redis, commonly used stress testing tools are:

  • redis-benchmark: Redis comes with a performance testing tool that allows you to quickly perform basic performance tests.
  • JMeter: A widely used performance testing tool that allows for more complex test scenario simulations.
  • Custom scripts: Write test scripts based on your specific needs to simulate real-world application scenarios.

2. Set up a test environment

  • Make sure that the test environment is as similar as possible to the production environment, including hardware configuration, network environment, and data volume.
  • Test in a non-production environment to avoid impacting normal business.

3. Use redis-benchmark for basic testing

Example command

redis-benchmark -h redis-host -p 6379 -c 100 -n 100000
           

Copy the code

redis-benchmark -h redis-host -p 6379 -c 100 -n 100000

This command means:

  • -h redis-host: specifies the address of the Redis server.
  • -p 6379: specifies the Redis server port.
  • -c 100: 100 concurrent connections are used.
  • -n 100000: A total of 100,000 requests are sent.

Test content

  • Test different types of operations like GET, SET, etc.
  • Observe response time and throughput.

4. Use JMeter for advanced testing

Test plan settings

  • Set up a test plan in JMeter, including thread groups (the number of simulated users), samplers (specify Redis commands), and listeners (display results).

Instance usage

  • JMeter is typically configured through a graphical interface and does not involve writing code. However, more complex test logic can be written through JMeter's scripting language.

5. Analyze the test results

  • Analyze throughput, response time, and resource utilization (e.g., CPU, memory usage).
  • Identify performance bottlenecks, such as network latency, Redis configuration, or hardware resource limitations.

6. Tweaks and optimizations

  • Adjust the Redis configuration based on the test results, such as the number of connections and the memory size.
  • Optimize application logic, such as improving data access patterns and reducing unnecessary operations.

7. Repeat the test

  • After you've made improvements, repeat the test to verify the optimization.
  • Continuous monitoring and testing to ensure consistent system performance under changing loads.

When conducting stress testing, it is important to ensure that the test scenario simulates real-world usage as closely as possible. This can include simulating high concurrent requests, large amounts of data being written, long runs, and so on. Through these tests, the problems that the system may encounter under high loads can be effectively identified and resolved.

Precautions

  • Gradually increase the load: You should start with a lower load and gradually increase until you reach the highest load you expect.
  • Monitor resource usage: Continuously monitor the server's CPU, memory, disk I/O, and network bandwidth usage while testing.
  • Test data preparation: Ensure the authenticity and diversity of test data so that the test can cover different use cases.
  • Test Result Recording: Records the test results in detail, including the test conditions, the number and type of actions performed, and the response of the system.
  • Secure backups: Make sure to back up your existing data before starting the test to avoid data loss or corruption during the test.

With these steps and precautions, Redis stress testing can be performed effectively and appropriate optimizations and adjustments can be made based on the test results.

Continuous monitoring

Continuous monitoring of the performance of Redis is a key part of ensuring the stable operation of the system. It helps you discover and respond to emerging issues in a timely manner, thus keeping your system efficient and reliable. Here are a few key steps and recommendations for conducting continuous Redis monitoring.

1. Select a monitoring tool

For Redis, there are a variety of monitoring tools to choose from:

  • Redis自带命令:如INFO, MONITOR, 和SLOWLOG等。
  • 开源监控工具:如Prometheus结合Grafana,Redis Exporter。
  • 商业监控工具:如Datadog, New Relic, RedisInsight。

2. Key Performance Indicators

The following key performance indicators should be monitored with a focus on:

  • Memory Usage: Includes total memory usage and memory fragmentation rate.
  • CPU Usage: monitors the CPU usage of the Redis process.
  • Client Connections: The number of active connections and the number of connections denied.
  • Throughput: Commands per second.
  • Slow query log: records commands that take a long time to execute.

3. Use Prometheus and Grafana for monitoring

Here are the basic steps for Redis monitoring with Prometheus and Grafana:

安装Prometheus Redis Exporter

Redis Exporter is a Prometheus Exporter that collects performance metrics for Redis and makes it available for Prometheus.

docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter
           

Configure Prometheus to monitor Redis instance

Add the Redis exporter as a data source in the configuration file of Prometheus.

scrape_configs:
  - job_name: 'redis'
    static_configs:
      - targets: ['<REDIS_EXPORTER_ADDRESS>:9121']
           

Use Grafana to display data

  • Add Prometheus as a data source in Grafana.
  • Create dashboards to display and analyze the performance metrics of Redis.

4. Set up an alarm mechanism

By configuring Prometheus' Alertmanager or using the alarm function of a commercial monitoring tool, you can set up alarm rules to receive timely notifications when performance issues occur.

5. Automation and script monitoring

For some specific monitoring needs, custom scripts can be written to collect and analyze data. For example, use a Python or shell script to periodically invoke Redis commands, and then send the results to a monitoring system or log to a log file.

# Python脚本示例 - 使用redis-py库监控Redis
import redis
import json

redis_client = redis.StrictRedis(host='localhost', port=6379, db=0)

def get_redis_info():
    info = redis_client.info()
    print(json.dumps(info, indent=4))

get_redis_info()
           

6. Logging and Analytics

  • Use Redis logging to log critical events and potential issues.
  • Regularly analyze log files for unusual patterns or potential issues.

The above steps need to be adjusted and optimized according to the specific system environment and business requirements. In the process of processing, it is recommended to implement it gradually and avoid large-scale simultaneous changes to avoid introducing new problems.

Author: A programmer ape who loves to pet cats

Link: https://juejin.cn/post/7323484289353908287

Read on