laitimes

Record the troubleshooting and resolution of high CPU load service unavailable in an online environment

author:Brother Gui said Java entrepreneurship

Problem phenomenon

The three deployment nodes of the background service occasionally occupy 200% of the CPU, the service process is still there, but the service is unavailable, wait 3-10 minutes, and the service returns to normal.

analyse

According to the analysis of previous project experience, the initial suspicion is that it is a code problem itself, a large number of objects are created, and full GC is frequently triggered.

Investigation

1. Check the GC situation

Record the troubleshooting and resolution of high CPU load service unavailable in an online environment

Through GCUTIL, you can see that the service GC is normal.

2. Check the TCP monitoring, the TCP status is normal, you can rule out the problem caused by the third-party timeout of HTTP request.

3. View the thread stack

1) jps gets the service process PID

2) Check the thread situation under the process

Record the troubleshooting and resolution of high CPU load service unavailable in an online environment

3) Find the PID of the thread with high CPU usage (the service has returned to normal when writing the document, and the screenshot is not a screenshot when there is a problem), and the phenomenon at that time is that the CPU utilization rate of two threads is 99%. View the two thread stack information. Display the thread ID in decimal 16 and execute the instruction "printf" %x\n"PID".

4) Execute the command "jstack process PID | grep "16 base thread number" -C5 --color":

Record the troubleshooting and resolution of high CPU load service unavailable in an online environment

repair

By viewing step 4 of the thread stack, you can see that the two CPU threads correspond to the GC thread and the service thread number.

Direction 1: Since there is a gc thread, it means that gc still has a problem, at this time, in the view of the gc status of the service once more full gc, adjust the memory, increase the memory to 32G, avoid full gc, modify java8 default garbage collector, CMS, ParNew to G1. Conclusion: The problem is not solved.

Direction 2: Another business thread number, according to the business thread number grep history log (background thread number has been renamed, each thread number is unique), at this time you can locate the corresponding interface according to the thread number: URL to base64, extract the user's request packet in the log, call the url to base64 interface again, sure enough, the problem reappears, the reoffender, the service is unavailable, and the CPU is soaring to 200%. Restart the service After the arthas tool is turned on, the thread has been stuck and not returning after locating the set value, so it is determined that it is a business code problem. Looking at the log printing step, it is initially suspected that it is a unified return fastJson serialization issue.

The old code is as follows (key logic code):

@GetMapping("/urlToBase64")
    public UrltoBease64Vo imgUrlToBase64(@RequestParam("imageUrl") String imageUrl){
        UrltoBease64Vo vo = new UrltoBease64Vo();
        String base64 = urlToBase64(imageUrl);
        vo.setBase64(base64);
        return vo;
    }           

Rectification direction: The service configures serialization to fastJSON when unified return, and no longer takes unified return to bypass serialization when unified return, that is, bypasses fastJSON serialization.

The new code is as follows (only the modifications are listed):

public void imgUrlToBase64(HttpServletResponse response, @RequestParam("imageUrl") String imageUrl){
        UrltoBease64Vo vo = new UrltoBease64Vo();
        String base64 = urlToBase64(imageUrl);
        vo.setBase64(base64);
        StringBuffer stringBuffer = new StringBuffer(  20480);
        stringBuffer.append("{\"OptCode\":0,\"Data\":{\"Base64\":\"").append(base64).append("\"}}");
        try {
            response.getWriter().write(stringBuffer.toString());
            response.getWriter().flush();
            response.getWriter().close();
        }catch (IOException e) {
            //....
        }
    }           

Try the problematic packet request interface again, and the interface returns normally.

summary

fastTJSON has problems with large object serialization.