laitimes

Let's talk about the importance of TCP/IP in .NET failure analysis

author:Front-line code farmers talk about technology

One: Background

1. Tell a story

During this time, several network failures were analyzed. If you don't have a deep understanding of the TCPPIP protocol, it is really difficult to solve these problems, because you can only do black-box tests at the high level, and you can't see the handshake and PSH communication at the TCP level.

In this article, we will use two small examples to understand the role of the TCP protocol in failure analysis.

Two: Two small examples of the TCP protocol

1. Sudden large number of program timeouts

The story originated from a problem a friend had:

At first, the program has been running well.,But there will be occasional sudden inaccessibility.,The strange thing is that it's normal to manually access the domain name when it's malfunctioning.,And then it's inexplicably strange.,What's going on?

Although this kind of problem friend caught the dump, it is difficult to find the problem in the dump, because there is a high probability that there is a problem in the http communication, and you need to use a similar wireshark to do traffic monitoring, and the final reason is that the proxy server occasionally convulsions, resulting in the inaccessibility of the C# HttpClient.

For the sake of demonstration, here is a simple piece of test code.

  1. WebAPI code

Create a WebApi skeleton code and deploy it on a Windows virtual machine.

[HttpGet]
        public IEnumerable<WeatherForecast> Get()
        {
            return Enumerable.Range(1, 5).Select(index => new WeatherForecast
            {
                Date = DateTime.Now.AddDays(index),
                TemperatureC = Random.Shared.Next(-20, 55),
                Summary = Summaries[Random.Shared.Next(Summaries.Length)]
            })
            .ToArray();
        }

           

And configure the external port to 80 in appsetttings.json.

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  },
  "Kestrel": {
    "Endpoints": {
      "Http": {
        "Url": "http://0.0.0.0:80"
      }
    }
  }
}

           
  1. Client

In this case, I used hosts to map the virtual machine 192.168.25.133 myproxy.com, and then accessed it by domain name.

internal class Program
    {
        public static HttpClient client = new HttpClient(new HttpClientHandler()
        {
            Proxy = new WebProxy("http://myproxy.com")
        });

        static async Task Main(string[] args)
        {
            for (int i = 0; i < 100000; i++)
            {
                try
                {
                    // 发送 GET 请求
                    HttpResponseMessage response = await client.GetAsync("http://youtube.com/WeatherForecast");

                    // 检查响应状态码
                    response.EnsureSuccessStatusCode();

                    // 读取响应内容
                    string responseBody = await response.Content.ReadAsStringAsync();

                    // 输出响应内容
                    Console.WriteLine(responseBody);

                    await Task.Delay(1000);
                }
                catch (HttpRequestException e)
                {
                    Console.WriteLine(#34;{DateTime.Now} HTTP 请求异常:{e.Message} {e.GetType().Name}");
                }
            }

        }
    }

           

Open wireshark for traffic listening, run the program, and find that everything is so peaceful, screenshot is as follows:

Let's talk about the importance of TCP/IP in .NET failure analysis

For some reason, there is a problem with the proxy server, here it is simulated by shutting down, and if you observe wireshark again, you can find that the server does not receive a response to packet 154, and the client side retries according to RTO=1s.

Let's talk about the importance of TCP/IP in .NET failure analysis

2. The IP address to which the DNS resolution is unreachable

Some friends have a freeze, the reason is that a long timeout is set, this timeout is quite interesting, the domain name can be resolved to the IP through DNS, but the IP cannot be accessed, causing the client to keep retrying, until the timeout time limit expires and throws an exception.

Next, let's use HttpClient as a small example to access youtube.com directly, refer to the following code:

static async Task Main(string[] args)
        {
            HttpClient client = new HttpClient();

            for (int i = 0; i < 100000; i++)
            {
                try
                {
                    // 发送 GET 请求
                    HttpResponseMessage response = await client.GetAsync("http://youtube.com");

                    // 检查响应状态码
                    response.EnsureSuccessStatusCode();

                    // 读取响应内容
                    string responseBody = await response.Content.ReadAsStringAsync();

                    // 输出响应内容
                    Console.WriteLine(responseBody);

                    await Task.Delay(1000);
                }
                catch (HttpRequestException e)
                {
                    Console.WriteLine(#34;{DateTime.Now} HTTP 请求异常:{e.Message} {e.GetType().Name}");
                }
            }
        }

           

Open Wireshark to start monitoring, and then run the program, screenshot below:

Let's talk about the importance of TCP/IP in .NET failure analysis

From the hexagram, you can see that the client initiated a DNS query, and the DNS server queried that the IP corresponding to the youtube.com is 104.244.46.85, and the next step should be that the client initiates a handshake request for this IP, as shown in the screenshot below:

Let's talk about the importance of TCP/IP in .NET failure analysis

Judging from the information in the picture, it is really embarrassing, and there are two points of information:

  • The client initiates a SYN request, but no one is birding it, mainly because the firewall on the path confiscated the SYN ACK.
  • The client retries according to the RTO timer timeout of 1s, 2s, 4s, and 8s until the HttpClient can't wait to throw a TimeoutException exception.

Three: Summary

People live in an intricate web of relationships, and so do programs, to solve more. For .NET program failures, knowledge of the TCP/IP system is also essential.