laitimes

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

author:CSDN program life

Author | Alaska

Source | Jacko's IT journey

Recently, I published an article about the HTTP protocol, outlined below:

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary
What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

< h1 toutiao-origin="h3" > an introduction to HTTP</h1>

The HTTP protocol, short for Hyper Text Transfer Protocol, is a transfer protocol used to transfer hypertext from a World Wide Web server to a local browser.

HTTP is a TCP/IP communication protocol based on which data (HTML files, image files, query results, etc.) are passed.

HTTP is an object-oriented protocol that belongs to the application layer and is suitable for distributed hypermedia information systems due to its simple and fast approach. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. The sixth edition of HTTP/1.0 is currently in use in the WWW, the normalization of HTTP/1.1 is underway, and the recommendation for HTTP-NG (Next Generation of HTTP) has been proposed.

The HTTP protocol works on the client-side architecture, where the browser, as an HTTP client, sends all requests to the HTTP server, the WEB server, through a URL, and the Web server sends response information to the client based on the request received.

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

<h2 toutiao-origin="h4" >HTTP features:</h2>

Simple and fast: When a customer requests service from the server, only the request method and path are passed. The commonly used request methods are, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small, so the communication speed is very fast;

Flexible: HTTP allows any type of data object to be transmitted. The type being transmitted is marked by Content-Type;

Connectionless: The meaning of connectionless is to limit the processing of only one request per connection. After the server has processed the customer's request and received the customer's reply, it disconnects. In this way, the transmission time can be saved;

Stateless: The HTTP protocol is stateless. Stateless means that the protocol has no memory for transaction processing. The absence of state means that if subsequent processing requires the preceding information, it must be retransmitted, which can result in an increase in the amount of data transferred per connection. On the other hand, when the server does not need previous information, its response is faster;

Support B/S and C/S modes;

HTTP has so many advantages, so the question is, is there any drawback to the HTTP protocol? The answer is yes, and the reason is simple, if HTTP is perfect, what else do you need a security protocol called HTTPS?

<h2 toutiao-origin="h4" > HTTP cons:</h2>

When we send more private data to the server (such as your bank card, ID card), if you use http to communicate. Then security will not be guaranteed;

First of all, in the process of data transmission, the data may be grabbed by the middleman, then the data will be stolen by the middleman;

Secondly, after the data is obtained by the middleman, the middleman may modify or replace the data and then send it to the server;

Finally, after the server receives the data, it cannot determine whether the data has been modified or replaced, of course, if the server cannot determine whether the data is really from the client;

To sum up, HTTP has three drawbacks:

The confidentiality of the message cannot be guaranteed;

The completeness and accuracy of the message cannot be guaranteed;

The reliability of the source of the message cannot be guaranteed;

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

< h1 toutiao-origin="h3" > an introduction to HTTPS</h1>

How do I fix HTTP drawbacks? HTTPS was born to solve these problems.

HTTPS (hyper Text Transfer Protocol over Secure Socket Layer) is an HTTP channel with security as the goal, which is simply a secure version of HTTP.

That is, the SSL layer is added under HTTP, and the security basis of HTTPS is SSL, so ssl is required for the details of encryption. It is now widely used for secure and sensitive communications on the World Wide Web, such as transaction payments.

HTTPS through asymmetric encryption algorithms can make the plaintext information we transmit, can not be deduced by backwards to get the plaintext. Let's take a look at what the specific workflow looks like.

<h2 toutiao-origin="h4" > works:</h2>

The process of establishing HTTPS

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

Here the HTTPS establishment to disconnection is divided into 6 stages, 12 processes. The 12 procedures are explained below:

1. Client - Hello: The client starts SSL communication by sending client Hello packets. The message contains the specified version of SSL supported by the client, a list of encryption components (Cipher Suite) (encryption algorithm used, key length, etc.);

2. Server — Hello: When the server can communicate ssl, it will respond with a Server Hello message. As with the client, the SSL version and the encryption component are included in the message. The contents of the server's encryption component are filtered out from within the received client-side encryption component;

3. Server — Issue certificate: The server sends a certificate message. The message contains a certificate of public key;

4, the server — I said: finally the server sent a Server Hello Done message to inform the client, the initial stage of the SSL handshake negotiation part ended;

5. Client — Send key: After the ssl first handshake ends, the client responds with the Client Key Exchange message. The message contains a random string of passwords called Pre-master secrets used in communication encryption. The message has been encrypted with the public key in step 3;

6, the client - use this key: the message will prompt the server, after this message communication will be encrypted with the Pre-master secret key;

7. Client — I'm done: the message contains the overall check value of all the messages connected so far. Whether the handshake negotiation can be successful or not is determined by whether the server can correctly decrypt the message;

8. Server — send c Change Cipher Spec messages (I'm receiving the key);

9, the server — send d Finished message (I finished receiving the key);

10. Client-side — start sending the body: the server sends an HTTP request and sends related content;

11. Server — Start receiving the body: the client receives the HTTP request and processes the relevant content;

12. Client — Disconnect: The client is finally disconnected. When disconnected, close_notify message is sent. Some omissions were made in the above figure, after which TCP FIN messages were sent to close communication with TCP;

In addition, in the above flowchart, the application layer sends data with a message abstract called MAC (MessageAuthentication Code). MAC can check whether the message has been tampered with, thus ensuring the integrity of the message;

Let's use the illustration to illustrate it graphically, this figure is more detailed than the figure of the digital certificate above (the picture comes from "Illustrated HTTP")

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

The process in which HTTPS is established and communicated is described above. Since the actual workflow is like this, what kind of algorithm can achieve such a function, and what kind of way can asymmetric encryption be achieved? How is it calculated from a mathematical point of view? So what is the corresponding theoretical basis? What underpins HTTPS that enables him to encrypt transmissions?

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

<h1 toutiao-origin="h3" > the theoretical principle of HTTPS</h1>

HTTPS uses some encryption and decryption, digital certificate, digital signature technology to achieve. Let's first introduce the basic concepts of these technologies.

In order to ensure the confidentiality of the message, encryption and decryption are required. The current mainstream encryption and decryption algorithms are divided into symmetric encryption and asymmetric encryption.

<h2 toutiao-origin="h4" > symmetric encryption (shared key encryption).</h2>

The client and server share a key to encrypt and decrypt messages, a method known as symmetric encryption. The client and server agree on an encryption key. The client encrypts the message with the key before sending the message, and after sending it to the server, the server uses the key to decrypt the message.

Illustration of the encryption process:

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

The symmetric encryption algorithm used here:

M: Plaintext, the content we intend to transmit;

C: The key, in the symmetric encryption algorithm needs to be encrypted with the key, decrypted with the key (the encryption algorithm can be very simple, addition, subtraction, multiplication and division, can also be very complex);

N: Ciphertext, the content obtained by the plaintext encrypted with the key, is called ciphertext, and the ciphertext transmitted on the network is also ciphertext;

For example, the client transmits 1 (plaintext) to the server, 1 + 3 (3 is the key) = 4 gets the ciphertext, transmits, the server gets the ciphertext 4, 4-3 (3 is the key) = 1 gets the plaintext, so that the client and the server communicate, and vice versa;

Advantages of symmetric encryption:

Symmetric encryption solves the problem of message confidentiality in HTTP;

Disadvantages of symmetric encryption:

Although symmetric encryption ensures message confidentiality, because the client and server share a key, this makes the key particularly easy to leak;

Because the risk of key leakage is high, it is difficult to ensure the reliability of the source, the integrity and accuracy of the message;

Symmetric encryption key leakage risk is very high, the key is fixed, resulting in easy to be cracked, so is there a better way to encrypt the transmission, such as each time the key is not the same, each decryption key is not the same, or other situations to increase security?

<h2 toutiao-origin="h4" > asymmetric encryption (public key encryption).</h2>

Since the key is so easy to leak in symmetric encryption, we can use an asymmetric encryption method to solve it. With asymmetric encryption, both the client and the server have a public key and a private key. Public keys can be exposed to the outside world, while private keys are only visible to themselves.

Messages encrypted with a public key can only be unlocked by the corresponding private key. Conversely, messages encrypted with a private key can only be unlocked by a public key. In this way, before sending the message, the client first encrypts the message with the server's public key, and then decrypts it with its own private key after receiving it.

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

The explanation is as follows:

M: refers to the plaintext, the content we intend to transmit;

D: refers to the public key, which needs to be encrypted with the public key in asymmetric encryption algorithms;

E: Refers to the private key, which needs to be decrypted with the private key in asymmetric encryption algorithms;

N: Refers to ciphertext, the content obtained by clear text encrypted with a key, known as ciphertext, which is also transmitted on the network;

This time the server generates the public key D and the private key E, which is kept by itself. Then the public key D is exposed to the public, and the client that wants to communicate with the server uses the public key D to encrypt and send it to the server with the private key E, and the server can decrypt the ciphertext with the private key E, and finally get the plaintext.

< h2 toutiao-origin="h4" > an introduction to the asymmetric encryption algorithm RSA</h2>

RSA is currently the most influential public key cryptography algorithm that is resistant to the vast majority of known cryptographic attacks to date and has been recommended by ISO as the standard for public key data encryption.

Today only a short RSA key can be broken in a powerful way. As of 2008, there was no reliable way to attack RSA algorithms. As long as its key is long enough, information encrypted with RSA cannot actually be broken. But today, as distributed computing and quantum computer theory matures, RSA cryptographic security is being challenged.

The RSA algorithm is based on a very simple fact of number theory: it is easy to multiply two prime numbers, but it is extremely difficult to factor their products, so the products can be exposed as cryptographic keys.

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

<h1 toutiao-origin="h3" > HTTP performance tuning</h1>

<h2 toutiao-origin="h4" > reduce the number of HTTP requests</h2>

Reducing the number of HTTP requests is a very important aspect of performance optimization, so in basically all optimization principles, there is this principle: reduce the number of HTTP requests, regardless of the rest.

Let's first consider why reducing HTTP requests optimizes performance:

1, reduce the time spent on DNS requests and do not say right or wrong, because basically, reducing the number of HTTP requests can indeed reduce the time spent on DNS requests and resolution;

2, reduce server pressure this is usually the most considered, but also the biggest reason I use to explain to others, because every HTTP request will consume server resources, especially some need to calculate and merge and other operations of the server, consuming the server's CPU resources is not a joke, hard disk can be bought with money, CPU resources can not be so cheap;

3, reduce the HTTP request header, when we launch a request to the server, we will carry the cookie under this domain name and some other information in the HTTP header, and then the server will also bring back some cookies and other header information when responding to the request, this information will sometimes be very large, in this kind of request and response will affect the bandwidth performance;

<h2 toutiao-origin="h4" > DNS requests and resolutions</h2>

In simple terms, for example, www.taobao.com such a URL, where the www part is called the hostname, the taobao part is the second-level domain, the com is the first-level domain, and if it is such a URL: www.ali.tao.com then ali is the third-level domain.

When we go to request a URL, we will first go to the local server to find out if there is a resolution result in the cache, if there is no resolution result, we will go to the root nameserver request, the root nameserver returns to the local nameserver the IP address of the main nameserver of the queryed domain, and then we go to request the nameserver of the IP address just returned, and then return the IP address of the next level of domain name, until we find the server IP referred to in the domain name. The result is then cached for next use and returned.

The DNS resolution process for a URL requested for the first time may be very expensive, but after parsing once, the result will be cached, and then the request will not have to go through the above set of complex resolution processes.

<h2 toutiao-origin="h4" > reduce server stress</h2>

Too many HTTP requests are very dangerous for the server, if your server is not very strong, please put this one in the first place, other optimization strategies are only optimization, and here is the server, you have to ensure that your server can work properly.

But this is Taobao, and we have enough speed to provide enough user experience. If your server can't provide this kind of speed, and can't withstand this frequent asynchronous request, this optimization should be careful, and the delay may cause navigation to be unavailable, which is also coordinated for the scenario.

Taobao is now widely deployed CDN, CDN can provide us with enough background resources to protect, in the case of CDN and background environment continuous improvement, the focus should be more focused on the front desk transmission speed and display parsing speed to improve.

<h2 toutiao-origin="h4" > reduce HTTP request headers</h2>

HTTP header is a huge guy, you open the home page of the taobao.com, alert the document.cookie, will find that the Taobao cookie is relatively large, every time you request Taobao's server will go back and forth to these data, there are some other header information, the space occupied is not small, you can imagine how much this consumption is.

Then in fact, since the use of CDN, all this does not need to think too much, because the CDN and Taobao master site is not under a domain name, cookies will not pollute each other, and the CDN domain name basically has no cookies and header information, so every time you request a static resource, you will not run around with the main site's cookie, but only transmit the subject content of the resource, so the impact on performance will become very small after using cdn. But if your static resource server and master server are under the same domain name, it's important to control the size of cookies and other headers, because they are transmitted every time they are transmitted.

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary

< h1 toutiao-origin="h3" > summary</h1>

We have a preliminary understanding of the network protocol HTTP and HTTPS this time, understand the advantages and disadvantages of HTTP, it is precisely because of some shortcomings of HTTP, there is HTTPS, we learned how it works through the legend, or it is more complex, need to further understand the deepening, and then we talked about HTTP performance tuning, about reducing the number of requests, reducing server pressure and so on;

In short, different scenarios should consider different emphases, and different website sizes and types should be moderately optimized, and standards and best practices should not be blindly pursued.

What exactly is more of the HTTPS protocol than the HTTP protocol? Introduction to HTTPS Introduction Theory of HTTPSHTTP Performance Tuning Summary