2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
The TCP protocol has the characteristics of connection, reliable transmission, byte stream orientation, and full-duplex, among which reliable transmission is the most core part.
figure 1
The figure above shows the structure of the TCP header.
(1) There is no need to introduce the source port and destination port in detail. They are used to indicate the sending and receiving programs.
(2) Sequence numbers and acknowledgment numbers are used in the acknowledgment response mechanism in TCP, which will be introduced later.
(3) The 4 bits in the data offset part are actually used to indicate the length of the TCP header. Because of the 4 bits, the maximum number that can be represented is 15. However, because the unit is 4 bytes, the maximum length of the TCP header is 60 bytes.
(4) We all know that the maximum length of a UDP data packet is 64kb, which is very limited. So in order to avoid this situation, 6 reserved bits are provided in the TCP header. If the TCP function is expanded in the future, the reserved bits can be used to indicate it.
(5) The 6-byte English identifier is related to the TCP mechanism introduced later and will not be introduced here.
(6) The checksum has the same function as the checksum in UDP and is used to check whether the data has been changed during transmission.
(7) The window will be discussed later when the mechanism is introduced.
(8) Emergency pointers will be introduced later.
(9) The options are optional/optional.
The TCP protocol has to solve a very important problem - reliable transmission. The so-called reliable transmission does not mean that the sender can send data to the receiver 100%, but it will try to let the sender know whether the receiver knows.
figure 2
As shown in Figure 2, every time you send a message to the goddess, the goddess will send you a message back, which is the response. The same is true in TCP. When the client sends a data packet, the server will return a response data packet. At this time, the flag ACK of the response data packet in Figure 1 will be set to 1.
image 3
As shown in Figure 3, when we send multiple messages to the goddess, because the goddess responds to the message at different speeds, it is easy to confuse us. We may think that the goddess agrees to be my girlfriend, but in fact the goddess says get lost. This is the objective problem of the last send first arrive in the network. Obviously, when sending multiple data packets to the client, the client responds with multiple response packets. We also need to distinguish which response corresponds to which sent data packet. This problem can be solved by using the sequence number and confirmation sequence number in Figure 1.
Each data packet sent has a sequence number, but no confirmation number, or the confirmation number field is invalid. The ACK of the data packet responding to it is 1, and the confirmation number field is valid. The sequence number of the sent data packet and the confirmation number of the response data packet correspond to each other, so that it is possible to distinguish whose response is whose, and the last-in-first-out problem in the above-mentioned network can be solved.
The actual sequence number of TCP data packets is numbered according to bytes. Each byte is assigned a sequence number. The value of the sequence number in the TCP header is the number of the first byte in the payload. For example, if the first byte of the payload is numbered 1 and the payload length is 1000 bytes, then the confirmation sequence number in the response data packet header of the data packet is 1001. This is because the confirmation sequence number of the response data packet corresponding to the data packet is set to the number of the last byte of the payload plus 1. In fact, this is relatively easy to understand. Returning 1001 means that the 1000-byte payload sent has been received, and the data after 1001 is requested, as shown in Figure 4.
Figure 4
When sending subsequent data packets, the sequence number is incremented, but one thing to note is that although it is incremented, the sequence number does not start from 0 or 1.
Reliable transmission is achieved mainly by relying on the "confirmation response" mechanism. It can tell the sender through the response message whether everything went well and whether there was packet loss. If packet loss occurs (the data is discarded during the data transmission process and cannot reach the other end, which is an objective random event), what should be done?
Timeout retransmission is used to deal with the problem of packet loss in the network.
There are mainly two situations here:
(1) Transmitted data packets are lost
At this time, if B does not receive A's data packet, it will not send a response data packet. When the event A is waiting for exceeds a certain threshold, it will determine that a packet loss problem has occurred and retransmit the data packet.
(2) Response data packet loss
When A does not receive a response data packet, it still retransmits a data packet. However, at this time B has already received a data packet, and with the retransmitted data packet, two identical data packets are received. This is very unscientific, especially for problems such as money transfer, which will cause duplicate transfers.
To solve the above problem, the TCP receiver will deduplicate the received data according to the sequence number. The TCP layer does not care about duplication. The key is to prevent the application layer from reading duplicate data. No matter how many times the data is retransmitted, the application layer must only read one copy of the data.
In the kernel of the receiving operating system, there is a data structure - receiving buffer. This data structure is similar to the priority blocking queue. When B receives a data packet and passes through layers of distribution to the transport layer, there will be a blocking queue to put the data into it. The queue will determine whether the data has existed based on the sequence number of the data packet. If it has existed (it means that the same data packet has been received and has been read once by the application layer), it will be discarded directly. Another point of the receiving buffer is that it can solve the problem of last-sent first-arrived in the network. The sent data will be sorted according to the sequence number and then consumed by the application layer program in order.
As shown in the figure above, the data arriving at the receive buffer will be sorted according to the sequence number. This not only solves the problem of last-sent-first-arrived, but also solves the problem of receiving duplicate data packets. At this time, if the sequence number of the retransmitted data packet received is 500, it will be discarded directly, because the minimum sequence number of the receive buffer is 1000, which means that 500 has already been read by the application.
Packet loss itself is a low-probability event. When the number of packet losses increases, the network problem becomes serious. As the number of retransmissions increases, the retransmission interval continues to lengthen, because the more retransmissions there are, the more problems there are in the network. Frequent retransmissions will consume resources. When the number of retransmissions reaches a threshold, a reset message will be sent with the RST flag set to 1 to clear the intermediate state at both ends. If the reset message is not responded to, the connection at both ends will be deleted.
Timeout retransmission is a supplement to confirmation response.
Figure 5
Establishing a connection means that both parties in communication save each other's information. Three network interactions are required to complete the process as shown in Figure 5.
The first of the three-way handshake must be initiated by the client. Whoever initiates the three-way handshake is the client. The specific process is shown in Figure 5. First, the client sends a SYN data packet, that is, the SYN flag bit in the packet header is set to 1. Then the server returns a response data packet, and the ACK and SYN flag bits of the response data packet are both set to 1. Because the data packets with the ACK and SYN flag bits are sent by the operating system kernel at the same time, they can be sent together to improve performance. Finally, the client sends a response data packet to the server, and the three-way handshake is completed. The data packets transmitted in this process do not contain business data.
The significance of three-way handshake:
(1) Throwing a stone to test the water
Confirm whether the communication link is unobstructed
(2) Negotiate some important parameters
For example, the sequence number of the transmitted data packet
(3) Confirm the receiving and sending capabilities of both parties
Why do we need three handshakes? Are four or two handshakes OK?
Although the four-way handshake will not affect normal functions, it will reduce performance. The two-way handshake cannot fully confirm the server's receiving and sending capabilities.
In addition, there are two more important states involved here. The first is the listen state, which means that the server has bound the port and is waiting for the client to send a SYN packet. The other is established, which is easy to understand as the state after the three-way handshake completes the establishment of the connection.
Unlike the three-way handshake, which can only be initiated by the client, the four-way handshake process can be actively initiated by both the server and the client.
Figure 6
The specific process of the four waves is shown in Figure 6. When the socket.close() method is called in the client code or the process ends, a FIN end message will be sent to the server. The server will immediately send an ACK message back, but it will not be able to send a FIN end message to the client until the server code also calls the socket.close() code. After that, the client sends an ACK message to the server, and the four waves are completed.
The ACK and FIN in the middle cannot be combined, because the ACK is sent by the system kernel, so it will be sent immediately when the server receives the FIN message, but the FIN message can only be sent after the server code executes socket.close(), so there is a time difference between the two. However, there is a way to combine the two. In special cases, the ACK can be sent later, so that it can be sent together with the FIN.
In addition, there are two states involved in the four-wave process. The first is the close_wait state, which is the state in which the receiver is in after receiving the FIN message from the sender. The other state is time_wait, which is a state in which the sender needs to wait after sending an ACK to the receiver after receiving the FIN sent by the receiver. The connection cannot be disconnected immediately to prevent the last ACK sent by the sender to the receiver from being lost and the receiver from retransmitting the FIN to ensure that the sender can still receive the FIN. The time of this state is generally 2MSI (MSI: the maximum time for data transmission at both ends), which is generally 2 minutes.
If a large number of close_wait times appear on the receiving end, it means that the close() method has been forgotten to be called. If a large number of time_wait times appear on the server, it means that the server has triggered a large number of active TCP disconnect operations.
TCP needs to ensure reliable transmission, but at the same time it also wants to complete data transmission as efficiently as possible, and the sliding window is a mechanism to improve efficiency. This is actually a way to make up for the loss, because in order to ensure reliability, TCP sacrifices a lot of performance. No matter how much you slide the window, the data transmission speed cannot be faster than UDP.
Figure 7
As shown in FIG. 7 , this is the process of data transmission. However, the process of sending a data packet and receiving a response data packet is relatively slow.
Figure 8
As shown in Figure 8, this is after the sliding window mechanism is introduced. Instead of sending one data packet at a time, multiple data packets are sent, so that the waiting time for the returned response data packets overlaps. Without waiting for ACK, the amount of data sent in batches is the window size.
Fig. 9
The process of sliding the window is shown in Figure 9. Assuming the window size is 4 groups, when one group receives ACK, new data will be sent to make up for it, which is equivalent to a sliding process. If the sender receives ACK for 3001, it means that the data from 1001 to 3001 have all been received, so the window can move two squares to the right.
What to do if packet loss occurs in the sliding window? There are two situations:
(1) The sent data packet is lost
If a group of datagrams is lost, even if you send many groups of data to the receiver in batches, the confirmation number of the received ACK is still the lost group until the sender retransmits the datagram. For example, if the datagrams 1001 to 2000 in the above figure are lost, even if multiple groups of data are transmitted later, the ACK 1001 is finally returned until the sender retransmits and the receiver receives it, and then responds to the ACK 7001.
(2) Response ACK is lost
If the response ACK is lost, then there is no need to worry about it, because you can just wait until the ACK of other groups of data is returned. For example, it doesn’t matter if the ACK of 1001 in the above figure is lost. If the sender receives the next ACK of 2001, it means that the previous data has been received. The same is true if the ACK of the later data is lost.
The above processing of packet loss is still very efficient. If a data packet is lost, the gap can be filled, that is, the data can be retransmitted. If the ACK is lost, just ignore it. This operation is called fast retransmit.
Timeout retransmission and fast retransmission are different strategies adopted in different environments. If your TCP transmits little data and infrequently, it will trigger a timeout retransmission. If your TCP needs to transmit a large amount of data in a short period of time, it will trigger the sliding window and fast retransmission. Fast retransmission is equivalent to a variant of timeout retransmission under the sliding window.
As mentioned above, the sliding window size is variable. The sending speed of the sender can be controlled by changing the window size. The larger the window, the more data is sent per unit time, and the higher the efficiency. The smaller the window, the less data is sent per unit time, and the lower the efficiency. Normally, of course, the higher the efficiency, the better. However, the premise of high efficiency is to ensure reliability. If the sending speed of the sender is too fast and the receiver cannot handle it, it may cause packet loss. A more reasonable approach is for the receiver to tell the sender that the sending speed is too fast. This is "flow control."
As shown in the figure above, we have already mentioned that there is a data structure receive buffer in the kernel, and the receiver will return the size of the free space in the receive buffer as the window size. The previous TCP packet header has a 16-bit window size field that uses ACK to save and return this information. The window size field will only take effect in ACK.
As shown in the figure above, ACK will return the window size to achieve the purpose of flow control. When the window size returns to 0, the sender will periodically send a probe message that does not contain business data to trigger ACK to understand the status of the buffer and whether there is free space.
Congestion control is very similar to flow control, both of which are mechanisms used in conjunction with sliding windows.
As shown in the figure above, the links in the network are very complex, and any node on the link may restrict the speed of the sender. The idea of congestion control is to treat the intermediate structure as a whole no matter how complex it is, and then find the most appropriate window size through experiments.
The above figure shows a congestion control process. First, a relatively small window size (slow start) is given to try, because the network congestion situation is unknown. Then the window size increases exponentially. When it reaches a certain threshold, it starts to increase linearly. When it increases to a certain extent, packet loss occurs. At this time, the window is immediately reduced. There are two ways to reduce the size:
(1) Shrink directly to the bottom, return to the beginning of the slow start, and then repeat the previous process (has been abandoned)
(2) Reduce by half and then increase linearly (the actual method used)
Congestion control is to use experiments to find a suitable window size. If there is a lot of packet loss, the window size will be reduced, and if there is no packet loss, the window size will be increased.
As the name suggests, delayed response means waiting a while before returning ACK. This actually also involves the issue of window size. Because the delayed return of ACK gives the receiver more time to consume the data in the receiving buffer, the size of the free buffer increases, the window size returned by ACK increases, and the sender can send more data in batches.
There are two ways to delay the response:
(1) Specify the delay according to a certain time
(2) According to the amount of data received
The above two strategies are used in combination.
Piggyback acknowledgment has actually appeared before, and it is a mechanism used to improve transmission efficiency. It is the situation in which ACK and SYN are returned using the same data packet in the three-way handshake. There is also a situation similar to the four-way handshake, where ACK and FIN are sent at different times and therefore piggyback acknowledgment cannot be performed. However, with delayed acknowledgment, ACK does not need to be sent to the sender so quickly, and the two transmissions of it and FIN can be combined into one transmission through piggyback acknowledgment.
Byte stream-oriented is the mechanism of TCP. One issue that needs attention here is the packet sticking problem. This problem is caused by the inability to distinguish the boundaries between different application layer data packets. Due to the characteristics of byte streams, the server can read multiple bytes at a time or one byte, which can easily cause this problem.
There are two solutions to the above-mentioned sticky package problem:
(1) Use separators
Any symbol can be used as long as it does not exist in the request packet.
(2) Agree on the length of the data packet
However, in most cases, Java programmers do not use TCP directly. Instead, they use ready-made protocols such as http, or implement network communication based on tools such as protobuffer or dubbo. These methods have already solved the packet sticking problem internally.
(1) The process at one end of the communication crashes.
The operating system completes four waves and waves the PCB.
(2) A host is shut down (normal process)
The first possibility is that the operating system completes four waves. The second possibility is that the receiver does not respond to ACK after sending FIN, and unilaterally deletes the connection after retransmitting it many times. As for the sender, that is, the one that is turned off, since it is turned off, the stored information (memory) is naturally lost.
(3) A host loses power
When the host that loses power is a server, the data packet sent by the client will be retransmitted if there is no ACK. If there is no result after multiple retransmissions, the connection will be deleted.
When the host that loses power is the client, the server will periodically send a heartbeat packet without a payload if it has not received any data packets for a long time, just to trigger ACK. If the client is normal, it will return ACK, otherwise it will not receive any response. If the client does not respond after sending multiple times in a row, it can be considered that the client has hung up and the connection information will be deleted.
In addition, although TCP implements heartbeat packets, the cycle is relatively long. It often takes minutes to detect client failure through such heartbeat packets. In actual development, application layer heartbeat packets are implemented with higher frequency and shorter cycle (seconds/milliseconds). When sending heartbeat packets, A->B sends a ping, and B->A replies with a pong. Once a device fails, the problem can be quickly discovered.
(4) The network cable is disconnected
It is essentially the third case. If the sender does not receive ACK, it will timeout and retransmit, then send RST, and then unilaterally delete the connection. If the receiver does not receive the data packet, it will send a heartbeat packet, and unilaterally delete the information if it does not receive ACK.
There are two flags in the TCP header structure that have not been mentioned, namely PSH and URG. PSH is to urge the other party to return a response as soon as possible. URG is associated with the urgent pointer field of the TCP header and is used together to control TCP out-of-band data transmission.
Out-of-band data transmission means that in addition to business data, there are some special data packets used to control the working mechanism of TCP itself.