English French
A closer look at TCP
In this series of exercises, you will explore in more details the operation of TCP and its congestion control scheme. TCP is a very important protocol in today's Internet since most applications use it to exchange data. We first look at TCP in more details by injecting segments in the Linux TCP stack and analyze how the stack reacts. Then we study the TCP congestion control scheme.
Injecting segments in the Linux TCP stack
Packet capture tools like tcpdump_ and Wireshark_ are very useful to observe the segments that transport protocols exchange. TCP is a complex protocol that has evolved a lot since its first specification :rfc:`793`. TCP includes a large number of heuristics that influence the reaction of a TCP implementation to various types of events. A TCP implementation interacts with the application through the ``socket`` API.
packetdrill_ is a TCP test suite that was designed to develop unit tests to verify the correct operation of a TCP implementation. A more detailed description of packetdrill_ may be found in [CCB+2013]_. packetdrill_ uses a syntax which is a mix between the C language and the tcpdump_ syntax. To understand the operation of packetdrill_, we first discuss several examples. The TCP implementation in the Linux kernel supports all the recent TCP extensions to improve its performance. For pedagogical reasons, we disable [#fsysctl]_ most of these extensions to use a simple TCP stack. packetdrill_ can be easily installed on recent Linux kernels [#finstall]_.
Let us start with a very simple example that uses packetdrill_ to open a TCP connection on a server running on the Linux kernel. A packetdrill_ script is a sequence of lines that are executed one after the other. There are three main types of lines in a packetdrill_ script.
packetdrill_ executes a system call and verifies its return value
packetdrill_ injects [#ftcpdump_pdrill]_ a packet in the instrumented Linux kernel as if it were received from the network
packetdrill_ compares a packet transmitted by the instrumented Linux kernel with the packet that the script expects
For our first packetdrill_ script, we aim at reproducing the simple connection shown in the figure below.
Let us start with the execution of a system call. A simple example is shown below.
The ``0`` indicates that the system call must be issued immediately. packetdrill_ then executes the system call and verifies that it returns ``3```. If yes, the processing continues. Otherwise the script stops and indicates an error.
For this first example, we program packetdrill_ to inject the segments that a client would send. The first step is thus to prepare a :manpage:`socket` that can be used to accept this connection. This socket can be created by using the four system calls below.
At this point, the socket is ready to accept incoming TCP connections. packetdrill_ needs to inject a TCP segment in the instrumented Linux stack. This can be done with the line below.
Each line of a packetdrill_ script starts with a `timing` parameter that indicates at what time the event specified on this line should happen. packetdrill_ supports absolute and relative timings. An absolute timing is simply a number that indicates the delay in seconds between the start of the script and the event. A relative timing is indicated by using ``+`` followed by a number. This number is then the delay in seconds between the previous event and the current line. Additional information may be found in [CCB+2013]_.
The description of TCP packets in packetdrill_ uses a syntax that is very close to the tcpdump_ one. The ``+0`` timing indicates that the line is executed immediately after the previous event. The ``<`` sign indicates that packetdrill_ injects a TCP segment and the ``S`` character indicates that the ``SYN`` flag must be set. Like tcpdump_, packetdrill_ uses sequence numbers that are relative to initial sequence number. The three numbers that follow are the sequence number of the first byte of the payload of the segment (``0``), the sequence number of the last byte of the payload of the segment (``0`` after the semi-column) and the length of the payload (``0`` between brackets) of the ``SYN`` segment. This segment does not contain a valid acknowledgment but advertises a window of 1000 bytes. All ``SYN`` segments must also include the ``MSS`` option. In this case, we set the MSS to 1000 bytes. The next line of the packetdrill_ script verifies the reply sent by the instrumented Linux kernel.
This TCP segment is sent immediately by the stack. The ``SYN`` flag is set and the dot next to the ``S`` character indicates that the ACK flag is also set. The SYN+ACK segment does not contain any data but its acknowledgment number is set to 1 (relative to the initial sequence number). For outgoing packets, packetdrill_ does not verify the value of the advertised window. In this line, it also accepts any TCP options (``<...>``).
The third segment of the three-way handshake is sent by packetdrill_ after a delay of 0.1 seconds. The connection is now established and the accept system call will succeed.
The :manpage:`accept` system call returns a new file descriptor, in this case value ``4``. At this point, packetdrill_ can write data on the socket or inject packets.
packetdrill_ writes 10 bytes of data through the :manpage:`write` system call. The stack immediately sends these 10 bytes inside a segment whose ``Push`` flag is set [#fpush]_. The payload starts at sequence number ``1`` and ends at sequence number ``10``. packetdrill_ replies by injecting an acknowledgment for the entire data after 100 milliseconds.