19.3. IP OptionsBecause of the overhead associated with the time needed to process IP options , they have never been used much. In the next sections, we will see one by one the IP options handled by the Linux kernel and how they are processed. Here are the main APIs involved with IP option management, all of them defined in net/ipv4/ip_options.c. To understand some of them, remember that not all of the IP options of a packet need to be replicated in all of its fragments.
Now let's see how the functions are used in practice. Because you have not yet seen the internals of all the functions in Figure 18-1 in Chapter 18, you may not understand everything at this stage. You can come back to this second part of the section once you are familiar with the other functions. As you saw in Figure 18-1 in Chapter 18, different paths can lead to the transmission of a packet, and they handle the IP options in slightly different ways. I will cover two cases and leave you the others as an exercise. 19.3.1. Option ProcessingThe options of an ingress IP packet are first parsed with the ip_options_compile function, described in the next section. As mentioned in the previous section, the options are then processed by different routines at different times, depending on whether a packet is to be forwarded, fragmented, etc. Figure 19-3 summarizes where the key routines introduced in the previous section (with a lighter color) are called for ingress packets and for locally generated packets. When an ingress packet is to be forwarded, ip_rcv_finish calls ip_forward (via dst_input) to take care of the forwarding process. ip_forward handles the Router Alert option, if present, and makes sure that there are no problems with the strict source route option. Then it asks ip_forward_finish to complete the job of forwarding. The latter can behave differently depending on whether the header contains options. Let's suppose the packet had options. In this case, ip_forward_finish calls ip_forward_options to handle those options that should be processed when forwarding a packet, and then calls dst_output to carry out the actual transmission. As shown in Figure 18-1 in Chapter 18, dst_output ends up calling ip_output when the ingress IP packet needs to be forwarded. Figure 19-3. (a) Ingress packets; (b) locally generated packets![]() At this stage, the IP header is ready to be used, because all of the options have been processed. If there was no fragmentation, options processing is finished. However, if the packet needs to be fragmented, ip_output needs to make sure that only the first fragment includes all of the options; the others should have only a subset, according to Table 18-1 in Chapter 18. In this case, ip_output calls ip_fragment. Once the first fragment is done, ip_fragment uses ip_options_fragment to clear the options that are not needed for the subsequent fragments. This way, ip_fragment can keep copying the IP header from the original packet and have all the options correct. In a locally generated packet, options are handled with ip_options_build. We will see in Chapter 21 how that function is used by ip_queue_xmit and ip_push_pending_frames. 19.3.2. Option ParsingParsing, here, means extracting the IP options from the format in which they are stored in an IP packet's header and storing them in a structure called ip_options that is more convenient for program code to handle. Storing them in a dedicated data structure is useful because different options are handled in different parts of the IP code. ip_options_compile only parses the options, it does not process them. We saw in the previous section where options are processed. The function ip_options_compile is called in two different cases:
Let's now analyze how ip_options_compile parses the options of an IP packet's header. This is the function's prototype: int ip_options_compile(struct ip_options * opt, struct sk_buff * skb) The values of the two input parameters let the function know the context in which it is being called:
This means that depending on the function's context, the IP header is stored in different places. When transmitting a locally generated packet, opt is not NULL and opt->data contains a pointer to an IP header that was previously partially generated by the caller. If instead the function is processing an ingress packet, the header is contained in the skb input buffer and opt is NULL. In this second case, the ip_options structure is stored in skb->cb. ip_options_compile initializes local variables such as optptr according to where the IP header is located (i.e., skb->nh or opt->_ _data). The value of skb is also often used by ip_options_compile to distinguish between the two previous cases. In both cases (transmit and forward), you need to fill in opt. The only choices to make are where to get the input IP header to parse and where to store the result. if (!opt) { opt = &(IPCB(skb)->opt); memset(opt, 0, sizeof(struct ip_options)); iph = skb->nh.raw; opt->optlen = ((struct iphdr *)iph)->ihl*4 - sizeof(struct iphdr); optptr = iph + sizeof(struct iphdr); opt->is_data = 0; } else { optptr = opt->is_data ? opt->_ _data : (unsigned char*)&(skb->nh.iph[1]); iph = optptr - sizeof(struct iphdr); } If parsing fails, ip_options_compile returns immediately. The caller will handle the event in one of the following ways, depending on whether the options were used by a received or transmitted packet:
Among the possible reasons for a parsing failure are:
Currently, there are only two single-byte options:
The main for loop simply goes option by option and stores the result of parsing in the output ip_options structure opt. The code inside the loop may look complex, but actually it is very easy to read if you take into consideration the following points:
Figure 19-4. ip_options_compile's local variables' values in the middle of an execution![]() There cannot be other options after the IPOPT_END option. Therefore, as soon as one is found, whatever follows it is overwritten with more IPOPT_END options. The basic sanity checks for multibyte options include:
Since the length of each option includes the first two bytes (type and length) and since it starts counting from 1 (not 0), if length is less than 2 or bigger than the block of options left to analyze, there is an error: if (optlen<2 || optlen>l) { pp_ptr = optptr; goto error; } Note that some options (such as TIMESTAMP) have a minimum length bigger than 2, and thus the general check just shown is necessary but not always sufficient. The more specific checks are inside the per-option handlers. When an error is found in the options, a special ICMP message has to be sent back to the sender. This ICMP packet includes the original IP header, eight bytes of the IP payload, and an offset that points to where the error was found. The eight bytes of the IP payload consist of the start of the L4 header and usually include the L4 port numbers; this allows the receiver of the ICMP error message to find the socket associated with the faulty IP packet (more details in Chapter 25). Before returning the error message, the code initializes pp_ptr to point to the place where the problem was found. The switch statement uses, as its discriminator, the option type field. Therefore, each option in handled by a different statement, exactly as was done before for the single-byte options: switch (*optptr) The next sections analyze the multibyte options one by one, and Figures 19-5(a) and 19-5(b) show the big picture. The two obsolete options SEC and SIC are recognized but not processed[*] (see RFC 1812).
19.3.2.1. Option: strict and loose Source RoutingOnly one Source Routing option can appear in a header. The flag opt->srr is used to detect that condition: if the following code does not find any error in the option, it sets that flag. If another option of the same type appears later in the header, the error will be detected. opt->is_strictroute is used to tell the caller whether the Source Routing option was loose or strict. The section "ip_forward Function" in Chapter 20 shows how packets are dropped if they cannot reach their destinations while respecting the Source Routing rules. The option is considered faulty if the length of the option (including type and length) is less than 3. This is because the value has to contain the type, length, and pointer fields. At the same time, pointer cannot have a value smaller than 4 because the first three bytes of the option are already used by the type, length, and pointer fields. When the input skb parameter is NULL, it means that ip_options_compile has been called to parse the options of an outgoing packet (generated locally, not forwarded). In that case, the first IP address in the array of addresses provided by user space is saved in opt->faddr and then removed from the array by shifting the other elements of the array back one position with a memmove operation. This address will be retrieved later by the functions described in Chapter 21, ip_queue_xmit, and the ip_append_data's users, so they know the destination IP address. An easy-to-follow example of the use of opt->faddr can be found in the function udp_sendmsg. if (!skb) { if (optptr[2] != 4 || optlen < 7 || ((optlen-3) & 3)) { pp_ptr = optptr + 1; goto error; } memcpy(&opt->faddr, &optptr[3], 4); if (optlen > 7) memmove(&optptr[3], &optptr[7], optlen-7); } opt->is_strictroute = (optptr[0] == IPOPT_SSRR); opt->srr = optptr - iph; break; Figure 19-5a. ip_options_compile overview![]() Figure 19-5b. ip_options_compile overview![]() 19.3.2.2. Option: Record RouteFor the Record Route option, as for Timestamp, the sender reserves the part of the header it will use in advance. Because of this, when processing the option, new elements are added to the header only if there is some room left. If there is space, the ip_options_compile function sets the flag rr_needaddr to tell the routing subsystem to write the IP address of the outgoing interface into the IP header once the routing decision is taken.[*] Note that the list of IP addresses includes the transmitting interface's address if the options belong to a locally generated packet.
if (optptr[2] <= optlen) { if (optptr[2]+3 > optlen) { pp_ptr = optptr + 2; goto error; } if (skb) { memcpy(&optptr[optptr[2]-1], &rt->rt_spec_dst, 4); opt->is_changed = 1; } optptr[2] += 4; opt->rr_needaddr = 1; } opt->rr = optptr - iph; break; Since skb is non-null only when you are processing the options of an ingress packet, this piece of code simply copies the preferred source IP address into the list of addresses being recorded in the header, and updates the flag is_changed, which will force the IP checksum to be updated. See the section "Preferred Source Address Selection" in Chapter 35 for the reason why the rt_spec_dst IP address is used. Whether the address is written in the block of code shown here, because the packet is being forwarded, or will be written later thanks to the flag rr_needaddr that is set later, the pointer field of the option is moved ahead four bytes (the size of the IP address). This explains why ip_forward_options (which will be executed if the packet we are processing is being forwarded) will have to go back four bytes to write the IP into the right position. 19.3.2.3. Option: TimestampBecause optlen represents the length of the option being analyzed, the if statement simply checks whether any space is left to store the new information. In this case, the length of the option represents the space reserved by the transmitter (not the space used so far). if (optptr[2] <= optlen) { _ _u32 * timeptr = NULL; The handling of the option depends on the suboption specified by the sub-type field in Figure 18-8 in Chapter 18, but the three suboptions are handled in the same general way. Regardless of the subtype, whoever is going to handle the option needs two pieces of information (which will be stored in the ip_option structure):
If a timestamp needs to be recorded (this would be true for the TS_ONLY and TS_TSANDADDR cases), timeptr would be initialized to point to the right place where it should be written inside the IP header. Note also that timeptr is initialized only when skb is not NULL, which is the case when the option belongs to an ingress packet (as opposed to one that is locally generated). We already saw in the section "Option Parsing" that ip_options_compile can also be called when handling locally generated packets. In that case, skb would be NULL, so timeptr would not be initialized (i.e., it would be left NULL) and no timestamp would be recorded in the header. There is nothing wrong here, because the timestamp will be put there by ip_options_build. That function will store the timestamp because opt->ts_needtime equals 1. The only difference between processing an ingress packet to be forwarded and a locally generated packet is that in the former case, a timestamp is added to the IP header and the checksum has to be recomputed (so opt->is_changed needs to be set as well). When the subcode is IPOPT_TS_PRESPEC, the timestamp has to be added only when the next IP address to match is local to the system. The function used to make that check is inet_addr_type; here are the main return values:
Since local broadcasts and registered multicast addresses could be considered local (i.e., addresses the system listens to), the following piece of code that checks RTN_UNICAST does exactly what we wantit determines whether the address is local: { u32 addr; memcpy(&addr, &optptr[optptr[2]-1], 4); if (inet_addr_type(addr) == RTN_UNICAST) break; if (skb) timeptr = (_ _u32*)&optptr[optptr[2]+3]; } opt->ts_needtime = 1; Depending on the suboption being processed, the timestamp has to be written at a different offset within the IP header. The first part initializes timeptr accordingly, and the second part copies the timestamp to the right position. Depending on the suboption, the ts_needtime and tr_needaddr flags are also initialized. if (timeptr) { struct timeval tv; _ _u32 midtime; do_gettimeofday(&tv); midtime = htonl((tv.tv_sec % 86400) * 1000 + tv.tv_usec / 1000); memcpy(timeptr, &midtime, sizeof(_ _u32)); opt->is_changed = 1; } This last part takes care of the counter overflow we described in the section "Timestamp Option" in Chapter 18. unsigned overflow = optptr[3]>>4; if (overflow == 15) { pp_ptr = optptr + 3; goto error; } opt->ts = optptr - iph; if (skb) { optptr[3] = (optptr[3]&0xF)|((overflow+1)<<4); opt->is_changed = 1; } 19.3.2.4. Option: Router AlertAs we explained in the section "Router Alert Option" in Chapter 18, the last two bytes of this option must be zero. If this option passes the sanity check, ip_options_compile initializes the router_alert flag so that later ip_forward will handle it accordingly. (opt->router_alert is simply treated as Boolean, zero, or nonzero.) if (optptr[2] == 0 && optptr[3] == 0) opt->router_alert = optptr - iph; 19.3.2.5. Handling parsing errorsIf the error was found in a locally generated packet (skb==NULL), the function simply returns an error that will have to be handled by the caller. If instead it was found on a received IP packet, an ICMP error message has to be sent back to the source: error: if (skb) { icmp_send(skb, ICMP_PARAMETERPROB, 0, htonl((pp_ptr-iph)<<24)); } return -EINVAL; } |