燕塘乳业诉风行牛奶 涉包装侵权
Ready to give LWN a try? 百度 女侠43年救起25名落水者 60岁的吴永秀在念高中时,水性就非常好,曾担任学校游泳队队长。High-performance networking is continually faced with a challenge: local networking technologies are getting faster more quickly than processor and memory speeds. So every time that the venerable Ethernet technology provides another speed increment, networking developers must find ways to enable the rest of the system to keep up - even on fast contemporary hardware.With a subscription to LWN, you can stay current with what is happening in the Linux and free-software community and take advantage of subscriber-only site features. We are pleased to offer you a free trial subscription, no credit card required, so that you can see for yourself. Please, join us!
One recurring idea is to push more of the work into the networking hardware itself. TCP offload engines have been around since the days when systems were having trouble keeping up with 10Mb Ethernet, but that technology has always been limited in its acceptance; see this 2005 LWN article for some discussion of why. But some more restrained hardware assist techniques have been more successful; for example, TCP segmentation offload (TSO), where network adapters turn a stream of data into packets for transmission, is well supported under Linux.
Use of TSO can boost networking performance considerably. When one is dealing with thousands of packets every second, even a slight per-packet assist will add up. TSO reduces the amount of work needed to build headers and checksum the data, and it cuts down on the number of times that the driver must program operations into the network adapter. There is, however, no analogous assistance for incoming data. So, if you have two identical Linux servers with one sending a high-bandwidth stream to the other, the receiving side may be barely keeping up with the load while the transmitting side barely breaks a sweat.
Proposals for assistance for packet reception often go under the name "large receive offload" (LRO); the idea was first proposed for Linux in this OLS 2005 talk [PDF]. The initial LRO implementation used hardware features found in Neterion adapters; it never made it into the mainline and little has been heard from that direction since. The LRO idea has recently returned, though, in the form of this patch by Jan-Bernd Themann. Interestingly, the new LRO code does not require any hardware assistance at all.
With Jan-Bernd's patch, a driver must, to support LRO, fill in an LRO manager structure which looks like this:
#include <linux/inet_lro.h> struct net_lro_mgr { struct net_device *dev; struct net_lro_stats stats; unsigned long features; u32 ip_summed; /* Options to be set in generated SKB in page mode */ int max_desc; /* Max number of LRO descriptors */ int max_aggr; /* Max number of LRO packets to be aggregated */ struct net_lro_desc *lro_arr; /* Array of LRO descriptors */ /* * Optimized driver functions * * get_skb_header: returns tcp and ip header for packet in SKB */ int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags, void *priv); /* * get_frag_header: returns mac, tcp and ip header for packet in SKB * * @hdr_flags: Indicate what kind of LRO has to be done * (IPv4/IPv6/TCP/UDP) */ int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr, void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags, void *priv); };
In this structure, dev is the network interface for which LRO is to be implemented; stats contains some statistics which can be queried to see how well LRO is working. The features field controls how the LRO code should feed packets into the networking stack; it has two flags defined currently:
- LRO_F_NAPI says that the driver is NAPI compliant, and that, in
particular, packets should be passed upward with
netif_receive_skb().
- LRO_F_EXTRACT_VLAN_ID is for drivers with VLAN support. This article won't go further into VLAN support for the simple reason that your editor does not understand it.
Checksum information for the final packets should go into ip_summed. The maximum number of "LRO descriptors" should be stored in max_desc. Each descriptor describes one TCP stream, so the maximum limits the number of streams for which LRO can be done simultaneously. Increasing the maximum requires more memory and will slow things a bit, since packets are matched to streams by way of a linear search. max_aggr is the maximum number of incoming packets which will be aggregated into a single, larger packet. The lro_arr array contains the descriptors for tracking streams; the driver should provide it with enough memory for at least max_desc structures or very unpleasant things are likely to happen.
Finally, there are the get_skb_header() and get_frag_header() methods. Their job is to locate the IP and TCP headers in a packet as quickly as possible. Typically a driver will only provide one of the two functions, depending on how it feeds packets into the LRO aggregation code.
A driver which receives packets in fully-completed sk_buff structures would normally pass them up directly to the network stack with netif_rx() or netif_receive_skb(). If LRO is being done, instead, the packets should be handed to:
void lro_receive_skb(struct net_lro_mgr *lro_mgr, struct sk_buff *skb, void *priv);
This function will attempt to identify an LRO descriptor for the given packet, creating one if need be. Then it will try to join that packet with any others in the stream, making one large, fragmented packet. In the process, it will call the driver's get_skb_header() method, passing through the pointer given as priv. If the packet cannot be aggregated with others (it may not be a TCP packet, for example, or it could have TCP options which require it to be processed separately) it will be passed directly to the network stack. Either way, the driver can consider it delivered and move on to its next task.
Some drivers receive packets directly into memory represented by page structures, constructing the full sk_buff structure after reception. For such drivers, the interface is:
void lro_receive_frags(struct net_lro_mgr *lro_mgr, struct skb_frag_struct *frags, int len, int true_size, void *priv, __wsum sum);
The LRO code will build the necessary sk_buff structure, perhaps aggregating fragments from several packets, and (sooner or later) feed the results to the network stack. It will call the driver's get_frag_header() method to locate the headers in the first element of the frags array; that method should also ensure that the packet is an IPv4 TCP packet and set LRO_IPV4 and LRO_TCP in the flags argument if so.
Combined packets will be pushed up into the network stack whenever max_aggr individual packets have been merged into them. Delaying data for too long while waiting for additional packets is not a good idea, though; occasionally packets should be sent on even if they are not as large as they could be. The function for this job is:
void lro_flush_all(struct net_lro_mgr *lro_mgr);
It will cause all packets to sent on. A logical place for such a call might be at the end of a NAPI driver's poll() method. An individual stream can be flushed with:
void lro_flush_pkt(struct net_lro_mgr *lro_mgr, struct iphdr *iph, struct tcphdr *tcph);
This call will locate the stream associated with the given IP and TCP headers and send its accumulated data onward. It will not add any data associated with the given headers; the packet associated with those headers should have already been added with one of the receive functions if need be.
That is, for all practical purposes, the entire interface. One might well
wonder how this code can improve performance, given that it is just
aggregating packets which have already been received in the usual way by
the driver. The answer is that it is reducing the number of packets that
the network stack has to work with, cutting the per-packet overhead at
higher levels in the stack. A clever driver can, using the struct
page approach, also reduce the number of memory allocations required
for each packet, which can be a big win. So LRO appears to be worth
having, and current plans call for it to be merged in 2.6.24.
Index entries for this article | |
---|---|
Kernel | Device drivers/Network drivers |
Kernel | Networking |
Posted Aug 4, 2007 7:59 UTC (Sat)
by bgoglin (subscriber, #7800)
[Link]
> The initial LRO implementation used hardware features found in Neterion
Actually, lots happened in the last year or so. First, the neterion driver (s2io) had its own LRO implementation merged in mainline for a while, since 2.6.17. Then some patches were posted to add LRO in the Myri-10G driver (myri10ge) for 2.6.19, but they got rejected because the kernel maintainers didn't want 2 driver-specific implementations, they wanted a generic LRO (what Jan-Bernd did in the end). For the same reason, the Chelsio driver (cxgb3) LRO got the same reject later.
However, this long discussion got unfair suddenly because the new NetXen driver (netxen-nic) got merged in 2.6.20 with its own LRO by mistake, which made the myri10ge and cxgb3 maintainers kind of jalous :)
Now that Jan-Bernd posted this generic LRO patch, drivers are being ported to use it. "skb-mode" drivers can take a look at the patch that Jan-Bernd posted to port the eHEA driver. "page-based" drivers can look at the myri10ge driver patch that has been posted by Andrew Gallatin. He also provided lots of useful input during Jan Bernd's rework of his initial LRO patch (which was basically only designed for eHEA, i.e. skb-mode, only certain HW-checksum features, ...).
Everybody should be happy now, at least as long as the generic LRO performance is as good as the driver-specific LRO perf. So far, only the myri10ge driver can run both the generic LRO from Jan-Bernd or its own specific LRO (not included in mainline). Andrew confirmed the performance was similar, fortunately.
Large receive offload
> adapters; it never made it into the mainline and little has been heard from
> that direction since.