Ethernet polling and patch-pulling latency

By Jonathan Corbet
July 10, 2013

Dave Miller's networking git tree is a busy place; it typically feeds over 1,000 changesets into the mainline each development cycle. Linus clearly sees the networking subsystem as being well managed, though, and there are rarely difficulties when Dave puts in his pull requests. So it was surprising to see Linus reject Dave's request for the big 3.11 pull. In the end, it came down to the low-latency Ethernet device polling patches, which had to go through some urgent repairs while the rest of the networking pull request waited.

The point of this patch set is to enable low-latency data reception by applications that are willing to busy wait (in the kernel) if data is not available when a read() or poll() operation is performed on a socket. Busy waiting is normally avoided in the kernel, but, if latency matters more than anything else, some users will be willing to accept the cost of spinning in the kernel if it allows them to avoid the cost of context switches when the data arrives. The hope is that providing this functionality in the kernel will lessen the incentive for certain types of users to install user-space networking stacks.

Since this patch set was covered here in May, it has seen a few changes. As was predicted, a setsockopt() option (SO_LL) was added so that the polling behavior could be adjusted on a per-socket basis; previously, all sockets in the system would use busy waiting if the feature was enabled in the kernel. Another flag (POLL_LL) was added for the poll() system call; once again, it causes busy waiting to happen even if the kernel is otherwise configured not to use it. The runtime kernel configuration itself was split into two sysctl knobs: low_latency_read to set the polling time for read() operations, and low_latency_poll for poll() and select(). Setting either knob to zero (the default) disables busy waiting for the associated operation.

When the time came to push the networking changes for 3.11, Dave put the low-latency patches at the top of his list of new features. Linus was not impressed, though. He had a number of complaints, ranging from naming and documentation through to various implementation issues and the fact that changes had been made to the core poll() code without going through the usual channels. He later retracted some of his complaints, but still objected to a number of things. For example, he called out code like:

    if (ll_flag && can_ll && can_poll_ll(ll_start, ll_time))

saying that it "should have made anybody sane go 'WTF?' and wonder about bad drugs." More seriously, he strongly disliked the "low-latency" name, saying that it obscured the real effect of the patch. That name, he said, should be changed:

The "ll" stands for "low latency", but that makes it sound all good. Make it describe what it actually does: "busy loop", and write it out. So that people understand what the actual downsides are. We're not a marketing group.

So, for example, he was not going to accept POLL_LL in the user-space interface; he requested POLL_BUSY_LOOP instead.

Beyond that, Linus disliked how the core polling code worked, saying that it was more complicated than it needed to be. He made a number of suggestions for improving the implementation. Importantly, he wanted to be sure that polling would not happen if the need_resched flag is set in the current structure. That flag indicates that a higher-priority process is waiting to run on the CPU; when it is set, the current process needs to get out of the way as quickly as possible. Clearly, performing a busy wait for network data would not be the right thing to do in such a situation. Linus did not say that the proposed patch violated that rule, but it was not sufficiently clear to him that things would work as they needed to.

In response to these perceived shortcomings, Linus refused the entire patch set, putting just over 1,200 changes on hold. He didn't reject the low-latency work altogether, though:

End result: I think the code is salvageable and people who want this kind of busy-looping can have it. But I really don't want to merge it as-is. I think it was badly done, I think it was badly documented, and I think somebody over-sold the feature by emphasizing the upsides and not the problems.

As one might imagine, that put a bit of pressure on Eliezer Tamir, the author of the patches in question. The merge window is only two weeks long, so the requested changes needed to be made in a hurry. Eliezer was up to the challenge, though, producing the requested changes in short order. On July 9, Dave posted a new pull request with the updated code; Linus pulled the networking tree the same day, though not before posting a complaint about some unrelated issues.

In this case, the last-minute review clearly improved the quality of the implementation; in particular, the user-visible option to poll() is now more representative of what it really does (SO_LL remains unchanged, but it will become SO_BUSY_WAIT before 3.11 is released). The cost, of course, was undoubtedly a fair amount of adrenaline on Eliezer's part as he imagined Dave busy waiting for the fixes. Better review earlier in the process might have allowed some of these issues to be found and fixed in a more relaxed manner. But review bandwidth is, as is the case in most projects, the most severely limited resource of all.

Index entries for this article
Kernel	Networking

Ethernet polling and patch-pulling latency

Posted Jul 11, 2013 5:04 UTC (Thu) by eliezert (subscriber, #35757) [Link]

I should mention that some of the stuff that Linus asked me to remove, was added because of requests from earlier reviewers.

Maybe, just like we give credit to reviewers, when we add their suggestion to the code, we should also credit them when Linus gets angry and asks to revert it. ;)

Ethernet polling and patch-pulling latency

Posted Jul 11, 2013 9:58 UTC (Thu) by niner (subscriber, #26151) [Link] (2 responses)

Is POLL_BUSY_LOOP really a better name? Sure it does spell out the drawbacks very clearly, but it does so naming an implementation detail. Maybe at some point in the future we find another way to provide low latency device polling not using busy looping. Maybe it will be some fancy hardware feature or some new genius trick with new drawbacks. Applications probably will not care about how low latency is achieved, but the interface will then be misleading.

Ethernet polling and patch-pulling latency

Posted Jul 11, 2013 18:42 UTC (Thu) by error27 (subscriber, #8346) [Link]

Naming is hard, but if those are the only two choices then POLL_BUSY_LOOP is better. The other name is too tempting and you'll always be having to explain it and why no one outside the financial industry should use it. By naming it something scary you've cut your support costs dramatically.

Honestly, there will always be people who want to busy loop, so you're right that it's an implementation detail but it's also probably not going to change.

Ethernet polling and patch-pulling latency

Posted Jul 11, 2013 21:34 UTC (Thu) by iabervon (subscriber, #722) [Link]

If there aren't drawbacks, the networking stack would just start using the feature regardless of what the application specifically cares about. If there are drawbacks, it might matter what they are. I mean, you can often improve latency by delivering corrupted packets instead of waiting for correct copies, but that's not a tradeoff everyone who wants low latency is willing to make.

Ethernet polling and patch-pulling latency

Posted Jul 11, 2013 12:57 UTC (Thu) by nowster (subscriber, #67) [Link]

#define KML_DAVE_BUSY_WAIT

perhaps?