KS2007: The greater kernel ecosystem and user-space APIs

By Jonathan Corbet
September 6, 2007

LWN.net Kernel Summit 2007 coverage

Once upon a time, the kernel exported a small set of system calls which made almost the entire interface with user space. In current times, that interface has grown quite a bit more complex. For all practical purposes, the bottom layer of the system now consists of the kernel plus a fair amount of user-space software - udev, HAL, X, etc., which presents the interface that the user actually sees. A panel at the 2007 Kernel Summit made up of Greg Kroah-Hartman, David Zeuthen, Kay Sievers, and David Airlie looked at issues involving this combined software layer.

No discussion of the user-space interface is complete without bringing up sysfs and its well-known habit of breaking applications. There are several things being done to minimize sysfs-related problems in the future. The kernel developers have taken a while to learn how to design and manage this interface, and how to represent things in ways that don't break. There is an ongoing effort to break the much-maligned coupling between sysfs and internal kernel data structures. And there is an education effort aimed at helping user-space developers avoid using sysfs in ways which will break in the future. The key here is to bear in mind things can move as the structure of the system changes; they don't necessarily stay put even over a single reboot cycle. Any application which assumes that the system's hardware configuration is stable will break sooner or later.

Part of this problem is that the dynamic tree structure implemented by sysfs is hard for application developers to work with. The simple, flat namespace found in /dev was much easier. User-space developers who don't want to deal directly with sysfs should use the libraries which are part of HAL. The old libsysfs library is gone forevermore; libhal is the new libsysfs. Not everybody in the room agreed that HAL is the way of the future, but that does appear to be the way things are going.

Hidden file descriptors were discussed briefly. Linus said that he thought it was a reasonable idea, but that there have not been a whole lot of developers screaming for that feature. Unless that changes, hidden file descriptors will probably remain outside of the mainline.

On the X front, much of the work at the moment is aimed at moving video mode setting into the kernel. There are a number of tricky transition issues to take care of; once the kernel is in charge of video modes, it really will not do to have user-space programs changing them behind its back. So kernel-space mode setting will likely remain disabled until the distributor sets a flag indicating that user-space knows not to try to play with the hardware directly.

There were some questions about how some of the video driver code is managed. This code lives in a repository which provides drivers for both Linux and BSD; there are a certain number of macros in the code designed to make that support easier to maintain. It's a sort of favor being done for the BSD world, and David sees no real need to stop doing that for now. The in-kernel mode setting may force a change, though, as the BSD side is not interested in doing things that way.

From here, it was a fairly straightforward transition into the next session, which covered review of user-space API additions - system calls in particular. Michael Kerrisk presented an abbreviated version of his LCE talk on system call review; it was generally received well.

Christoph Hellwig asked if anybody had reviewed the timerfd() and signalfd() system calls before they were merged. What followed was one of the few times all day that the room was silent.

Part of Michael's proposal is that new system calls should come equipped with manual pages. It was suggested that this requirement will be hard to enforce unless the man pages are packaged with the kernel itself. That led to an interesting question: the man pages, as currently written, document the system call interface as presented by the C library. But the API exported directly by the kernel can be different, and often is. Which API should be documented? It seems that the kernel-implemented API is the one to cover, especially considering that glibc is not the only C library and that other library implementors may well be very interested in that information.

From there the discussion went into the idea of including test cases in the kernel tree as well. In fact, perhaps the entire Linux Test Project suite could be so packaged. That maybe taking things a little too far, but there was interest in getting a simple set of test cases for new system calls into the kernel. If nothing else, they would help architecture maintainers wire up system calls on their target machines. Christoph Hellwig volunteered to do some of the work to get those tests into the tree, so it might just happen.

Toward the end, the discussion headed back toward review of new system calls. Linus expressed a fear that an overly severe review process would just force system calls underground (in the form of ioctl() commands). No formal decision was made on any sort of review process. But it seems likely that any proposed new system calls will be looked at harder than in the past - at least for a while.

Index entries for this article
Kernel	Development model/User-space ABI
Kernel	Sysfs
Kernel	User-space API

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 11, 2007 18:41 UTC (Tue) by nix (subscriber, #2304) [Link] (6 responses)

[...] the man pages, as currently written, document the system call interface as presented by the C library. But the API exported directly by the kernel can be different, and often is. Which API should be documented?

There's already a scheme for this, and long has been. The syscall docs go into section 2: the docs for the C interface go into section 3.

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 13, 2007 11:14 UTC (Thu) by mkerrisk (subscriber, #1978) [Link] (5 responses)

[...] the man pages, as currently written, document the system call interface as presented by the C library. But the API exported directly by the kernel can be different, and often is. Which API should be documented?
There's already a scheme for this, and long has been. The syscall docs go into section 2: the docs for the C interface go into section 3.

Life is not so simple, on Linux at least (and I suspect the same is true of a number of other Unix implementatons): there is a fairly close intertwining of kernel and (g)libc interfaces. Often the glibc wrapper for a system call adds nothing, or very little, on top of the kernel interface. But sometimes the wrapper makes significant changes (e.g., does some manipulation of arguments). Where that is done, the application programmer is almost always interested in the (g)libc interface, rather than the raw kernel interface. The alternative would be two have two man pages for each system call: one in section 2 describing the raw kernel interface, and one in section 3 describing the (g)libc interface. That is kind of clumsy for the following reasons:

often the section 3 page will describe no difference from the section 2 page (i.e., the wrapper does nothing except invoke the syscall); and
in cases where the wrapper does add something to the syscall, the reader needs to read two man pages to get the full picture.

My preference (already embodied in some pages), is to describe all syscalls in section 2 pages, and, if the (g)libc wrapper provides a different behavior/interface, then document that interface in the main text of the section 2 page, and include a NOTE that describes the differences for the raw kernel interface.

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 13, 2007 15:25 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

In the wrapper-and-syscall-nearly-identical case, you could describe the
differences in a NOTE in the section 3 page. It just seems clumsy to have
user-callable stuff documented in section 2: on other Unixes that's not
what it's for.

But you're the manpage maintainer and I'm just a hanger-on, so ignore
me. :)

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 6:31 UTC (Sun) by mkerrisk (subscriber, #1978) [Link] (3 responses)

In the wrapper-and-syscall-nearly-identical case, you could describe the differences in a NOTE in the section 3 page.

Yes, but what I want to avoid is people having to look in two places to get all the information they need. Or looking in just one of those two places and not getting all the info that they require (and not realizing that they don't have all the information, if for example they only look in the section 2 page). Ideas are always welcome!

It just seems clumsy to have user-callable stuff documented in section 2: on other Unixes that's not what it's for.

It is not clear to me other Unix implementations always have a clear .2 / .3 divide. Lacking the source, it's not easy to be sure what is done in libc before a syscall is invoked.

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 11:07 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)

Er, why would people need to simultaneously know the details of the
kernel-level interface (only of interest to people writing libcs) and of
the POSIX interface (only of interest to people using libcs).

It seems to me that your division is of most use only to libc authors :/
everyone else will need either one half of the info, or the other half.

(Of course this is relevant only for the small minority of syscall/libc
calls that differ significantly, and as I said, I'm not doing the *work*,
so my opinion is worth basically nothing :) )

Solaris has a clear .2 / .3 divide: it's only that it then subdivides
section 3 into enough subsections that you're then left guessing which of
*those* your page might be in. Let's not do *that*. :)

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 15:16 UTC (Sun) by mkerrisk (subscriber, #1978) [Link] (1 responses)

Er, why would people need to simultaneously know the details of the kernel-level interface (only of interest to people writing libcs) and of the POSIX interface (only of interest to people using libcs).

The majority audience for man pages is of course userland programmers. I suppose that 99.99% (give or take a 9) of those userland programmers use a libc, rather than invoking syscalls directly, and let's say that 99% of them use glibc, and are thus interested in the glibc interface. In terms of documenting the APIs, these are the choices I see:

Document the details of the system call in .2, and have .3 pages that note just the differences in the (g)libc API. I dislike this option, because the (userland) programmer must look at two pages to put together the information they need.
Document the details of the system call in .2, and have .3 pages that fully document the (g)libc API, reproducing all of the details that also appeared in the corresponding .2 page. I dislike this solution because of the duplication involved. Furthermore, for the many interfaces where the glibc wrapper does nothing, the .2 and .3 pages would be exactly the same.
Have .2 pages which include details of the (g)libc API, but clearly indicate those parts where the raw syscall API differs.

So far, I prefer option 3, but I realize it's not perfect, for various reasons, some of which you mention. It may be that someone comes up with a better solution than any of these three.

It seems to me that your division is of most use only to libc authors :/ everyone else will need either one half of the info, or the other half. (Of course this is relevant only for the small minority of syscall/libc calls that differ significantly, and as I said, I'm not doing the *work*, so my opinion is worth basically nothing :) )

But you're polite, and interested, so I can't help but respond ;-).

Solaris has a clear .2 / .3 divide

What I'm suggesting (and it's just a guess), is that maybe the divide on Solaris is no more real than that on Linux. Is *everything* documented in .2 on Solaris a raw syscall? Is *anything* documented in .3 in fact a syscall? I don't know the definitive answer to either question, but I wouldn't be surprised to find that the answer to both questions is "yes".

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 16, 2007 17:10 UTC (Sun) by nix (subscriber, #2304) [Link]

Well, option 2 is implementable by having the nearly-identical subset of
section 2 and 3 manpages generated from a common source (it'd be pretty
trivial to sed out markers that indicate that `this bit is section 2 only'
and `this bit is section 3 only').

But I really will shut up now until I have actual patches implementing
this (medical crud means it may be some time, biology is best observed
from a long way away).

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 13, 2007 10:30 UTC (Thu) by gypsumfantastic (guest, #31134) [Link]

"The in-kernel mode setting may force a change, though, as the BSD side is not interested in doing things that way."

Why not? Because it's a Linux idea, and NIH applies? Because they're BSD, and inertia rules triumphant? Just because? Or do they actually have sound technical reasons for rejecting in-kernel mode setting?

KS2007: The greater kernel ecosystem and user-space APIs

Posted Sep 21, 2007 9:32 UTC (Fri) by malcolmparsons (guest, #46787) [Link]

That maybe taking things a little too far, but there was interest in getting a simple set of test cases for new system calls into the kernel.

Rather confusingly, "maybe" has a different meaning to "may be". You wanted "may be" in this sentence.